Chapter 5. Monitoring Quality of Service

Chapter 5. Monitoring Quality of Service
Prev		Next

You can use the grioqos(1M) command to extract and report the QoS metrics that GRIO maintains for each active stream. grioqos loops, repeatedly fetching new QoS statistics from the kernel for the specified application stream or node-level allocation. grioqos displays a header containing the following information:

Operating system and release
Date and time
Filesystem reported on
Current reservation and stream ID

This section discusses the following:

`grioqos` Command Line

grioqos [options] [streamID|fs] [delay [count]]

-c

Clears the screen before printing each new set of statistics.

-h

Prints a usage message.

-i

Reports the following low-level QoS metrics for all currently configured sampling intervals:

minbw

maxbw

lastbw

minio

maxio

lastio

For details about these metrics, see “Quality-of-Service Metrics”.

-I intervals

Reports the same low-level QoS metrics as -i, but for a specified range of sampling intervals.

intervals is a comma-separated list of sampling intervals expressed as either a number of I/Os or a time interval in msecs. For example, the following would report results averaged over the last 5 and 10 I/Os, and over the last 1 and 2 seconds, respectively:

-I 5,10,1000ms,2000ms

-l

Lists active streams in an easily parsed form (one per line with the following fields:

Filesystem mount point (or the string <unmounted> if the filesystem is not mounted)
Type of the stream
Stream ID
Reserved bandwidth reported in bytes and msecs
Process ID for application-created streams

-m

Enables monitoring mode, which reports the following high-level stream and I/O metrics:

bytes

msecs

bckt

bckt (max)

total

rate

bklg

issd

idle

thrt

wait

See “Monitoring Stream and I/O Metrics” for more information.

-n

Prints a more human-readable version of the performance information reported with the -i option for all currently configured sampling intervals:

lastbw

minbw

maxio

For more information, see “Quality-of-Service Metrics”

The minimum bandwidth and maximum average service time are the metrics of most concern when attempting to deliver guaranteed data rates.

-N intervals

Reports the same metrics as -n, but for a specified range of sampling intervals.

-N 5,10,1000ms,2000ms

-o file

Logs output to the specified file.

-r

Resets the specified statistics when used with one of the following options:

High-level stream statistics: -m
Low-level QoS statistics: -i, -I, -n, -N, -t, or -T

The -r option is ignored if none of these other options is specified.

GRIO will continue to update some kernel statistics even when no I/O is being performed (such as when the rate metric reported in the -m mode is updated even on an idle stream). In order to get results that accurately correspond with those seen by a user application, you should start grioqos with the -r option at the same time that the application test begins.

-R intervals

Reconfigures the kernel QoS monitoring intervals and resets the statistics. This allows you to change the set of sampling intervals used in the kernel to compute recent bandwidth and average service time.

-R 5,10,1000ms,2000ms

By default, GRIO is configured to compute statistics for a wide range of sampling intervals. However, it can be useful to change these intervals using the -R option when a monitored application has a buffering behavior that is not well-matched by the default intervals.

Note: GRIO always configures two additional intervals automatically:

The single sample interval, which tracks the best and worst case service times for individual I/Os
The maximum interval, which is as large as the kernel data structures can accommodate

-s

Prints a more human-readable summary of active streams than -l. Results are grouped per filesystem and include the following:

Stream type
Process ID (for application streams)
Bandwidth reservation in MB/s
Stream IDs (when -v is also specified)

For more information about the output format, see the grioqos(1M) man page.

-t

Displays a per-stream I/O service time histogram for all buckets.

-T buckets

Displays a per-stream I/O service time histogram for the specified buckets. You can display a ranges of values. For example, the following would cause the values of 11 histogram buckets to be displayed:

-T 0,1,2,3,20-25,52

-v

Display verbose output (used with -s).

streamID

Specifies the ID of an active GRIO stream.

Specifies a path that identifies a mounted GRIO-managed filesystem.

delay

Specifies the length of time in seconds that grioqos should sleep before retrieving each new set of statistics.

count

Specifies the total number of samples to be retrieved.

If you specify grioqos without any arguments, it prints a usage message by default.

To terminate grioqos, press Ctrl-C on Windows or send a SIGINT on other platforms.

GRIO Scheduler

Interpreting the statistics collected by grioqos requires a basic understanding of the GRIO scheduler.

GRIO uses the token bucket abstraction to limit the average rate and burstiness of I/O flowing to or from the filesystem. Conceptually, each stream has a bucket of tokens. Each token confers the right to issue one unit of I/O. Tokens are added to the token bucket at a rate corresponding to the GRIO reservation and accumulate up to the maximum size of the bucket, at which point any further tokens are discarded. When a new I/O request arrives, it is issued if there are sufficient tokens available to the token bucket; if there are insufficient tokens, it is added to the throttle queue for the stream, where it is held for a short period before the token bucket is checked again. The rate at which tokens accrue to the token bucket controls the average rate of the stream. The maximum size of the token bucket controls the size of the largest burst of I/O that can be issued.

The ability to issue a temporary burst of I/O above the reserved data rate is important. It is the mechanism within GRIO by which an application or device that temporarily falls below the required data rate can catch up, thus preserving the required average data rate.

GRIO implements a variation of the weighted round-robin scheduling discipline. At each scheduler activation, it visits each stream in the system and issues as much I/O as it can, up to the limit of the token bucket. The order in which the streams are visited is always the same. To increase the determinism of the resulting I/O flow, GRIO will (on platforms where it is possible) attempt to disable further I/O reordering operations in lower-level devices.

Monitoring Stream and I/O Metrics

In monitoring mode (enabled with -m), grioqos reports the following metrics:

`bytes, msecs`		Reports the current GRIO reservation. If the monitored stream is a non-GRIO stream, this includes both the static and dynamic components (and may change as the DBA periodically adjusts the dynamic allocation or if an administrator modifies the static allocation using `grioadmin`). An application reservation may change if the application uses the `grio_modify(3X)` call to modify its reservation at runtime.
`bckt, bckt` `(max)`		Describes the current state of the token bucket: `bckt` measures the current contents of the token bucket in MB. The contents of `bckt` change continuously as I/O is issued. `bckt =` `(max)` is the size of the token bucket in MB and the maximum burst of I/O that GRIO will issue to the filesystem. The value of `bckt` `(max)` is related to the size of the current reservation and only changes when the reservation is changed.
`total, rate`		Describes the amount of data transferred: `total` is the total amount of data in MB transferred across the stream since it was created or the statistics were reset `rate` is the overall data rate in MB/s that was achieved When a stream is first initialized, the token bucket is full, which means that `bckt` is equal to `bckt (max)`. An unthrottled application can issue a large initial burst of I/O before it drains its token bucket and the GRIO throttle forcibly slows it down. Depending on the size of individual I/Os, the action of the throttle can cause the instantaneous bandwidth to oscillate slightly above and below the guaranteed rate. In these cases, however, the overall data rate including the initial burst is greater than the requested data rate and can be verified with the rate metric (for example, by using `grioqos -rm`).
`bklg, issd`		Tracks I/Os being actively processed by the stream: `bklg` is the backlog of I/O that has been placed on the throttle queue `issd` is active I/O that has been issued to the volume but has not yet completed
`idle, thrt, wait`		Accounts for the utilization of the stream. These are instantaneous metrics that are computed for the period since the last sample: `idle` is the percentage of the time during which the stream was not processing I/O, that is, there was no active I/O and no I/O on the throttle queue (`bklg` and `issd` are both equal to 0) `thrt` is the percentage of the time during which the stream had I/O on the throttle queue (`bklg` is non-zero) `wait` is the percentage of the time during which there was active I/O (`issd` is non-zero)

The stream utilization metrics (idle, thrt, and wait) can be useful when trying to understand the interaction between an application, the GRIO scheduler, and the storage device. Table 5-1 describes commonly observed behaviors and their corresponding metrics.

Table 5-1. Relationship of Stream Utilization Metrics to Application State

`idle`	`thrt`	`wait`	Application State
Low	Low	Low	Expected behavior for a self-throttled application: The application is issuing I/O to the filesystem efficiently, so the stream is rarely idle The application is not issuing I/O at a rate faster than its reservation, as there is little I/O on the throttle queue of the stream I/O is being serviced quickly suggesting that the filesystem is not currently oversubscribed
Low	High	Low	Expected behavior for an application being throttled by GRIO.
Any	Any	High	The application is spending a lot of time waiting for I/O. This may or may not be a problem, but if the application is seeing poor QoS as reported by the `-i`, `-I`, `-n`, `-N`, `-t`, or `-T` options, you should review the qualified bandwidth for this filesystem. An indication of poor QoS would be low worst-case bandwidth and high average service times over relatively long sampling intervals.
High	Any	Any	The stream is spending a lot of time idle. The application may not be issuing I/O to the filesystem efficiently. You should investigate whether it is using multithreaded or asynchronous I/O. If the desired data rate in userspace is not being achieved, the behavior of the application should be reviewed.

Quality of Service

Depending on the amount of I/O buffering an application performs, it may be more or less sensitive to variation in I/O service time, also known as jitter. This can vary from tens of seconds for applications that have large buffers and use threaded or asynchronous I/O, to tens of milliseconds for single-threaded applications with little buffering that require a low upper bound on I/O service time.

Approaches to measuring I/O performance often tend to focus at the ends of this spectrum, measuring one of the following (which can be limiting as a result):

Average bandwidth and ignoring the effects of service interruptions over shorter time intervals
Worst-case service time that (for applications that are able to tolerate more jitter) can be a stronger criteria that is useful

The GRIO QoS infrastructure provides a configurable mechanism for monitoring performance over the entire range of time scales from the service times of individual I/Os to the sustained bandwidth over long sampling intervals. It can do so for an individual application or over a period of time without instrumenting or otherwise disrupting the performance of the application.

Quality-of-Service Metrics

Within the kernel, GRIO records the I/O completion times for all recent I/Os to or from a stream. From this high-resolution data, it computes a number of derived metrics that can be efficiently exported to userspace. You can change the monitoring intervals over which these metrics are computed by using grioqos. Sampling intervals can be expressed as either a time t (such as 1000ms) or as a number of individual samples n. For instance, grioqos can display average I/O service time and bandwidth for the last four I/Os, the last 200ms, the last second, and so forth.

GRIO computes the following metrics for each configured sampling interval:

`lastbw`		Describes the recent average bandwidth, which is the bandwidth observed over the last t ms or n samples. It is an instantaneous metric describing recent stream activity.
`minbw, maxbw`		Describes the minimum and maximum values of `lastbw`. These metrics track the worst- and best-case bandwidth delivered over any continuous interval of the specified length since the creation of the stream or the last time the statistics were reset.
`lastio`		Describes the average I/O service time for I/Os over `t`ms or n samples. When n is 1, this metric records the actual service times of individual I/Os. When n is greater than 1, this metric is the average of the observed service times. It is an instantaneous metric describing recent stream activity.
`minio, maxio`		Describes the minimum and maximum values of `lastio`. Like `minbw` and `maxbw`, these metrics track the worst- and best-case average service times delivered over any continuous interval of the specified length since the statistics were initialized or last reset.

`grioqos` Caveats

There is a size restriction on the kernel structures used to hold recent I/O statistics. If a requested metric cannot be computed because there is insufficient data, a single hyphen (-) is printed. This can also happen when the QoS metrics have been recently reset using the -r or -R options. For example, requesting a sampling interval of 10000ms may display only a hyphen (-) because the GRIO kernel structures cannot hold enough individual samples to compute an average over ten seconds. However, for most I/O rates and sampling intervals, the kernel structures should be adequate.

Use care when interpreting the low-level QoS statistics. A number of the bandwidth and service time measures only make sense if they have been recorded during a period of continuous, consistent application I/O (for example, for a video playout).

The lastbw and maxbw metrics are meaningful regardless of the behavior of the application. However, minbw tracks all interruptions to the flow of I/O. This includes interruptions due to the normal operation of the application as opposed to an actual service interruption in the filesystem or device. Thus, if the application stops and starts I/O during the sampling period, this will be recorded in the minbw, which will in turn be of little use in detecting a real service interruption and is unlikely to provide any useful insight into the performance of the application and system.

Similarly, the lastio metric is most useful if the application uses a consistent request size when issuing I/O to the filesystem. If the application issues I/O of widely varying size, then the service time is permuted both by filesystem and device issues and the behavior of the application. For such applications, this makes it very difficult to determine the origin of a performance issue. This is particularly true for non-GRIO streams., which manage all of the I/O on a node that does not otherwise have an explicit GRIO reservation. This includes the following:

Direct I/O from applications that do not have a GRIO reservation
Buffered I/O from all sources via the buffer cache (or whatever the native filesystem caching mechanism is for the platform)
All other system I/O to the managed filesystem

The result is that the non-GRIO stream may see a large variation in I/O sizes and the average service time of those I/Os is unlikely to provide useful insight into the performance of the system.

`grioqos` Examples

This section shows grioqos used to monitor a GRIO-aware application. High-level stream and low-level quality-of-service metrics are collected. The application is temporarily suspended to show the effect on the stream utilization and average data rate. The example filesystem /mirror has a qualified bandwidth of 30 MB/s.

Confirm the available bandwidth on /mirror:
$ grioadmin -a /mirror 29.94 MB/s available on /mirror 0.06 MB/s allocated to this node
There are just under 30 MB/s available, and a minimal dynamic allocation. Now we start the test application, which makes a 20-MB/s reservation and starts performing reads as fast as it can. The I/O size is just under 8 MBs. The application is multithreaded and configured to have up to four I/Os active.

List the active streams and get the stream ID of the application's GRIO stream:

$ grioqos -sv
/mirror:
  Dynamic          0.06 MB/s  b77c9351-7b63-1029-8f56-08006913a7f7
  App (6754151)   20.00 MB/s  03041498-871c-1029-87e2-08006913a7f7

Monitor the application stream:

$ grioqos -m 03041499-871c-1029-87e2-08006913a7f7 1

IRIX64 octane 6.5 01062343 IP30 07/21/05

Filesystem: /mirror

App (6754151) 20.00 MB/s 03041498-871c-1029-87e2-08006913a7f7

-           bytes msecs  bckt (max)  total rate  bklg issd idle thrt wait
-           bytes    ms    MB    MB     MB MB/s    MB    MB   %    %    %
21:00:38 20971520  1000 26.97 40.00   0.00 0.00  0.00 15.82   -    -    -
21:00:39 20971520  1000  7.63 40.00  31.64 25.5  0.00 23.73   0    5  100
21:00:40 20971520  1000  4.12 40.00  63.28 28.1 15.82 15.82   0   88  100
21:00:41 20971520  1000  0.61 40.00  94.92 29.1 23.73  7.91   0  100   85
21:00:42 20971520  1000  5.01 40.00 110.74 25.9 23.73  7.91   0  100   72
21:00:43 20971520  1000  1.49 40.00 134.47 25.5 23.73  7.91   0  100   55
21:00:44 20971520  1000  5.89 40.00 158.20 25.2 31.64  0.00   0  100   71
21:00:45 20971520  1000  2.41 40.00 174.02 23.8 23.73  7.91   0  100   60
21:00:46 20971520  1000  6.80 40.00 197.75 23.8 31.64  0.00   0  100   65
21:00:47 20971520  1000  3.29 40.00 213.57 22.9 23.73  7.91   0  100   66
21:00:48 20971520  1000  7.69 40.00 237.30 23.0 31.64  0.00   0  100   61
21:00:49 20971520  1000  4.18 40.00 253.12 22.3 23.73  7.91   0  100   70
21:00:50 20971520  1000  0.67 40.00 276.86 22.4 23.73  7.91   0  100   55
21:00:51 20971520  1000  5.13 40.00 292.68 21.9 23.73  7.91   0  100   70
...

The first few samples show that the token bucket bckt is initially full, which allows the overall data rate rate to jump above the reserved 20 MB/s briefly (see “Monitoring Stream and I/O Metrics”).

The stream utilization metrics idle, thrt, and wait show that while the application is draining its token bucket, the application spends all of its time waiting for I/O to the device. Very quickly, the token bucket empties completely and GRIO begins to throttle the application. thrt jumps to 100%. wait drops to around 60-70%, which is consistent with the qualified bandwidth.

The maximum this filesystem can deliver is 30MB/s, therefore a reservation of 20MB/s should keep the filesystem active approximately two-thirds of the time, which is what we see. The application is clearly very efficient about issuing I/O to the filesystem (multithreaded with four active I/Os), because there is never any point when the stream is idle and the filesystem does not have I/O to process.

To simulate an interruption, temporarily suspend the application in userspace (sending it a SIGSTOP). The grioqos -m output would change as follows:

21:01:04 20971520  1000  7.42 40.00 561.62 21.2 15.82  0.00   0  100   61
21:01:05 20971520  1000 11.82 40.00 577.44 21.0  0.00  0.00  31   44   49
21:01:06 20971520  1000 32.04 40.00 577.44 20.2  0.00  0.00 100    0    0
21:01:07 20971520  1000 40.00 40.00 577.44 19.5  0.00  0.00 100    0    0
21:01:08 20971520  1000 40.00 40.00 577.44 18.9  0.00  0.00 100    0    0
...

The application stops issuing I/O completely and immediately the utilization metrics change:

The token bucket fills
Any remaining I/O on the throttle queue drains out (thrt goes to 0)
The stream becomes completely idle

Note: The rate metric, which computes the overall data rate, is updated even while the stream is idle and gradually decreases during this period of inactivity.

Restart the application. The grioqos -m output changes accordingly:

21:01:12 20971520  1000 16.10 40.00 593.26 17.1  0.00 23.73  23    0   77
21:01:13 20971520  1000  4.80 40.00 632.81 17.8 15.82 15.82   0   70   99
21:01:14 20971520  1000  1.53 40.00 664.45 18.1 23.73  7.91   0  100  100
21:01:15 20971520  1000  5.93 40.00 688.18 18.3 31.64  0.00   0  100   66
21:01:16 20971520  1000  2.42 40.00 704.00 18.2 23.73  7.91   0  100   60
21:01:17 20971520  1000  6.82 40.00 727.73 18.3 31.64  0.00   0  100   64
21:01:18 20971520  1000  3.31 40.00 743.55 18.3 23.73  7.91   0  100   63
21:01:19 20971520  1000  7.71 40.00 767.29 18.4 31.64  0.00   0  100   63

There is a small initial burst as the token bucket is drained and GRIO throttles the application to 20 MB/s.

During the same run, we collect low-level QoS statistics. At the start of the run, use -i to display all of the intervals that are being monitored in the kernel:

$ grioqos -i 03041498-871c-1029-87e2-08006913a7f 1

IRIX64 octane 6.5 01062343 IP30 07/21/05

Filesystem: /mirror

App (6754151) 20.00 MB/s 03041498-871c-1029-87e2-08006913a7f7

-          interval   minbw   maxbw  lastbw  minio  maxio lastio
-                 -    MB/s    MB/s    MB/s     ms     ms     ms
21:00:38        1io       -       -       -  296.8 1004.3 1004.3
+               2io   29.32   32.89   32.89  402.2  967.8  967.8
+               3io   30.68   31.94   31.00  505.0  882.0  882.0
+               4io   31.01   31.38   31.38  611.6  788.4  788.4
+               5io   31.46   31.46   31.46  690.1  690.1  690.1
+               6io       -       -       -      -      -      -
+              10io       -       -       -      -      -      -
+             100ms   29.32   32.89   32.89  402.2  967.8  967.8
+             200ms   29.32   32.89   32.89  402.2  967.8  967.8
+             500ms   30.68   31.00   31.00  716.5  882.0  882.0
+            1000ms   31.46   31.46   31.46  690.1  690.1  690.1
+            2000ms       -       -       -      -      -      -
+            5000ms       -       -       -      -      -      -
+            1500io       -       -       -      -      -      -

There are 14 intervals being monitored for this stream. This sample was collected just after the application was started and only a small number of I/Os had been issued. There is insufficient data to compute some of these metrics and a number of samples are displayed as “-” .

Select two intervals (500ms and 2000ms) and monitor them during the course of the run:

$ grioqos -I "500ms,2000ms" 03041498-871c-1029-87e2-08006913a7f 2

IRIX64 octane 6.5 01062343 IP30 07/21/05
Filesystem: /mirror

App (6754151) 20.00 MB/s 03041498-871c-1029-87e2-08006913a7f7

-          interval   minbw   maxbw  lastbw  minio  maxio lastio
-                 -    MB/s    MB/s    MB/s     ms     ms     ms
21:00:38      500ms       -       -       -      -      -      -
+            2000ms       -       -       -      -      -      -
21:00:40      500ms   30.37   31.93   31.48  479.0 1009.0  959.2
+            2000ms   31.46   31.46   31.46  789.4  789.4  789.4
21:00:42      500ms   18.91   32.31   20.14  479.0 1224.1 1224.1
+            2000ms   25.37   31.72   25.37  789.4 1057.5 1057.5
21:00:44      500ms   18.91   32.31   20.75  479.0 1588.4 1583.9
+            2000ms   19.74   31.72   20.38  789.4 1527.7 1527.7
...

As seen in the high-level metrics, there is an initial burst of I/O before the application is throttled by GRIO. The current bandwidth lastbw quickly stabilizes at around 20 MB/s. After the application is suspended in userspace, the low-level QoS statistics clearly record the interruption:

21:01:13      500ms    1.15   32.31   27.27  479.0 1609.3  554.7
+            2000ms    1.15   31.72    3.19  789.4 1594.4  816.3
21:01:15      500ms    1.15   34.95   31.96  479.0 1609.3  988.3
+            2000ms    1.15   33.08   33.08  788.7 1594.4  861.1
...

Prev	Table of Contents	Next
Chapter 4. Administering GRIO		Chapter 6. GRIO API Overview