Chapter 5. Monitoring Quality of Service

You can use the grioqos(1M) command to extract and report the QoS metrics that GRIO maintains for each active stream. grioqos loops, repeatedly fetching new QoS statistics from the kernel for the specified application stream or node-level allocation. grioqos displays a header containing the following information:

This section discusses the following:

grioqos Command Line

grioqos [options] [streamID|fs] [delay [count]]

-c

Clears the screen before printing each new set of statistics.

-h

Prints a usage message.

-i

Reports the following low-level QoS metrics for all currently configured sampling intervals:

minbw
maxbw
lastbw
minio
maxio
lastio

For details about these metrics, see “Quality-of-Service Metrics”.

-I intervals

Reports the same low-level QoS metrics as -i , but for a specified range of sampling intervals.

intervals is a comma-separated list of sampling intervals expressed as either a number of I/Os or a time interval in msecs. For example, the following would report results averaged over the last 5 and 10 I/Os, and over the last 1 and 2 seconds, respectively:

-I 5,10,1000ms,2000ms

-l

Lists active streams in an easily parsed form (one per line with the following fields:

  • Filesystem mount point (or the string <unmounted> if the filesystem is not mounted)

  • Type of the stream

  • Stream ID

  • Reserved bandwidth reported in bytes and msecs

  • Process ID for application-created streams

-m

Enables monitoring mode, which reports the following high-level stream and I/O metrics:

bytes
msecs
bckt
bckt (max)
total
rate
bklg
issd
idle
thrt
wait

See “Monitoring Stream and I/O Metrics” for more information.

-n

Prints a more human-readable version of the performance information reported with the -i option for all currently configured sampling intervals:

lastbw
minbw
maxio

For more information, see “Quality-of-Service Metrics”

The minimum bandwidth and maximum average service time are the metrics of most concern when attempting to deliver guaranteed data rates.

-N intervals

Reports the same metrics as -n, but for a specified range of sampling intervals.

intervals is a comma-separated list of sampling intervals expressed as either a number of I/Os or a time interval in msecs. For example, the following would report results averaged over the last 5 and 10 I/Os, and over the last 1 and 2 seconds, respectively:

-N 5,10,1000ms,2000ms

-o file

Logs output to the specified file.

-r

Resets the specified statistics when used with one of the following options:

  • High-level stream statistics: -m

  • Low-level QoS statistics: -i, -I, -n, -N, -t, or -T

The -r option is ignored if none of these other options is specified.

GRIO will continue to update some kernel statistics even when no I/O is being performed (such as when the rate metric reported in the -m mode is updated even on an idle stream). In order to get results that accurately correspond with those seen by a user application, you should start grioqos with the -r option at the same time that the application test begins.

-R intervals

Reconfigures the kernel QoS monitoring intervals and resets the statistics. This allows you to change the set of sampling intervals used in the kernel to compute recent bandwidth and average service time.

intervals is a comma-separated list of sampling intervals expressed as either a number of I/Os or a time interval in msecs. For example, the following would report results averaged over the last 5 and 10 I/Os, and over the last 1 and 2 seconds, respectively:

-R 5,10,1000ms,2000ms

By default, GRIO is configured to compute statistics for a wide range of sampling intervals. However, it can be useful to change these intervals using the -R option when a monitored application has a buffering behavior that is not well-matched by the default intervals.


Note: GRIO always configures two additional intervals automatically:

  • The single sample interval, which tracks the best and worst case service times for individual I/Os

  • The maximum interval, which is as large as the kernel data structures can accommodate



-s

Prints a more human-readable summary of active streams than -l. Results are grouped per filesystem and include the following:

  • Stream type

  • Process ID (for application streams)

  • Bandwidth reservation in MB/s

  • Stream IDs (when -v is also specified)

For more information about the output format, see the grioqos(1M) man page.

-t

Displays a per-stream I/O service time histogram for all buckets.

-T buckets

Displays a per-stream I/O service time histogram for the specified buckets. You can display a ranges of values. For example, the following would cause the values of 11 histogram buckets to be displayed:

-T 0,1,2,3,20-25,52

-v

Display verbose output (used with -s).

streamID

Specifies the ID of an active GRIO stream.

fs

Specifies a path that identifies a mounted GRIO-managed filesystem.

delay

Specifies the length of time in seconds that grioqos should sleep before retrieving each new set of statistics.

count

Specifies the total number of samples to be retrieved.

If you specify grioqos without any arguments, it prints a usage message by default.

To terminate grioqos, press Ctrl-C on Windows or send a SIGINT on other platforms.

GRIO Scheduler

Interpreting the statistics collected by grioqos requires a basic understanding of the GRIO scheduler.

GRIO uses the token bucket abstraction to limit the average rate and burstiness of I/O flowing to or from the filesystem. Conceptually, each stream has a bucket of tokens. Each token confers the right to issue one unit of I/O. Tokens are added to the token bucket at a rate corresponding to the GRIO reservation and accumulate up to the maximum size of the bucket, at which point any further tokens are discarded. When a new I/O request arrives, it is issued if there are sufficient tokens available to the token bucket; if there are insufficient tokens, it is added to the throttle queue for the stream, where it is held for a short period before the token bucket is checked again. The rate at which tokens accrue to the token bucket controls the average rate of the stream. The maximum size of the token bucket controls the size of the largest burst of I/O that can be issued.

The ability to issue a temporary burst of I/O above the reserved data rate is important. It is the mechanism within GRIO by which an application or device that temporarily falls below the required data rate can catch up, thus preserving the required average data rate.

GRIO implements a variation of the weighted round-robin scheduling discipline. At each scheduler activation, it visits each stream in the system and issues as much I/O as it can, up to the limit of the token bucket. The order in which the streams are visited is always the same. To increase the determinism of the resulting I/O flow, GRIO will (on platforms where it is possible) attempt to disable further I/O reordering operations in lower-level devices.

Monitoring Stream and I/O Metrics

In monitoring mode (enabled with -m), grioqos reports the following metrics:

bytes, msecs 

Reports the current GRIO reservation. If the monitored stream is a non-GRIO stream, this includes both the static and dynamic components (and may change as the DBA periodically adjusts the dynamic allocation or if an administrator modifies the static allocation using grioadmin). An application reservation may change if the application uses the grio_modify(3X) call to modify its reservation at runtime.

bckt, bckt (max) 

Describes the current state of the token bucket:

  • bckt measures the current contents of the token bucket in MB. The contents of bckt change continuously as I/O is issued.

  • bckt = (max) is the size of the token bucket in MB and the maximum burst of I/O that GRIO will issue to the filesystem. The value of bckt (max) is related to the size of the current reservation and only changes when the reservation is changed.

total, rate 

Describes the amount of data transferred:

  • total is the total amount of data in MB transferred across the stream since it was created or the statistics were reset

  • rate is the overall data rate in MB/s that was achieved

When a stream is first initialized, the token bucket is full, which means that bckt is equal to bckt (max). An unthrottled application can issue a large initial burst of I/O before it drains its token bucket and the GRIO throttle forcibly slows it down. Depending on the size of individual I/Os, the action of the throttle can cause the instantaneous bandwidth to oscillate slightly above and below the guaranteed rate. In these cases, however, the overall data rate including the initial burst is greater than the requested data rate and can be verified with the rate metric (for example, by using grioqos -rm).

bklg, issd 

Tracks I/Os being actively processed by the stream:

  • bklg is the backlog of I/O that has been placed on the throttle queue

  • issd is active I/O that has been issued to the volume but has not yet completed

idle, thrt, wait 

Accounts for the utilization of the stream. These are instantaneous metrics that are computed for the period since the last sample:

  • idle is the percentage of the time during which the stream was not processing I/O, that is, there was no active I/O and no I/O on the throttle queue (bklg and issd are both equal to 0)

  • thrt is the percentage of the time during which the stream had I/O on the throttle queue (bklg is non-zero)

  • wait is the percentage of the time during which there was active I/O (issd is non-zero)

The stream utilization metrics (idle, thrt, and wait) can be useful when trying to understand the interaction between an application, the GRIO scheduler, and the storage device. Table 5-1 describes commonly observed behaviors and their corresponding metrics.

Table 5-1. Relationship of Stream Utilization Metrics to Application State

idle

thrt

wait

Application State

Low

Low

Low

 

Expected behavior for a self-throttled application:

  • The application is issuing I/O to the filesystem efficiently, so the stream is rarely idle

  • The application is not issuing I/O at a rate faster than its reservation, as there is little I/O on the throttle queue of the stream

  • I/O is being serviced quickly suggesting that the filesystem is not currently oversubscribed

Low

High

Low

Expected behavior for an application being throttled by GRIO.

Any

Any

High

The application is spending a lot of time waiting for I/O. This may or may not be a problem, but if the application is seeing poor QoS as reported by the -i, -I, -n, -N, -t, or -T options, you should review the qualified bandwidth for this filesystem. An indication of poor QoS would be low worst-case bandwidth and high average service times over relatively long sampling intervals.

High

Any

Any

The stream is spending a lot of time idle. The application may not be issuing I/O to the filesystem efficiently. You should investigate whether it is using multithreaded or asynchronous I/O. If the desired data rate in userspace is not being achieved, the behavior of the application should be reviewed.


Quality of Service

Depending on the amount of I/O buffering an application performs, it may be more or less sensitive to variation in I/O service time, also known as jitter. This can vary from tens of seconds for applications that have large buffers and use threaded or asynchronous I/O, to tens of milliseconds for single-threaded applications with little buffering that require a low upper bound on I/O service time.

Approaches to measuring I/O performance often tend to focus at the ends of this spectrum, measuring one of the following (which can be limiting as a result):

  • Average bandwidth and ignoring the effects of service interruptions over shorter time intervals

  • Worst-case service time that (for applications that are able to tolerate more jitter) can be a stronger criteria that is useful

The GRIO QoS infrastructure provides a configurable mechanism for monitoring performance over the entire range of time scales from the service times of individual I/Os to the sustained bandwidth over long sampling intervals. It can do so for an individual application or over a period of time without instrumenting or otherwise disrupting the performance of the application.

Quality-of-Service Metrics

Within the kernel, GRIO records the I/O completion times for all recent I/Os to or from a stream. From this high-resolution data, it computes a number of derived metrics that can be efficiently exported to userspace. You can change the monitoring intervals over which these metrics are computed by using grioqos. Sampling intervals can be expressed as either a time t (such as 1000ms ) or as a number of individual samples n. For instance, grioqos can display average I/O service time and bandwidth for the last four I/Os, the last 200ms, the last second, and so forth.

GRIO computes the following metrics for each configured sampling interval:

lastbw 

Describes the recent average bandwidth, which is the bandwidth observed over the last t ms or n samples. It is an instantaneous metric describing recent stream activity.

minbw, maxbw 

Describes the minimum and maximum values of lastbw. These metrics track the worst- and best-case bandwidth delivered over any continuous interval of the specified length since the creation of the stream or the last time the statistics were reset.

lastio 

Describes the average I/O service time for I/Os over tms or n samples. When n is 1, this metric records the actual service times of individual I/Os. When n is greater than 1, this metric is the average of the observed service times. It is an instantaneous metric describing recent stream activity.

minio, maxio 

Describes the minimum and maximum values of lastio. Like minbw and maxbw, these metrics track the worst- and best-case average service times delivered over any continuous interval of the specified length since the statistics were initialized or last reset.

grioqos Caveats

There is a size restriction on the kernel structures used to hold recent I/O statistics. If a requested metric cannot be computed because there is insufficient data, a single hyphen (-) is printed. This can also happen when the QoS metrics have been recently reset using the -r or -R options. For example, requesting a sampling interval of 10000ms may display only a hyphen (-) because the GRIO kernel structures cannot hold enough individual samples to compute an average over ten seconds. However, for most I/O rates and sampling intervals, the kernel structures should be adequate.

Use care when interpreting the low-level QoS statistics. A number of the bandwidth and service time measures only make sense if they have been recorded during a period of continuous, consistent application I/O (for example, for a video playout).

The lastbw and maxbw metrics are meaningful regardless of the behavior of the application. However, minbw tracks all interruptions to the flow of I/O. This includes interruptions due to the normal operation of the application as opposed to an actual service interruption in the filesystem or device. Thus, if the application stops and starts I/O during the sampling period, this will be recorded in the minbw, which will in turn be of little use in detecting a real service interruption and is unlikely to provide any useful insight into the performance of the application and system.

Similarly, the lastio metric is most useful if the application uses a consistent request size when issuing I/O to the filesystem. If the application issues I/O of widely varying size, then the service time is permuted both by filesystem and device issues and the behavior of the application. For such applications, this makes it very difficult to determine the origin of a performance issue. This is particularly true for non-GRIO streams., which manage all of the I/O on a node that does not otherwise have an explicit GRIO reservation. This includes the following:

  • Direct I/O from applications that do not have a GRIO reservation

  • Buffered I/O from all sources via the buffer cache (or whatever the native filesystem caching mechanism is for the platform)

  • All other system I/O to the managed filesystem

The result is that the non-GRIO stream may see a large variation in I/O sizes and the average service time of those I/Os is unlikely to provide useful insight into the performance of the system.

grioqos Examples

This section shows grioqos used to monitor a GRIO-aware application. High-level stream and low-level quality-of-service metrics are collected. The application is temporarily suspended to show the effect on the stream utilization and average data rate. The example filesystem /mirror has a qualified bandwidth of 30 MB/s.

  1. Confirm the available bandwidth on /mirror:

    $ grioadmin -a /mirror
    29.94 MB/s available on /mirror
    0.06 MB/s allocated to this node

    There are just under 30 MB/s available, and a minimal dynamic allocation. Now we start the test application, which makes a 20-MB/s reservation and starts performing reads as fast as it can. The I/O size is just under 8 MBs. The application is multithreaded and configured to have up to four I/Os active.

  2. List the active streams and get the stream ID of the application's GRIO stream:

    $ grioqos -sv
    /mirror:
      Dynamic          0.06 MB/s  b77c9351-7b63-1029-8f56-08006913a7f7
      App (6754151)   20.00 MB/s  03041498-871c-1029-87e2-08006913a7f7

  3. Monitor the application stream:

    $ grioqos -m 03041499-871c-1029-87e2-08006913a7f7 1
    
    IRIX64 octane 6.5 01062343 IP30 07/21/05
    
    Filesystem: /mirror
    
    App (6754151) 20.00 MB/s 03041498-871c-1029-87e2-08006913a7f7
    
    -           bytes msecs  bckt (max)  total rate  bklg issd idle thrt wait
    -           bytes    ms    MB    MB     MB MB/s    MB    MB   %    %    %
    21:00:38 20971520  1000 26.97 40.00   0.00 0.00  0.00 15.82   -    -    -
    21:00:39 20971520  1000  7.63 40.00  31.64 25.5  0.00 23.73   0    5  100
    21:00:40 20971520  1000  4.12 40.00  63.28 28.1 15.82 15.82   0   88  100
    21:00:41 20971520  1000  0.61 40.00  94.92 29.1 23.73  7.91   0  100   85
    21:00:42 20971520  1000  5.01 40.00 110.74 25.9 23.73  7.91   0  100   72
    21:00:43 20971520  1000  1.49 40.00 134.47 25.5 23.73  7.91   0  100   55
    21:00:44 20971520  1000  5.89 40.00 158.20 25.2 31.64  0.00   0  100   71
    21:00:45 20971520  1000  2.41 40.00 174.02 23.8 23.73  7.91   0  100   60
    21:00:46 20971520  1000  6.80 40.00 197.75 23.8 31.64  0.00   0  100   65
    21:00:47 20971520  1000  3.29 40.00 213.57 22.9 23.73  7.91   0  100   66
    21:00:48 20971520  1000  7.69 40.00 237.30 23.0 31.64  0.00   0  100   61
    21:00:49 20971520  1000  4.18 40.00 253.12 22.3 23.73  7.91   0  100   70
    21:00:50 20971520  1000  0.67 40.00 276.86 22.4 23.73  7.91   0  100   55
    21:00:51 20971520  1000  5.13 40.00 292.68 21.9 23.73  7.91   0  100   70
    ...

    The first few samples show that the token bucket bckt is initially full, which allows the overall data rate rate to jump above the reserved 20 MB/s briefly (see “Monitoring Stream and I/O Metrics”).

    The stream utilization metrics idle, thrt, and wait show that while the application is draining its token bucket, the application spends all of its time waiting for I/O to the device. Very quickly, the token bucket empties completely and GRIO begins to throttle the application. thrt jumps to 100%. wait drops to around 60-70%, which is consistent with the qualified bandwidth.

    The maximum this filesystem can deliver is 30MB/s, therefore a reservation of 20MB/s should keep the filesystem active approximately two-thirds of the time, which is what we see. The application is clearly very efficient about issuing I/O to the filesystem (multithreaded with four active I/Os), because there is never any point when the stream is idle and the filesystem does not have I/O to process.

  4. To simulate an interruption, temporarily suspend the application in userspace (sending it a SIGSTOP). The grioqos -m output would change as follows:

    21:01:04 20971520  1000  7.42 40.00 561.62 21.2 15.82  0.00   0  100   61
    21:01:05 20971520  1000 11.82 40.00 577.44 21.0  0.00  0.00  31   44   49
    21:01:06 20971520  1000 32.04 40.00 577.44 20.2  0.00  0.00 100    0    0
    21:01:07 20971520  1000 40.00 40.00 577.44 19.5  0.00  0.00 100    0    0
    21:01:08 20971520  1000 40.00 40.00 577.44 18.9  0.00  0.00 100    0    0
    ...

    The application stops issuing I/O completely and immediately the utilization metrics change:

    • The token bucket fills

    • Any remaining I/O on the throttle queue drains out ( thrt goes to 0)

    • The stream becomes completely idle


    Note: The rate metric, which computes the overall data rate, is updated even while the stream is idle and gradually decreases during this period of inactivity.


  5. Restart the application. The grioqos -m output changes accordingly:

    21:01:12 20971520  1000 16.10 40.00 593.26 17.1  0.00 23.73  23    0   77
    21:01:13 20971520  1000  4.80 40.00 632.81 17.8 15.82 15.82   0   70   99
    21:01:14 20971520  1000  1.53 40.00 664.45 18.1 23.73  7.91   0  100  100
    21:01:15 20971520  1000  5.93 40.00 688.18 18.3 31.64  0.00   0  100   66
    21:01:16 20971520  1000  2.42 40.00 704.00 18.2 23.73  7.91   0  100   60
    21:01:17 20971520  1000  6.82 40.00 727.73 18.3 31.64  0.00   0  100   64
    21:01:18 20971520  1000  3.31 40.00 743.55 18.3 23.73  7.91   0  100   63
    21:01:19 20971520  1000  7.71 40.00 767.29 18.4 31.64  0.00   0  100   63

    There is a small initial burst as the token bucket is drained and GRIO throttles the application to 20 MB/s.

  6. During the same run, we collect low-level QoS statistics. At the start of the run, use -i to display all of the intervals that are being monitored in the kernel:

    $ grioqos -i 03041498-871c-1029-87e2-08006913a7f 1
    
    IRIX64 octane 6.5 01062343 IP30 07/21/05
    
    Filesystem: /mirror
    
    App (6754151) 20.00 MB/s 03041498-871c-1029-87e2-08006913a7f7
    
    -          interval   minbw   maxbw  lastbw  minio  maxio lastio
    -                 -    MB/s    MB/s    MB/s     ms     ms     ms
    21:00:38        1io       -       -       -  296.8 1004.3 1004.3
    +               2io   29.32   32.89   32.89  402.2  967.8  967.8
    +               3io   30.68   31.94   31.00  505.0  882.0  882.0
    +               4io   31.01   31.38   31.38  611.6  788.4  788.4
    +               5io   31.46   31.46   31.46  690.1  690.1  690.1
    +               6io       -       -       -      -      -      -
    +              10io       -       -       -      -      -      -
    +             100ms   29.32   32.89   32.89  402.2  967.8  967.8
    +             200ms   29.32   32.89   32.89  402.2  967.8  967.8
    +             500ms   30.68   31.00   31.00  716.5  882.0  882.0
    +            1000ms   31.46   31.46   31.46  690.1  690.1  690.1
    +            2000ms       -       -       -      -      -      -
    +            5000ms       -       -       -      -      -      -
    +            1500io       -       -       -      -      -      -

    There are 14 intervals being monitored for this stream. This sample was collected just after the application was started and only a small number of I/Os had been issued. There is insufficient data to compute some of these metrics and a number of samples are displayed as “ -” .

  7. Select two intervals (500ms and 2000ms) and monitor them during the course of the run:

    $ grioqos -I "500ms,2000ms" 03041498-871c-1029-87e2-08006913a7f 2
    
    IRIX64 octane 6.5 01062343 IP30 07/21/05
    Filesystem: /mirror
    
    App (6754151) 20.00 MB/s 03041498-871c-1029-87e2-08006913a7f7
    
    -          interval   minbw   maxbw  lastbw  minio  maxio lastio
    -                 -    MB/s    MB/s    MB/s     ms     ms     ms
    21:00:38      500ms       -       -       -      -      -      -
    +            2000ms       -       -       -      -      -      -
    21:00:40      500ms   30.37   31.93   31.48  479.0 1009.0  959.2
    +            2000ms   31.46   31.46   31.46  789.4  789.4  789.4
    21:00:42      500ms   18.91   32.31   20.14  479.0 1224.1 1224.1
    +            2000ms   25.37   31.72   25.37  789.4 1057.5 1057.5
    21:00:44      500ms   18.91   32.31   20.75  479.0 1588.4 1583.9
    +            2000ms   19.74   31.72   20.38  789.4 1527.7 1527.7
    ...

    As seen in the high-level metrics, there is an initial burst of I/O before the application is throttled by GRIO. The current bandwidth lastbw quickly stabilizes at around 20 MB/s. After the application is suspended in userspace, the low-level QoS statistics clearly record the interruption:

    21:01:13      500ms    1.15   32.31   27.27  479.0 1609.3  554.7
    +            2000ms    1.15   31.72    3.19  789.4 1594.4  816.3
    21:01:15      500ms    1.15   34.95   31.96  479.0 1609.3  988.3
    +            2000ms    1.15   33.08   33.08  788.7 1594.4  861.1
    ...