Chapter 3. Setting Up GRIO

This chapter discusses the following:

Installation Requirements

To operate in local volume domain on an IRIX node, you must install the eoe.sw.grio2 product.

To enable clustered GRIO support on IRIX, you must install both eoe.sw.grio2 and cxfs.sw.grio2_cell.

In a cluster deployment, every node must be GRIO-enabled. Consult the CXFS multiOS release notes to determine whether your platform has been GRIO-enabled.

Deployment Considerations for Cluster Volumes

You must observe the following constraints when setting up GRIO filesystems:

  • If any of the logical units (LUNs) on a particular device will be managed as GRIO filesystems, then all of the LUNs should be managed as GRIO filesystems. Typically, there will be hardware contention between separate LUNs, both in the storage area network (SAN) and within the storage device. If only a subset of the LUNs are managed, I/O to the unmanaged LUNs could still cause oversubscription of the device and could in turn violate guarantees on the managed filesystems.

  • A storage device containing GRIO-managed filesystems should not be shared between clusters. The GRIO daemons running within different clusters are not coordinated, and unmanaged I/O from one cluster can cause guarantees in the other cluster to be violated.

Data Layout

To set up a filesystem on a RAID device such that you achieve correct filesystem device alignment and maximize I/O performance, remember to do the following:

  • Ensure that each data partition is correctly aligned with the internal disk layout of its LUN

  • Set XVM stripe parameters correctly

  • Pass correct volume geometry (stripe unit and width) to mkfs_xfs(1)

For more information, see the grio2(5) man page.

Choosing a Qualified Bandwidth

You can adjust the qualified bandwidth to reflect the specific trade-off between delivered quality of service and utilization of the storage infrastructure for your situation.

The following affect the qualified bandwidth you will choose:

  • The hardware configuration

  • The application work flow and I/O load

  • The specific quality-of-service requirements of applications and users

Typically, the first concern is that the required bandwidth can be delivered by the storage system. The second concern is the timeliness or service times observed for individual I/Os.

Determining qualified bandwidth is an iterative process. There are several strategies you can use to determine and fine-tune the qualified bandwidth for a filesystem. For example:

  • Establish a given bandwidth and then adjust so that the quality-of-service requirements are met. Do the following:

    1. Make an initial estimate of the qualified bandwidth. You can use the fixed storage architecture parameters (RAID performance, number of HBAs, etc.) to estimate the anticipated peak bandwidth that can be delivered. The qualified bandwidth is then determined as an appropriate fraction of this peak.

    2. Configure ggd2 (either using griotab or the cluster database) appropriately.

    3. Run a test workload.

    4. Monitor the delivered performance.

    5. Refine the estimate as needed.

  • Establish that quality-of-service requirements are satisfied and then adjust to maximize throughput. To do this, increase the load until the storage system can no longer meet the application quality-of-service requirements; the qualified bandwidth must be lower than this value.

  • Explore the space of possible workloads and test whether a given workload satisfies both bandwidth and application quality-of-service requirements.

Although the hardware configuration provides a basis for calculating an estimate, remember that the qualified bandwidth is also affected by the particular work-flow issues and the quality-of-service requirements of individual applications. For example, an application that has large tunable buffers (such as a flipbook application that does aggressive RAM caching) can tolerate a greater variation in I/O service time than can a media broadcast system that must cue and play a sequence of very short clips. In the first example, the qualified bandwidth would be configured as a larger proportion of the sustained maximum. In the second example, the qualified bandwidth might be reduced to minimize the system utilization levels and improve I/O service times.

A high qualified bandwidth will generally achieve the greatest overall throughput but with the consequence that individual applications may intermittently experience longer service times for some I/Os. This variation in individual service times is referred to as jitter; as the storage system approaches saturation, service-time jitter will typically increase. A lower qualified bandwidth means that total throughput will be reduced, but because the I/O infrastructure is under less stress, individual requests will typically be processed with less variation in individual I/O service times. Figure 3-1 illustrates these basic ideas. The precise relationship between load on the storage system and variation in I/O service time is highly dependent on your storage hardware.

Figure 3-1. Tradeoff Between Throughput and Variation in I/O Service Time (Jitter)

Tradeoff Between Throughput and Variation in
I/O Service Time (Jitter)

Some storage devices (particularly those with real-time schedulers) can provide a fixed bound on I/O service time even at utilization levels close to their maximum. In this case, the qualified bandwidth can be set higher even where applications have tight quality-of-service requirements. The user-adjustable qualified bandwidth provides the flexibility required for GRIO to work with both dedicated real-time devices as well as more common off-the-shelf storage systems.


Note: In all cases, you must verify the chosen qualified bandwidth by testing the storage system under a realistic workload.

You can use the grioqos(1M) tool to measure the delivered quality-of-service performance. This tool extracts quality-of-service performance for an active stream without disturbing the application or the kernel scheduler. GRIO maintains very detailed performance metrics for each active stream. Using the grioqos command while running a workload test lets you answer questions such the following for every active stream in the system:

  • What has been the worst observed bandwidth over a 1-second period?

  • What is the worst observed average I/O service time for a sequence of 10 I/Os?

For more information about GRIO tools and the mechanisms for accessing quality-of-service data within the kernel, see the grioqos(1M) and grioqos(5) man pages.

Local Volumes and Cluster Volumes

A managed volume can be one of the following:

  • A local volume is attached to the node in question. This volume is in the local volume domain .

    Local volumes are always managed by the instance of the ggd2 daemon running on the node to which they are attached.

  • A cluster volume is used with CXFS filesystems and is shared among nodes in a cluster. This volume is in the cluster volume domain.

    All cluster volumes are managed by a single instance of the ggd2 daemon running on one of the CXFS administration nodes in the cluster; this node is referred to as the GRIO server. There is one GRIO server per cluster.

    The GRIO server is elected automatically. You can relocate it by using the grioadmin(1M) command. The GRIO server must be a CXFS administration node. Client-only nodes will never be elected as GRIO servers.

    If a given CXFS administration node has locally attached volumes and has also been selected as the GRIO server, then the ggd2 running on that node will serve dual-duty and will manage both its own local volume domain and the cluster volume domain.

For more information about CXFS, see “Cluster Volume Domain Configuration” and CXFS Administration Guide for SGI InfiniteStorage.

Local Volume Domain Configuration

To configure GRIO for local volume domains, you must provide information in the /etc/griotab file.

The /etc/griotab file lists the volumes that should be managed by GRIO and the maximum qualified bandwidth they can deliver. This file is read at startup and whenever ggd2 receives a SIGHUP signal (such as when you issue a killall -HUP ggd2 command). See the ggd2(1M) and griotab(4) man pages for more information.

Cluster Volume Domain Configuration

You must use the cmgr(1M) cluster configuration tool to configure cluster volumes for GRIO.

To mark a filesystem as GRIO-managed and set its qualified bandwidth, use the following commands:

# /usr/cluster/bin/cmgr
Welcome to SGI Cluster Manager Command-Line Interface

cmgr> modify cxfs_filesystem fsname in cluster clustername
cmgr> set grio_managed to true
cmgr> set grio_qualified_bandwidth to qualified_bandwidth
cmgr> done

The value for qualified_bandwidth is specified in bytes per second. For example, the following sets the qualified bandwidth to 200 MB/s (200*1024*1024):

cmgr> set grio_qualified_bandwidth to 209715200

To show the current status of a shared filesystem:

cmgr> show cxfs_filesystem fsname in cluster clustername
...
               GRIO Managed Filesystem: true
               GRIO Managed Bandwidth: qualified_bandwidth
...


Note: In cmgr, you must unmount a filesystem before you can modify it.

A prompting mode is also available for cmgr. For more information, see the CXFS Administration Guide for SGI InfiniteStorage.

If you have installed the cxfs.sw.grio2_cell subsystem and turned on GRIO, the ggd2 daemon will automatically query the cluster configuration database for GRIO volume configuration information. ggd2 dynamically tracks updates to the cluster database.

Licensing

The GRIO FLEXlm licensing regime controls a number of configuration parameters including the total number of active streams and the total aggregate qualified bandwidth of filesystems under management. Separate license types are provided for the local and cluster volume domains, and license constraints are enforced for each volume domain separately.

The ggd2 daemon checks the license at startup, whenever it detects a configuration change, or when it receives a SIGHUP signal.

License enforcement for streams is straightforward. The license for a given volume domain specifies a maximum number of active streams. All reservation requests above this limit are denied.

In the case of bandwidth, a license specifies the maximum total aggregate qualified bandwidths for all volumes within the volume domain. The ggd2 daemon validates the configuration at startup and whenever the configuration is changed:

  • For the local domain, ggd2 tracks changes to /etc/griotab (ggd2 is notified of changes with a SIGHUP)

  • For the cluster volume domain, ggd2 tracks the relevant cluster database entries for cluster volume qualified bandwidth

If the configuration of a volume domain is altered and becomes unlicensed, ggd2 enters a passive mode in which all further requests pertaining to that domain, with the exception of release requests, are denied. A message is sent to the system log and that volume domain will remain deactivated until the configuration returns to a licensed state, at which time another message will be logged indicating the domain is again active.

For more information, see the license.dat (5) man page.

Starting GRIO

Although you can have both the GRIOv1 subsystem and the GRIOv2 subsystem installed on the same machine, only one of these subsystems can be active. The subsystem that is turned on in chkconfig is started by default at boot time and remains in effect until the chkconfig setting is changed and the machine is rebooted.

Starting GRIOv2 when GRIOv1 is Active

Suppose you were running GRIOv1 and wanted to switch to GRIOv2. After performing the configuration tasks discussed in this guide, you would do the following:

  1. Turn off GRIOv1 ( grio) and turn on GRIOv2 (grio2):

    # chkconfig grio off
    # chkconfig grio2 on

  2. Reboot the system to allow the kernel to be reinitialized with the GRIOv2 scheduler.

You do not need to manually start GRIOv2 because the daemon is automatically started upon reboot when the chkconfig setting is on.


Note: If GRIOv1 is still enabled when you perform a GRIOv2 library call, the return will be ENOSYS. If you do not have either the GRIOv1 or GRIOv2 kernel initialized, the return will be EAGAIN, indicating that the subsystem has not yet initialized and the application should retry the request.


Starting GRIOv2 when GRIOV1 is Not Active

If you have not run GRIOv1 during the current boot session, you can start GRIOv2 by doing the following:

  1. Turn on GRIOv2:

    # chkconfig grio2 on

  2. Start GRIOv2:

    # /etc/init.d/grio2 start

You must perform the manual start only once. When the machine is rebooted, GRIOv2 will be restarted automatically as long as its chkconfig setting remains on.

Monitoring GRIO

You can use the griomon(1M) tool to monitor active streams within the system and display their high-level performance metrics, such as the currently allocated bandwidth and total bytes transferred.