Chapter 6. Administration and Maintenance

Chapter 6. Administration and Maintenance
Prev		Next

You can perform offline administration tasks using the cmgr(1M) command when logged into any CXFS administration node (one that is installed with the cxfs_cluster product) in the pool, or when the GUI is connected to any CXFS administration node in the pool. However, when the filesystems are mounted, administration must be done from the metadata server. (You cannot use cmgr(1M) or connect the GUI to a client-only node.)

The following are the same in CXFS and XFS:

Disk concepts
Filesystem concepts
User interface
Filesystem creation

For more information about these topics, see IRIX Admin: Disks and Filesystems.

The rest of this chapter discusses the following topics:

If you have upgraded directly from 6.5.12f or earlier, you must manually convert you filesystem definitions to the new format. See “Convert Filesystem Definitions for Upgrades” in Chapter 2.

Transforming an Existing Node into a Client-Only Node

If you are upgrading to 6.5.19f from 6.5.17f or earlier and you want to change an existing node with weight 1 (which as of 6.5.18f was defined as a server-capable administration node) to be a client-only node, you must do the following:

Ensure that the node is not listed as a potential metadata server for any filesystem. See “Modify a CXFS Filesystem with the GUI” in Chapter 4, or “Modify a CXFS Filesystem with cmgr” in Chapter 5.
Stop the CXFS services on the node. See “Stop CXFS Services (Normal CXFS Shutdown) with the GUI” in Chapter 4, or “Stop CXFS Services with cmgr” in Chapter 5.
Modify the cluster so that it no longer contains the node. See “Modify a Cluster Definition with the GUI” in Chapter 4, or “Modify a Cluster with cmgr” in Chapter 5.
Delete the node definition. See “Delete a Node with the GUI” in Chapter 4, or “Delete a Node with cmgr” in Chapter 5.
Install the node with the cxfs_client package. See“Install IRIX Software on a Client-Only Node” in Chapter 2.
Redefine the node and use a node function of client-only. See “Define a Node with the GUI” in Chapter 4, or “Define a Node with cmgr” in Chapter 5.
Modify the cluster so that it contains the node. See “Modify a Cluster Definition with the GUI” in Chapter 4, or “Modify a Cluster with cmgr” in Chapter 5.
Start the CXFS services on the node. See “Start CXFS Services with the GUI” in Chapter 4, or “Start CXFS Services with cmgr” in Chapter 5.

NFS Export Scripts

When you install CXFS, the following default scripts are placed in the/var/cluster/clconfd-scripts directory:

cxfs-pre-mount
cxfs-post-mount
cxfs-pre-umount
cxfs-post-umount

These scripts allow you to use NFS to export the CXFS filesystems listed in /etc/exports if they are successfully mounted. The clconfd daemon executes these scripts before and after mounting or unmounting CXFS filesystems specified in the /etc/exports file. The files must be named exactly as above and must have root execute permission. You can modify these scripts if needed.

Note: The /etc/exports file describes the filesystems that are being exported to NFS clients. If a CXFS mount point is included in the exports file, the empty mount point is exported unless the filesystem is re-exported after the CXFS mount using the cxfs-post-mount script.

The /etc/exports file cannot contain any filesystems managed by IRIS FailSafe.

The following arguments are passed to the files:

cxfs-pre-mount: filesystem device name
cxfs-post-mount: filesystem device name and exit code
cxfs-pre-umount: filesystem device name
cxfs-post-umount: filesystem device name and exit code

Because the filesystem name is passed to the scripts, you can write the scripts so that they take different actions for different filesystems; because the exit codes are passed to the post files, you can write the scripts to take different actions based on success or failure of the operation.

The clconfd daemon checks the exit code for these scripts. In the case of failure (nonzero), the following occurs:

For cxfs-pre-mount and cxfs-pre-umount, the corresponding mount or unmount is not performed.
For cxfs-post-mount and cxfs-post-umount, clconfd will retry the entire operation (including the -pre- script) for that operation.

This implies that if you do not want a filesystem to be mounted on a host, the cxfs-pre-mount script should return a failure for that filesystem while the cxfs-post-mount script returns success.

Unmounting `lofs` File Systems

You must unmount lofs mounts of a CXFS filesystem before attempting to unmount the CXFS filesystem. You can use a script such as the following to unexport and locally unmount an lofs filesystem:

#!/bin/ksh
#/var/cluster/clconfd-scripts/cxfs-pre-umount
echo "$0: Preparing to unmount CXFS file system \"$1\""
MNTPNT=`mount | grep "$1 " | cut -f 3 -d" "`
print "MNTPNT $MNTPNT"
if [ -n "${MNTPNT}" ] ; then
    lofslist=`mount | grep 'type lofs' | grep "${MNTPNT}" | nawk '{print $3}'`
    set -e
    for lofs in ${lofslist}
    do
        echo "$0: unmounting $lofs"
        umount -k $lofs
    done
    if /usr/etc/exportfs | /sbin/grep -q "${MNTPNT}" ; then
        echo "$0: unexporting $MNTPNT"
        /usr/etc/exportfs -u ${MNTPNT}
    fi
fi

Using `telnet` and I/O Fencing

If there are problems with a node, the I/O fencing software sends a message via the telnet protocol to the appropriate Fibre Channel switch. The switch only allows one telnet session at a time; therefore, if you are using I/O fencing, you must keep the telnet port on the Fibre Channel switch free at all times. Do not perform a telnet to the switch and leave the session connected.

Using `fsr(1M)`

The fsr(1M) command can only be used on the active metadata server for the filesystem; the bulkstat system call has been disabled for CXFS clients. You should use fsr manually, and only on the active metadata server for the filesystem.

Using `cron(1)` in a CXFS Cluster

The cron(1) daemon can cause severe stress on a CXFS filesystem if multiple nodes in a cluster start the same filesystem-intensive task simultaneously. An example of such a task is one that uses the find(1) command to search files in a filesystem.

Any task initiated using cron on a CXFS filesystem should be launched from a single node in the cluster, preferably from the active metadata server.

Using Hierarchical Storage Management (HSM) Products

CXFS supports the use of hierarchical storage management (HSM) products through the data management application programming interface (DMAPI), also know as X/Open Data Storage Management Specification (XSDM). An example of an HSM product is the Data Migration Facility (DMF). DMF is the only HSM product currently supported with CXFS.

The HSM application must make all of its DMAPI interface calls through the active metadata server. The CXFS client nodes do not provide a DMAPI interface to CXFS mounted filesystems. A CXFS client routes all of its communication to the HSM application through the metadata server. This generally requires that the HSM application run on the CXFS metadata server.

To use HSM with CXFS, do the following:

Install eoe.sw.dmi on each CXFS administration node. For client-only nodes, no additional software is required.
Use the dmi option when mounting a filesystem to be managed. For more information about this step, see “Define a CXFS Filesystem with the GUI” in Chapter 4, or “Modify a Cluster with cmgr” in Chapter 5.
Start the HSM application on the active metadata server for each filesystem to be managed.

Discovering the Active Metadata Server for a Filesystem

You can discover the active metadata server using the GUI or the cluster_status(1M) or clconf_info commands.

Metadata Server Discovery with the GUI

Do the following:

Select View: Filesystems
In the view area, click on the name of the filesystem you wish to view. The name of the active metadata server is displayed in the details area to the right.

Figure 6-1 shows an example.

Figure 6-1. Window Showing the Metadata Server

Metadata Server Discovery with `cluster_status`

You can use the cluster_status command to discover the active metadata server. For example:

# /var/cluster/cmgr-scripts/cluster_status

+ Cluster=cxfs6-8  FailSafe=Not Configured CXFS=ACTIVE               15:15:33    
   Nodes =   cxfs6    cxfs7    cxfs8
FailSafe =
    CXFS =      UP       UP       UP

CXFS              DevName           MountPoint           MetaServer     Status
        /dev/cxvm/concat0             /concat0                cxfs7         UP

For more information, see “Check Cluster Status with cluster_status” in Chapter 9.

Metadata Server Discovery with `clconf_info`

You can use the clconf_info command to discover the active metadata server for a given filesystem. For example, the following shows that cxfs7 is the metadata server:

cxfs6 # clconf_info
Membership since Thu Mar  1 08:15:39 2001
Node         NodeId     Status    Age   Incarnation     CellId
cxfs6             6         UP      0             0          2
cxfs7             7         UP      0             0          1
cxfs8             8         UP      0             0          0
1 CXFS FileSystems
/dev/cxvm/concat0 on /concat0  enabled  server=(cxfs7)  2 client(s)=(cxfs8,cxfs6)

Metadata Server Recovery

Note: In this release, relocation is disabled by default and recovery is supported only when using standby nodes.

Relocation and recovery are fully implemented, but the number of associated problems prevents full support of these features in the current release. Although data integrity is not compromised, cluster node panics or hangs are likely to occur. Relocation and recovery will be fully supported in a future release when these issues are resolved.

If the node acting as the metadata server for a filesystem dies, another node in the list of potential metadata servers will be chosen as the new metadata server. This assumes that at least two potential metadata servers are listed when you define a filesystem. For more information, see “Define a CXFS Filesystem with the GUI” in Chapter 4, or “Modify a Cluster with cmgr” in Chapter 5.

The metadata server that is chosen must be a filesystem client; other filesystem clients will experience a delay during the relocation process. Each filesystem will take time to recover, depending upon the number of active inodes; the total delay is the sum of time required to recover each filesystem. Depending on how active the filesystem is at the time of recovery, the total delay could take up to several minutes per filesystem.

If a CXFS client dies, the metadata server will clean up after the client. Other CXFS clients may experience a delay during this process. A delay depends on what tokens, if any, that the deceased client holds. If the client has no tokens, then there will be no delay; if the client is holding a token that must be revoked in order to allow another client to proceed, then the other client will be held up until recovery returns the failed nodes tokens (for example, in the case where the client has the write token and another client wants to read). The actual length of the delay depends upon the following:

CXFS kernel membership situation
Whether any servers have died
Where the servers are in the recovery order relative to recovering this filesystem

The deceased CXFS client is not allowed to rejoin the CXFS kernel membership until all metadata servers have finished cleaning up after the client.

Shutdown of the Database and CXFS

This section tells you how to perform the following:

If there are problems, see Chapter 10, “Troubleshooting”. For more information about states, Chapter 9, “Monitoring Status”.

Cluster Database Shutdown

A cluster database shutdown terminates the following user-space daemons that manage the cluster database:

cad
clconfd
cmond
crsd
fs2d

After shutting down the database on a node, access to the shared filesystems remains available and the node is still a member of the cluster, but the node is not available for database updates. Rebooting of the node results in a restart of all services.

To perform a cluster database shutdown, enter the following:

# /etc/init.d/cluster stop

If you also want to disable the daemons from restarting at boot time, enter the following:

# /etc/chkconfig cluster off

Node Status and Cluster Database Shutdown

A cluster database shutdown is appropriate when you want to perform a maintenance operation on the node and then reboot it, returning it to ACTIVE status.

If you perform a cluster database shutdown, the node status will be DOWN, which has the following impacts:

The DOWN node is still considered part of the cluster, but unavailable.
The DOWN node does not get cluster database updates; however, it will be notified of all updates after it is rebooted.

Missing cluster database updates can cause problems if the kernel portion of CXFS is active. That is, if the node continues to have access to CXFS, the node's kernel level will not see the updates and will not respond to attempts by the remaining nodes to propagate these updates at the kernel level. This in turn will prevent the cluster from acting upon the configuration updates.

Restart the Cluster Database

To restart the cluster database, enter the following:

# /etc/init.d/cluster start

Normal CXFS Shutdown

You should perform a normal CXFS shutdown when you want to stop all CXFS services on a node and remove it from the CXFS kernel membership quorum. A normal CXFS shutdown does the following:

Unmounts all the filesystems except those for which it is the active metadata server; those filesystems for which the node is the active metadata server will become inaccessible from the node after it is shut down.
Terminates the CXFS kernel membership of this node in the cluster.
Marks the node as INACTIVE.

The effect of this is that cluster disks are unavailable and no cluster database updates will be propagated to this node. Rebooting the node leaves it in the shutdown state.

If the node on which you shut down CXFS services is an active metadata server for a filesystem, then that filesystem will be recovered by another node that is listed as one of its potential metadata servers. For more information, see “Define a CXFS Filesystem with the GUI” in Chapter 4, or “Modify a Cluster with cmgr” in Chapter 5. The server that is chosen must be a filesystem client; other filesystem clients will experience a delay during the recovery process.

Note: If the node on which the CXFS shutdown is performed is the sole potential metadata server (that is, there are no other nodes listed as potential metadata servers for the filesystem), then you should use the CXFS GUI or the cmgr(1M) command to unmount the filesystem from all nodes before performing the shutdown.

To perform a normal CXFS shutdown, enter the following cmgr(1M) command:

cmgr> stop cx_services on node nodename for cluster clustername

You could also use the GUI; see “Stop CXFS Services (Normal CXFS Shutdown) with the GUI” in Chapter 4.

Note: This action deactivates CXFS services on one node, forming a new CXFS kernel membership after deactivating the node. If you want to stop services on multiple nodes, you must enter this command multiple times or perform the task using the GUI.

After you shut down cluster services on a node, the node is marked as inactive and is no longer used when calculating the CXFS kernel membership. See “Node Status” in Chapter 9.

Node Status and Normal CXFS Shutdown

After performing normal CXFS shutdown on a node, its state will be INACTIVE; therefore, it will not impact CXFS kernel membership quorum calculation. See “Normal CXFS Shutdown”.

When You Should Not Perform a Normal CXFS Shutdown

You should not perform a normal CXFS shutdown under the following circumstances:

On the local node, which is the CXFS administration node on which the cluster manager is running or the node to which the GUI is connected
If stopping CXFS services on the node will result in loss of CXFS kernel membership quorum
If the node is the only available metadata server for one or more active CXFS filesystems

If you want to perform a CXFS shutdown under these conditions, you must perform a forced CXFS shutdown. See “Forced CXFS Shutdown: Revoke Membership of Local Node”.

Rejoining the Cluster after a Normal CXFS Shutdown

The node will not rejoin the cluster after a reboot. The node will rejoin the cluster only when CXFS services are explicitly reactivated with the GUI (see “Start CXFS Services with the GUI” in Chapter 4) or the following command:

cmgr> start cx_services on node nodename for cluster clustername

Forced CXFS Shutdown: Revoke Membership of Local Node

A forced CXFS shutdown is appropriate when you want to shutdown the local node even though it may drop the cluster below its CXFS kernel membership quorum requirement.

CXFS does the following:

Shuts down all cluster filesystems on the local node
Attempts to access the cluster filesystems result in I/O error (you may need to manually unmount the filesystems)
Removes this node from the CXFS kernel membership
Marks the node as DOWN

Caution: A forced CXFS shutdown may cause the cluster to fail if the cluster drops below CXFS kernel membership quorum.

If you do a forced shutdown on an active metadata server, it loses membership immediately. At this point another potential metadata server must take over (and recover the filesystems) or quorum is lost and a forced shutdown follows on all nodes.

If you do a forced CXFS shutdown that forces a loss of quorum, the remaining part of the cluster (which now must also do a forced shutdown) will not reset the departing node.

To perform a forced CXFS shutdown, enter the following cmgr(1M) command to revoke the CXFS kernel membership of the local node:

cmgr> admin cxfs_stop

You can also perform this action with the GUI; see “Revoke Membership of the Local Node with the GUI” in Chapter 4. This action can also be triggered automatically by the kernel after a loss of CXFS kernel membership quorum.

Node Status and Forced CXFS Shutdown

After a forced CXFS shutdown, the node is still considered part of the configured cluster and is taken into account when propagating the cluster database (these services are still running) and when computing the cluster database (fs2d) membership quorum (this could cause a loss of quorum for the rest of the cluster, causing the other nodes to do a forced shutdown). The state is INACTIVE.

It is important that this node stays accessible and keeps running the cluster infrastructure daemons to ensure database consistency. In particular, if more than half the nodes in the pool are down or not running the infrastructure daemons, cluster database updates will stop being propagated and will result in inconsistencies. To be safe, you should remove those nodes that will remain unavailable from the cluster and pool. See:

Rejoining the Cluster after a Forced CXFS Shutdown

After a forced CXFS shutdown, the local node will not resume CXFS kernel membership until the node is rebooted or until you explicitly allow CXFS kernel membership for the local node by entering the following cmgr(1M) command:

cmgr> admin cxfs_start

You can also perform this step with the GUI; see “Allow Membership of the Local Node with the GUI” in Chapter 4.

If you perform a forced shutdown on a CXFS administration node, you must restart CXFS on that node before it can return to the cluster. If you do this while the cluster database still shows that the node is in a cluster and is activated, the node will restart the CXFS kernel membership daemon. Therefore, you may want to do this after resetting the database or after stopping CXFS services.

For example:

cmgr> admin cxfs_start

Serial Hardware Reset Capability and Forced CXFS Shutdown

Caution: If you perform forced shutdown on an IRIX node with serial hardware reset capability and the shutdown will not cause loss of cluster quorum, the node will be reset (rebooted) by the appropriate node.

For more information about resets, see “Serial Hardware Reset” in Chapter 1.

Avoiding a CXFS Restart at Reboot

The cxfs_cluster flag to chkconfig(1M) controls the clconfd daemon on a CXFS administration node. The cxfs_client flag controls the cxfs_client daemon on a client-only node.

If these flags are turned off, the daemons will not be started at the next reboot and the kernel will not be configured to join the cluster. It is useful to turn them off before rebooting if you want to temporarily remove the nodes from the cluster for system or hardware upgrades or for other maintenance work.

To avoid restarting the cluster database on a CXFS administration node, set the cluster option to off.

For example, do the following on a CXFS administration node:

# /etc/chkconfig cxfs_cluster off
# /etc/chkconfig cluster off
# reboot

Log File Management

You should rotate the log files at least weekly so that your disk will not become full. The following sections provide example scripts. For information about log levels, see “Configure Log Groups with the GUI” in Chapter 4.

Rotating All Log Files

You can run the /var/cluster/cmgr-scripts/rotatelogs script to copy all files to a new location. This script saves log files with the day and the month name as a suffix. If you run the script twice in one day, it will append the current log file to the previous saved copy. The root crontab file has an entry to run this script weekly.

The script syntax is as follows:

/var/cluster/cmgr-scripts/rotatelogs [-h] [-d|-u]

If no option is specified, the log files will be rotated. Options are as follows:

`-h`	Prints the help message. The log files are not rotated and other options are ignored.
`-d`	Deletes saved log files that are older than one week before rotating the current log files. You cannot specify this option and `-u`.
`-u`	Unconditionally deletes all saved log files before rotating the current log files. You cannot specify this option and `-d`.

By default, the rotatelogs script will be run by crontab once a week, which is sufficient if you use the default log levels. If you plant to run with a high debug level for several weeks, you should reset the crontab entry so that the rotatelogs script is run more often.

On heavily loaded machines, or for very large log files, you may want to move resource groups and stop CXFS services before running rotatelogs.

Rotating Large Log Files

You can use a script such as the following to copy large files to a new location. The files in the new location will be overwritten each time this script is run.

#!/bin/sh
# Argument is maximum size of a log file (in characters) - default: 500000

size=${1:-500000}
find /var/cluster/ha/log -type f ! -name '*.OLD' -size +${size}c -print | while read log_file; do
        cp ${log_file} ${log_file}.OLD
        echo '*** LOG FILE ROTATION ' `date` '***' > ${log_file}
done

Also see “/etc/config/cad.options on CXFS Administration Nodes” in Chapter 2, and “/etc/config/fs2d.options on CXFS Administration Nodes” in Chapter 2

Volume Management

CXFS uses the XVM volume manager. XVM can combine many disks into high transaction rate, high bandwidth, and highly reliable filesystems. CXFS uses XVM to provide the following:

Disk striping
Mirroring
Concatenation
Advanced recovery features

Note: The xvm(1M) command must be run on a CXFS administration node. If you try to run an XVM command before starting the CXFS daemons, you will get a warning message and be put into XVM's local domain.

When you are in XVM's local domain, you could define your filesystems, but then when you later start up CXFS you will not see the filesystems. When you start up CXFS, XVM will switch to cluster domain and the filesystems will not be recognized because you defined them in local domain; to use them in the cluster domain, you would have to use the give command. Therefore, it is better to define the volumes directly in the cluster domain.

For more information, see the XVM Volume Manager Administrator's Guide.

Disk Management

This section describes the CXFS differences for backups, NFS, Quotas, and Samba.

Disk Backups

CXFS enables the use of commercial backup packages such as VERITAS NetBackup and Legato NetWorker for backups that are free from the local area network (LAN), which allows the backup server to consolidate the backup work onto a backup server while the data passes through a storage area network (SAN), rather than through a lower-speed LAN.

For example, a backup package can run on a host on the SAN designated as a backup server. This server can use attached tape drives and channel connections to the SAN disks. It runs the backup application, which views the filesystems through CXFS and transfers the data directly from the disks, through the backup server, to the tape drives.

This allows the backup bandwidth to scale to match the storage size, even for very large filesystems. You can increase the number of disk channels, the size of the backup server, and the number of tape channels to meet the backup-bandwidth requirements.

NFS

You can put an NFS server on top of CXFS so that computer systems that are not part of the cluster can share the filesystems. This can be performed on any node.

Quotas

XFS quotas are supported. However, the quota mount options must be the same on all mounts of the filesystem. You can administer quotas from anywhere in the cluster, just as if it was an XFS filesystem.

Samba

You can run Samba on top of CXFS, allowing Windows machines to support CXFS and have access to the filesystem.

Filesystem Maintenance

Although filesystem information is traditionally stored in /etc/fstab on IRIX nodes, the CXFS filesystems information is relevant to the entire cluster and is therefore stored in the replicated cluster database instead.

As the administrator, you will supply the CXFS filesystem configuration by using the CXFS Cluster Manager tools. For information about the GUI, see “Filesystem Tasks with the GUI” in Chapter 4; for information about cmgr(1M), see “Cluster Tasks with cmgr” in Chapter 5.

The information is then automatically propagated consistently throughout the entire cluster. The cluster configuration daemon mounts the filesystems on each node according to this information, as soon as it becomes available.

A CXFS filesystem will be automatically mounted on all the nodes in the cluster. You can add a new CXFS filesystem to the configuration when the cluster is active.

Whenever the cluster configuration daemon detects a change in the cluster configuration, it does the equivalent of a mount -a command on all the filesystems that are configured.

Caution: You must not modify or remove a CXFS filesystem definition while the filesystem is mounted. You must unmount it first and then mount it again after the modifications.

Mounting Filesystems

You supply mounting information with the GUI Mount a Filesystem task (which is part of the Set Up a New Filesystem guided configuration task) or with the modify subcommand to cmgr(1M). See the following:

For information about mounting using the GUI, see “Set Up a New CXFS Filesystem with the GUI” in Chapter 3, and “Define a CXFS Filesystem with the GUI” in Chapter 4.
For information about defining and mounting a new filesystem with cmgr, see “Modify a Cluster with cmgr” in Chapter 5.
For information about mounting a filesystem that has already been defined but is currently unmounted, see “Define a CXFS Filesystem with cmgr” in Chapter 5.

When properly defined and mounted, the CXFS filesystems are automatically mounted on each node by the local cluster configuration daemon, clconfd, according to the information collected in the replicated database. After the filesystems configuration has been entered in the database, no user intervention is necessary.

Caution: Do not attempt to use the mount(1M) command to mount a CXFS filesystem. Doing so can result in data loss and/or corruption due to inconsistent use of the filesystem from different nodes.

CXFS filesystems must be mounted on all nodes in the cluster or none. (Otherwise, the filesystem may be mounted on different nodes in an inconsistent way that may result in data loss and/or corruption.) The GUI and cmgr will not let you mount a filesystem on a subset of nodes in the cluster.

Mount points cannot be nested when using CXFS. That is, you cannot have a filesystem within a filesystem, such as /usr and /usr/home.

Unmounting Filesystems

To unmount CXFS filesystems, use the GUI Unmount a Filesystem task or the admin subcommand to cmgr. For information, see “Unmount a CXFS Filesystem with the GUI” in Chapter 4, or “Unmount a CXFS Filesystem with cmgr” in Chapter 5.

These tasks unmount a filesystem from all nodes in the cluster. Although this action triggers an unmount on all the nodes, some might fail if the filesystem is busy. On active metadata servers, the unmount cannot succeed before all of the CXFS clients have successfully unmounted the filesystem. All nodes will retry the unmount until it succeeds, but there is no centralized report that the filesystem has been unmounted on all nodes. To verify that the filesystem has been unmounted from all nodes, do one of the following:

Check the SYSLOG files on the metadata servers for a message indicating that the filesystem has been unmounted
Run the GUI or cmgr on the metadata server, disable the filesystem from the server and wait until the GUI shows that the filesystem has been fully disabled (it will be an error if it is still mounted on some CXFS clients and the GUI will show which clients are left)

Growing Filesystems

To grow a CXFS filesystem, do the following:

Unmount the CXFS filesystem. For information, see “Unmount a CXFS Filesystem with the GUI” in Chapter 4, or “Unmount a CXFS Filesystem with cmgr” in Chapter 5.
Mount the filesystem as an XFS filesystem. See IRIX Admin: Disks and Filesystems.
Use the xfs_growfs(1M) command to grow it.
Unmount the XFS filesystem with the umount(1M) command.
Mount the filesystem as a CXFS filesystem. See “Mount a CXFS Filesystem with the GUI” in Chapter 4, or “Define a CXFS Filesystem with cmgr” in Chapter 5.

Dump and Restore

You must perform dump and restore procedures from the active metadata server. The xfsdump(1M) and xfsrestore(1M) commands make use of special system calls that will only function on the metadata server.

The filesystem can have active clients during a dump process.

In a clustered environment, a CXFS filesystem may be directly accessed simultaneously by many CXFS clients and the active metadata server. With failover or simply metadata server reassignment, a filesystem may, over time, have a number of metadata servers. Therefore, in order for xfsdump to maintain a consistent inventory, it must access the inventory for past dumps, even if this information is located on another node. It is recommended that the inventory be made accessible by potential metadata server nodes in the cluster using one of the following methods:

Relocate the inventory to a shared filesystem.

For example:

On the node currently containing the inventory, enter the following:

# cp -r /var/xfsdump /shared_filesystem
# mv /var/xfsdump /var/xfsdump.bak
# ln -s ../shared_filesystem /var/xfsdump

On all other IRIX nodes in the cluster, enter the following:

# mv /var/xfsdump /var/xfsdump.bak
# ln -s ../shared_filesystem /var/xfsdump

Export the directory using an NFS shared filesystem.

For example:
- On the node currently containing the inventory, add /var/xfsdump to /etc/exports and then enter the following:
  # exportfs -a
- On all other IRIX nodes in the cluster, enter the following:
  # mv /var/xfsdump /var/xfsdump.bak # ln -s /hosts/hostname/var/xfsdump /var/xfsdump

Note: It is the /var/xfsdump directory that should be shared, rather than the /var/xfsdump/inventory directory. If there are inventories stored on various nodes, you can use xfsinvutil(1M) to merge them into a single common inventory, prior to sharing the inventory among the cluster.

Cluster Database Backup and Restore

You should perform a database backup whenever you want to save the database and be able to restore it to the current state at a later point.

You can use the following methods to restore the database:

If the database is accidentally deleted from a node, use the fs2d(1M) daemon to replicate the database from another node in the pool.
If you want to be able to recreate the current configuration, use the build_cmgr_script(1M) script. You can then recreate this configuration by running the script generated.
If you want to retain a copy of the database and all node-specific information such as local logging, use the cdbBackup(1M) and cdbRestore(1M) commands.

Restoring the Database from Another Node

If the database has been accidentally deleted from an individual node, you can replace it with a copy from another node. Do not use this method if the cluster database has been corrupted.

Do the following:

Stop the cluster database daemons by running the following command on each node:
# /etc/init.d/cluster stop
Run cdbreinit(1M) on nodes that are missing the cluster database.
Restart the cluster database daemons by running the following command on each node:
# /etc/init.d/cluster start
The fs2d(1M) daemon will then replicate the cluster database to those nodes from which it is missing

Using `build_cmgr_script(1M)` for the Cluster Database

You can use the build_cmgr_script(1M) command from one node in the cluster to create a cmgr(1M) script that will recreate the node, cluster, switch, and filesystem definitions for all nodes in the cluster database. You can then later run the resulting script to recreate a database with the same contents; this method can be used for missing or corrupted cluster databases.

Note: The build_cmgr_script script does not contain local logging information, so it cannot be used as a complete backup/restore tool.

To perform a database backup, use the build_cmgr_script script from one node in the cluster, as described in “Creating a cmgr Script Automatically” in Chapter 5.

Caution: Do not make configuration changes while you are using the build_cmgr_script command.

By default, this creates a cmgr script in the following location:

/tmp/cmgr_create_cluster_clustername_processID

You can specify another filename by using the -o option.

To perform a restore on all nodes in the pool, do the following:

Stop CXFS services for all nodes in the cluster.
Stop the cluster database daemons on each node.
Remove all copies of the old database by using the cdbreinit command on each node.
Execute the cmgr script (which was generated by the build_cmgr_script script) on the node that is defined first in the script. This will recreate the backed-up database on each node.
Note: If you want to run the generated script on a different node, you must modify the generated script so that the node is the first one listed in the script.
Restart cluster database daemons on each node.

For example, to backup the current database, clear the database, and restore the database to all nodes, do the following:

On one node:
# /var/cluster/cmgr-scripts/build_cmgr_script -o /tmp/newcdb
Building cmgr script for cluster clusterA ...
build_cmgr_script: Generated cmgr script is /tmp/newcdb

On one node:
# stop cx_services for cluster clusterA

On each node:
# /etc/init.d/cluster stop

On each node:
# /usr/cluster/bin/cdbreinit

On each node:
# /etc/init.d/cluster start

On the *first* node listed in the /tmp/newcdb script:
# /tmp/newcdb

Using `cdbBackup(1M)` and `cdbRestore(1M)` for the Cluster Database and Logging Information

The cdbBackup(1M) and cdbRestore(1M) commands backup and restore the cluster database and node-specific information, such as local logging information. You must run these commands individually for each node.

To perform a backup of the cluster, use the cdbBackup(1M) command on each node.

Caution: Do not make configuration changes while you are using the cdbBackup command.

To perform a restore, run the cdbRestore command on each node. You can use this method for either a missing or corrupted cluster database. Do the following:

Stop CXFS services.
Stop cluster services on each node.
Remove the old database by using the cdbreinit command on each node.
Stop cluster services again (these were restarted automatically by cdbreinit in the previous step) on each node.
Use the cdbRestore command on each node.
Start cluster services on each node.

For example, to backup the current database, clear the database, and then restore the database to all nodes, do the following:

On each node:
# /usr/cluster/bin/cdbBackup

On one node in the cluster:
# stop cx_services for cluster clusterA

On each node:
# /etc/init.d/cluster stop

On each node:
# /usr/cluster/bin/cdbreinit

On each node (again):
# /etc/init.d/cluster stop

On each node:
# /usr/cluster/bin/cdbRestore

On each node:
# /etc/init.d/cluster start

For more information, see the cdbBackup(1M) and cdbRestore(1M) man page.

`chkconfig` Flags

Note: These flags are not normally manipulated with the chkconfig command by the administrator; they are set or unset by the GUI or cmgr. These flags only control the processes, not the cluster. Stopping the processes that control the cluster will not stop the cluster, and starting the processes will start the cluster only if the CXFS services are marked as activated in the database.

CXFS has the following flags to the chkconfig(1M) command:

On CXFS administration nodes, cluster, which controls the other cluster administration daemons, such as the replicated cluster database. If it is turned off, the database daemons will not be started at the next reboot and the local copy of the database will not be updated if you make changes to the cluster configuration on the other nodes. This could cause problems later, especially if a majority of nodes are not running the database daemons. If the database daemons are not running, the cluster database will not be accessible locally and the node will not be configured to join the cluster.
On CXFS administration nodes, cxfs_cluster, which controls the clconfd daemon and whether or not the cxfs_shutdown command is used during a system shutdown.

The cxfs_shutdown command attempts to withdraw from the cluster gracefully before rebooting. Otherwise, the reboot is seen as a failure and the other nodes have to recover from it.
On client-only nodes, cxfs_client controls whether or not the cxfs_client daemon should be started.

System Tunable Parameters

Table 6-1 shows the system tunable parameters available with CXFS.

Table 6-1. System Tunable Parameters

Parameter

Description

cms_fence_timeout

Specifies the number of seconds to wait for clconfd to acknowledge a fence request. 0 is an infinite wait and is the default. If a non-zero value is set and the time-out expires, CXFS takes the action specified by the cms_fence_timeout_action parameter. This parameter may be changed at run time. Before setting the time-out, you should understand the ramifications of doing so on your system. Modification of this parameter is not generally recommended.

This parameter is located in /var/sysgen/mtune/cell.

cms_fence_timeout_action

Specifies the action to be taken when clconfd does not acknowledge a reset request (determined by cms_fence_timeout). cms_fence_timeout_action may be changed at run time, and may be set to one of the following. Before setting the time-out, you should understand the ramifications of doing so on your system. Modification of this parameter is not generally recommended.

0 - Causes the node waiting for the fence acknowledgement to forcibly withdraw from the cluster, equivalent to a forced shutdown that occurs when a node loses quorum (default). If clconfd is still present and functioning properly, it will then restart the kernel cms daemon and the node will attempt to rejoin the cluster.
1 - Clears all pending fence requests and continues (that is, fakes acknowledgment). CAUTION: Setting this value is potentially dangerous.
2 - Panics the local node

This parameter is located in /var/sysgen/mtune/cell.

cms_reset_timeout

Specifies the number of seconds to wait for clconfd to acknowledge a reset request. 0 is an infinite wait and is the default. If a non-zero value is set and the time-out expires, CXFS takes the action specified by the cms_reset_timeout_action parameter. This parameter may be changed at run time.

This parameter is located in /var/sysgen/mtune/cell.

cms_reset_timeout_action

Specifies the action to be taken when clconfd does not acknowledge a reset request (determined by cms_reset_timeout). cms_reset_timeout_action may be changed at run time, and may be set to one of the following:

0 - Causes the node waiting for the reset acknowledgement to forcibly withdraw from the cluster, equivalent to a forced shutdown that occurs when a node loses quorum (default). If clconfd is still present and functioning properly, it will then restart the kernel cms daemon and the node will attempt to rejoin the cluster.
1 - Clears all pending resets and continues (that is, fakes acknowledgment). CAUTION: Setting this value is potentially dangerous.
2 - Panics the local node

This parameter is located in /var/sysgen/mtune/cell.

cxfsd_min

Specifies the minimum number of cxfsd threads to run per CXFS filesystem.

The cxfsd threads do the disk block allocation for delayed allocation buffers in CXFS and the flushing of buffered data for files that are being removed from the local cache by the metadata server. The threads are allocated at filesystem mount time. The value of the cxfsd_min parameter at mount time remains in effect for a filesystem until it is unmounted.

The legal value for cxfsd_min is an integer in the range 1 through 256. The default is 4.

This parameter is located in /var/sysgen/mtune/cxfs.

cxfsd_max

Specifies the maximum number of cxfsd threads to run per CXFS filesystem. The value of the cxfsd_max parameter at mount time remains in effect for a filesystem until it is unmounted.

The legal value for cxfsd_max is an integer in the range 8 through 65536. The default is 256. The value for cxfsd_max cannot be less than the value specified for cxfsd_min.

This parameter is located in /var/sysgen/mtune/cxfs.

cxfs_relocation_ok

Specifies whether relocation is disabled or enabled:

0 - Disables relocation
1 - Enables relocation

Note: Relocation is disabled by default. SGI recommends that you do not enable relocation.

This parameter is located in /var/sysgen/mtune/cxfs.

cxfs_shutdown_time

Specifies the time it takes a node to take media offline after it has recognized that it has lost quorum . SGI recommends a value of 50 (0.5 seconds).

This parameter is located in /var/sysgen/mtune/cell.

mtcp_nodelay

Enables TCP_NODELAY on CXFS message channels. SGI recommends that you do not change this value.

This parameter is located in /var/sysgen/mtune/cell.

mtcp_hb_period

Specifies the length of time, in HZ, that CXFS waits for heartbeat from other nodes before declaring node failure. SGI recommends a value of 500 (5 seconds).

This parameter is located in /var/sysgen/mtune/cell.

mtcp_reserve_size

Sets the size of the TCP window. SGI recommends that you do not change this value.

This parameter is located in /var/sysgen/mtune/cell.

mtcp_mesg_validate

Enables checksumming on top of what TCP is already doing. Normally, this is not needed and is only used if TCP data corruption is suspected.

The legal values are as follows:

0 - Performs no validation
1 - Generates checksums, but does not perform validation
2 - Generates and validates checksums, warns (via a SYSLOG message) on validation failure
3 - Generates and validates checksums, warns and returns an error message on validation failure
4 - Generates and validates checksums, warns and panics on validation error

This parameter is located in /var/sysgen/mtune/cell.

Prev	Table of Contents	Next
Chapter 5. Reference to cmgr Tasks for CXFS		Chapter 7. Coexecution with IRIS FailSafe