Chapter 13. Cluster Database Management

This chapter contains the following:

Performing Cluster Database Backup and Restoration

You should perform a database backup whenever you want to save the database and be able to restore it to the current state at a later point.

You can use the following methods to restore the database:

  • If the database is accidentally deleted from a node, use the fs2d daemon to replicate the database from another node in the pool.

  • If you want to be able to recreate the current configuration, use the build_cmgr_script script. You can then recreate this configuration by running the script generated.

  • If you want to retain a copy of the database and all node-specific information such as local logging, use the cdbBackup and cdbRestore commands.

Restoring the Database from Another Node

If the database has been accidentally deleted from an individual administration node, you can replace it with a copy from another administration node. Do not use this method if the cluster database has been corrupted.

Do the following:

  1. Stop the CXFS daemons by running the following command on each administration node:

    # /etc/init.d/cxfs_cluster stop

  2. Run cdbreinit on administration nodes that are missing the cluster database.

  3. Restart the daemons by running the following commands on each administration node:

    # /etc/init.d/cxfs_cluster start

    The fs2d daemon will then replicate the cluster database to those nodes from which it is missing

Using build_cmgr_script for the Cluster Database

You can use the build_cmgr_script command from one node in the cluster to create a cmgr script that will recreate the node, cluster, switch, and filesystem definitions for all nodes in the cluster database. You can then later run the resulting script to recreate a database with the same contents; this method can be used for missing or corrupted cluster databases.


Note: The build_cmgr_script script does not contain local logging information, so it cannot be used as a complete backup/restore tool.

To perform a database backup, use the build_cmgr_script script from one node in the cluster, as described in “Creating a cmgr Script Automatically” in Chapter 11.


Caution: Do not make configuration changes while you are using the build_cmgr_script command.

By default, this creates a cmgr script in the following location:

/tmp/cmgr_create_cluster_clustername_processID

You can specify another filename by using the -o option.

To perform a restore on all nodes in the pool, do the following:

  1. Stop CXFS services for all nodes in the cluster.

  2. Stop the cluster database daemons on each node.

  3. Remove all copies of the old database by using the cdbreinit command on each node.

  4. Execute the cmgr script (which was generated by the build_cmgr_script script) on the node that is defined first in the script. This will recreate the backed-up database on each node.


    Note: If you want to run the generated script on a different node, you must modify the generated script so that the node is the first one listed in the script.


  5. Restart cluster database daemons on each node.

For example, to backup the current database, clear the database, and restore the database to all administration nodes, do the following on administration nodes as directed:

On one:
# /var/cluster/cmgr-scripts/build_cmgr_script -o /tmp/newcdb
Building cmgr script for cluster clusterA ...
build_cmgr_script: Generated cmgr script is /tmp/newcdb

On one:
# stop cx_services for cluster clusterA

On each:
# /etc/init.d/cxfs_cluster stop

On each:
# /usr/cluster/bin/cdbreinit

On each:
# /etc/init.d/cxfs_cluster start

On the *first* node listed in the /tmp/newcdb script:
# /tmp/newcdb

Using cdbBackup and cdbRestore for the Cluster Database and Logging Information

The cdbBackup and cdbRestore commands backup and restore the cluster database and node-specific information, such as local logging information. You must run these commands individually for each node.

To perform a backup of the cluster, use the cdbBackup command on each node.


Caution: Do not make configuration changes while you are using the cdbBackup command.

To perform a restore, run the cdbRestore command on each node. You can use this method for either a missing or corrupted cluster database. Do the following:

  1. Stop CXFS services.

  2. Stop cluster services on each node.

  3. Remove the old database by using the cdbreinit command on each node.

  4. Stop cluster services again (these were restarted automatically by cdbreinit in the previous step) on each node.

  5. Use the cdbRestore command on each node.

  6. Start cluster services on each node.

For example, to backup the current database, clear the database, and then restore the database to all administration nodes, do the following as directed on administration nodes in the cluster:

On each:
# /usr/cluster/bin/cdbBackup

On one:
# stop cx_services for cluster clusterA

On each:
# /etc/init.d/cxfs_cluster stop
On each:
# /usr/cluster/bin/cdbreinit

On each (again):
# /etc/init.d/cxfs_cluster stop

On each:
# /usr/cluster/bin/cdbRestore

On each:
# /etc/init.d/cxfs_cluster start

For more information, see the cdbBackup and cdbRestore man page.

Checking the Cluster Configuration with cxfs-config

The cxfs-config command displays and checks configuration information in the cluster database. You can run it on any administration node in the cluster.

By default, cxfs-config displays the following:

  • Cluster name and cluster ID

  • Tiebreaker node

  • Networks for CXFS kernel-to-kernel messaging


    Note: Use of these networks is deferred.


  • Nodes in the pool:

    • Node ID

    • Cell ID (as assigned by the kernel when added to the cluster and stored in the cluster database)

    • Status of CXFS services (configured to be enabled or disabled)

    • Operating system

    • Node function

  • CXFS filesystems:

    • Name, mount point (enabled means that the filesystem is configured to be mounted; if it is not mounted, there is an error)

    • Device name

    • Mount options

    • Potential metadata servers

    • Nodes that should have the filesystem mounted (if there are no errors)

    • Switches:

      • Switch name, user name to use when sending a telnet message, mask (a hexadecimal string representing a 64-bit port bitmap that indicates the list of ports in the switch that will not be fenced)

      • Ports on the switch that have a client configured for fencing at the other end

  • Warnings or errors

For example:

thump# /usr/cluster/bin/cxfs-config
Global:
    cluster: topiary (id 1)
    tiebreaker: <none>

Networks:
    net 0: type tcpip  192.168.0.0      255.255.255.0   
    net 1: type tcpip  134.14.54.0      255.255.255.0   
    net 2: type tcpip  1.2.3.4          255.255.255.0   

Machines:
    node leesa: node 6     cell 2  enabled  Linux32 client_only 
        fail policy: Fence
        nic 0: address: 192.168.0.164 priority: 1 network: 0
        nic 1: address: 134.14.54.164 priority: 2 network: 1

    node thud: node 8     cell 1  enabled  IRIX    client_admin
        fail policy: Fence
        nic 0: address: 192.168.0.204 priority: 1 network: 0
        nic 1: address: 134.14.54.204 priority: 2 network: 1

    node thump: node 1     cell 0  enabled  IRIX    server_admin
        fail policy: Fence
        nic 0: address: 192.168.0.186 priority: 1 network: 0
        nic 1: address: 134.14.54.186 priority: 2 network: 1

Filesystems:
    fs dxm: /mnt/dxm             enabled 
        device = /dev/cxvm/tp9500a4s0
        options = []
        servers = thump (1)
        clients = leesa, thud, thump

Switches:
    switch 0: admin@asg-fcsw1      mask 0000000000000000
        port 8: 210000e08b0ead8c thump
        port 12: 210000e08b081f23 thud

    switch 1: admin@asg-fcsw0      mask 0000000000000000

Warnings/errors:
    enabled machine leesa has fencing enabled but is not present in switch database

The command has the following options:

  • -ping contacts each NIC in the machine list and displays if the packets is transmitted and received. For example:

    node leesa: node 6     cell 2  enabled  Linux32 client_only 
       fail policy: Fence
       nic 0: address: 192.168.0.164 priority: 1 
           ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
           ping: round-trip min/avg/max = 0.477/0.666/1.375 ms
       nic 1: address: 134.14.54.164 priority: 2 
           ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
           ping: round-trip min/avg/max = 0.469/0.645/1.313 ms

  • -xfs lists XFS information for each CXFS filesystem, such as size. For example:

    Filesystems:
        fs dxm: /mnt/dxm             enabled 
            device = /dev/cxvm/tp9500a4s0
            options = []
            servers = thump (1)
            clients = leesa, thud, thump
            xfs:
                magic: 0x58465342
                blocksize: 4096
                uuid: 3459ee2e-76c9-1027-8068-0800690dac3c
                data size 17.00 Gb
    

  • -xvm lists XVM information for each CXFS filesystem, such as volume size and topology. For example:

    Filesystems:
        fs dxm: /mnt/dxm             enabled 
            device = /dev/cxvm/tp9500a4s0
            options = []
            servers = thump (1)
            clients = leesa, thud, thump
            xvm:
                vol/tp9500a4s0                    0 online,open
                    subvol/tp9500a4s0/data     35650048 online,open
                        slice/tp9500a4s0           35650048 online,open
                
                data size: 17.00 Gb

  • -check performs extra verification, such as XFS filesystem size with XVM volume size for each CXFS filesystem. This option may take a few moments to execute.

For more information, see the cxfs-config man page.