Chapter 3. Programming Checkpoint and Restart

This chapter describes how to write applications that checkpoint and restart processes gracefully. Code samples are provided, and code fragments at the end of the chapter show sample usage of IRIX CPR library routines.

For applications with checkpoint-unsafe objects, the principal programming concern is setting up event handlers to perform clean-up at checkpoint time and to restore network sockets, graphic state, tape I/O, and CD-ROM status (and so on) at restart time.

This chapter contains the following sections:

Design of Checkpoint and Restart

This section describes some design issues that governed the implementation of CPR.

POSIX Compliance

IRIX Checkpoint and Restart is based on POSIX 1003.1m draft 11, and was initially implemented in IRIX release 6.4. Because POSIX draft standards often change radically from inception to approval, the interfaces in IRIX release 6.5 are not guaranteed to be fully compliant, nor can SGI make any assurance that they will conform to the POSIX 1003.1m standard when it is eventually approved.

IRIX Extensions

The cpr command is not specified in POSIX 1003.1m draft 11. It is an IRIX specific command provided for the convenience of customers; see the cpr(1) man page. The POSIX draft standard covers only the programming interfaces for checkpoint and restart.

The ckpt_stat() function, which returns information about the status of checkpoint statefiles, is not specified in POSIX 1003.1m draft 11; see the ckpt_stat(3) man page. The ckpt_setup() function specified in the POSIX draft is unimplemented; when applications call this routine, it is a no-op.

Programming Issues

This section describes the CPR library interfaces and signals, and shows how to write programs that set up event handlers using atcheckpoint() to prepare for a checkpoint and using atrestart() to restore non-checkpointable system objects at restart time. See “Limitations and Caveats” for a list of non-checkpointable objects.

CPR Library Interfaces

Application interfaces for adding CPR event handlers are contained in the C library, and are listed below. For more information, see the atcheckpoint(3C) man page.

  • atcheckpoint()- add an event handler function for checkpointing

  • atrestart()- add an event handler function for restarting

The checkpoint and restart library interfaces are contained in the libcpr.so  dynamic shared object (DSO). When using this library, include the <ckpt.h>  header file:

#include <ckpt.h>

The available library routines are listed below. For more information, see the ckpt_create(3) man page.

  • ckpt_create()- checkpoint a process or set of processes into statefiles

  • ckpt_restart()- resume execution of checkpointed process or process group

  • ckpt_stat()- retrieve status information about a checkpoint statefile

  • ckpt_remove()- delete a checkpoint statefile directory

  • ckpt_setup()- control checkpoint creation attributes (currently a no-op)

In the following discussion, “set of processes” can mean one process or a group of processes.

SIGCKPT and SIGRESTART

When a program (such as the cpr command) calls ckpt_create() to create a checkpoint, that function sends a SIGCKPT signal to the set of processes specified by the checkpoint ID argument to ckpt_create(). Applications add an event handler to catch SIGCKPT if they need to restore non-checkpointable objects such as network sockets, a graphic state, or file pointers to a CD-ROM. The default action is to ignore SIGCKPT.

After sending a SIGCKPT signal, ckpt_create() waits for the application to finish its signal handling before CPR proceeds with further checkpoint activities. At restart time, the first thing ckpt_restart() runs is the application's SIGRESTART signal handler, if one exists. This implies that checkpoint and restart can “get stuck” in the SIGCKPT and SIGRESTART handling routines.

When a program calls ckpt_restart() to resume execution from a checkpoint, the restart function sends a SIGRESTART signal to the set of processes checkpointed in the statefile specified by the path argument to ckpt_restart(). Applications add an event handler to catch SIGRESTART if they need to restore non-checkpointable objects such as sockets, a graphic state, or CD-ROM files. The default action is to ignore SIGRESTART.

Adding Event Handlers

The SIGCKPT and SIGRESTART signals are not intended to be handled directly by an application. Instead, CPR provides two C library functions that allow applications to establish a list of functions for handling checkpoint and restart events.

The atcheckpoint() routine takes one parameter—the name of your application's checkpoint handling function—and adds this function to the list of functions that get called upon receipt of SIGCKPT. Similarly, the atrestart() routine registers the specified callback function for execution upon receipt of SIGRESTART.

These functions are recommended for use during initialization when applications expect to be checkpointed but contain checkpoint-unsafe objects. An application may register multiple checkpoint event handlers to be called when checkpoint occurs, and multiple restart event handlers to be called when restart occurs.

At checkpoint time and at restart time, registered functions are called in the same order as the first-in-first-out order of their registration with atcheckpoint() or atrestart(), respectively. This is an important consideration for applications that need to register multiple callback handlers for checkpoint or restart events.

Use of atcheckpoint() and atrestart() ensures that registered signal handlers are invoked only when a checkpoint or restart of the application is in progress (as opposed to the user sending the signals directly via a function such as sigsend()).


Caution: If applications catch the SIGCKPT and SIGRESTART signals directly, it could undo all of the automatic CPR signal handler registration provided by atcheckpoint() and atrestart(), including CPR signal handlers that some libraries may reserve without the application programmer's knowledge.


Preparing for Checkpoint

If an application needs to restore network sockets, graphic state, tape I/O, CD-ROM mounts, or some other non-checkpointable system object, it should set up automatic checkpoint and restart event handlers using the recommended library routines.

The following sample code calls atcheckpoint() and atrestart() to set up functions for handling checkpoint and restart events. It is possible for this setup to fail on operating systems that do not (yet) support CPR.

Example 3-1. Checkpoint and Restart Event Handling

#include <stdlib.h>
#include <ckpt.h>
extern void ckptSocket(void);
extern void ckptXserver(void);
extern void restartSocket(void);
extern void restartXserver(void);
main(int argc, char *argv[])
{
    int err = 0;
    if ((atcheckpoint(ckptSocket) == -1) ||
        (atcheckpoint(ckptXserver) == -1) ||
        (atrestart(restartSocket) == -1) ||
        (atrestart(restartXserver) == -1))
            perror("Cannot setup checkpoint and restart handling");
    /* 
     *  processing ...
     */
    exit(0);
}


Handling a Checkpoint

Suppose your program mounts an ISO 9660 format CD-ROM, from which it reads data as a basis for more complex processing. Since the CD-ROM is not a checkpointable object, your program needs to record the file pointer position, close all open files on CD-ROM, and perhaps unmount the CD-ROM device.

The following sample code marks the current file position in the open cdFile, saves it for restoration at restart time, closes cdFile, and unmounts the CD-ROM.

Example 3-2. Routine to Handle Checkpoint

#include <sys/types.h>
#include <sys/mount.h>
#include <stdio.h>
extern char *cdFile;
extern FILE fpCD;
long cdOffset;
catchCKPT()
{
    cdOffset = ftell(fpCD);
    fclose(fpCD);
    umount("/CDROM");
    exit(0);
}


Note: The checkpoint event handler should return directly to its calling routine—it must not contain any sigsetjmp() or siglongjmp() code.



Checkpoint Time-outs

For programs that must wait for some external condition before exiting the checkpoint event handling function, it might be wise to set a time-out. For example, if a program is waiting for data to arrive over a TCP socket that must be shut down before checkpoint, and the data never arrives, the program should not wait forever.

The alarm() system call sends a SIGALRM signal to the calling program after a specified number of seconds. Since the default action for SIGALRM is for the program to exit, put this call near the top of the checkpoint handling routines to set a 1-minute time-out.

Example 3-3. Setting an Alarm in Callback

extern int sock; /* file descriptor for socket */
catchCKPT()
{
    alarm(60);
    close(sock);
    alarm(0);
}


Handling a Restart

Suppose your program that unmounted the ISO 9660 CD-ROM at checkpoint time is restarted with the cpr command. Now it needs to ensure that the CD-ROM is mounted, reopen the formerly active file, and seek to the previous file offset position. Once it accomplishes all that, your program is ready to continue reading data from the CD-ROM.

The following sample code waits for the CD-ROM to become mounted, then reopens the cdFile, and seeks to the remembered offset position in cdFile.

Example 3-4. Routine to Handle Restart

#include <unistd.h>
#include <stdio.h>
extern char *cdFile;
extern FILE fpCD;
extern long cdOffset;
catchRESTART()
{
    while (access("/CDROM/data", R_OK) == -1) {
        perror("please insert CDROM");
        sleep(60);
    }
    if ((fpCD = fopen(cdFile, "r")) == NULL)
        perror("cannot open cdFile"), exit(1);
    if (fseek(fpCD, cdOffset, SEEK_SET))
        perror("cannot seek to cdOffset"), exit(1);
    /*
     * etc. */
}


Note: The restart event handler should return directly to its calling routine—it must not contain any sigsetjmp() or siglongjmp() code.



Checkpoint and Restart of System Objects

Due to the nature of UNIX process checkpoint and restart, it is hard, if not impossible, to claim that everything that an original process owns or connects with can be restored. The following list defines what is clearly supported (checkpoint safe), and what limitations are known to exist. For items not listed, application writers and customers must decide what is checkpoint-safe.

Checkpoint-Safe Objects

All known checkpoint-safe entities are listed in the following sections..

Supported Process Groupings

CPR works on UNIX processes, process groups, terminal control sessions, array sessions, process hierarchies (trees of processes started from a common ancestor), IRIX jobs, POSIX threads (see the pthreads(5) man page), IRIX sproc() share groups (see the sproc(2) man page), and random process sets.

User Memory

All user memory regions are saved and restored, including user stack and data regions. Note that user text, without being saved at checkpoint time, is remapped directly at restart from the application binaries and libraries. However, by using REPLACE as the file disposition default, even user texts can be saved. The saved texts may not replace the originals if the originals are not changed after the checkpoint. Locked memory regions are restored to remain locked at restart.

System States in Kernel

Most of the important kernel states are restored at restart to be identical to the original ones, such as basic process and user information, signal disposition and signal mask, scheduling information, owner credentials, accounting data, resource limits, current working directory, root directory, user semaphores (see the usnewsema(3P) man page), and so on.

System Calls

All system calls are checkpoint safe as long as the applications are handling the system call returns and error numbers correctly. Fast system calls are allowed to finish before checkpoint proceeds. Slow system calls are interrupted and may return to the calling routine with partial results. Applications using system calls that can return partial results need to check for and be prepared to deal with partial results. Slow system calls with no results are transparently reissued at restart.

A number of selected system calls are handled individually. The sleep() system call is reissued for the amount of time remaining at checkpoint time; see the sleep(3C) man page. Restart of the alarm() system call is similar; the remainder of time recorded at checkpoint elapses before it times out; for more information see the alarm(2) man page.

Signals

Undelivered signals and queued signals are saved at checkpoint and delivered at restart.

Open Files and Devices

Processes with regular open files or mapped files, including NFS mounted files, can be checkpointed and restarted without many restrictions as long as users choose the correct file disposition in the CPR attribute file, as described in the section “Checkpoint and Restart Attributes” in Chapter 1.

All file locks are also restored at restart. If the file regions that the restarting process needs to lock have already been locked by another process, CPR tries to acquire the locks a few times before it aborts the restart.

Supported special files are:

  • /dev/tty

  • /dev/console

  • /dev/zero

  • /dev/null

  • ccsync (see the ccsync(7M) man page).

Inherited file descriptors are restored at restart. Applications using R10000 counters through the /proc interface are checkpoint safe, provided the /proc file descriptor is closed.

Open Pipes

Applications with SVR3 or SVR4 pipes open can be checkpointed and restarted without restrictions. Pipeline data and streams pipe message modes are also saved and restored.

Shared Memory and Semaphores

Applications using SVR4 shared memory can be checkpointed and restarted; for more information see the shmop(2) man page. The original shared memory ID (shmid) is now restored—this was not the case in the IRIX 6.4 release.

Applications using POSIX semaphores, or shared arena semaphores and locks, can be checkpointed and restarted. For more information, see the psema(D3X) or usinit(3P) man pages, respectively.

Application Licensing

Applications using node-lock licenses (one license per machine) are generally safe for checkpoint and restart. Applications using floating licenses may be safe for checkpoint and restart, depending on the license library implementation. In IRIX 6.5 and later, the FLEXlm library includes atcheckpoint() and atrestart() event handlers.

If your license library employs open-and-warm sockets without CPR-aware handlers, you should do one of the following:

  • Add atcheckpoint() and atrestart() event handlers to your application. The atcheckpoint() handler should disconnect license checking, and the atrestart() handler should reconnect license checking.

  • Ask your license software vendor to add similar handlers to their license library.

Network Applications Using Array Services

Jobs started with Power ChallengeArray or ChallengeArray services can be checkpointed and restarted, provided the jobs have a unique ASH (array session handle) number; for more information see the array_services(5) man page. Array services jobs may use several methods to generate a new ASH, including calling newarraysess(). For more information, see the newarraysess(2) man page.

During an array checkpoint, a checkpoint server is responsible for starting, monitoring, and synchronizing all checkpoint clients running on its different machines based on the given ASH. Statefiles are saved locally on each machine for all processes with the given ASH running on that machine. Restart occurs in a similar fashion, with the restart server synchronizing with all local restart clients to restore all processes on different machines.

An interactive array job with a controlling terminal on a given machine has to be checkpointed and restarted from that very same machine. Otherwise the controlling terminal cannot be restored.

Other Supported Command

Applications using blockproc() and unblockproc() are checkpoint safe; for more information see the blockproc(2) man page.

Memory regions added by calling prctl() with the PR_ATTACHADDR argument can be safely checkpointed and restarted. For more information, see the prctl(2) man page.

The Power Fortran join synchronization accelerator is checkpoint safe. For more information, see the ccsync(7M) man page.

Applications using R10000 counters are checkpoint safe. For more information, see the libperfex(3C) or perfex(1) man page.

Compatibility Between Releases

A statefile checkpointed in any current release will most likely be able to restart in future releases, owing to the object-oriented architecture of the CPR implementation.

With certain limitations, an object of system functionality available in any current release will be remapped to some new replacement object at restart if the original object becomes obsolete in a future release.

Limitations and Caveats

Various CPR restrictions and warnings are listed in the following sections..

SVR4 Semaphores and Messages

Applications using SVR4 semaphores, or SVR4 messages, cannot be checkpointed and restarted; for more information see the semop(2) or msgop(2) man pages, respectively.

Networking Socket Connections

Generally speaking, an application with open socket connections (see the socket(2) man page) should not be checkpointed and restarted without special CPR-aware signal handling code. An application needs to catch SIGCKPT and SIGRESTART, and run signal handlers to disconnect any open socket before checkpoint, and reconnect the socket after restart.

Since the MPI (message passing interface) library uses sockets for network connections to the array services daemon arrayd, it is generally not possible to checkpoint MPI code. For more information, refer to the MPI and PVM User's Guide, or see the mpi(5) man page.

Other Special Devices

Any device or special file not listed in the section “Open Files and Devices” as a checkpoint-safe device can be considered not supported for checkpoint and restart. This includes tape, CD-ROM, and other special real or pseudo devices. Again, applications need to close these devices before checkpoint by catching SIGCKPT, and reopen them after restart by catching SIGRESTART.

Graphics

X terminals, and other kinds of graphics terminals, are not supported. Applications with these devices open have to be CPR-aware and do proper clean-up by catching SIGCKPT and SIGRESTART and calling appropriate signal handling routines. (This is similar to how socket connections should be handled.)

Miscellaneous Restrictions

Applications with open directories cannot be properly checkpointed; for more inforamtion see the directory(3C) man page.

A potential problem exists with setuid() programs. When restarting resources such as file descriptors, locks acquired with a different (especially higher) privilege may not succeed. For example, a root process may first open some files, and then call setuid(guest). If this process is checkpointed after setuid(), the corresponding restart fails because the files opened by root cannot be accessed by guest. Similar restrictions apply for a non-root process' inherited resources, such as file descriptors from a privileged process.

Saving State Using ckpt_create()

The ckpt_create() function checkpoints a process or set of processes into a statefile. The following code shows sample usage of the ckpt_create() function.

Example 3-5. Sample Usage of the ckpt_create() Function

#include <ckpt.h>
static int
do_checkpoint(ckpt_id_t id, u_long type, char *pathname)
{
    int rc;
    printf("Checkpointing id %d (type %s) to directory %s\n",
        id, ckpt_type_str(CKPT_REAL_TYPE(type)), pathname);
    if ((rc = ckpt_create(pathname, id, type, 0, 0)) != 0) {
        printf("Failed to checkpoint process %lld\n", id);
        return (rc);
    }
    return (0);
}

The global variable cpr_flags, defined in <ckpt.h>, permits programmers to specify checkpoint-related options. The following flags may be bitwise ORed into cpr_flags before a call to ckpt_create():

CKPT_CHECKPOINT_CONT
 

Have checkpoint target processes continue running after this checkpoint is finished. This overrides the default WILL policy, and the WILL policy specified in a user's CPR attribute file.

CKPT_CHECKPOINT_KILL
 

Kill checkpoint target processes after this checkpoint is finished. This is the default WILL policy, but overrides a CONT setting in a user's CPR attribute file.

CKPT_CHECKPOINT_UPGRADE
 

Use this flag only when issuing a checkpoint immediately before an operating system upgrade. This forces a save of all executable files and DSO libraries used by the current processes, so that target processes can be restarted in an upgraded environment. This flag must be used again if restarted processes are again checkpointed in the new environment.

CKPT_OPENFILE_DISTRIBUTE
 

Instead of saving open files under statefile, save open files in the same directory where they reside, and assign a unique name to identify them. For example, if a checkpointed process had the /etc/passwd file open with this flag set, the open file would be saved in /etc/passwd.ckpt.pidXXX. Although security could be a concern, this mode is useful when disk space is at a premium.

Since cpr_flags is a process-wide global variable, make sure to reset or clear flags appropriately before a second call to ckpt_create().

Resuming Using ckpt_restart()

The ckpt_restart() function resumes execution of a checkpointed process or processes. The following code shows sample usage of the ckpt_restart() function.

Example 3-6. Sample Usage of the ckpt_restart() Function

#include <ckpt.h>
static int
do_restart(char *path)
{
    printf("Restarting processes from directory %s\n", path);
    if (ckpt_restart(path, 0, 0) < 0) {
        printf("Restart %s failed\n", path);
        return (-1);
    }
}

The global variable cpr_flags, defined in <ckpt.h>, permits programmers to specify restart-related options. The following flags may be bitwise ORed into cpr_flags before a call to ckpt_restart():

CKPT_RESTART_INTERACTIVE
 

Make a process or group of processes interactive (that is, subject to UNIX job control), if the original processes were interactive. The calling process or the calling process' group leader becomes the group leader of restarted processes, but the original process group ID cannot be restored. Without this flag, the default is to restart target processes as an independent process group with the original group ID restored.

CKPT_RESTART_MIGRATE
 

Migrate process memory so it is restored to the location in the system topology where the restart operation is executing, for example, within a specific cpuset, within the global cpuset, and so on. The global cpuset is the pool of CPUs not assigned to any specific named cpuset. Without this option, the default restart behavior on NUMA systems is to restore process memory back to where it was at the time of the checkpoint. See the migration(3) man page for scenarios that may prevent pages from migrating properly. This option has no effect on non-NUMA systems.

Since cpr_flags is a process-wide global variable, make sure to reset or clear flags appropriately before a second call to ckpt_restart().

Checking Status Using ckpt_stat()

The ckpt_stat() function retrieves status information about a checkpoint statefile. The following code shows sample usage of the ckpt_stat() function.

Example 3-7. Sample Usage of the ckpt_stat() Function

#include <ckpt.h>
static int
ckpt_info(char *path)
{
    ckpt_stat_t *sp, *sp_next;
    int rc;
    if ((rc = ckpt_stat(path, &sp)) != 0) {
        printf("Cannot get information on CPR file %s\n", path);
        return (rc);
    }
    printf("\nInformation About Statefile %s (%s):\n",
        path, rev_to_str(sp->cs_revision));
    while (sp) {
        printf(" Process:\t\t%s\n", sp->cs_psargs);
        printf(" PID,PPID:\t\t%d,%d\n", sp->cs_pid, sp->cs_ppid);
        printf(" PGRP,SID:\t\t%d,%d\n", sp->cs_pgrp, sp->cs_sid);
        printf(" Working at dir:\t%s\n", sp->cs_cdir);
        printf(" Num of Openfiles:\t%d\n", sp->cs_nfiles);
        printf(" Checkpointed @\t%s\n", ctime(&sp->cs_stat.st_mtime));
        sp_next = sp->cs_next;
        free(sp);
        sp = sp_next;
    }
    return (0);
}


Removing Checkpoints Using ckpt_remove()

The ckpt_remove() function deletes a checkpoint statefile directory.

The following code shows sample usage of the ckpt_remove() function.

Example 3-8. Sample Usage of the ckpt_remove() Function

#include <ckpt.h>
static int
do_remove(char *path)
{
    int rc = 0;
    if ((rc = ckpt_remove(path)) != 0) {
        printf("Remove checkpoint statefile %s failed\n", path);
        return (rc);
    }
}


Preparing Checkpoints Using ckpt_setup()

This function, described in the POSIX draft standard, is implemented as a no-op.

The following code shows the current implementation of the ckpt_create() function.

Example 3-9. Implementation of the ckpt_setup() Function

int ckpt_setup(struct ckpt_args *args[], size_t nargs)
{
    return(0);
}