Chapter 12. Process-Level Parallelism

The process is the traditional unit of UNIX execution. The concept of the process (and its relationship to the concept of a thread) are covered under “Process-Level Parallelism”. The purpose of this chapter is to review how you can use IRIX processes to perform parallel processing in a single program.

Using Multiple Processes

In general, you can create a new process for each unit of work that your program could do in parallel. The processes can share the address space of the original program, or each can have its own address space. You design the processes so that they coordinate work and share data using any and all of the interprocess communication (IPC) features discussed in Part II, “Interprocess Communication.”

Software products from Silicon Graphics use process-level parallelism. For example, the IRIS Performer graphics library normally creates a separate lightweight process to manage the graphics pipe in parallel with rendering work. The run-time library for statement-level parallelism creates a pool of lightweight processes and dispatches them to execute parts of loop code in parallel (see “Managing Statement-Parallel Execution”).

Process Creation and Share Groups

The most important system functions you use to create and manage processes are summarized in Table 12-1.

Table 12-1. Commands and System Functions for Process Management

Function Name

Purpose and Operation

npri(1)

Command to run a process at a specified nondegrading priority.

runon(1)

Command to run a process on a specific CPU.

fork(2)

Create a new process with a private address space.

pcreate(3C)

Create a new process with a private address space running a designated program with specified arguments.

sproc(2)

Create a new process in the caller's address space using a private stack.

sprocsp(2)

Create a new process in the caller's address space using a preallocated stack area.

prctl(2)

Query and set assorted process attributes.

sysmp(2)

Query multiprocessor status and assign processes to CPUs.

syssgi(2)

Query process virtual and real memory use, and other operations.

You can initiate a program at a specified nondegrading priority (explained under “Process Scheduling”) using npri. You can initiate a program running on a specific CPU of a multiprocessor using runon. Both attributes—the assigned priority and the assigned CPU—are inherited by any child processes that the program creates.

Process Creation

The process that creates another is called the parent process. The processes it creates are child processes, or siblings. The parent and its children together are a share group. IRIX provides special services to share groups. For example, you can send a signal to all processes in a share group.

The fork() function is the traditional UNIX way of creating a process. The new process is a duplicate of the parent process, running in a duplicate of the parent's address space. Both execute the identical program text; that is, both processes “return” from the fork() call. Your code can distinguish them by the return code, which is 0 in the child process, but in the parent is the new process ID.

The sproc() and sprocsp() functions create a lightweight process. The difference between these calls is that sproc() allocates a new memory segment to serve as the stack for the new process. You use sprocsp() to specify a stack segment that you have already allocated—for example, a block of memory that you allocate and lock against paging using mpin().

The sproc() calls take as an argument the address of the function that the new process should execute. The new process begins execution in that function, and when that function returns, the process is terminated. Read the sproc(2) reference page for details on the flags that specify which process attributes a child process shares with its parent, and for other comparisons between fork() and sproc().


Note: The sproc() and sprocsp() functions are not available for use in a threaded program (see Chapter 13, “Thread-Level Parallelism”). The pthreads library uses lightweight processes to implement threading, and has to control the creation of processes. Also, when your program uses the MPI library (see Chapter 14, “Message-Passing Parallelism”), the use of sproc() and sprocsp() can cause problems.


Process Management

Certain system functions give you some control over the processes you create. The prctl() function offers a variety of operations. These are some of the most useful:

PR_MAXPROCS

Query the system limit on processes per user (also available from sysconf(_SC_CHILD_MAX), see sysconf(2).

PR_MAXPPROCS

Query the maximum number of CPUs that are available to the calling process and its children. This reflects both the system hardware and reservations made on CPUs, but does not reflect system load.

PR_GETNSHARE

Query the number of processes in the share group with the calling process.

PR_GETSTACKSIZE

Query the maximum size of the stack segment of the calling process. For the parent process this reflects the system limit (also available from getrlimit(RLIMIT_STACK), see getrlimit(2)). For a process started by sprocsp(), the size of the allocated stack.

PR_SETSTACKSIZE

Set an upper limit on stack growth for the calling process and for child processes it creates in the future.

PR_RESIDENT

Prevent the calling process from being swapped out. This has no connection to paging, but to swapping out an entire, inactive process under heavy system load.

The sysmp() function gives a privileged process information about and control over the use of a multiprocessor. Some of the operations it provides are as follows:

MP_NPROCS

Number of CPUs physically in the system.

MP_NAPROCS

Number of CPUs available to the scheduler; should be the same as prctl(PR_MAXPPROCS).

MP_MUSTRUN

Assign the calling process to run on a specific CPU.

MP_MUSTRUN_PID

Assign a specified other process (typically a just-created child process) to run on a specific CPU.

MP_GETMUSTRUN
MP_GETMUSTRUN_PID

Query the must-run assignment of the calling process or of a specified process.

MP_RUNANYWHERE
MP_RUNANYWHERE_PID

Allow the calling process, or a specified process, to run on any CPU.

The runon command (see “Process Creation” and runon(1)) initiates the parent process of a program running on a specific CPU. Any child processes also runs on that CPU unless the parent reassigns them to run anywhere, or to run on a different CPU, using sysmp(). The use of restricted CPUs and assigned CPUs to get predictable real-time performance is discussed at length in the REACT Real-Time Programmer's Guide.

The syssgi() function has a number of interesting uses but only one of interest for managing processes: syssgi(SGI_PROCSZ) returns the virtual and resident memory occupancy of the calling process.

Process “Reaping”

A parent process should not terminate while its child processes continue to run. When it does so, the parent process of each child becomes 1, the init process. This causes problems if a child process should loop or hang. The functions you use to collect (the technical term is to “reap”) the status of child processes are summarized in Table 12-2.

Table 12-2. Functions for Child Process Management

Function Name

Purpose and Operation

wait(2)

Function to block until a child stops or terminates, and to receive the cause of its change of status.

waitpid(2)

POSIX extension of wait() which allows more selectivity and returns more information.

wait3(2)

BSD extension of wait() that allows you to poll for terminated children without suspending.

waitid(2)

Function to suspend until one of a selected set of status changes occurs in one or more child processes.

When the parent process has nothing to do after starting the child processes, it can loop on wait() until wait() reports no more children exist; then it can exit.

Sometimes it is necessary to handle child termination and other work, and the parent cannot suspend. In this case the parent can treat the termination of a child process as an asynchronous event, and trap it in a signal handler for SIGCLD (see “Catching Signals”). The wait(2) reference page has extensive discussion of the three methods (BSD, SVR4, and POSIX) for handling this situation, with example code for each.

Process Scheduling

There are two different approaches to setting the scheduling priorities of a process, one compatible with IRIX and BSD, the other POSIX compliant.

Controlling Scheduling With IRIX and BSD-Compatible Facilities

The IRIX compatible and BSD compatible scheduling operations are summarized in Table 12-3.

Table 12-3. Commands and Functions for Scheduling Control

Function Name

Purpose and Operation

schedctl(2)

Query and set IRIX process scheduling attributes.

getpriority(2)

Return the scheduling priority of a process or share group.

setpriority(2)

Set the priority of a process or process group.

nice(1)

Run a program at a positive or negative increment from normal priority.

renice(1)

Alter the priority of a running process by a positive or negative increment.

For BSD compatibility, use the nice and renice commands to alter priorities, and within a program use getpriority() and setpriority() to query and set priorities. These commands and functions use priority numbers ranging from -20 through 0 to +20, with lower arithmetic values having superior access to the CPU.

Only the IRIX schedctl() function gives you complete access to a variety of operations related to process scheduling. Some of the key operations are as follows:

NDPRI

Set a nondegrading priority for the calling process (see text).

GETNDPRI

Query the nondegrading priority of the calling process.

SETMASTER

Set the master process of a share group. By default the parent process is the master process, but it can transfer that honor.

SCHEDMODE, SGS_SINGLE

Cause all processes in the share group to be suspended except the master process (set with SETMASTER).

SCHEDMODE, SGS_GANG

Cause all processes in the share group to be scheduled as a “gang,” with all running concurrently.

SCHEDMODE, SGS_FREE

Schedule the share group in the default fashion.

A program started interactively inherits a scheduling discipline based on degrading priorities. That is, the longer the process executes without voluntarily suspending, the lower its dispatching priority becomes. This strategy keeps a runaway process from monopolizing the hardware. However, you may have a CPU-intensive application that needs a predictable execution rate. This is the purpose of nondegrading priorities set with schedctl(NDPRI) or with the npri command (see the npri(1) reference page).

There are three bands of nondegrading priorities, designated by symbolic names declared in sys/schedctl.h:

  • A real-time band from NDPHIMAX to NDPHIMIN. System daemons and real-time programs run in this band, which has higher priority than any interactive process.

  • A normal band from NDPNORMMAX to NDPNORMMIN. These values have the same priority as interactive programs. Processes at these priorities compete with interactive processes, but their priorities do not degrade with time.

  • A batch band from NDPLOMAX to NDPLOMIN. Processes at these priorities receive available CPU time and are scheduled from a batch queue.


Tip: The IRIX priority numbers are inverted, in the sense that numerically smaller values have superior priority. For example. NDPHIMAX is 30 and NDPHIMIN is 39. However, as long as you declare priority values using symbolic expressions, the numbers work out correctly. For example, the statement


#define NDPHIMIDDLE NDPHIMIN+((NDPHIMAX-NDPHIMIN)/2) 

produces a “middle” value of 35, as it should.

When you create a cooperating group of processes, it is important that they all execute at the same time, provided there are enough CPUs to handle all the members of the group that are ready to run. This minimizes the time that members of the share group spend waiting for each other to release locks or semaphores.

Use schedctl() to initiate “gang” scheduling for the share group. IRIX attempts to schedule all processes to execute at the same time, when possible.


Note: Through IRIX 6.2, schedctl() also supported a scheduling mode called “deadline scheduling.” This scheduling mode is being removed and will not be supported in the future. Do not design a program based on the use of deadline scheduling.


Controlling Scheduling With POSIX Functions

The POSIX compliant functions to control process scheduling are summarized in Table 12-4.

Table 12-4. POSIX Functions for Scheduling

Function Name

Purpose and Operation

sched_getparam(2)
sched_setparam(2)

Query and change the POSIX scheduling priority of a process.

sched_getscheduler(2)
sched_setscheduler(2)

Query and change the POSIX scheduling policy and priority of a process.

sched_get_priority_max(2)
sched_get_priority_min(2)

Query the maximum (most use of CPU) and minimum (least use) priority numbers for use with sched_getparam().

sched_get_rr_interval(2)

Query the timeslice interval of the round-robin scheduling policy.

sched_yield(2)

Let other processes of the same priority execute.

Use the functions sched_get_priority_max() and sched_get_priority_min() to get the ranges of priority numbers you can use. Use sched_setparam() to change priorities. POSIX dispatching priorities are nondegrading. (Note that in a program that links with the pthreads library, these same function names are library functions that return thread scheduling priority numbers unrelated to process scheduling.)


Tip: The POSIX scheduling priority values reported by these functions and declared in sched.h are not numerically the same as the bands supported by schedctl() and declared in sys/schedctl.h. The POSIX numbers are numerically higher for superior priority. However, the POSIX range is functionally (but not numerically) equivalent to the “normal” range supported by schedctl() (NDPNORMMAX to NDPNORMMIN).

POSIX scheduling uses one of two scheduling policies, strict FIFO and round-robin, which are described in detail in the sched_setscheduler(2) reference page. The round-robin scheduler, which rotates processes of equal priority on a time-slice basis, is the default. You can query the time-slice interval with sched_get_rr_interval(). You can change both the policy and the priority by using sched_setscheduler().

Self-Dispatching Processes

Often, each child process has a particular role to play in the application, and the function that you name to sproc() represents that work. The child process stays in that function until it terminates.

Another design is possible. In some applications, you may have to manage a flow of many relatively short activities that should be done in parallel. However, the sproc() function has considerable overhead. It is inefficient to continually create and destroy child processes. You do not want to create a new child process for each small activity and destroy it afterward. Instead, you can create a pool containing a small number of processes. When a piece of work needs to be done, you can dispatch one process to do it. The fragmentary code in Example 12-1 shows the general approach.

Example 12-1. Partial Code to Manage a Pool of Processes

typedef void (*workFunc)(void *arg);
struct oneSproc {
   struct oneSproc *next;        /* -> next oneSproc ready to run */
   workFunc calledFunc;          /* -> function the sproc is to call */
   void *callArg;                /* argument to pass to the called func */
   usema_t *sprocDone;           /* optional sema to post on completion */
   usema_t *sprocWait;           /* sproc waits for work here */
} sprocList[NUMSPROCS];
usema_t *readySprocs;            /* count represents sprocs ready to work */
uslock_t sprocListLock;          /* mutex control of sprocList head */
struct oneSproc *sprocList;      /* -> first ready oneSproc */
/*
|| Put a oneSproc structure on the ready list and sleep on it.
|| Called by a child process when its work is done.
*/
void sprocSleep(struct oneSproc *theSproc)
{
    ussetlock(sprocListLock);       /* acquire exclusive rights to sprocList */
    theSproc->next = sprocList;  /* put self on the list */
    sprocList = theSproc;
    usunsetlock(sprocListLock);  /* release sprocList */
    usvsema(readySprocs);           /* notify master, at least 1 on the list */
    uspsema(theSproc->sprocWait);/* sleep until master posts me */
}
/*
|| Body of a general-purpose child process. The argument, which must
|| be declared void* to match the sproc() prototype, is the oneSproc
|| structure that represents this process.   The contents of that
|| struct, in particular sprocWait, are initialized by the parent.
*/
void childBody(void *theSprocAsVoid)
{
    struct oneSproc *mySproc = (struct oneSproc *)theSprocAsVoid;
    /* here one could establish signal handlers, etc. */
    for(;;)
    {
        sprocSleep(mySproc);      /* wait for work to do */
        mySproc->calledFunc(mySproc->callArg);  /* do the work */
        if (mySproc->sprocDone)   /* if a completion sema is given, */
            usvsema(mySproc->sprocDone); /* ..post it */
    }
}
/*
|| Acquire a oneSproc structure from the ready list, waiting if necessary. 
|| Called by the master process as part of dispatching a sproc.
*/
struct oneSproc *getSproc()
{
    struct oneSproc *theSproc;
    uspsema(readySprocs);        /* wait until at least 1 sproc is free */
    ussetlock(sprocListLock);       /* acquire exclusive rights to sprocList */
    theSproc = sprocList;        /* get address of first free oneSproc */
    sprocList = theSproc->next;  /* make next in list, the head of list */
    usunsetlock(sprocListLock);  /* release sprocList */
    return theSproc;
}
/*
|| Start a function going asynchronously. Called by master process.
*/
void execFunc(workFunc toCall, void *callWith, usema_t *done)
{
    struct oneSproc *theSproc = getSproc();
    theSproc->calledFunc = toCall;     /* set address of func to exec */
    theSproc->callArg = callWith;      /* set argument to pass */
    theSproc->sprocDone = done;           /* set sema to post on completion */
    usvsema(theSproc->sprocWait);      /* wake up sleeping process */
}


Parallelism in Real-Time Applications

In real-time programs such as aircraft or vehicle simulators, separate processes are used to divide the work of the simulation and distribute it onto multiple CPUs. In these demanding applications, the programmer frequently uses IRIX facilities to

  • reserve one or more CPUs of a multiprocessor for exclusive use by the application

  • isolate the reserved CPUs from all interrupts

  • assign specific processes to execute on specific, reserved CPUs

These facilities are described in detail in the REACT Real-Time Programmer's Guide (007-2499-nnn). Also covered in that book is the use of the Frame Scheduler, an alternate process scheduler. The normal process scheduling algorithm of the IRIX kernel attempts to keep all CPUs busy and to keep all processes advancing in a fair manner. This algorithm is in conflict with the stringent needs of a real-time program, which needs to dedicate predictable amounts of hardware capacity to its processes, without regard to fairness.

The Frame Scheduler seizes one or more CPUs of a multiprocessor, isolates them, and executes a specified set of processes on each CPU in strict rotation. The Frame Scheduler has much lower overhead than the normal IRIX scheduler, and it has features designed for real-time work, including detection of overrun (when a scheduled process does not complete its work in the necessary time) and underrun (when a scheduled process fails to execute in its turn).

At this writing there are no real-time applications that use multiple nodes of an Array system.