Chapter 8. Debugging Multiprocess Programs

This chapter explains multiprocess debugging procedures, and covers these topics:

Processes and Threads

dbx supports debugging multiprocess programs, including processes spawned with either the fork(2) or sproc(2) system calls. You can attach child processes automatically to dbx. You also can perform process control operations on a single process or on all processes in a group.

dbx provides commands specifically for seizing, stopping, and debugging currently running processes. When dbx seizes a process, it adds it to a pool of processes available for debugging. Once you select a process from the pool of available processes, you can use all the dbx commands normally available.

Once you are finished with the process, you can terminate it, return it to the pool, or return it to the operating system.

dbx also provides limited support for the IRIX pthreads library. You can obtain information about threads, but cannot specify threads in program-control commands (such as stop).

Setting up Your Environment

When debugging a multiprocess program (one compiled with the -mp option), enter the following command:

% (dbx) ignore TERM

This command allows a multiprocessed program to terminate gracefully after execution is complete.

When debugging pthreaded programs, set the following dbx variables as shown below:

% set $mp_program=1

% set $promptonfork=2

Using the pid Clause

Many dbx commands allow you to append the clause pid pid (where pid is a numeric process ID or a debugger variable holding a process ID). Using the pid pid clause means you can apply a command to any process in the process pool even though it is not the active process.

Example 8-1. Seeing breakpoints using pid

To set a breakpoint at line 97 of the process whose ID is 12745, enter:

(dbx) stop at 97 pid 12745
Process 12745: [3] stop at "/usr/demo/test.c":97

Commands that accept the pid pid clause include:

active        edit        resume         wait
addproc       file        return         whatis
assign        func        showpoc        when, when[i]
catch         goto        status         where
cont, cont[i] ignore      step, step[i}  whereis
delete        kill        stop, stop[i]  which
delproc       next        suspend
directory     print       trace, trace[i]
down          printf      up
dump          printregs   use


Using the pgrp Clause

Many dbx commands allow the pgrp clause as a way to apply a command to several processes. For information, see “Using the pid Clause” and “Handling sproc System Calls and Process Group Debugging ”.

Using the thread Clause

You can append the clause thread tid (where tid is a numeric thread ID, a debugger variable holding a thread ID, or the qualifier all) to some dbx commands that provide program information. The thread clause is accepted where the pid or pgrp clauses are accepted; however you cannot use the thread tid clause with program-control commands such as stop, trace, when or continue. Using the thread tid clause means you can apply a command to any thread even if it is not current or in the current process. The current thread is defined to be the thread that is running in the current process. Examples of the thread tid clause are:

(dbx) where thread
(dbx) where thread $no  

The outputs of these commands are respectively: a stack trace of the current thread and a stack trace of the thread whose ID is stored in $no.

The showthread command provides status information about the threads in your program. In one dbx session, you cannot debug more than one program that uses threads.

The syntax of the showthread command is:

showthread [full] [thread] [number] [$no] [all]

The following list describes these options and arguments:

  • showthread [full]: prints brief status information about the current thread. If the full qualifier is included, prints full status information.

  • showthread [full] [thread] [ number|$no|all]: prints brief status information about the thread identified by number or the value of $no, or all threads associated with the debug session. If the full qualifier is included, prints full status information. The thread qualifier does not affect the output, but it is allowed so the syntax can be the same as that for other commands that use the thread clause.

Using Scripts

dbx also provides two variables that you can use when writing scripts to debug multiprocess programs:

  • $lastchild: always set to the process ID of the last child process created by a fork or sproc.

  • $pid0: always set to the process ID of the process started by the run command.

Listing Available Processes

Use the showproc command to list the available processes:

showproc all [pid]

The following list describes the options and arguments:

  • showproc (with no arguments): shows processes already in the dbx process pool or processes that dbx can control. Without any arguments, dbx lists the processes it already controls.

  • showproc all: lists all the processes controlled by dbx and all the processes it could control but that are not yet added to the process pool.

  • showproc pid: shows the status of the process ID.

Example 8-2. showproc command

For example, to display all processes in the process pool, enter:

(dbx) showproc
Process 12711 (test) Trace/BPT trap [main:14 ,0x40028c]
Process 12712 (test) Trace/BPT trap [main:18 ,0x4002b4]

To display only process 12712, enter:

(dbx) showproc 12712
Process 12712 (test) Trace/BPT trap [main:18 ,0x4002b4]

To display all processes that dbx can control, enter:

(dbx) showproc all
Process 12711 (test) Trace/BPT trap [main:14 ,0x40028c]
Process 12055 (tcsh)
Process 12006 (clock)
Process 12054 (tcsh)
Process 12673 (zipxgizmo)
Process 12672 (zip)
Process 11974 (4Dwm)
Process 12712 (test) Trace/BPT trap [main:18 ,0x4002b4]
Process 12708 (dbx)
Process 12034 (xlock)


Adding a Process to the Process Pool

The addproc command adds one or more specified processes to the dbx process pool. This allows you to debug a program that is already running.

Example 8-3. addproc command

The following examples show the syntax of the addproc command:

addproc pid [...]

addproc var

For example:

(dbx) addproc 12924
Reading symbolic information of Process 12924 . . .
Process 12924 (loop_test) added to pool
Process 12924 (loop_test) running

Equivalently, you can enter either of the following commands:

(dbx) set $foo = 12924
(dbx) addproc $foo


Deleting a Process from the Process Pool

The delproc command removes a process or variable from the process pool, freeing it from dbx control. When you delete a process from the process pool, dbx automatically returns the process to normal operation.

Example 8-4. delproc command

The following examples show the syntax of the delproc command:

delproc pid [...]

delproc var

For example:

(dbx) delproc 12924
Process 12924 (loop_test) deleted from pool

Equivalently, you can enter either of the following:

(dbx) set $foo = 12924
(dbx) delproc $foo


Selecting a Process

The dbx command has the ability to control multiple processes. However, dbx commands (by default) apply to only one process at a time, the active process. To select a process from the process pool to be the active process, use the active command; it selects a process, pid, from the dbx process pool as the active process. If you do not provide a process ID, dbx prints the currently active process ID.

Example 8-5. active command

For example, to determine which process is currently active, enter:

(dbx) active
Process 12976 (test1) is active

To then select process 12977 as the active process, enter:

(dbx) active 12977
Process 12977 (test1) after fork [.fork.fork:15 +0x8,0x4005e8]


Suspending a Process

The suspend command allows you to stop a process in the dbx process pool; the following list shows the options and arguments for this command:

  • suspend: suspends the active process if it is running. If it is not running, this command does nothing.

  • suspend all: suspends all the processes.

  • suspend pid pid: suspends the process names by pid if it is in the dbx process pool. If it is not running, this command does nothing.

  • suspend pgrp: suspends all the processes the process group specified by pgrp.

Example 8-6. suspend command

For example, to stop the active process, enter:

(dbx) suspend
Process 12987 (loop_test) requested stop [main:10 +0x8,0x400244]
  10  i = i % 10;

Then to stop process 12988, enter:

(dbx) suspend pid 12988
Process 12988 (test3) requested stop [main:29 +0x4,0x400424]
  10  j = k / 10.0;


Resuming a Suspended Process

To resume execution of a suspended dbx-controlled process, you can use either the cont command or the resume command. If you use cont, you do not return to the dbx command interpreter until the program encounters an event (for example, a breakpoint). On the other hand, the resume command returns immediately to the dbx command interpreter.

The resume command resumes program execution and returns immediately to the dbx command interpreter. When used with the signal argument, it resumes process execution, sending it the specified signal, and returns immediately to the dbx command interpreter.

Because the resume command returns you to the dbx command interpreter after restarting the process, it is more useful than the cont command when you are debugging multiple processes. With resume, you are free to select and debug a process while another process is running.

If any resumed process modifies the terminal modes (for example if it uses curses(3X)), dbx cannot correctly control the modes. Intercept programs using curses by typing dbx -p (or dbx -P).

Example 8-7. resume command

If you are debugging multiple processes and want to resume the active process, enter:

(dbx) resume

dbx restarts the active process and returns the dbx prompt. You can then continue debugging, for example by switching to another process.

To resume all the processes in pgrp 2 and send a SIGINT signal to the process when dbx resumes, enter:

(dbx) resume SIGINT 2


Waiting for a Resumed Process

To wait for a process to stop for an event (such as a breakpoint), use the wait command. This is useful after a resume command. Also refer to the description of the waitall command, described in “Waiting for Any Running Process”.

The syntax of the wait command is:

wait [pid]

wait without arguments waits for the active process to stop for an event. With pid, it waits for the process pid to stop for an event.

Example 8-8. wait command

Assume that you want to wait until process 14280 stops, perhaps at a breakpoint you have set. To do so, enter:

(dbx) wait pid 14280

After you enter this command, dbx waits until process 14280 stops, at which point it displays the dbx prompt.


Waiting for Any Running Process

To wait for any process currently running to breakpoint or stop for any reason, use the waitall command. It causes dbx to wait until a running process in the process list stops, at which point it returns you to the dbx command interpreter.


Note: When you return to the dbx command interpreter after a waitall command, dbx does not make the process that stopped the active process. You must use the active command to change the active process.


Example 8-9. waitall command

To wait until one of your processes under dbx control stops, enter:

(dbx) waitall

After you enter this command, dbx waits until a process stops, at which point it indicates which process stopped and displays the dbx prompt. For example:

Process 14281 (loop_test) Terminated [main:10 +0x8,0x400244]
  10  i = i % 10;
(dbx)


Killing a Process

To kill a process in the process pool while running dbx, use the kill command:

kill [pid]

The kill command without arguments kills the active process. By using the pid argument, it kills the specified processes.

Example 8-10. kill command

For example, to kill process 14257, enter:

(dbx) kill 14257
Process 14257 (fork_test) terminated
Process 14257 (fork_test) deleted from pool


Handling fork System Calls

When a program executes a fork system call and starts another process, dbx allows you to add that process to the process pool. (See also “Stopping at System Calls” in Chapter 6.)

The dbx $promptonfork variable determines how dbx treats fork system calls. The following list summarizes its effects:

  • 0 (default): dbx does not add the child process to the process pool. Both the child process and the parent process continue to run.

  • 1: dbx stops the parent process and asks if you want to add the child process to the process pool. If you answer yes, then dbx adds the child process to the pool and stops the child process; if you answer no, dbx allows the child process to run and does not place it in the process pool.

  • 2: dbx automatically stops both the parent and child processes and adds the child process to the process pool.

“Handling sproc System Calls and Process Group Debugging ”, provides additional information on debugging multiprocessing programs; some of the material in that section can apply also to programs that use the fork system call.

Example 8-11. fork system calls

Consider a program named fork that contains these lines:

main(argc, argv)
int argc;
char *argv;
{
     int pid;
     if ((pid = fork()) == -1)
        perror("fork");
     else if (pid == 0)
        printf("child\n");
     else { printf("parent\n");
}

If you set $promptonfork to 1 and run the program, dbx prompts you whether it should add the child process to the process pool:

(dbx) set $promptonfork = 1
(dbx) run
Process 22661 (fork) started
Process 22662 (fork) has executed the “fork” system call

Add child to process pool (n if no)?y
Process 22662 (fork) added to pool
Process 22662 (fork) stopped on sysexit fork [_fork:28 ,0x40643a4]
Process 22661 (fork) stopped on sysexit fork [_fork:28 ,0x40643a4]
           Source (of /shamu/lib/libc/libc_64/proc/fork.s) not
available for Process 22661


Handling exec System Calls

The exec system call executes another program. During an exec, the first program gives up its process number to the program it executes. When a program using DSOs executes an exec() call, dbx runs the new program to main. When a program linked with a non-shared library executes an exec() call, dbx reads the symbolic information for the new program and then stops program execution. In either case, you can continue by entering a cont or resume command.

Example 8-12. exec system call

Consider the programs exec1.c and exec2.c :

/* exec1.c */
main()
{
   printf("in exec1\n");
/* Invoke the "exec2" program */

   execl("exec2", "exec2", 0);

   /* We'll only get here if execl() fails */

   perror("execl");
}
/* exec2.c */
main()
{
   printf("in exec2\n");
}

You can enter cont to continue executing exec2. For example:

(dbx) cont
in exec2
Process 14409 (exec2) finished


Handling sproc System Calls and Process Group Debugging

The process group facility allows a group of processes to be operated on simultaneously by a single dbx command. This is more convenient to use when dealing with processes created with the sproc system call than issuing individual resume, suspend, or breakpoint setting commands. This facility was created for use with applications that have multiple processes (sproc) and the multiple processes have built-in barriers, such as those created on MP Fortran on IRIX 6.4 (and earlier) systems.

The dbx $mp_program variable determines how dbx treats sproc system calls. The following list summarizes its effects:

  • 0 (default): dbx treats calls to sproc in the same way as it treats calls to fork.

  • 1: child processes created by calls to sproc are allowed to run; they block on multiprocessor synchronization code emitted by mp Fortran or C code. When you set $mp_program to 1, multiprocess Fortran or C code is easier to debug.

Whenever a process executes a sproc, if dbx adds the child to the process pool, dbx also adds the parent and child to the group list. The group list is simply a list of processes. If you set the dbx $groupforktoo variable to 1, then forked processes are added to the group list automatically just as sproc ed processes are. (By default, $groupforktoo is set to 0.)

You can explicitly add one or more processes to the group list with the addpgrp command (you can add only processes in the process pool to the group list). The syntax of the command is:

addpgrp pid [...]

You can remove processes from the group list with the delpgrp command:

delpgrp pid [...]

The showpgrp command displays information about the group list. The showpgrp command shows the process group numbers and all the stop, trace, or when events in each. These events are created by stop[i], when[i] ... pgrp (which create multiple stop, trace, or when events) and by delete pgrp commands, which delete them.

Example 8-13. showgrp command

The following example shows the output of the showpgrp command with two processes in the group list:

(dbx) showpgrp
2 processes in group:
   14559 14558

Once you add processes to the group list (by adding the keyword pgrp to the end of certain dbx commands), you can apply that command to all processes in the group. The commands to which you can append pgrp are: delete, list, next[i] , resume, status, stop[i], suspend, trace[i], and when.

The breakpoints and traces set by the stop[i], trace[i], and when commands, when used with the pgrp keyword, are also added to the group history. This group history is displayed as a numbered list when you execute showpgrp.

To delete breakpoints from multiple processes with a single command, use the group history number with the delete command. For example, to delete the history entry 7 for the process group, enter:

(dbx)   delete 7 pgrp

The dbx $newpgrpevent variable stores the group history number of the most recent pgrp event. This can be useful when writing a script, for example:

set $myevent = $newpgrpevent
....
delete $myevent pgrp

Breakpoints set on the process group are recorded both in the group and in each process. Deleting breakpoints individually (even if set by a group command) is allowed.

For example, the following command sets a breakpoint at line 10 in all processes in the group list:

(dbx) stop at 10 pgrp
Process 14558: [6] stop at "/usr/demo/pgrp_test.c":10
Process 14559: [7] stop at "/usr/demo/pgrp_test.c":10

If you now enter a status command, only those breakpoints associated with the active process are displayed:

(dbx) status
Process 14559: [7] {pgrp 269011340} stop at "/usr/demo/pgrp_test.c":10

By appending the keyword pgrp, you can display the breakpoints for all processes in the group list:

(dbx) status pgrp
Process 14558: [6] {pgrp 269011276} stop at "/usr/demo/pgrp_test.c":10
Process 14559: [7] {pgrp 269011340} stop at "/usr/demo/pgrp_test.c":10

Use the showpgrp command to display the group history:

(dbx) showpgrp
2 processes in group:
   14559 14558
Group history number: 10
        Process 14558 Process 14558: [6] stop at "/usr/demo/pgrp_test.c":10
        Process 14559 Process 14559: [7] stop at "/usr/demo/pgrp_test.c":10

You can delete the breakpoints in both processes by deleting the associated group history entry. For example, enter:

(dbx) delete 10 pgrp
(dbx) showpgrp
2 processes in group:
   14559 14558