Chapter 8. Multiple Process Debugging

This chapter explains multiprocess debugging procedures, including:

Processes and Threads

dbx supports debugging multiprocess applications, including processes spawned with either the fork(2) or sproc(2) system calls. You can attach child processes automatically to dbx. You also can perform process control operations on a single process or on all processes in a group.

dbx provides commands specifically for seizing, stopping, and debugging currently running processes. When dbx seizes a process, it adds it to a pool of processes available for debugging. Once you select a process from the pool of available processes, you can use all the dbx commands normally available.

Once you are finished with the process, you can terminate it, return it to the pool, or return it to the operating system.

dbx also provides limited support for the IRIX pthreads library. You can obtain information about threads, but cannot specify threads in program-control commands.

Using the pid Clause

Many dbx commands allow you to append the clause pidpid (where pid is a numeric process ID or a debugger variable holding a process ID). Using the pidpid clause means you can apply a command to any process in the process pool even though it is not the active process.

For example, to set a breakpoint at line 97 of the process whose ID is 12745, enter:

(dbx) stop at 97 pid 12745
Process 12745: [3] stop at "/usr/demo/test.c":97

Commands that accept the pidpid clause include:

active        edit        resume         wait
addproc       file        return         whatis
assign        func        showpoc        when, when[i]
catch         goto        status         where
cont, cont[i] ignore      step, step[i}  whereis
delete        kill        stop, stop[i]  which
delproc       next        suspend
directory     print       trace, trace[i]
down          printf      up
dump          printregs   use

Using the pgrp Clause

Many dbx commands allow the pgrp clause as a way to apply a command to several processes. For information, see "Handling sproc System Calls and Process Group Debugging".

Using the thread Clause

You can append the clause threadtid (where tid is a numeric thread ID, a debugger variable holding a thread ID, or the qualifier all) to some dbx commands that provide program information. You cannot use the threadtid clause with program-control commands such as stop, trace, when or continue. Using the threadtid clause means you can apply a command to any thread even if it is not current or in the current process.The current thread is defined to be the thread that is running in the current process. Examples of the threadtid clause are:

(dbx) where thread
(dbx) where thread $no
(dbx) print x thread all 

The outputs of these commands are respectively: a stack trace of the current thread, a stack trace of the thread whose ID is stored in $no, and the values of all instances of the program variable x in all threads.

The showthread command provides status information about the threads in your program. In one dbx session, you cannot debug more than one program that uses threads.

The syntax of the showthread command is:

showthread [full]
 

Prints brief status information about the current thread. If the full qualifier is included, prints full status information.

showthread [full] [thread] {number | $no | all} 


Prints brief status information about the thread identified by number or the value of $no, or all threads associated with the debug session. If the full qualifier is included, prints full status information. The thread qualifier does not affect the output, but it is allowed so the syntax can be the same as that for other commands that use the thread clause.

Using Scripts

Additionally, dbx provides two variables that you can use when writing scripts to debug multiprocess programs:

$lastchild  

Always set to the process ID of the last child process created by a fork or sproc.

$pid0  

Always set to the process ID of the process started by the run command.

See the dbx online help file section on hint_mp_debug for sample multiprocessing debugging scripts.

Listing Available Processes

Use the showproc command to list the available processes:

showproc 

Shows processes already in the dbx process pool or processes that dbx can control. Without any arguments, dbx lists the processes it already controls.

showproc all 

Lists all the processes it controls as well as all those processes it could control but that are not yet added to the process pool.

showproc pid 

Shows the status of the process ID.

For example, to display all processes in the process pool, enter:

(dbx) showproc
Process 12711 (test) Trace/BPT trap [main:14 ,0x40028c]
Process 12712 (test) Trace/BPT trap [main:18 ,0x4002b4]

To display only process 12712, enter:

(dbx) showproc 12712
Process 12712 (test) Trace/BPT trap [main:18 ,0x4002b4]

To display all processes that dbx can control, enter:

(dbx) showproc all
Process 12711 (test) Trace/BPT trap [main:14 ,0x40028c]
Process 12055 (tcsh)
Process 12006 (clock)
Process 12054 (tcsh)
Process 12673 (zipxgizmo)
Process 12672 (zip)
Process 11974 (4Dwm)
Process 12712 (test) Trace/BPT trap [main:18 ,0x4002b4]
Process 12708 (dbx)
Process 12034 (xlock)

Adding a Process to the Process Pool

The addproc command adds one or more specified processes to the dbx process pool. This allows you to debug a program that is already running. The syntax of the addproc command is:

addprocpid [ ... ]
addprocvar

For example:

(dbx) addproc 12924
Reading symbolic information of Process 12924 . . .
Process 12924 (loop_test) added to pool
Process 12924 (loop_test) running

Equivalently, you can enter either of the following:

(dbx) set $foo = 12924
(dbx) addproc $foo

Deleting a Process From the Process Pool

The delproc command removes a process or variable from the process pool, freeing it from dbx control. When you delete a process from the process pool, dbx automatically returns the process to normal operation. The syntax of the delproc command is:

delprocpid [ ... ]
delprocvar

For example:

(dbx) delproc 12924
Process 12924 (loop_test) deleted from pool

Equivalently, you can enter either of the following:

(dbx) set $foo = 12924
(dbx) delproc $foo

Selecting a Process

The dbx command has the ability to control multiple processes. However, dbx commands (by default) apply to only one process at a time, the active process. To select a process from the process pool to be the active process, use the active command:

active [pid] 

Selects a process, pid, from dbx process pool as the active process. If you do not provide a process ID, dbx prints the currently active process ID.

For example, to determine which process is currently active, enter:

(dbx) active
Process 12976 (test1) is active

To then select process 12977 as the active process, enter:

(dbx) active 12977
Process 12977 (test1) after fork [.fork.fork:15 +0x8,0x4005e8]

Suspending a Process

The suspend command allows you to stop a process in the dbx process pool:

suspend  

Suspends the active process if it is running. If it is not running, this command does nothing.

suspend all 


Suspends all the processes.

suspend pid pid  


Suspends the process pid if it is in the dbx process pool. If it is not running, this command does nothing.

suspend pgrp  


Suspends all the processes in the pgrp.

For example, to stop the active process, enter:

(dbx) suspend
Process 12987 (loop_test) requested stop [main:10 +0x8,0x400244]
  10  i = i % 10;

Then to stop process 12988, enter:

(dbx) suspend pid 12988
Process 12988 (test3) requested stop [main:29 +0x4,0x400424]
  10  j = k / 10.0;

Resuming a Suspended Process

To resume execution of a suspended dbx controlled process, you can use either the cont command or the resume command. If you use cont, you do not return to the dbx command interpreter until the program encounters an event (for example, a breakpoint). On the other hand, the resume command returns immediately to the dbx command interpreter.

The syntax of the resume command is:

resume  

Resumes execution of the program, and returns immediately to the dbx command interpreter.

resume [signal] 

Resumes execution of the process, sending it the specified signal, and returns immediately to the dbx command interpreter.

Because the resume command returns you to the dbx command interpreter after restarting the process, it is more useful than using the cont command when you're debugging multiple processes. With resume, you are free to select and debug a process while another process is running.

If any resumed process modifies the terminal modes (for example if it uses curses(3X)), dbx can't correctly control the modes. Intercept programs using curses by typing dbx –p (or dbx –P).

For example, if you are debugging multiple processes and want to resume the active process, enter:

(dbx) resume

dbx restarts the active process and returns the dbx prompt. You can then continue debugging, for example by switching to another process.

To resume all the processes in pgrp 2 and send a SIGINT signal to the process when dbx resumes, enter:

(dbx) resume SIGINT 2

Waiting for a Resumed Process

To wait for a process to stop for an event (such as a breakpoint), use the wait command. This is useful after a resume command. Also refer to the description of the waitall command, described in "Waiting for Any Running Process".

The syntax of the wait command is:

wait  

Waits for the active process to stop for an event.

wait pid pid  

Waits for the process pid to stop for an event.

For example, assume that you want to wait until process 14280 stops, perhaps at a breakpoint you have set. To do so, enter:

(dbx) wait pid 14280

After you enter this command, dbx waits until process 14280 stops, at which point it displays the dbx prompt.

Waiting for Any Running Process

To wait for any process currently running to breakpoint or stop for any reason, use the waitall command. It causes dbx to wait until a running process in the process list stops, at which point it returns you to the dbx command interpreter.


Note: When you return to the dbx command interpreter after a waitall command, dbx does not make the process that stopped the active process. You must use the active command to change the active process.

For example, to wait until one of your processes under dbx control stops, enter:

(dbx) waitall

After you enter this command, dbx waits until a process stops, at which point it indicates which process stopped and displays the dbx prompt. For example:

Process 14281 (loop_test) Terminated [main:10 +0x8,0x400244]
  10  i = i % 10;
(dbx)

Killing a Process

To kill a process in the process pool while running dbx, use the kill command:

kill  

Kills the active process.

kill pid [ ... ] 

Kills the specified process(es).

For example, to kill process 14257, enter:

(dbx) kill 14257
Process 14257 (fork_test) terminated
Process 14257 (fork_test) deleted from pool

Handling fork System Calls

When a program executes a fork system call and starts another process, dbx allows you to add that process to the process pool. (See also "Stopping at System Calls".)

The dbx variable $promptonfork determines how dbx treats forks. Table 8-1 summarizes its effects.

Table 8-1. How the $promptonfork Variable Affects dbx's Treatment of Forks

$promptonfork Value

Effect on dbx's Treatment of Forks

0 (default)

dbx does not add the child process to the process pool. Both the child process and the parent process continue to run.

1

dbx stops the parent process and asks if you want to add the child process to the process pool. If you answer yes, then dbx adds the child process to the pool and stops the child process; if you answer no, dbx allows the child process to run and does not place it in the process pool.

2

dbx automatically stops both the parent and child processes and adds the child process to the process pool.



Note: "Handling sproc System Calls and Process Group Debugging" provides additional information on debugging multiprocessing programs; some of the material in that section can apply also to programs that use the fork system call.

Consider a program named fork that contains these lines:

main(argc, argv)
int argc;
char *argv;
{
   int pid;
   if ((pid = fork()) == -1)
      perror("fork");
   else if (pid == 0)
      printf("child\n");
   else { printf("parent\n");
}

If you set $promptonfork to 1 and run the program, dbx prompts you whether it should add the child process to the process pool:

(dbx) set $promptonfork = 1
(dbx) run
Process 22661 (fork) started
Process 22662 (fork) has executed the "fork" system call

Add child to process pool (n if no)?y
Process 22662 (fork) added to pool
Process 22662 (fork) stopped on sysexit fork [_fork:28 ,0x40643a4]
Process 22661 (fork) stopped on sysexit fork [_fork:28 ,0x40643a4]
         Source (of /shamu/lib/libc/libc_64/proc/fork.s) not available for Process 22661

Handling exec System Calls

The exec system call executes another program. During an exec, the first program gives up its process number to the program it executes. When a program using DSOs executes an exec() call, dbx runs the new program to main. When a program linked with a non-shared library executes an exec() call, dbx reads the symbolic information for the new program and then stops program execution. In either case, you can continue by entering a cont or resume command.

For example, consider the programs exec1.c and exec2.c:

/* exec1.c */
main()
{
   printf("in exec1\n");
/* Invoke the "exec2" program */

   execl("exec2", "exec2", 0);

   /* We'll only get here if execl() fails */

   perror("execl");
}
/* exec2.c */
main()
{
   printf("in exec2\n");
}

You can enter cont to continue executing exec2. For example:

(dbx) cont
in exec2
Process 14409 (exec2) finished

Handling sproc System Calls and Process Group Debugging

The process group facility allows a group of processes to be operated on simultaneously by a single dbx command. This is more convenient to use when dealing with processes created with the sproc system call than issuing individual resume, suspend, or breakpoint setting commands. This facility was created to deal more conveniently with parallel programs created, for example, by the Power Fortran Accelerator (PFA).

The dbx variable $mp_program determines how dbx treats sproc system calls. Table 8-2 summarizes its effects.

Table 8-2. How the $mp_program Variable Affects dbx's Treatment of sprocs

$mp_program Value

Effect on dbx's Treatment of sproc

0 (default)

dbx treats calls to sproc in the same way as it treats calls to fork.

1

Child processes created by calls to sproc are allowed to run; they block on multiprocessor synchronization code emitted by mp Fortran or C code. When you set $mp_program to 1, multiprocess Fortran or C code is easier to debug.

Whenever a process executes a sproc, if dbx adds the child to the process pool, dbx also adds the parent and child to the group list. The group list is simply a list of processes. If you set the dbx variable $groupforktoo to 1, then forked processes are added to the group list automatically just as sproced processes are. (By default, $groupforktoo is 0.)

You can explicitly add one or more processes to the group list with the addpgrp command (you can add only processes in the process pool to the group list):

addpgrppid [ ... ]

You can remove processes from the group list with the delpgrp command:

delpgrppid [ ... ]

The showpgrp command displays information about the group list. The showpgrp command shows the process group numbers and all the stop, trace, or when events in each. These events are created by stop[i], when[i] ... pgrp (which create multiple stop, trace, or when events) and by deletepgrp commands, which delete them.

The following example shows the output of the showpgrp command with two processes in the group list:

(dbx) showpgrp
2 processes in group:
 14559 14558

Once you add processes to the group list (by adding the keyword pgrp to the end of certain dbx commands), you can apply that command to all processes in the group. The commands to which you can append pgrp are: delete, list, next[i], resume, status, stop[i], suspend, trace[i], and when.

The breakpoints and traces set by the stop[i], trace[i], and when commands, when used with the pgrp keyword, are also added to the group history. This group history is displayed as a numbered list when you execute showpgrp.

To delete breakpoints from multiple processes with a single command, use the group history number with the delete command. For example, to delete the history entry 7 for the process group, enter:

(dbx) delete 7 pgrp

The dbx variable $newpgrpevent stores the group history number of the most recent pgrp event. This can be useful when writing a script, for example:

set $myevent = $newpgrpevent
....
delete $myevent pgrp

Breakpoints set on the process group are recorded both in the group and in each process. Deleting breakpoints individually (even if set by a group command) is allowed.

For example, the following command sets a breakpoint at line 10 in all processes in the group list:

(dbx) stop at 10 pgrp

Process 14558: [6] stop at "/usr/demo/pgrp_test.c":10
Process 14559: [7] stop at "/usr/demo/pgrp_test.c":10

If you now enter a status command, only those breakpoints associated with the active process are displayed:

(dbx) status
Process 14559: [7] {pgrp 269011340} stop at "/usr/demo/pgrp_test.c":10

By appending the keyword pgrp, you can display the breakpoints for all processes in the group list:

(dbx) status pgrp
Process 14558: [6] {pgrp 269011276} stop at "/usr/demo/pgrp_test.c":10
Process 14559: [7] {pgrp 269011340} stop at "/usr/demo/pgrp_test.c":10

Use the showpgrp command to display the group history:

(dbx) showpgrp
2 processes in group:
 14559 14558
Group history number: 10
        Process 14558 Process 14558: [6] stop at "/usr/demo/pgrp_test.c":10
        Process 14559 Process 14559: [7] stop at "/usr/demo/pgrp_test.c":10

You can delete the breakpoints in both processes by deleting the associated group history entry. For example, enter:

(dbx) delete 10 pgrp
(dbx) showpgrp
2 processes in group:
 14559 14558