Chapter 4. Debugging MPI Applications

Debugging MPI applications can be more challenging than debugging sequential applications. This chapter presents methods for debugging MPI applications.

MPI Routine Argument Checking

By default, the SGI MPI implementation does not check the arguments to some performance-critical MPI routines such as most of the point-to-point and collective communication routines. You can force MPI to always check the input arguments to MPI functions by setting the MPI_CHECK_ARGS environment variable. However, setting this variable might result in some degradation in application performance, so it is not recommended that it be set except when debugging.

Using the ProDev™ WorkShop Debugger with MPI Programs


Note: The ProDev WorkShop debugger (also known as CVD) is available on IRIX systems only.


Recent versions of the ProDev WorkShop debugger work well with MPI jobs running within a single host. You can use Debugger to debug MPI applications that make use of MPI-2 spawn functions. To use the Debugger, perform the following steps:

Procedure 4-1. Steps for Using the Debugger

  1. Use the following command to bring up the Debugger:

    % cvd /usr/bin/mpirun

  2. When the Debugger comes, up click on the Admin menu and select Multiprocess View.

  3. When Multiprocess View appears, click on the Config menu, then the Preferences menu.

  4. When the Preferences menu appears, check the first two unchecked boxes and click on OK. So that you do not need to set these menus the next time you bring up the Debugger, you can click on the Save button.

  5. In the Debugger command window (bottom of the main window), enter the following commands:

    cvd> set $pendingtraps=true
    cvd> stop pgrp all in MPI_SGI_init

    (You can also use a function in your a.out) file.

  6. In the command window (the top of the main window), enter the mpirun command with arguments, as in the following example:

    /usr/bin/mpirun -np 2 a.out

    Then click on the Run button.

  7. Watch the Multiprocess View window as it forks processes. Eventually, it stops in MPI_SGI_init (or your function) in your program and the Debugger focuses on it. If you compiled with the -g option, it shows the source.

  8. If you did not compile with the -g option, you can execute a file command to select a certain file and see the source, as in the following example:

    file ep.f

For complete details about using the Debugger, see the ProDev WorkShop: Debugger User's Guide.

Setting Breakpoints

The pgrp attribute on the stop command (see Step 5 above) indicates the setting of the breakpoint for any processes in the Multiprocess View window (including ones that will be spawned as slaves). You can set breakpoints by clicking just to the left of the line, but by default, they are for that particular process, not all processes in the Multiprocess View window.

You can change the process to add pgrp by clicking on the Traps menu in the Debugger and selecting both of the unchecked boxes. Note that when Group Trap Default is set, the pgrp attribute is added and when Stop All Default is set, the all attribute is added. The all attribute stops all processes in the Multiprocess View window when any process hits this breakpoint.

Finding Windows

To find various windows, use the Views menu. Call Stack and Trap Manager windows are very useful. You can also type dbx commands in the Debugger command window at the bottom of the main window.

Continuing and Stepping Processes

The buttons in the Multiprocess View window cause all processes to continue or step. Typically, you will want to use these. The buttons in the Debugger main window are for a single process, unless a button indicates All. You can set up the viewing of line numbers from the Display menu.

Rerunning a Process

If you want to rerun the process, simply click on the Run button. To temporarily turn off breakpoints, use the Traps menu in the Trap Manager window.

Using TotalView with MPI programs

The syntax for running SGI MPI with Etnus' TotalView is as follows:

% totalview mpirun -a -np 4 a.out

Note that TotalView is not expected to operate with MPI processes started via the MPI_Comm_spawn or MPI_Comm_spawn_multiple functions.

Using dbx and gdb with MPI programs

Because the dbx and gdb debuggers are designed for sequential, non-parallel applications, they are generally not well suited for use in MPI program debugging and development. However, the use of the MPI_SLAVE_DEBUG_ATTACH environment variable makes these debuggers more usable.

If you set the MPI_SLAVE_DEBUG_ATTACH environment variable to a global rank number, the MPI process sleeps briefly in startup while you use dbx or gdb to attach to the process. A message is printed to the screen, telling you how to use dbx to attach to the process.

Similarly, if you want to debug the MPI daemon, setting MPI_DAEMON_DEBUG_ATTACH sleeps the daemon briefly while you attach to it. Both of these environment variables are available on IRIX and Linux.