Chapter 6. Setting Up and Running Experiments: ssrun

This chapter provides information on how to set up and run performance analysis experiments using the ssrun command. It consists of the following sections:

Building Your Executable

The ssrun command is designed to be used with normally built executables and default environment settings. However, there are some cases where you need to change the way you build your executable or set certain environment variables.

This section explains when to change the way you build your executable program. For information on setting environment variables, see "Using Runtime Environment Variables."

  • If you have used the ssrt_caliper_point() function provided in the SpeedShop libraries, you have to explicitly link in the SpeedShop libraries libss.so and libssrt.so. For more information on setting caliper points, see "Using Calipers."

  • If you are planning to build your executable using the -32 option to the cc command, and you want to run the usertime experiment, you must add -lexc to the link line. For more information on cc -32, see the cc reference page.

  • If you have built a stripped executable, you need to rebuild a non-stripped version to use with SpeedShop. For example, if you are using ld to link your C program, do not use the -s option because this strips debugging information from the program object and makes the program unusable for performance analysis.

  • If you have used compiler optimization level 3, and you are performing experiments that report function-level information, the procedure inlining the optimization performs can result in extremely misleading profiles since the time spent in the inlined procedure will show up in the profile as time spent in the procedure into which it was inlined. It's generally better to use compiler optimization level 2 or less when gathering an execution profile.

Special Information for MP Fortran Programs

If you are compiling MP Fortran programs, you may encounter anomalies in the displayed data:

  • For all FORTRAN MP compilations, parallel loops within the program are represented as subroutines with names relating to the source routine in which they are embedded. The naming conventions for these subroutines are different for 32-bit and 64-bit compilations.

    For example, in the linpack example program, most of the time is spent in the routine DAXPY, which can be parallelized.

    • In an n32 or 64-bit MP version, the routine has the name "DAXPY," but most of that work is done in the MP routine named "DAXPY.PREGION1."

    • In a 32-bit version, the DAXPY routine is named "daxpy_," and the MP routine "_daxpy_519_aaab_."

  • If you perform an ideal experiment, the source annotations for 32-bit and 64-bit compilations with the -g option differ and are not correct in most cases.

    • In 64-bit source annotations, the exclusive time is correctly shown for each line, but the inclusive time for the first line of the loop (do statement) includes the time spent in the loop body. This same time appears on the lines comprising the loop's body, in effect representing a double-counting.

    • In 32-bit source annotations, the exclusive time is incorrectly shown for the line comprising the loop's body. The line-level data for the loop-body routine ("_daxpy_519_aaab_") doesn't refer to proper lines. If the program was compiled with the -mp_keep flag, the line-level data should refer to the temporary files that are saved from the compilation, but the temporary files do not contain that information, so no source or disassembly data can be shown. The disassembly data for the main routine does not show the times for the loop-body.

    • If the 32-bit program was compiled without the -mp_keep flag, the line-level data for the loop-body routine is incorrect. Most lines refer to line 0 of the file, and the rest to other lines at seemingly random places in the file. Consequently, spurious annotations will appear on these other lines. Disassembly correctly shows the instructions and their data, but the line numbers are wrong. This reflects essentially the same double-counting problem as seen in 64-bit compilations, but the extra counts go to other places in the file, rather than to the first line of the loop.

Setting Up Output Directories and Files

When you run an experiment, performance data files are written to the current working directory by default. They are named using the following convention:

prog_name.exp_type.id 

The experiment ID, id, consists of one or two letters (designating the process type) and the process ID number. See Table 6-1 for letter codes and descriptions.

Table 6-1. Letter Codes in Experiment ID Numbers

Letter code

Description

m

Master process created by ssrun

p

Process created by a call to sproc()

f

Process created by a call to fork()

s

Process created by a call to system()

e

Process created by a call to exec()

fe

Process created by a call to fork() and exec()

In a single-process application, ssrun generates a single performance data file. In a multi-process application, ssrun generates a performance data file for each process.

You can change the default filename or directory for performance data files
using environment variables. See _SPEEDSHOP_OUTPUT_DIRECTORY and _SPEEDSHOP_OUTPUT_FILENAME in Table 6-2 for more information.

Using Runtime Environment Variables

This section provides information about available environment variables, grouped by functionality:

User Environment Variables

A number of environment variables are normally used to control the operation of SpeedShop. Table 6-2 lists these variables.

Table 6-2. General Environment Variables

Variable

Description

_SPEEDSHOP_VERBOSE

Causes a log of each program's operation to be written to stderr. If this variable is set to an empty string, only major events are logged; if it is set to a non-empty string, more detailed events are logged.

_SPEEDSHOP_SILENT

Suppresses all SpeedShop output, other than fatal error messages.

If both _SPEEDSHOP_VERBOSE and _SPEEDSHOP_SILENT are set, _SPEEDSHOP_VERBOSE is ignored.

_SPEEDSHOP_CALIPER_POINT_SIG sig_num

Causes the specified signal number to be used for recording a caliper-point in the experiment.

_SPEEDSHOP_REUSE_FILE_DESCRIPTORS

Opens and closes the file descriptors for the output files every time performance data is to be written.

_SPEEDSHOP_HWC_COUNTER_NUMBER

Specifies the counter to be used for prof_hwc experiments. Counters are numbered between 0 and 31, and are described in the MIPS R10000 Microprocessor's User's Manual, Chapter 14. Counter 0 counters are numbered 0-15, and counter 1 counters are numbered 16-31.

_SPEEDSHOP_HWC_COUNTER_OVERFLOW

Specifies the overflow value for the counter to be used in prof_hwc experiments. The value chosen may be any number greater than 0. Some choices may produce data that is not statistically random, but reflects a correlation between the overflow interval and a cyclic behavior in the application. Users may want to do two or more runs with different overflow values.

_SPEEDSHOP_OUTPUT_NOCOMPRESS

Disables the compression of performance data.

_SPEEDSHOP_OUTPUT_DIRECTORY

Causes the output data files to be placed in the specified directory, rather than the current working directory.

_SPEEDSHOP_OUTPUT_FILENAME

Causes the output file to be saved under the specified name.

If _SPEEDSHOP_OUTPUT_DIRECTORY is also specified, it is prepended to the filename you specify.


Process Tracking Environment Variables

A number of environment variables may be used for controlling the treatment of processes spawned from the original target. Table 6-3 lists these variables.

Table 6-3. Process Tracking Environment Variables

Variable

Description

_SPEEDSHOP_TRACE_FORK [True|False]

If True, specifies that processes spawned by calls to fork() will be monitored if they don't call exec(). If they do call exec(), and _SPEEDSHOP_TRACE_FORK_TO_EXEC is not set to True, the data covering the time between the fork() and exec() will be discarded. It is true by default.

Note: In the current release, data are recorded independent of whether the process calls exec() or not.

_SPEEDSHOP_TRACE_FORK_TO_EXEC [True|False]

If True, specifies that a process spawned by calls to fork() will be monitored even if they also call exec(). It is False by default.

_SPEEDSHOP_TRACE_EXEC [True|False]

If True, specifies that a process spawned by calls to any of the various flavors of exec() will be monitored. It is true by default.

_SPEEDSHOP_TRACE_SPROC [True|False]

If True, specifies that a process spawned by calls to sproc() will be monitored. It is True by default.

_SPEEDSHOP_TRACE_SYSTEM [True|False]

If True, specifies that system() calls will be monitored. It is False by default.


Expert-Mode Environment Variables

A number of variables may be used for debugging and finer control of the operation of SpeedShop. Table 6-4 lists these variables.

Table 6-4. Expert-Mode Environment Variables

Variable

Description

_SPEEDSHOP_SAMPLING_MODE

For PC-sampling and hardware-counter profiling. If set to 1, generates data for the base executable only. If not set, or set to a value different from 1, data is generated for the executable and all DSOs it uses.

_SPEEDSHOP_INIT_DEFERRED_SIG sig_num

If specified, initialization of the experiment is not performed when the target process starts, but will be delayed until the specified signal is sent to the process. A handler for the given signal is installed when the process starts. It is the user's responsibility to ensure that it is not overridden by the target code.

_SPEEDSHOP_SHUTDOWN_SIG sig_num

If specified, termination of the experiment will not be performed when the target process exits, but rather will happen when the specified signal is sent to the process. A handler for the given signal will be installed when the process starts, and it is the user's responsibility to ensure that it is not overridden by the target code.

_SPEEDSHOP_EXPERIMENT_TYPE

Passes the name of the experiment to the runtime. It is normally set by ssrun, but may be overwritten.

_SPEEDSHOP_MARCHING_ORDERS

Passes the marching orders of the experiment to the runtime. It is normally set by ssrun from the experiment type, but may be overwritten.

_SPEEDSHOP_SBRK_BUFFER_LENGTH

Defines the maximum size of the internal malloc arena used. This arena is completely separate from the user's arena, and has a default size of 0x100000.

_SPEEDSHOP_FILE_BUFFER_LENGTH

Defines the size of the buffer used for writing the experiment files. The default length is 8 KB. The buffer is used only for writing small records to the file; large records are written directly to avoid the buffering overhead.

_SPEEDSHOP_DEBUG_NO_SIG_TRAPS

Disables the normal setting of signal handlers for all fatal and exit signals.

_SPEEDSHOP_DEBUG_NO_STACK_UNWIND

Suppresses the stack unwind as done in usertime experiments, and as is done at caliper-samples for all experiments. The option is used as a workaround for various unwind bugs in libexc.


Running Experiments

This section describes how to use ssrun to perform experiments. For information on using pixie directly, see Chapter 8, "Using SpeedShop in Expert Mode: pixie."

ssrun Syntax

ssrun flags -exp_type prog_name prog_args 

flags  

Zero or more of the flags described in Table 6-5 that control the data collection and the treatment of descendent processes or programs, and how the data is to be externalized.

-exp_type  

The experiment type. Experiments are described in detail in Chapter 4, "Experiment Types."

prog_name  

The name of the program on which you want to run an experiment.

args  

Arguments to your program, if any.

ssrun generates a performance data file that is named as described in the section "Building Your Executable."

Table 6-5. Flags for ssrun

Name

Result

-hang

Specifies that the process should be left waiting just before executing its first instruction. This allows you to attach the process to a debugger.

-mo marching_orders

Allows you to specify marching orders. If this option is used, the environment variable _SSRUNTIME_MARCHING_ORDERS is not examined.

-name target_name

Specifies that the target should be run with argv[0] set to target_name.

-purify

Can be used only when the Purify® product is installed. Specifies that purify should be run on the target, and then runs the resulting "purified" executable. Note that -purify and SpeedShop performance experiments cannot be combined.

-v

Prints a log of the operation of ssrun to stderr. The same behavior occurs if the environment variable _SPEEDSHOP_VERBOSE is set a to an empty string.

-V

Prints a detailed log of the operation of ssrun to stderr. The same behavior occurs if the environment variable _SPEEDSHOP_VERBOSE is set a to a non-zero-length string. This option can be used to see how to set the various environment variables, and how to invoke instrumentation when necessary.


ssrun Examples

This section provides examples of using ssrun with options and experiment types. For additional examples, see Chapter 2, "Tutorial for C Users," or Chapter 3, "Tutorial for Fortran Users."

Example Using the pcsampx Experiment

The pcsampx experiment collects data to estimate the actual CPU time for each source code line, machine instruction, and function in your program. The optional x suffix causes a 32-bit bin size to be used, allowing a larger number of counts to be recorded. For a more detailed description of the pcsamp experiment, see the "pcsamp Experiment" section in Chapter 4, "Experiment Types."

This example performs a pcsampx experiment on the generic executable:

ssrun -pcsampx generic

To see the performance data that has been generated, run prof on the performance data file, generic.pcsampx.16064:

prof generic.pcsampx.m16064

The report is printed to stdout. (This layout of this report has been altered slightly to accommodate presentation needs.) For more information on prof and the reports generated by prof, see Chapter 7, "Analyzing Experiment Results: prof."

-------------------------------------------------------------------------------
Profile listing generated Thu May 23 10:30:40 1996
    with:       prof generic.pcsampx.m16064 
-------------------------------------------------------------------------------
samples   time    CPU    FPU   Clock   N-cpu  S-interval Countsize
   2058    21s  R4000  R4010 150.0MHz   1     10.0ms     4(bytes)
 
Each sample covers 4 bytes for every 10.0ms ( 0.05% of 20.5800s)
-------------------------------------------------------------------------------
  -p[rocedures] using pc-sampling.
  Sorted in descending order by the number of samples in each procedure.
  Unexecuted procedures are excluded.
-------------------------------------------------------------------------------
samples   time(%)      cum time(%)      procedure (dso:file)
 
1926      19s( 93.6)   19s( 93.6)         anneal (generic:/usr/demos/
          SpeedShop/generic/generic.c)
 111       1.1s(  5.4)   20s( 99.0)   slaveusrtime (/usr/demos/SpeedShop/
          generic/dlslave.so:/usr/demos/SpeedShop/generic/dlslave.c)
  15      0.15s(  0.7)   21s( 99.7)          _read (/usr/lib32/libc.so.1:
          /work/irix/lib/libc/libc_n32_M3/sys/read.s)
   2      0.02s(  0.1)   21s( 99.8)         memcpy (/usr/lib32/libc.so.1:
          /work/irix/lib/libc/libc_n32_M3/strings/bcopy.s)
   1      0.01s(  0.0)   21s( 99.9)         _xstat (/usr/lib32/libc.so.1:
          /work/irix/lib/libc/libc_n32_M3/sys/xstat.s)
   1      0.01s(  0.0)   21s( 99.9)        _ltzset (/usr/lib32/libc.so.1:
          /work/irix/lib/libc/libc_n32_M3/gen/time_comm.c)
   1      0.01s(  0.0)   21s(100.0)         __sinf (/usr/lib32/libm.so:
          /work/cmplrs/libm/fsin.c)
   1      0.01s(  0.0)   21s(100.0)         _write (/usr/lib32/libc.so.1:
          /work/irix/lib/libc/libc_n32_M3/sys/write.s)
 
2058        21s(100.0)   21s(100.0)          TOTAL

Example Using the -v Option

To get information about how a SpeedShop experiment is set up and performed, you can supply the -v option to ssrun.

This example performs a pcsampx experiment on the generic executable:

ssrun -v -pcsampx generic

The ssrun command writes the following output to stderr. It displays information as the command line is parsed and shows the environment variables that ssrun sets.

fraser 75% ssrun -v -pcsampx generic

ssrun: setenv _SPEEDSHOP_MARCHING_ORDERS pc,4,10000,0:cu
ssrun: setenv _SPEEDSHOP_EXPERIMENT_TYPE pcsampx
ssrun: setenv _SPEEDSHOP_TARGET_FILE generic
ssrun: setenv _RLD_LIST libss.so:libssrt.so:DEFAULT
...

Using ssrun With a Debugger

To use the ssrun command in conjunction with a debugger such as dbx or the ProDev WorkShop debugger, you need to call ssrun with the -hang option and the name of your program.

Follow these steps to run the FPE trace experiment on generic, and then run generic in a debugger.

  1. Call ssrun as follows:

    ssrun -hang -fpe generic

    ssrun parses the command line, sets up the environment for the experiment, calls the target process using exec, and hangs the target process on exiting from the call to exec.

  2. Get the process ID of the call to ssrun using a command such as ps.

  3. Start your debugging session.

  4. Attach the process to the debugger.

  5. Run the process from the debugger.

You can also invoke ssrun from within a debugger. In this case, ssrun leaves the target hung on exiting the call to exec, and informs the debugger of that fact.

You can also use either dbx or the WorkShop debugger to set calipers to record performance data for a part of your program. See "Using Calipers" for more information on setting calipers.

Running Experiments on MPI Programs

The Message Passing Interface (MPI) is a library specification for message-passing, proposed as a standard by a committee of vendors, implementors, and users. It allows processes to communicate by "mailing" data "messages" to other processes, even those running on distant computers.

If your program uses the MPI, you need to set up SpeedShop experiments a little differently. There are two ways to accomplish this. The first method takes two steps:

  1. Set up a shell script that contains the call to ssrun and the experiment you want to run.

    For example, if you have a program called testit, and you want to run the pcsampx experiment, a script, named exp_script, might look like the following:

    #!/bin/sh
    ssrun -pcsampx testit

  2. Call mpirun with the script name using one of the following:

    mpirun -np 6 exp_script
    mpirun host1 2, host2 2 exp_script

The second method is to use one of the following:

mpirun -np 6 ssrun -pcsampx testit
mpirun host1 2, host2 2 ssrun -pcsampx testit

The master experiment file created on each MPI host might not contain performance data from the application (depending on the MPI version), but rather from a master program that spawns the actual MPI application slaves. You can choose to exclude that file from performance analysis.

When using ssrun -ideal, or ssrun -purify, you should take care that the code for each separate host executes out of a different physical directory, not out of the same NFS mounted directory. During process creation, instrumentation is performed, and since different hosts may have different versions of the same named library (libc.so.1, for example), conflicts may occur. You may also need to use the -d option with mpirun to specify the directory on each host.

Running Experiments on Programs Using Pthreads

Pthreads are the threads defined by the POSIX® operating system standard (IEEE1003.1c-1995). This standard contains a set of interfaces and semantics for creating and managing threads within the POSIX operating system definition. The basic Silicon Graphics pthreads implementation consists of a library (one for each o32, n32 and n64 ABI) and a header file.

Applications using pthreads are specifically identified by SpeedShop. Performance data collection is done on a per-program basis, rather than on a per-pthread basis. Under IRIX 6.2, 6.3 and 6.4, SpeedShop creates as many experiment files as the number of sprocs used by the pthreads library to create and manage the pthreads. In addition, cm_usage data is not supported, and SIGTERM is reserved to be used to terminate the application normally. You should analyze all the experiment files together via prof to get a valid profile for the code. Under IRIX 6.5, SpeedShop creates only one experiment file. For usertime and fpe experiments, however, you can specify the -pthreads option with prof to get per-pthread performance reports.

Using Calipers

In some cases, you may want to generate performance data reports for only a part of your program. You can do this by setting caliper points to identify the area or areas for which you want to see performance data. When you run prof, you can specify a region for which to generate a report by supplying the -calipers option and the appropriate caliper numbers. For more information on prof -calipers, see "Using the -calipers Option" in Chapter 7, "Analyzing Experiment Results: prof."

Table 6-6 shows how you can set caliper points in three different ways.

Table 6-6. Setting Caliper Points

Use This Approach...

For These Benefits...

Explicitly link with the SpeedShop runtime and call ssrt_caliper_point to record a caliper sample.

Allows you to set a caliper point at a specific location in a file.

Define a signal to be used to record a caliper sample by specifying a signal as a value to the environment variable _SPEEDSHOP_CALIPER_POINT_SIG and then sending the target the given signal.

Useful if you want to be able to set a caliper point as your program is running.

Set a caliper sample trap in dbx or the WorkShop debugger. Setting a trap involves setting a breakpoint and evaluating the expression libss_caliper_point(1) when the process stops.

Useful if you're working with a debugger in conjunction with SpeedShop.

An implicit caliper point is always present at the start of execution of the process. A final caliper-point is recorded when the process calls _exit. The implicit caliper point at the beginning of the program is numbered 0, the first caliper point recorded is numbered 1, and any additional caliper points are numbered sequentially.

In addition, caliper points are automatically recorded under the following circumstances to ensure that at least one valid set of data is recorded.

  • When a fatal signal is received, such as SIGQUIT, SIGILL, SIGTRAP, SIGABRT, SIGEMT, SIGFPE, SIGBUS, SIGSEGV, SIGSYS, SIGXCPU or SIGXFSZ. Note that this list does not and cannot include SIGKILL.

  • When the program calls an exec function such as execve() or execvp().

  • When a program closes a DSO by calling dlclose().

  • When an exit signal is received, such as SIGHUP, SIGINT, SIGPIPE, SIGALRM, SIGTERM, SIGUSR1, SIGUSR2, SIGPOLL, SIGIO, SIGRTMIN or SIGRTMAX.

Setting Calipers With ssrt_caliper_point

To set calipers with ssrt_caliper_point, follow these steps:

  1. Insert calls to ssrt_caliper_point() in your source code. Call the function with the argument 1 (True).

    ...
    ssrt_caliper_point(1);
    ...

    You can insert one or more calls at any point in your code.

  2. Link the SpeedShop library libss.so into your application.

    The library should be placed last on the link line.

  3. Run your program with ssrun and the desired experiment type.

    For example, if you want to run the ideal experiment on generic:

    ssrun -ideal generic

    The caliper points you have set in the source file are recorded in the performance data file that is generated by ssrun.

Setting Calipers With Signals

To set calipers with signals, follow these steps:

  1. Set the_SPEEDSHOP_CALIPER_POINT_SIG variable to the signal number you want to use.

    Choose a signal that doesn't terminate the program. The signal should also not be caught by the target program, because this would interfere with its use for triggering a caliper point.

    The following signals are good choices because they don't have any semantics already associated with them:

    SIGUSR1 16      /* user defined signal 1 */
    SIGUSR2 17      /* user defined signal 2 */

  2. Run ssrun with your program.

  3. Enter a command such as ps or top to determine the process ID of ssrun. This is also the process ID of the program you are working on.

  4. Send the signal you used in step 1 to the process using the kill command:

    kill -sig_num pid

    A caliper point is set at the point in the program where the signal was received by the SpeedShop runtime.

Setting Calipers With a Debugger

From either dbx or the WorkShop debugger, you can set a caliper point anywhere it is possible to set a breakpoint: function entry or exit, line numbers, execution addresses, watchpoints, pollpoints (timer-based). You can also attach conditions and/or cycle counts.

  1. Set a breakpoint in your program at the point at which you want to set a caliper point.

  2. When the process stops, evaluate the expression libss_caliper_point(1).

    The evaluation of the expression always returns zero, but a side effect of the evaluation is the recording of the appropriate data.

  3. Resume execution of the process.

Effects of ssrun

When you call ssrun, the system performs the following operations for all experiments:

  • Sets various environment variables like _SPEEDSHOP_MARCHING_ORDERS and _SPEEDSHOP_EXPERIMENT_TYPE.

    For more information on these variables, see "Using Runtime Environment Variables."

  • Inserts the SpeedShop libraries libss.so and libssrt.so as part of your executable using the environment variable _RLD_LIST.

  • Invokes the target process by calling exec().

  • The SpeedShop runtime library writes the appropriate experiment data to the output file.

Effects of ssrun -ideal

When you run an ideal experiment, the following additional operations occur:

  • libpixrt.so is inserted first in the executable's library list.

  • libssrt.so and libss.so are inserted in the executable's library list.

  • ssrun generates pixified versions of all the libraries that the program uses, as well as the executable.

    The generated pixified versions have an extension that depends on the ABI:

    • .pixie for the executable

    • .pix32 for all 32 libraries

    • .pixn32 for all n32 libraries

    • .pix64 for all 64 libraries

    The generated files are written to the current working directory, and include code that allows performance data to be collected for each function and basic block.

    For more information on the ideal experiment, see the "ideal Experiment" section in Chapter 4, "Experiment Types."