Chapter 2. Tutorial for C Users

This chapter provides a tutorial that shows you how to gather and analyze performance data in a C program, using SpeedShop tools. The tutorial covers these topics:


Note: Because of inherent differences between systems and because of concurrent processes that may be running on your system, your experiment will produce different results from the one in this tutorial. However, the basic structure of the results should be the same.

The C tutorial demonstrates the following experiments:

For an example of an MPI tracing tutorial and a hardware counter experiment, see Chapter 3, “Tutorial for Fortran Users”.

Tutorial Overview

This tutorial uses a sample program called generic. There are three versions of the program:

  • generic directory : contains files for the n32-bit ABI

  • generico32 directory: contains files for the (old) 32-bit ABI

  • generic64 directory: contains files for the 64-bit ABI

When you work with the tutorial, choose the version of generic most appropriate for your system. A good guideline is to choose the version that corresponds to the way you expect to develop your programs.

This tutorial was written and tested using the version of generic in the generic directory.

Contents of the generic Program

The generic program was designed as a test and demonstration application. It contains code to run scenarios that each test a different area of SpeedShop. The version of generic used in this tutorial performs scenarios that:

  • Build a linked list of structures

  • Use a lot of user time

  • Scan a directory and run the stat command on each file

  • Perform file I/O

  • Generate a number of floating-point exceptions

  • Load and call a routine in a DSO

Output from the generic Program

Output from the program looks like the following:

0:00:00.000 ======== (27173)             Begin script Fri  06 Feb 1998
15:03:31.
        begin script `ll.u.cvt.d.i.f.dso'
0:00:00.002 ======== (27173)        start of linklist Fri  06 Feb 2002
15:03:31.
        linklist completed.
0:00:00.003 ======== (27173)         start of usrtime Fri  06 Feb 2002
15:03:31.
        usertime completed.
0:00:25.572 ======== (27173)         start of cvttrap Fri  06 Feb 2002
15:03:57.
        cvttrap completed, y = 2.60188e+14, z = 2.60188e+14.
0:00:25.806 ======== (27173)         start of dirstat Fri  06 Feb 2002
15:03:57.
        dirstat of /usr/include completed, 304 files.
0:00:26.618 ======== (27173) start of iofile -- stdio Fri  06 Feb 2002
15:03:58.
        stdio iofile on /unix completed, 7307988 chars.
0:00:26.864 ======== (27173)        start of fpetraps Fri  06 Feb 2002
15:03:58.
        fpetraps completed.
0:00:26.865 ======== (27173)          start of libdso Fri  06 Feb 2002
15:03:58.
dlslave_init executed
dlslave_routine executed
        slaveusertime completed, x = 5000000.000000.
        libdso: dynamic routine returned 13
        end of script `ll.u.cvt.d.i.f.dso'
0:00:27.972 ======== (27173)               End script Fri  06 Feb 2002
15:03:59.

Tutorial Setup

Copy the program to a directory where you have write permission and compile it so that you can use it in the tutorial.

  1. Change to the /usr/demos/SpeedShop directory.

  2. Copy the appropriate generic directory and its contents to a directory where you have write permission:

    % cp -r generic your_dir

  3. Change to the directory you just created:

    % cd your_dir/generic

  4. Compile the program, by entering:

    % make all

    This provides an executable for the experiment.

Analyzing Performance Data

This section explains how to run the following experiments on the generic program, generate the experiment's results, and interpret the results:

  • usertime. As a first cut at optimization, this may be the most useful experiment. It breaks down a program into its functions and returns the CPU time used in each. See “A usertime Experiment”.

  • pcsamp. This experiment uses a different method to return the CPU time. See “A pcsamp Experiment”.

  • dsc_hwc. This experiment counts the number of times a required data item was not in secondary data cache. If the data item is not in secondary data cache, it must be fetched from memory, which requires more time. See “A Hardware Counter Experiment”.

  • bbcounts. This experiment counts basic block usage and estimates a linear time. It also maps out a complete call graph. See “A Basic Block Experiment”.

  • fpe. This experiment counts the number of floating-point exceptions in each function. See “An fpe Trace”.

You can follow the tutorial from start to finish, or you can choose the experiment you want to perform.

A usertime Experiment

This section explains how to perform a usertime experiment. The usertime experiment allows you to gather data on the amount of CPU time spent in each function in your program.


Note: Due to statistical sampling of the call stack, not all functions may appear in the experiment output.


For more information on usertime, see “Call Stack Profiling Experiment (usertime/totaltime)” in Chapter 4.

Performing a usertime Experiment

From the command line, enter the following:

% ssrun -usertime generic

This command starts the experiment. Output from generic and from ssrun is printed to stdout, as shown in the following example. A data file is also generated. The name consists of the process name (generic), the experiment type (usertime), and the experiment ID. In this example, the file name is generic.usertime.m10981.

0:00:00.000 ======== (16957)             Begin script Mon  18 Mar 2002
06:56:38.
        begin script `ll.u.cvt.d.i.f.dso'
0:00:00.004 ======== (16957)        start of linklist Mon  18 Mar 2002
06:56:38.
        linklist completed.
0:00:00.005 ======== (16957)         start of usrtime Mon  18 Mar 2002
06:56:38.
        usertime completed.
0:00:18.736 ======== (16957)         start of cvttrap Mon  18 Mar 2002
06:56:57.
        cvttrap completed, y = 2.60188e+14, z = 2.60188e+14.
0:00:18.906 ======== (16957)         start of dirstat Mon  18 Mar 2002
06:56:57.
        dirstat of /usr/include completed, 264 files.
0:00:18.941 ======== (16957) start of iofile -- stdio Mon  18 Mar 2002
06:56:57.
        stdio iofile on /unix completed, 7965088 chars.
0:00:20.426 ======== (16957)        start of fpetraps Mon  18 Mar 2002
06:56:59.
        fpetraps completed.
0:00:20.428 ======== (16957)          start of libdso Mon  18 Mar 2002
06:56:59.
dlslave_init executed
dlslave_routine executed
        slaveusertime completed, x = 5000000.000000.
        libdso: dynamic routine returned 13
        end of script `ll.u.cvt.d.i.f.dso'
0:00:21.217 ======== (16957)               End script Mon  18 Mar 2002
06:56:59.

Generating a Report

To generate a report on the data collected, enter the following at the command line:

% prof your_output_file_name > usertime.results

In this example, your_output_file_name would be generic.usertime.m10981. The prof command prints results to stdout.


Note: Because of line width restrictions, the DSO, file name, and line number information at the end of each line is wrapped to the next line in the following sample output.


-------------------------------------------------------------------------
SpeedShop profile listing generated Mon Mar 18 07:00:30 2002
   prof generic.usertime.m16957
                 generic (n32): Target program
                      usertime: Experiment name
                         ut:cu: Marching orders
                 R5000 / R5000: CPU / FPU
                             1: Number of CPUs
                           180: Clock frequency (MHz.)
  Experiment notes--
	  From file generic.usertime.m16957:
	Caliper point 0 at target begin, PID 16957
			/speedtest/generic/generic
	Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of statistical callstack sampling data (usertime)--
                           664: Total Samples
                             0: Samples with incomplete traceback
                        19.920: Accumulated Time (secs.)
                          30.0: Sample interval (msecs.)
-------------------------------------------------------------------------
Function list, in descending order by exclusive time
-------------------------------------------------------------------------
 [index]  excl.secs excl.%   cum.%  incl.secs incl.%    samples  procedure  (dso: file, line)

     [3]     18.570  93.2%   93.2%     18.570  93.2%        619  anneal (generic: generic.c, 1570)
     [5]      0.750   3.8%   97.0%      0.750   3.8%         25  slaveusrtime (dlslave.so: dlslave.c, 22)
     [8]      0.420   2.1%   99.1%      0.420   2.1%         14  __read (libc.so.1: read.s, 20)
    [12]      0.150   0.8%   99.8%      0.150   0.8%          5  cvttrap (generic: generic.c, 318)
    [13]      0.030   0.2%  100.0%      0.030   0.2%          1  _ngetdents (libc.so.1: ngetdents.s, 16)
     [1]      0.000   0.0%  100.0%     19.920 100.0%        664  main (generic: generic.c, 102)
     [2]      0.000   0.0%  100.0%     19.920 100.0%        664  Scriptstring (generic: generic.c, 185)
     [4]      0.000   0.0%  100.0%     18.570  93.2%        619  usrtime (generic: generic.c, 1385)
    [14]      0.000   0.0%  100.0%      0.030   0.2%          1  dirstat (generic: generic.c, 349)
    [15]      0.000   0.0%  100.0%      0.030   0.2%          1  _readdir (libc.so.1: readdir.c, 135)
     [9]      0.000   0.0%  100.0%      0.420   2.1%         14  iofile (generic: generic.c, 462)
    [10]      0.000   0.0%  100.0%      0.420   2.1%         14  fread (libc.so.1: fread.c, 27)
    [11]      0.000   0.0%  100.0%      0.420   2.1%         14  _read (libc.so.1: readSCI.c, 27)
     [6]      0.000   0.0%  100.0%      0.750   3.8%         25  libdso (generic: generic.c, 623)
     [7]      0.000   0.0%  100.0%      0.750   3.8%         25  dlslave_routine (dlslave.so: dlslave.c, 7)

Analyzing the Report

The report shows information for each function. The meanings of the column headings are described below:

  • The index column assigns a reference number to each function.

  • The excl.secs column shows how much time, in seconds, was spent in the function itself (exclusive time). Routines that begin with an underscore, such as __start, are internal routines that you cannot change.

  • The excl.% column shows the percentage of a program's total time that was spent in the function.

  • The cum.% column shows the percentage of the complete program time that has executed in the routines listed so far.

  • The incl.secs column shows how much time, in seconds, was spent in the function and descendents of the function.

  • The incl.% column shows the cumulative percentage of inclusive time spent in each function and its descendents.

  • The samples column shows how many samples were taken when the process was executing in the function and in all of the function's descendants.

  • The procedure (dso:file,line) columns list the function name, its DSO name, its file name, and its line number. For example, the top line reports statistics for the function anneal, the DSO generic, in the file generic.c, at line 1570.

A pcsamp Experiment

This section explains how to perform a pcsamp experiment. The pcsamp experiment allows you to gather information on actual CPU time for each function in your program. For more information on pcsamp, see “PC Sampling Experiment (pcsamp)” in Chapter 4.

From the command line, enter the following:

% ssrun -fpcsamp generic

This starts the experiment. The f prefix is added to pcsamp for this program because the program runs quickly and does not gather much data using the default pcsamp experiment name; adding the f prefix results in more data samples. Output from generic and from ssrun is printed to stdout, as shown in the following example.

A data file is also generated. The name consists of the process name (generic), the experiment type (fpcsamp), and the experiment ID. In this example, the file name is generic.fpcsamp.m11140.

0:00:00.000 ======== (16969)             Begin script Mon  18 Mar 2002
07:02:19.
        begin script `ll.u.cvt.d.i.f.dso'
0:00:00.005 ======== (16969)        start of linklist Mon  18 Mar 2002
07:02:19.
        linklist completed.
0:00:00.008 ======== (16969)         start of usrtime Mon  18 Mar 2002
07:02:19.
        usertime completed.
0:00:18.260 ======== (16969)         start of cvttrap Mon  18 Mar 2002
07:02:37.
        cvttrap completed, y = 2.60188e+14, z = 2.60188e+14.
0:00:18.430 ======== (16969)         start of dirstat Mon  18 Mar 2002
07:02:37.
        dirstat of /usr/include completed, 264 files.
0:00:18.464 ======== (16969) start of iofile -- stdio Mon  18 Mar 2002
07:02:37.
        stdio iofile on /unix completed, 7965088 chars.
0:00:18.813 ======== (16969)        start of fpetraps Mon  18 Mar 2002
07:02:38.
        fpetraps completed.
0:00:18.815 ======== (16969)          start of libdso Mon  18 Mar 2002
07:02:38.
dlslave_init executed
dlslave_routine executed
        slaveusertime completed, x = 5000000.000000.
        libdso: dynamic routine returned 13
        end of script `ll.u.cvt.d.i.f.dso'
0:00:19.577 ======== (16969)               End script Mon  18 Mar 2002
07:02:39.



Generating a Report

To generate a report on the data collected, and to redirect the output to a file, enter the following:

% prof your_output_file_name > pcsamp.results

Output similar to the following is generated:

-------------------------------------------------------------------------
SpeedShop profile listing generated Mon Mar 18 07:03:54 2002
   prof generic.fpcsamp.m16969
                 generic (n32): Target program
                       fpcsamp: Experiment name
                pc,2,1000,0:cu: Marching orders
                 R5000 / R5000: CPU / FPU
                             1: Number of CPUs
                           180: Clock frequency (MHz.)
  Experiment notes--
	  From file generic.fpcsamp.m16969:
	Caliper point 0 at target begin, PID 16969
			/speedtest/generic/generic
	Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of statistical PC sampling data (fpcsamp)--
                         19329: Total samples
                        19.329: Accumulated time (secs.)
                           1.0: Time per sample (msecs.)
                             2: Sample bin width (bytes)
-------------------------------------------------------------------------
Function list, in descending order by time
-------------------------------------------------------------------------
 [index]      secs    %    cum.%   samples  function (dso: file, line)

     [1]    18.084  93.6%  93.6%     18084  anneal (generic: generic.c, 1570)
     [2]     0.716   3.7%  97.3%       716  slaveusrtime (dlslave.so: dlslave.c, 22)
     [3]     0.329   1.7%  99.0%       329  __read (libc.so.1: read.s, 20)
     [4]     0.147   0.8%  99.7%       147  cvttrap (generic: generic.c, 318)
     [5]     0.031   0.2%  99.9%        31  _xstat (libc.so.1: xstat.s, 12)
     [6]     0.012   0.1%  99.9%        12  __write (libc.so.1: write.s, 20)
     [7]     0.004   0.0% 100.0%         4  fread (libc.so.1: fread.c, 27)
     [8]     0.002   0.0% 100.0%         2  iofile (generic: generic.c, 462)
     [9]     0.001   0.0% 100.0%         1  fprintf (libc.so.1: fprintf.c, 23)
    [10]     0.001   0.0% 100.0%         1  _cerror (libc.so.1: cerror.s, 30)
             0.002   0.0% 100.0%         2  **OTHER** (includes excluded DSOs, rld, etc.)

            19.329 100.0% 100.0%     19329  TOTAL



Analyzing the Report

The report has the following columns:

  • The [index] column assigns a reference number to each function.

  • The secs column shows the amount of CPU time, in seconds, that was spent in the function.

  • The % column shows the percentage of the total program time that was spent in the function.

  • The cum.% column shows the percentage of the complete program time in functions that have been listed so far.

  • The samples column shows how many samples were taken when the process was executing in the function.

  • The function (dso: file, line) columns list the function, its DSO name, its file name, and its line number.

A Hardware Counter Experiment


Note: This experiment can be performed only on systems that have built-in hardware counters (machines with the R10000, R12000, R14000, or R16000 class of CPU).

This section takes you through the steps to perform a hardware counter experiment. There are a number of hardware counter experiments, but this tutorial describes the steps involved in performing the dsc_hwc experiment. This experiment captures information about secondary data cache misses. For more information on hardware counter experiments, see “Hardware Counter Experiments (*_hwc, *_hwctime)” in Chapter 4.

Performing a Hardware Counter Experiment

From the command line, enter:

% ssrun -dsc_hwc generic

This starts the experiment. Output from generic and from ssrun is printed to stdout. A data file is also generated. The name consists of the process name (generic), the experiment type (dsc_hwc), and the experiment ID. In this example, the file name is generic.dsc_hwc.m294398.

Generating a Report

To generate a report on the data collected and redirect the output to a file, enter the following:

% profyour_output_file_name > dsc_hwc.results

The report should look similar to the following listing:

-------------------------------------------------------------------------
SpeedShop profile listing generated Mon Feb  2 11:11:44 1998
   prof generic.dsc_hwc.m294398
                 generic (n32): Target program
                       dsc_hwc: Experiment name
                 hwc,26,131:cu: Marching orders
               R10000 / R10010: CPU / FPU
                            16: Number of CPUs
                           195: Clock frequency (MHz.)
  Experiment notes--
          From file generic.dsc_hwc.m294398:
        Caliper point 0 at target begin, PID 294398
               /usr/demos/SpeedShop/linpack.demos/c/generic
        Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of R10K perf. counter overflow PC sampling data (dsc_hwc)--
                             6: Total samples
       Sec cache D misses (26): Counter name (number)
                           131: Counter overflow value
                           786: Total counts
-------------------------------------------------------------------------
Function list, in descending order by counts
-------------------------------------------------------------------------
 [index]        counts     %   cum.%   samples  function (dso: file, line)

     [1]           131  16.7%  16.7%     1  init2da (generic: generic.c, 1430)
     [2]           131  16.7%  33.3%     1  genLog (generic: generic.c, 1686)
     [3]           131  16.7%  50.0%     1  _write (libc.so.1: writeSCI.c, 27)
                   393  50.0% 100.0%     3  **OTHER** (includes excluded DSOs, rld, etc.)

                   786 100.0% 100.0%         6  TOTAL

Analyzing the Report

The information immediately preceding the function list displays the following:

  • The Total samples is the number of times the program counter was sampled. It is sampled once for each overflow, or once each time the hardware counter exceeds the specified value.

  • The Counter name (number) indicates the hardware counter used in the experiment. In this case, hardware counter 26 counts the number of times a value required in a calculation was not available in secondary cache. For a complete list of the hardware counters and their numbers, see Table 4-2.

  • The Counter overflow value is the number at which the hardware counter overflows or exceeds its preset value. In this case, the value is 131, which is the default. The fdsc_hwc experiment runs the same hardware counter experiment with the preset value of 29. You can change the overflow value by setting the _SPEEDSHOP_HWC_COUNTER_OVERFLOW environment variable to a value larger than 0, the _SPEEDSHOP_HWC_COUNTER_NUMBER environment variable to 26, and running the prof_hwc experiment instead of dsc_hwc. See “_hwc Hardware Counter Experiments” in Chapter 4 to learn how to choose a counter overflow value.

  • The Total counts is the total number of times a value was not in secondary cache when needed. This value is determined by multiplying the total number of samples by the overflow value; extra counts that do not cause an overflow are not recorded.

The function list has the following columns:

  • The index column assigns a reference number to each function.

  • The counts column shows the number of times a data item was not in secondary cache when needed for a calculation during the execution of the function. As with Total counts, a function's counts value is determined by multiplying its samples value by the overflow value.

  • The % column shows the percentage of the program's overflows that occurred in the function.

  • The cum.% column shows the percentage of the program's overflows that occurred in the functions listed so far. A function might have a low number in its % column but a high value in its cum.% column if it executed late in the program.

  • The samples column shows the number of times the program counter was sampled during execution of the function. A sample is taken for each overflow of the hardware counter.

  • The function (dso: file, line) columns list the function name, the DSO, the file name, and line number of the function.

A Basic Block Experiment

This section takes you through the steps to perform an bbcounts experiment. The times returned represent an idealized computation. This experiment ignores potential floating-point interlocks and memory latency time (cache misses and memory bus contention). The times returned will always be lower than the times for an actual run. For more information on the bbcounts experiment, see “Basic Block Counting Experiment (bbcounts)” in Chapter 4.

Performing a bbcounts Experiment

From the command line, enter

% ssrun -bbcounts generic

This starts the experiment. First the executable, rld, and the DSOs are instrumented. This entails making copies of the libraries and executables, giving the copies an extension of .pixie.

Output from generic and from ssrun is printed to stdout. A data file is also generated. The name consists of the process name (generic), the experiment type (bbcounts), and the experiment ID. In this example, the file name is generic.bbcounts.m10966, and the following is written to stdout:

        instrumenting /lib32/rld
        instrumenting /usr/lib32/libssrt.so
        instrumenting /usr/lib32/libss.so
        instrumenting /usr/lib32/libm.so
        instrumenting /usr/lib32/libc.so.1
        instrumenting /speedtest/generic/generic

0:00:00.001 ======== (16991)             Begin script Mon  18 Mar 2002
07:05:46.
        begin script `ll.u.cvt.d.i.f.dso'
0:00:00.016 ======== (16991)        start of linklist Mon  18 Mar 2002
07:05:46.
        linklist completed.
0:00:00.025 ======== (16991)         start of usrtime Mon  18 Mar 2002
07:05:46.
        usertime completed.
0:00:19.943 ======== (16991)         start of cvttrap Mon  18 Mar 2002
07:06:06.
        cvttrap completed, y = 2.60188e+14, z = 2.60188e+14.
0:00:20.201 ======== (16991)         start of dirstat Mon  18 Mar 2002
07:06:06.
        dirstat of /usr/include completed, 264 files.
0:00:20.245 ======== (16991) start of iofile -- stdio Mon  18 Mar 2002
07:06:06.
        stdio iofile on /unix completed, 7965088 chars.
0:00:20.623 ======== (16991)        start of fpetraps Mon  18 Mar 2002
07:06:07.
        fpetraps completed.
0:00:20.631 ======== (16991)          start of libdso Mon  18 Mar 2002
07:06:07.
        instrumenting /speedtest/generic/./dlslave.so
dlslave_init executed
dlslave_routine executed
        slaveusertime completed, x = 5000000.000000.
        libdso: dynamic routine returned 13
        end of script `ll.u.cvt.d.i.f.dso'
0:00:22.058 ======== (16991)               End script Mon  18 Mar 2002
07:06:08.

The output statements beginning with “instrumenting declares that ssrun is instrumenting first the libraries and then the generic executable itself.

Generating a Report

To generate a report on the data collected, enter the following at the command line:

% prof your_output_file_name > bbcounts.results

This command redirects output to a file called bbcounts.results. The file contains results that look similar to the following partial listing. The number of functions and their names may also vary.

-------------------------------------------------------------------------
SpeedShop profile listing generated Mon Mar 18 07:07:06 2002
   prof generic.bbcounts.m16991
                 generic (n32): Target program
                      bbcounts: Experiment name
                         it:cu: Marching orders
                 R5000 / R5000: CPU / FPU
                             1: Number of CPUs
                           180: Clock frequency (MHz.)
  Experiment notes--
	  From file generic.bbcounts.m16991:
	Caliper point 0 at target begin, PID 16991
			/speedtest/generic/generic
	Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of ideal time data (bbcounts)--
                    2048459432: Total number of instructions executed
                    3266522347: Total computed cycles
                        18.147: Total computed execution time (secs.)
                         1.595: Average cycles / instruction
-------------------------------------------------------------------------
Function list, in descending order by exclusive ideal time
-------------------------------------------------------------------------
 [index]   excl.secs   excl.%     cum.%        cycles  instructions    calls  function  (dso: file, line)

     [1]      17.304    95.4%     95.4%    3114690027    1956780024        1  anneal (generic.pixbb: generic.c, 1570)
     [2]       0.694     3.8%     99.2%     125000842      75000732        1  slaveusrtime (dlslave.so: dlslave.c, 22)
     [3]       0.139     0.8%     99.9%      25000068      15000054        1  cvttrap (generic.pixbb: generic.c, 318)
     [4]       0.002     0.0%    100.0%        348146        324941     1286  general_find_symbol (rld: rld.c, 2038)
     [5]       0.002     0.0%    100.0%        337349        305518     2874  resolve_relocations (rld: rld.c, 2636)
     [6]       0.001     0.0%    100.0%        148338        148338     1301  elfhash (rld: obj.c, 1184)
     [7]       0.001     0.0%    100.0%         95076         95076     4138  obj_dynsym_got (rld: objfcn.c, 46)
     [8]       0.000     0.0%    100.0%         88459         88459     1663  strcmp (rld: strcmp.s, 34)
     [9]       0.000     0.0%    100.0%         78410         69944        1  fix_all_defineds (rld: rld.c, 3419)
    [10]       0.000     0.0%    100.0%         76859         74309     1289  resolve_symbol (rld: rld.c, 1828)
    [11]       0.000     0.0%    100.0%         75590         58123        1  init2da (generic.pixbb: generic.c, 1427)
    [12]       0.000     0.0%    100.0%         72817         71565     1256  resolving (rld: rld.c, 1499)
    [13]       0.000     0.0%    100.0%         67753         62361      487  fread (libc.so.1: fread.c, 27)
    [14]       0.000     0.0%    100.0%         48270         45600       53  _doprnt (libc.so.1: doprnt.c, 227)
    [15]       0.000     0.0%    100.0%         48000         35200     1600  _drand48 (libc.so.1: drand48.c, 116)
    [16]       0.000     0.0%    100.0%         38783         27864      628  __sinf (libm.so: fsin.c, 97)
    [17]       0.000     0.0%    100.0%         34408         34320        6  search_for_externals (rld: rld.c, 3987)
    .
    .
    .
    .

Analyzing the Report

The columns in the report provide the following information:

  • The index column assigns a reference number to each function.

  • The excl.secs column shows the minimum number of seconds that might be spent in the function under ideal conditions.

  • The excl.% column shows how much of the program's total time was spent in the function.

  • The cum.% column shows the cumulative percentage of time spent in the functions listed so far.

  • The cycles column shows the total number of machine cycles used by the function.

  • The instructions column shows the total number of instructions executed by a function.

  • The calls column shows the total number of calls made to the function.

  • The function (dso:file, line) columns list the function, its DSO name, its file name, and the line number.

An fpe Trace

This section takes you through the steps to perform a floating-point exception (fpe) trace, which identifies functions in which floating-point exceptions have occurred. For more information on the fpe trace, see “Floating-Point Exception Trace Experiment (fpe)” in Chapter 4.

Performing an fpe Trace

From the command line, enter:

% ssrun -fpe generic

Output from generic and from ssrun is printed to stdout. A data file is created with a name generated by concatenating the process name (generic), the experiment type (fpe), and the experiment ID. In this example, the file name is generic.fpe.m12213.

Generating a Report

To generate a report on the data collected, enter the following at the command line:

% prof your_output_file_name > fpe.results

The report should look similar to the following partial listing:

-------------------------------------------------------------------------
SpeedShop profile listing generated Mon Mar 18 07:09:16 2002
   prof generic.fpe.m17002
                 generic (n32): Target program
                           fpe: Experiment name
                        fpe:cu: Marching orders
                 R5000 / R5000: CPU / FPU
                             1: Number of CPUs
                           180: Clock frequency (MHz.)
  Experiment notes--
	  From file generic.fpe.m17002:
	Caliper point 0 at target begin, PID 17002
			/speedtest/generic/generic
	Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of FPE callstack tracing data (fpe)--
                             4: Total FPEs
                             0: Samples with incomplete traceback
-------------------------------------------------------------------------
Function list, in descending order by exclusive FPEs
-------------------------------------------------------------------------
 [index]   excl.FPEs excl.%   cum.%   incl.FPEs incl.%  function  (dso: file, line)

     [1]           4 100.0%  100.0%           4 100.0%  fpetraps (generic: generic.c, 406)
     [2]           0   0.0%  100.0%           4 100.0%  main (generic: generic.c, 102)
     [3]           0   0.0%  100.0%           4 100.0%  Scriptstring (generic: generic.c, 185)



Analyzing the Report

The report shows information for each function:

  • The index column assigns a reference number to each function.

  • The excl.FPEs column shows how many floating-point exceptions were found in the function. .

  • The excl.% column shows the percentage of the total number of floating-point exceptions that were found in the function.

  • The cum.% column shows the percentage of exclusive floating-point exceptions in the functions that have been listed so far. The list is sorted by the number of floating-point exceptions, with the most in the top line and the least in the bottom line. Because all of the exceptions are in the first function listed in this example, all entries in this column are 100%.

  • The incl.FPEs column shows how many floating-point exceptions were generated by the function and the functions it called.

  • The incl.% column shows the percentage of the program's total number of floating-point exceptions in this function and the functions it called. Because fpetraps is called through all of the other functions, they are all listed as 100%.

  • The function (dso:file, line) columns list the routine name, its DSO name, its file name, and its line number.