Chapter 7. Analyzing Experiment Results: prof

This chapter provides information on how to view and analyze experiment results. It consists of the following sections:

Using prof to Generate Performance Reports

Performance data is examined using prof, a text-based report generator that prints to stdout.

The prof command can be used in two modes:

  • To generate a report from performance data gathered during experiments recorded by ssrun:

    prof <options> <perf-data-file> <perf-data-file> ...

  • To generate a report from data files produced by running a program that has been instrumented by pixie:

    prof executable_name [options] [pixie counts file]

    This chapter focuses on the use of prof to generate reports from ssrun experiments. For information on prof for a pixie experiment, see Chapter 8, "Using SpeedShop in Expert Mode: pixie."

prof Syntax

The syntax for prof when using it with data files from ssrun is:

prof options data_file data_file ...

options 

Zero or more of the options described in Table 7-2.

data_file 

One or more names of performance data files generated by ssrun. These files are usually of the format prog_name.exp_type.id.

The experiment ID, id, consists of one or two letters (designating the process type) and the process ID number. See Table 7-1 for letter codes and descriptions.

Table 7-1 lists the letter codes for id.

Table 7-1. Letter Codes in Experiment ID Numbers

Letter Code

Description

m

Master process created by ssrun

p

Process created by a call to sproc()

f

Process created by a call to fork()

s

Process created by a call to system()

e

Process created by a call to exec()

fe

Process created by a call to fork() and exec()


prof Options

Table 7-2 lists prof options. For more information, see the prof reference page.

Table 7-2. Options for prof

Name

Result

-calipers n1 n2

Restricts analysis to a segment of program execution. This option works only for SpeedShop experiments.

Causes prof to compute the data between caliper points n1 and n2, rather than for the entire experiment.

If n1 >= n2, an error is reported.

If n1 is negative, it is set to the beginning of the experiment.

If n2 is greater than the maximum number of caliper points recorded, it is set to the maximum.

If n1 is omitted, zero (the beginning of the program) is assumed.

–c[lock] n

Lists the number of seconds spent in each routine, based on the CPU clock frequency n, expressed in megahertz. This option is useful when generating reports for ideal experiments, or for basic block counting data obtained with pixie. The default is to use the clock frequency of the machine where the performance data was collected.

-cycle n

Sets the cycle time to n nanoseconds.

-den[sity]

Prints a list of procedures with non-zero instruction cycles sorted by the instruction density, which is the number of cycles per instruction.

This option can be used when generating reports for ideal experiments, or for basic block counting data obtained with pixie.

-debug:dbg_flags

dbg_flags can be combinations of the following:

GPROF_FLAG 0x00000001

COUNTS_FLAG 0x00000002

SAMPLE_FLAG 0x00000004

MISS_FLAG 0x00000008

FEEDBACK_FLAG 0x00000010

CORD_FLAG 0x00000020

USERPC_FLAG 0x00000040

MDEBUG_FLAG 0x00000080

BEAD_FLAG 0x00000100

LIBSSRT_FLAG 0x00000200

-dis[assemble]

Disassembles and annotates the analyzed object code with cycle times if you have run an ideal experiment, collected data using pixie, or have run a pcsamp or prof_hwc experiment.

–dso [dso_name]

Generates a report only for the named DSO. If you don't specify dso_name, prof prints a list of applicable DSO names. Only the basename of the DSO needs to be specified.

-dsolist

List all the DSOs in the program and their start and end text addresses.

–e[xclude] proc1...procN

Excludes information on the specified procedures. If you specify uppercase –E, prof also omits the specified procedures from the base upon which it calculates percentages.

–feedback

Produces files with information that can be used to (a) arrange procedures in the binary in an optimal ordering using cord, and (b) tell the compiler how to optimize compilation of the program using cc -fb filename.cfb. This option can be used when generating reports for ideal experiments, or for basic block counting data obtained with pixie.

cord feedback files are named program.fb or libso.fb. Compiler feedback files are named progam.cfb or libso.cfb. These are binary files and may be dumped using the fbdump command.

Procedures are normally ordered by their measured invocation counts; if -gprof is also specified, procedures are ordered using call graph counts, rather than invocation counts.

–gprof

Calculates cycles and propagates basic block counting to a procedure's callers proportionately.

This option can be used when generating reports for

ideal experiments, or for basic block counting data obtained with pixie. It can also be used for fpe and usertime experiments.

–h[eavy]

Lists the most heavily used lines of source code in descending order of use, sorting lines by their frequency of use. This option can be used when generating reports for ideal, pcsamp, or prof_hwc experiments, or for basic block counting data obtained with pixie.

–i[nvocations]

Lists the number of times each procedure is invoked. This option can be used when generating reports for ideal experiments, or for basic block counting data obtained with pixie.

–l[ines]

Lists the most heavily used lines of source code in descending order of use, but lists lines grouped by procedure, sorted by cycles executed per procedure. This option can be used when generating reports for ideal, pcsamp, or prof_hwc experiments, or for basic block counting data obtained with pixie.

-nocounts

Analyzes an executable or a .o file using the pixie machine model, and assuming each instruction is executed once. This analysis cannot match any possible real run of any executable which contains one or more conditional branch instructions.

-nofilenames

Removes a.out, DSO, and source filenames from the listing; useful for scripted analysis of prof output.

–o[nly] proc1...procN

Reports information on only the procedures specified.

If you specify uppercase

–O, prof uses only the procedures, rather than the entire program, as the base upon which it calculates percentages.

–p[rocedures]

Lists the time spent in each procedure.

-pthreads
pthrd1... pthrdN

Analyzes data only for the specified pthreads (for usertime and fpe experiments on applications that use pthreads (on Irix 6.5 or later)).

–q[uit] n

Condenses output listings by truncating -p[rocedures], -h[eavy], -l[ines], and -gprof listings. You can specify n in three ways:

n, an integer, truncates everything after n lines;

n%, an integer followed by a percent sign, does not print any procedure or line with less than n in the % column;

ncum%, an integer followed by cum%, does not print any procedure or line with more than n in the cum% column. That is, it truncates the listing after the last procedure or line which brings the cumulative total to n%. If -gprof is also specified, it behaves the same as -q n%.

For example, -q 15 truncates each part of the report after 15 lines of text.
-q 15% truncates each part of the report that represents less than 15% of the whole, and -q 15cum% truncates each part of the report that has a cumulative percentage above 15%.

-r10000|-r8000
|-r5000|-r4000
|-r3000

Overrides the default processor scheduling model that prof uses to generate a report. If this option is not specified, prof uses the scheduling model for the processor on which the experiment is being run.

-S (-source)

Disassembles and annotates the analyzed object code with cycle times, or PC samples, and source code,

if you have run an

ideal, pcsamp, or prof_hwc experiment, or collected data using pixie.

–z[ero]

Lists the procedures that are never invoked. Use this option when generating reports for ideal experiments, or for basic block counting data obtained with pixie.


prof Output

prof generates a performance report that is printed to stdout. Warning and fatal errors are printed to stderr.


Note: Fortran alternate entry point times are attributed to the main function/subroutine, since there is no general way for prof to separate the times for the alternate entries.


Using prof With ssrun

When you call prof with one or more SpeedShop performance data files, it collects the data from all the output files and produces a listing depending on the experiment type. The prof command is able to detect which experiment was run and generate an appropriate report. It provides reports for all experiment types.

In cases where prof accepts more than one data file as input, it sums up the results. The multiple input data files must be generated from the same executable, using the same experiment type.

prof may report times for procedures named with a prefix of *DF*, for example *DF*_hello.init_2. DF stands for "Dummy Function," and indicates cycles spent in parts of text which are not in any function: init and fini sections, and MIPS.stubs sections, for example.

The types of reports that prof generates are described in the following sections:

usertime Experiment Reports

For usertime experiments, prof generates a list of callers and callees of each function, with information on how much time was spent in the function, its callers and its callees.

The report shows information for each function, its callers and its callees. The function names are show in the right-hand column of the report. The function that is being reported is shown outdented from its caller and callee(s). For example, the first function shown in this report is __start(), which has no callers and two callees. The remaining columns are described below:

  • The index column provides an index number for reference.

  • The %samples column shows the cumulative percentage of time spent in each function.

  • The self column shows how much time, in seconds, was spent in the function.

  • The descendents columns shows how much time, in seconds, was spent in callees of the function.

  • The total column provides information on the number of samples of the function.

This example is a truncated version of the full report. For a complete report see "Generating a Report".

-----------------------------------------------------------------------
Profile listing generated Mon Nov 18 11:43:45 1996
    with: prof generic.usertime.m24479 
-----------------------------------------------------------------------
 
Total Time (secs)     : 43.98
Total Samples         : 1466
Stack backtrace failed: 1
Sample interval (ms)  : 30
CPU                   : R4600
FPU                   : R4600
Clock                 : 100.0MHz
Number of CPUs        : 1
 
-----------------------------------------------------------------------
 
index  %Samples  self descendents  total     name
[1]     99.9%    0.00       43.95    1465    __start

pcsamp Experiment Reports

For [f]pcsamp[x] experiments, prof generates a function list annotated with the number of samples taken for the function, and the estimated time spent in the function.

  • The samples column shows how many samples of the function were taken.

  • The time(%) column shows the amount of time, and the percentage of that time over the total time that was spent in the function.

  • The cum time(%) column shows how much time has been spent up to and including the procedure being examined.

  • The procedure (dso:file) column lists the procedure, its DSO name and file name. For example, the first line reports statistics for the procedure anneal in the file generic.c in the generic executable.

    -------------------------------------------------------------------------------
    Profile listing generated Sun May 19 17:21:27 1996
        with:       prof generic.fpcsamp.m14480 
    -------------------------------------------------------------------------------
    samples   time    CPU    FPU   Clock   N-cpu  S-interval Countsize
      19077    19s  R4000  R4010 150.0MHz   1      1.0ms     2(bytes)
     
    Each sample covers 4 bytes for every  1.0ms ( 0.01% of 19.0770s)
    -------------------------------------------------------------------------------
      -p[rocedures] using pc-sampling.
      Sorted in descending order by the number of samples in each procedure.
      Unexecuted procedures are excluded.
    -------------------------------------------------------------------------------
    samples   time(%)      cum time(%)      procedure (dso:file)
     
      17794    18s( 93.3)   18s( 93.3)         anneal (/usr/demos/SpeedShop/
              generic/generic:/usr/demos/Speedshop/generic/generic.c)

Hardware Counter Experiment Reports

For the various hwc experiments, prof generates a function list annotated with the number of overflows generated by the function.

  • The overflows(%) column shows the number of overflows caused by the function, and the percentage of that number over the total number of overflows in the program.

  • The cum overflows(%) column shows a cumulative number and percentage of overflows.

  • The procedure (dso:file) column shows the procedure name and the DSO and filename that contain the procedure.

    -------------------------------------------------------------------------------
    Profile listing generated Sun May 19 17:35:21 1996
        with:       prof generic.dsc_hwc.m5999 
    -------------------------------------------------------------------------------
            Counter               : Sec cache D misses
            Counter overflow value: 131
            Total numer of ovfls  : 10
            CPU                   : R10000
            FPU                   : R10010
            Clock                 : 196.0MHz
            Number of CPUs        : 1
    -------------------------------------------------------------------------------
      -p[rocedures] using counter overflow.
      Sorted in descending order by the number of overflows in each procedure.
      Unexecuted procedures are excluded.
    -------------------------------------------------------------------------------
      overflows(%)  cum overflows(%)      procedure (dso:file)
     
          4( 40.0)          4( 40.0)         memcpy (/usr/lib64/libc.so.1:
              /work/irix/lib/libc/libc_64_M4/strings/bcopy.s)

ideal Experiment Reports

For ideal experiments, prof generates a function list annotated with the number of cycles and instructions attributed to the function, and the estimated time spent in the function.

prof does not take into account interactions between basic blocks. Within a single basic block, prof computes cycles for one execution and multiplies it with the number of times that basic block is executed.

If any of the object files linked into the application have been stripped of line-number information (with ld -x for example), prof warns about the affected procedures. The instruction counts for such procedures are shown as a procedure total, not on a per-basic-block basis. Where a line number would normally appear in a report on a function without line numbers, question marks appear instead.

  • The cycles(%) column reports the number and percentage of machine cycles used for the procedure. For example, 2524610038 cycles, or 94.81% of cycles were spent in the anneal() procedure.

  • The cum% column shows the cumulative percentage of calls. For example, 99.88% of all calls were spent between the top two functions in the listing: anneal() and slaveusrtime().

  • The secs column shows the number of seconds spent in the procedure. For example, 16.83 seconds were spent in the anneal() procedure. The time represents an idealized computation based on modelling the machine. Potential floating-point interlocks and memory latency time (cache misses and memory bus contention) are ignored.

  • The instrns column shows the number of instructions executed for a procedure. For example, there were 1797940023 instructions devoted to the anneal() procedure.

  • The calls column reports the number of calls to the procedure. For example, there was just one call to the anneal() procedure.

  • The procedure (dso:file) column lists the procedure, its DSO name and filename. For example, the first line reports statistics for the procedure anneal() in the file generic.c in the generic executable.

    Prof run at: Sun May 19 17:49:10 1996
     Command line: prof generic.ideal.m14517 
     
             2662778531: Total number of cycles
              17.75186s: Total execution time
             1875323907: Total number of instructions executed
                  1.420: Ratio of cycles / instruction
                    150: Clock rate in MHz
                  R4000: Target processor modelled
    ---------------------------------------------------------
    Procedures sorted in descending order of cycles executed.
    Unexecuted procedures are not listed. Procedures
    beginning with *DF* are dummy functions and represent
    init, fini and stub sections.
    ---------------------------------------------------------
    
            cycles(%)  cum %     secs    instrns    calls procedure(dso:file)
     
    2524610038(94.81)  94.81    16.83 1797940023     1 anneal(generic:/usr/demos/
              SpeedShop/generic/generic.c)

If the -gprof flag is added to prof, a list of callers and callees of each function is provided:

                          self             kids        called/total    parents
index     cycles(%)      self(%)          kids(%)  called+self     name     index
                          self             kids        called/total    children
 
[1] 2661528037(99.95%)    71( 0.00%) 2661527966(100.00%)   0     __start [1]
                          44         2661527913          1/1       main [2]
                           5                  0          1/1       __istart [107]
                           4                  0         1/1
__readenv_sigfpe [108]
 
 
--------------------------------------------------------------------------------
                          44         2661527913         1/1       __start [1][2] 2661527957(99.95%)        44( 0.00%) 2661527913(100.00%)  1     main [2]
                        2152         2661524760         1/1       Scriptstring[3]
                          67                934         1/1       exit [55]
--------------------------------------------------------------------------------
                        2152         2661524760         1/1        main [2]
[3] 2661526912(99.95%)  2152( 0.00%) 2661524760(100.00%)  1      Scriptstring [3]
                          40         2525080081         1/1         usrtime [4]
                          82          135044460         1/1         libdso [6]
                       68058            1148856         1/2         iofile [10]
                         124              52933         2/8         genLog [16]
                        7211              45001         1/1         dirstat [27]
                        1438              32051         1/1         linklist [31]
                         632              32051         1/1         fpetraps [32]
                         124              10922         2/19        fprintf [20]
                         696                  0        45/45        strcmp [61]
--------------------------------------------------------------------------------
                          40         2525080081         1/1       Scriptstring[3]
[4] 2525080121(94.83%)    40( 0.00%) 2525080081(100.00%)  1     usrtime [4]
                  2524610038             437992         1/1         anneal [5]
                          62              26466         1/8         genLog [16]
                          62               5461         1/19        fprintf [20]

FPE Trace Reports

The report shows information for each function. The function name is show in the right column of the report. The remaining columns are described below.

  • The index column provides an index number for reference.

  • The %FPEs column shows the percentage of the total number of floating point exceptions that were found in the function.

  • The self column shows how many floating point exceptions were found in the function. For example, 0 floating point exceptions were found in __start().

  • The descendents columns shows how many floating point exceptions were found in the descendents of the function.

  • The totals column provides information on the number of floating point exceptions out of the total that were found.

    -----------------------------------------------------------------------
    Profile listing generated Mon Nov 18 11:46:33 1996
        with:      prof generic.fpe.m18823 
    -----------------------------------------------------------------------
    Total FPEs            : 4
    Stack backtrace failed: 0
    CPU                   : R4600
    FPU                   : R4600
    Clock                 : 100.0MHz
    Number of CPUs        : 1
     
    -----------------------------------------------------------------------
    index  %FPEs     self descendents  total         name
    [1]    100.0%       0           4       4   __start

Using prof Options

This section shows the output from calling prof with some of the options available for prof.

Using the -dis Option

For pcsamp and ideal experiments, the -dis option to prof can be used to obtain machine instruction information. prof provides the standard report and then appends the machine instruction information to the end of the report. The example below shows partial output from prof, for a pcsamp experiment.

-------------------------------------------------------------------------------
Profile listing generated Tue May 27 18:04:10 1997
    with:     prof -dis generic.pcsamp.m875 
-------------------------------------------------------------------------------
 
samples   time    CPU    FPU   Clock   N-cpu  S-interval Countsize
   4142    41s  R4600  R4600 100.0MHz   1     10.0ms     2(bytes)
 
Each sample covers 4 bytes for every 10.0ms ( 0.02% of 41.4200s)
 
-------------------------------------------------------------------------------
  -p[rocedures] using pc-sampling.
  Sorted in descending order by the number of samples in each procedure.
  Unexecuted or inlined procedures are excluded.
-------------------------------------------------------------------------------
 
samples   time(%)      cum time(%)      procedure (dso:file)
 
   3975    40s( 96.0)   40s( 96.0)         anneal (generic:generic.c)
    124   1.2s(  3.0)   41s( 99.0)   slaveusrtime (./dlslave.so:dlslave.c)
     32  0.32s(  0.8)   41s( 99.7)          _read (/usr/lib32/libc.so.1:/xlv1/
          bonsai-sep09/work/irix/lib/libc/libc_n32_M3/sys/read.s)
      4  0.04s(  0.1)   41s( 99.8)         _xstat (/usr/lib32/libc.so.1:/xlv1/
          bonsai-sep09/work/irix/lib/libc/libc_n32_M3/sys/xstat.s)
      2  0.02s(  0.0)   41s( 99.9)          fread (/usr/lib32/libc.so.1:/xlv1/
          bonsai-sep09/work/irix/lib/libc/libc_n32_M3/stdio/fread.c)
      1  0.01s(  0.0)   41s( 99.9)         iofile (generic:generic.c)
      1  0.01s(  0.0)   41s( 99.9)        usrtime (generic:generic.c)
      1  0.01s(  0.0)   41s(100.0)         _write (/usr/lib32/libc.so.1:/xlv1/
          bonsai-sep09/work/irix/lib/libc/libc_n32_M3/sys/write.s
      1  0.01s(  0.0)   41s(100.0)      _morecore (/usr/lib32/libc.so.1:/xlv1/
          bonsai-sep09/work/irix/lib/libc/libc_n32_M3/gen/malloc.c)
      1  0.01s(  0.0)   41s(100.0)           next (/usr/lib32/libc.so.1:/xlv1
          /bonsai-sep09/work/irix/lib/libc/libc_n32_M3/math/drand48.c)
  
   4142    41s(100.0)   41s(100.0)          TOTAL
-------------------------------------------------------------------------------
* -dis[assemble] listing annotated pc-samples                                 *
*     Procedures with zero samples are excluded.                              *
-------------------------------------------------------------------------------
  
...
  
generic.c
anneal: <0x100065b8-0x100068c4>    3975 total samples(95.97%)
  [1514] 0x100065b8     0x27bdffd0     addiu     sp,sp,-48           # 1
  [1514] 0x100065bc     0xffbf0018     sd        ra,24(sp)           # 2
  [1514] 0x100065c0     0xffbc0020     sd        gp,32(sp)           # 3
  [1514] 0x100065c4     0x3c030002     lui       v1,0x2              # 4
  [1514] 0x100065c8     0x246399e0     addiu     v1,v1,-26144        # 5
  [1514] 0x100065cc     0x0323e021     addu      gp,t9,v1            # 6
  [1516] 0x100065d0     0xd7808040     ldc1      $f0,-32704(gp)      # 7
          <2 cycle stall for following instruction>
  [1516] 0x100065d4     0xf7a00000     sdc1      $f0,0(sp)           # 10
  [1518] 0x100065d8     0x24010001     li        at,1                # 11
  [1518] 0x100065dc     0x8f8281c0     lw        v0,-32320(gp)       # 12
          <2 cycle stall for following instruction>
  [1518] 0x100065e0     0xac410000     sw        at,0(v0)            # 15
  [1519] 0x100065e4     0x8f99819c     lw        t9,-32356(gp)       # 16
          <2 cycle stall for following instruction>
  [1519] 0x100065e8     0x0320f809     jalr      ra,t9               # 19
  [1519] 0x100065ec     0000000000     nop       # 20
          <2 cycle stall for following instruction>
  [1527] 0x100065f0     0xafa00008     sw        zero,8(sp)          # 23
  [1527] 0x100065f4     0x8fa40008     lw        a0,8(sp)            # 24
          <2 cycle stall for following instruction>
  [1527] 0x100065f8     0x28842710     slti      a0,a0,10000         # 27
  [1527] 0x100065fc     0x108000ac     beq       a0,zero,0x100068b0  # 28
  [1527] 0x10006600     0000000000     nop       # 29
          <2 cycle stall for following instruction>
  [1529] 0x10006604     0x24070001     li        a3,1                # 32
  ^------    1 samples(0.02%)------^
  [1529] 0x10006608     0xafa7000c     sw        a3,12(sp)           # 33
  [1529] 0x1000660c     0x8f8681b8     lw        a2,-32328(gp)       # 34
          <2 cycle stall for following instruction>
  [1529] 0x10006610     0x8cc60000     lw        a2,0(a2)            # 37
          <2 cycle stall for following instruction>
  [1529] 0x10006614     0x24c6ffff     addiu     a2,a2,-1            # 40
  [1529] 0x10006618     0x8fa5000c     lw        a1,12(sp)           # 41
          <2 cycle stall for following instruction>
  [1529] 0x1000661c     0x00a6282a     slt       a1,a1,a2            # 44
  ^------    1 samples(0.02%)------^
  [1529] 0x10006620     0x10a0009c     beq       a1,zero,0x10006894  # 45
  [1529] 0x10006624     0000000000     nop       # 46
          <2 cycle stall for following instruction>
  [1530] 0x10006628     0x240a0001     li        t2,1                # 49
  ^------    2 samples(0.05%)------^
  [1530] 0x1000662c     0xafaa0010     sw        t2,16(sp)           # 50
  [1530] 0x10006630     0x8f8981b8     lw        t1,-32328(gp)       # 51
  ^------    1 samples(0.02%)------^
          <2 cycle stall for following instruction>
  [1530] 0x10006634     0x8d290000     lw        t1,0(t1)            # 54
          <2 cycle stall for following instruction>
  [1530] 0x10006638     0x2529ffff     addiu     t1,t1,-1            # 57
  [1530] 0x1000663c     0x8fa80010     lw        t0,16(sp)           # 58
          <2 cycle stall for following instruction>
  [1530] 0x10006640     0x0109402a     slt       t0,t0,t1            # 61
  ^------    3 samples(0.07%)------^
  [1530] 0x10006644     0x11000089     beq       t0,zero,0x1000686c  # 62
  ^------    1 samples(0.02%)------^
  [1530] 0x10006648     0000000000     nop       # 63
          <2 cycle stall for following instruction>
  [1531] 0x1000664c     0x8fa90010     lw        t1,16(sp)           # 66
  ^------   17 samples(0.41%)------^
          <2 cycle stall for following instruction>
  [1531] 0x10006650     0x25290001     addiu     t1,t1,1             # 69
  ^------   18 samples(0.43%)------^
  [1531] 0x10006654     0x8fab000c     lw        t3,12(sp)           # 70
  ^------   15 samples(0.36%)------^
          <2 cycle stall for following instruction>
  [1531] 0x10006658     0x256b0001     addiu     t3,t3,1             # 73
  ^------   33 samples(0.80%)------^
  [1531] 0x1000665c     0x000b5080     sll       t2,t3,2             # 74
  ^------   21 samples(0.51%)------^
  [1531] 0x10006660     0x014b5021     addu      t2,t2,t3            # 75
  ^------    9 samples(0.22%)------^
  [1531] 0x10006664     0x000a50c0     sll       t2,t2,3             # 76
  ^------   15 samples(0.36%)------^
   
 

The listing shows statistics about the procedure anneal() in the file generic.c and lists the beginning and ending addresses of anneal(): <0x100065b8-0x100068c4>. The five columns display the following information:

Column...

Displays...

1

Line number of the instruction: [1514].

2

Beginning address of the instruction: 0x100065b8.

3

Instruction in hexadecimal: 0x27bdffd0.

4

Assembler form (mnemonic) of the instruction: addiu sp,sp,-48.

5

Cycle in which the instruction executed: # 1.

Other information includes:

  • The number of times the immediately preceding branch was executed and taken (ideal only).

  • The total number of cycles in a basic block and the percentage of the total cycles for that basic block, the number of times the branch terminating that basic block was executed, and the number of cycles for one execution of that basic block (ideal only).

  • The total number of samples in an instruction (pcsamp only).

  • Any cycle stalls, that is, cycles that were wasted.

Using the -S Option

For ideal experiments, the -S option to prof can be used to obtain source line information. prof provides the standard report and then appends the source line information to the end of the report.

This example shows output from calling prof for an ideal experiment:

    Prof run at: Tue May 27 18:04:51 1997
 Command line: prof -S -dis generic.ideal.m876 
 
         3900085040: Total number of cycles
          39.00085s: Total execution time
         2046668286: Total number of instructions executed
              1.906: Ratio of cycles / instruction
                100: Clock rate in MHz
              R4600: Target processor modelled
---------------------------------------------------------
Procedures sorted in descending order of cycles executed.
Unexecuted or inlined procedures are not listed. Procedures
beginning with *DF* are dummy functions and represent
init, fini and stub sections.
---------------------------------------------------------
 
        cycles(%)  cum %  secs  instrns   calls   procedure(dso:file)
 
3754320037(96.26)  96.26  37.54 1971220024   1 anneal(generic.pixie:generic.c)
 145001146( 3.72)  99.98   1.45   75000728   1 slaveusrtime(./dlslave.so.pixn32:
          dlslave.c)
    187200( 0.00)  99.99   0.00     124800  1600 next(./libc.so.1.pixn32:
          /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/math/drand48.c)
    101504( 0.00)  99.99   0.00      58124     1 init2da(generic.pixie:generic.c)
     91200( 0.00)  99.99   0.00      62400  1600 _drand48(./libc.so.1.pixn32:
          /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/math/drand48.c)
     78574( 0.00)  99.99   0.00      30063   628 __sinf(./libm.so.pixn32:
          ../../libm/fsin.c)
     64442( 0.00)  99.99   0.00      45661    48 _doprnt(./libc.so.1.pixn32:
          /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/print/doprnt.c)
     57888( 0.00) 100.00   0.00       9648    16 offtime(./libc.so.1.pixn32:
          /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/gen/time_comm.c)
     43767( 0.00) 100.00   0.00      29215   263 fread(./libc.so.1.pixn32:
          /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/stdio/fread.c)
     16484( 0.00) 100.00   0.00      12285   299 _readdir(./libc.so.1.pixn32:
          /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/gen/readdir.c)
     12376( 0.00) 100.00   0.00       7224   281 memcpy(./libc.so.1.pixn32:
          /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/strings/bcopy.s)
     10526( 0.00) 100.00   0.00       6321     1 dirstat(generic.pixie:generic.c)
      9545( 0.00) 100.00   0.00       6103     1 iofile(generic.pixie:generic.c)
      6258( 0.00) 100.00   0.00       5066   298 _stat(./libc.so.1.pixn32:
          /xlv1/bonsai-sep09/work/irix/lib/libc/libc_n32_M3/sys/stat.c)
 
-----------------------
disassembly listing    
-----------------------
 
*DF*_generic.MIPS.stubs_1
*DF*_generic.MIPS.stubs_1: <0x10001a90-0x10001db4>
    154 total cycles(0.00%) invoked 0 times, average ? cycles/invocation 
  [1] 0x10001a90     0x0006000d     break     0x6                # 1
  ^---      0 total cycles(0.00%) executed     0 times, average  1 cycles.---^
  [1] 0x10001a94     0x8f998050     lw        t9,-32688(gp)      # 1
  [1] 0x10001a98     0x03e07825     move      t7,ra              # 2
          <1 cycle stall for following instruction>
  [1] 0x10001a9c     0x0320f809     jalr      ra,t9              # 4
  [1] 0x10001aa0     0x34180029     ori       t8,zero,0x29       # 5
          <2 cycle stall for following instruction>
  ^---      7 total cycles(0.00%) executed     1 times, average  7 cycles.---^
  [1] 0x10001aa4     0x8f998050     lw        t9,-32688(gp)      # 1
  [1] 0x10001aa8     0x03e07825     move      t7,ra              # 2
          <1 cycle stall for following instruction>
  [1] 0x10001aac     0x0320f809     jalr      ra,t9              # 4
  [1] 0x10001ab0     0x3418002a     ori       t8,zero,0x2a       # 5
 
...
 
generic.c
main: <0x10001ecc-0x10002000>
    44 total cycles(0.00%) invoked 1 times, average 44 cycles/invocation 
File `generic.c':
Skipping source listing to line 87
88: void    sproctestgrandchild(void *);    /* sproc grandchild code */
89: 
90: static  struct timeval   starttime;     /* starting time--first timestamp */
91: static  struct timeval   ttime;         /* last-recorded timestamp */
92: static  struct timeval   deltatime;
93: 
94: int     pagesize;
95: 
96: main(unsigned argc, char **argv)
97: {
  [97] 0x10001ecc     0x27bdffd0     addiu     sp,sp,-48           # 1
  [97] 0x10001ed0     0xffbf0008     sd        ra,8(sp)            # 2
  [97] 0x10001ed4     0xffbc0010     sd        gp,16(sp)           # 3
  [97] 0x10001ed8     0x3c010002     lui       at,0x2              # 4
  [97] 0x10001edc     0x2421e0cc     addiu     at,at,-7988         # 5
  [97] 0x10001ee0     0x0321e021     addu      gp,t9,at            # 6
  [97] 0x10001ee4     0xafa40024     sw        a0,36(sp)           # 7
  [97] 0x10001ee8     0xafa5002c     sw        a1,44(sp)           # 8
98:      int     i;
99: 
100:      /* initialize the timestamp */
101:      (void) gettimeofday(&starttime, NULL);
  [101] 0x10001eec     0x27848360     addiu     a0,gp,-31904        # 9
  [101] 0x10001ef0     0x00002825     move      a1,zero             # 10
  [101] 0x10001ef4     0x8f99807c     lw        t9,-32644(gp)       # 11
  ^---     11 total cycles(0.00%) executed     1 times, average 11 cycles.---^
  [101] 0x10001ef8     0x0320f809     jalr      ra,t9               # 1
  [101] 0x10001efc     0000000000     nop       # 2
          <2 cycle stall for following instruction>
  ^---      4 total cycles(0.00%) executed     1 times, average  4 cycles.---^
102: 
103:      /* set up to reap any children */
104:      (void) sigset(SIGCHLD, (SIG_PF)reapSig);
  [104] 0x10001f00     0x24040012     li        a0,18               # 1
  [104] 0x10001f04     0x8f858144     lw        a1,-32444(gp)       # 2
  [104] 0x10001f08     0x8f998080     lw        t9,-32640(gp)       # 3
  ^---      3 total cycles(0.00%) executed     1 times, average  3 cycles.---^
  [104] 0x10001f0c     0x0320f809     jalr      ra,t9               # 1
  [104] 0x10001f10     0000000000     nop       # 2
          <2 cycle stall for following instruction>
  ^---      4 total cycles(0.00%) executed     1 times, average  4 cycles.---^
105: 
106:      if(argc == 1) {
  [106] 0x10001f14     0x8fa20024     lw        v0,36(sp)           # 1
  [106] 0x10001f18     0x24030001     li        v1,1                # 2
          <1 cycle stall for following instruction>
  [106] 0x10001f1c     0x1443000c     bne       v0,v1,0x10001f50    # 4
  [106] 0x10001f20     0000000000     nop       # 5
          <2 cycle stall for following instruction>
      Preceding branch executed 1 times, taken 0 times.
  ^---      7 total cycles(0.00%) executed     1 times, average  7 cycles.---^
107:           Scriptstring(DEFAULT_SCRIPT);
  [107] 0x10001f24     0x8f84805c     lw        a0,-32676(gp)       # 1
          <2 cycle stall for following instruction>
  [107] 0x10001f28     0x24847038     addiu     a0,a0,28728         # 4
  [107] 0x10001f2c     0x8f99814c     lw        t9,-32436(gp)       # 5
  ^---      5 total cycles(0.00%) executed     1 times, average  5 cycles.---^
  [107] 0x10001f30     0x0320f809     jalr       ra,t9              # 1
 

Using the -calipers Option

When you run prof on the output of an experiment in which you have recorded caliper points, you can use the -calipers option to specify the area of the program for which you want to generate a performance report. For example, if you set just one caliper point in the middle of your program, prof can provide a report from the beginning of the program up to the first caliper point using the following command:

prof -calipers 0 1

prof can also provide a report from the caliper point to the end of the program using the following command:

prof -calipers 1 2

If you set two caliper points, prof can generate a report from the first to the second caliper point:

prof -calipers 1 2

Using the -gprof Option

For ideal, usertime, and fpe experiments, the -gprof option to prof can be used to obtain inclusive basic block counting information. prof provides the standard report and then appends the inclusive function counts information to the end of the report. The example below shows partial output from prof, showing just the inclusive function counts report.

With inclusive cycle counting, prof prints a list of functions at the end, which are called but not defined. This list includes functions starting with _rld because rld is not instrumented. It also includes functions from libss; they are instrumented, but their data is normally excluded.

prof fails to list cycles of a procedure in the inclusive listing for the following reasons:

  • init & fini sections and MIPS stubs are not part of any procedure.

  • Calls to procedures that don't use a "jump and link" are not recognized as procedure calls.

  • When global procedures with the same name are executed in different DSOs, only one of them is listed.

These exceptions are listed at the end of the report.

This example shows output from calling prof for a usertime experiment:

-------------------------------------------------------------------------------
Profile listing generated Tue May 27 18:18:21 1997
    with:     prof -gprof generic.usertime.m1019 
-------------------------------------------------------------------------------
 
     Total Time (secs)     : 41.01
     Total Samples         : 1367
     Stack backtrace failed: 1
     Sample interval (ms)  : 30
     CPU                   : R4600
     FPU                   : R4600
     Clock                 : 100.0MHz
     Number of CPUs        : 1
 
 
-------------------------------------------------------------------------------
 
index  %Samples     self descendents  total        name
[1]     99.9%    0.00       40.98    1366          __start
[2]     99.9%    0.00       40.98    1366          main
[3]     99.9%    0.00       40.98    1366          Scriptstring
[4]     96.0%    0.00       39.36    1312          usrtime
[5]     96.0%   39.36        0.00    1312          anneal
[6]      3.1%    0.00        1.26      42          libdso
[7]      3.1%    0.00        1.26      42          dlslave_routine
[8]      3.1%    1.26        0.00      42          slaveusrtime
[9]      0.8%    0.00        0.33      11          iofile
[10]     0.7%    0.00        0.30      10          fread
[11]     0.7%    0.30        0.00      10          _read
[12]     0.1%    0.00        0.03       1          dirstat
[13]     0.1%    0.00        0.03       1          _stat
[14]     0.1%    0.03        0.00       1          _xstat
[15]     0.1%    0.00        0.03       1          genLog
[16]     0.1%    0.00        0.03       1          fprintf
[17]     0.1%    0.00        0.03       1          _doprnt
[18]     0.1%    0.00        0.03       1          _dowrite
[19]     0.1%    0.00        0.03       1          fwrite
[20]     0.1%    0.03        0.00       1          _write
    
    
-------------------------------------------------------------------------------
     Gprof Listing
-------------------------------------------------------------------------------
 
 
                                  caller/total        parents
index  %time     self descendents  total (self)    name
                                  callee/descend      children
 
-------------------------------------------------------------------------------
 
[1]     99.9%    0.00       40.98    1366 (0)      __start [1]
                 0.00       40.98    1366/1366         0x10001e9c main [2]
 
-------------------------------------------------------------------------------
 
                 0.00       40.98    1366/1366         0x10001e9c __start [1]
[2]     99.9%    0.00       40.98    1366 (0)      main [2]
                 0.00       40.98    1366/1366         0x10001f30 Scriptstring [3]
 
-------------------------------------------------------------------------------
 
                 0.00       40.98    1366/1366         0x10001f30 main [2]
[3]     99.9%    0.00       40.98    1366 (0)      Scriptstring [3]
                 0.00       39.36    1312/1366         0x10002378 usrtime [4]
                 0.00        1.26      42/1366         0x10002378 libdso [6]
                 0.00        0.33      11/1366         0x10002378 iofile [9]
                 0.00        0.03       1/1366         0x10002378 dirstat [12]
 
-------------------------------------------------------------------------------
 
                 0.00       39.36    1312/1312         0x10002378 Scriptstring [3]
[4]     96.0%    0.00       39.36    1312 (0)      usrtime [4]
                39.36        0.00    1312/1312         0x100059b8 anneal [5]
 
-------------------------------------------------------------------------------
 
                39.36        0.00    1312/1312         0x100059b8 usrtime [4]
[5]     96.0%   39.36        0.00    1312 (1312)   anneal [5]
 
-------------------------------------------------------------------------------
 
                 0.00        1.26      42/42           0x10002378 Scriptstring [3]
[6]      3.1%    0.00        1.26      42 (0)      libdso [6]
                 0.00        1.26      42/42           0x10003028 dlslave_routine [7]
 
-------------------------------------------------------------------------------
 
                 0.00        1.26      42/42           0x10003028 libdso [6]
[7]      3.1%    0.00        1.26      42 (0)      dlslave_routine [7]
                 1.26        0.00      42/42           0x5ffe0650 slaveusrtime [8]
 
-------------------------------------------------------------------------------
 
                 1.26        0.00      42/42           0x5ffe0650 dlslave_routine [7]
[8]      3.1%    1.26        0.00      42 (42)     slaveusrtime [8]
 
-------------------------------------------------------------------------------
 
                 0.00        0.33      11/11           0x10002378 Scriptstring [3]
[9]      0.8%    0.00        0.33      11 (0)      iofile [9]
                 0.00        0.30      10/11           0x10002ab8 fread [10]
                 0.00        0.03       1/11           0x10002a5c genLog [15]
 
-------------------------------------------------------------------------------
 
                 0.00        0.30      10/10           0x10002ab8 iofile [9]
[10]     0.7%    0.00        0.30      10 (0)      fread [10]
                 0.30        0.00      10/10           0xfad26e0 _read [11]
 
-------------------------------------------------------------------------------
 
                 0.30        0.00      10/10           0xfad26e0 fread [10]
[11]     0.7%    0.30        0.00      10 (10)     _read [11]
 
-------------------------------------------------------------------------------
 
                 0.00        0.03       1/1            0x10002378 Scriptstring [3]
[12]     0.1%    0.00        0.03       1 (0)      dirstat [12]
                 0.00        0.03       1/1            0x10002820 _stat [13]
 
-------------------------------------------------------------------------------
 
                 0.00        0.03       1/1            0x10002820 dirstat [12]
[13]     0.1%    0.00        0.03       1 (0)      _stat [13]
                 0.03        0.00       1/1            0xfaf8c10 _xstat [14]
 
-------------------------------------------------------------------------------
 
                 0.03        0.00       1/1            0xfaf8c10 _stat [13]
[14]     0.1%    0.03        0.00       1 (1)      _xstat [14]
 
-------------------------------------------------------------------------------
 
                 0.00        0.03       1/1            0x10002a5c iofile [9]
[15]     0.1%    0.00        0.03       1 (0)      genLog [15]
                 0.00        0.03       1/1            0x10006bc4 fprintf [16]
 
-------------------------------------------------------------------------------
 
                 0.00        0.03       1/1            0x10006bc4 genLog [15]
[16]     0.1%    0.00        0.03       1 (0)      fprintf [16]
                 0.00        0.03       1/1            0xfab55ec _doprnt [17]
 
-------------------------------------------------------------------------------
 
                 0.00        0.03       1/1            0xfab55ec fprintf [16]
[17]     0.1%    0.00        0.03       1 (0)      _doprnt [17]
                 0.00        0.03       1/1            0xfab215c _dowrite [18]
 
-------------------------------------------------------------------------------
 
                 0.00        0.03       1/1            0xfab215c _doprnt [17]
[18]     0.1%    0.00        0.03       1 (0)      _dowrite [18]
                 0.00        0.03       1/1            0xfab1ddc fwrite [19]
 
-------------------------------------------------------------------------------
 
                 0.00        0.03       1/1            0xfab1ddc _dowrite [18]
[19]     0.1%    0.00        0.03       1 (0)      fwrite [19]
                 0.03        0.00       1/1            0xfad30f8 _write [20]
 
-------------------------------------------------------------------------------
 
                 0.03        0.00       1/1            0xfad30f8 fwrite [19]
[20]     0.1%    0.03        0.00       1 (1)      _write [20]
 

Generating Reports for Different Machine Types

If you need to generate a report for a machine model that is different from the one on which the experiment was performed, you can use several of the prof options to specify a machine model.

For example, if you record an ideal experiment on an R4000 processor with a clock frequency of 100 megahertz, but you want to generate a report for an R10000 processor, the prof command would be

prof -r10000 -clock 196 generic.ideal.m4561

Generating Reports for Multiprocessed Executables

You can gather data from executables that use the sproc() and sprocsp() system calls, such as those executables generated by POWER Fortran and POWER C. Prepare and run the job using the same method as for uniprocessed executables. For multiprocessed executables, each thread of execution writes its own separate data file. View these data files with prof like any other data files.

The only difference between multiprocessed and regular executables is how the data files are named. The data files are named prog_name.experiment_type.id.

The experiment ID, id, consists of one or two letters (designating the process type) and the process ID number. See Table 7-3 for letter codes and descriptions.This naming convention avoids the potential conflict of multiple threads attempting to write simultaneously to the same file.

Table 7-3. Letter Codes in Experiment ID Numbers

Letter Code

Description

m

Master process created by ssrun

p

Process created by a call to sproc()

f

Process created by a call to fork()

s

Process created by a call to system()

e

Process created by a call to exec()

fe

Process created by a call to fork() and exec()


Generating Compiler Feedback Files

If you run an ideal experiment, run prof with the -feedback option to generate a feedback file that can be used to arrange procedures more efficiently on the next recompile. You can rearrange procedures using the -fb flag to cc, or using the cord command. For more information, view the cc or cord reference page.

Interpreting Reports

If the target process was blocked for a long time as a result of an instruction, that instruction will show up as having a low or zero CPU time. On the other hand, CPU-intensive instructions will show up as having a high CPU time.

One way to sanity-check inclusive cycle counts is to look at the percentage cycles for __start(). If the value is anything less than 98 -99%, the inclusive report is suspect. Look for other warnings that prof didn't take into account certain procedures.