Chapter 23. Statistics

This chapter describes the OpenGL Performer profiling utilities. Statistics are available on nearly every aspect of OpenGL Performer's operation and can be used to diagnose both functionality and performance problems, as well as for writing benchmarks and for load management. For more detailed information on interpreting statistics to tune the performance of your application, refer to Chapter 24, “Performance Tuning and Debugging”.

To collect most OpenGL Performer statistics, all you have to do is enable them; OpenGL Performer then collects them automatically for you in pfStats and pfFrameStats data structures (for libpr and libpf, respectively). You can query the contents of these structures from your program, or write the data to files. A libpf application can also display the contents of a pfFrameStats structure in a channel by calling pfDrawChanStats() or pfDrawFStats(). The statistics drawn for a channel are the statistics accumulated in the channel's own pfFrameStats. Such a display is not necessary for statistics collection. The pointer to the pfFrameStats structure for a channel can be obtained with pfGetChanFStats(). You can then control which statistics for the channel are being accumulated.

Most of the OpenGL Performer demo programs display some subset of these statistics. This chapter first explains some of the complex graphical displays and then discusses how to display statistics from a libpf-based application. Subsequent sections explain how to access and manipulate statistics from within an application. Topics include enabling and disabling statistics classes, printing, querying, and copying statistics data, as well as some basic examples showing common uses of statistics. At the end of this chapter is a discussion of the different statistics classes for libpr and libpf along with details of their use.

Interpreting Statistics Displays

Many types of statistics can be displayed in a channel. Most such displays consist simply of labeled numbers and are fairly self-explanatory; however, some of the displays, such as the stage timing graph, warrant further explanation.

OpenGL Performer tracks the time spent in the application, cull, and draw stages of the rendering pipeline. The basic statistics display shows a timing graph for each stage of the past several frames, as well as showing the current frame rate and load information. This profiling diagram is useful for optimizing both the database and application structure.

Figure 23-1 shows a sample stage timing graph from an OpenGL Performer demo program. It might be helpful to refer to a running example as well—by turning on a statistics display in perfly, for instance—while reading this section.

Figure 23-1. Stage Timing Statistics Display

Stage Timing Statistics Display

The statistics diagram in Figure 23-1 is the simplest of the standard statistics displays. There are several other standard display formats, each emphasizing other classes of statistics. Statistics collection, though highly optimized, can take extra time in OpenGL Performer operations. Because of this, you have a great deal of fine control over exactly what statistics are currently being collected and what statistics are being displayed. Statistics are divided into classes (separated into vertically stacked boxes in a display), and into modes within each class. The next several sections describe the classes shown in a typical statistics display.

Status Line

The top line of a standard statistics display, above the box that contains the rest of the statistics, shows the current average frame rate followed by a slash and the target frame rate. (To set a target frame rate, call pfFrameRate().) The rest of that status line indicates what frame-rate control method you are using (FLOAT, FREE, LIMIT, or LOCK—for details, see “Achieving the Frame Rate” in Chapter 5), your multiprocess model (set with pfMultiprocess()), and the average time (in milliseconds) spent in the channel draw callback. An optional part of the status line indicates the number of triangles in the scene.

Stage Timing Graph

The main part of the timing display is the stage timing graph, occupying the top portion of the statistics display. The red vertical lines (the darker ones in Figure 23-1) mark video retrace intervals, which occur at the video refresh rate of the system (commonly 60 times per second); a field is the period of time between two video retrace boundaries. The green vertical lines (the lighter ones in the figure) indicate frame boundaries. Note that frame boundaries are always on field boundaries and are an integral number of fields.

The segmented horizontal line segments in the top portion of the timing graph show the time taken by each of the OpenGL Performer pipeline stages and additional processes: i (intersection), a (application), c  (cull), d (draw), l (lpoint), db (dbase), and cx (compute) for each of the four frames shown (0 through 3). On screen, all stages belonging to a given frame are drawn in one color; different colors indicate different frames. You will notice that the application lines show a change in color. This point is where pfFrame() returned and is the start of the next application frame. At that point is a label for the stage name and the age of the frame being represented. The stages of the most recent frame, at the right of the graph, are marked a0, c0, and d0; previous frames have higher numbers (so “a-1” indicates the application stage of the immediately previous frame).

All stages performed by the same process are connected by vertical lines. If two stages are performed by different processes, they are not connected by a vertical line. In most multiprocessing modes, a stage of one frame occurs at the same time as another stage for a different frame, so that (for instance) d0 is directly below c-1 in the graph. The exception is the PFMP_CULLoDRAW model, in which the cull and draw stages for a given frame are performed in tandem; in this mode d0 is directly below c0 in the graph. (In Figure 23-1, the PFMP_APPCULLDRAW model was used and all stages are part of separate processes.)

These stage timings are helpful when choosing a process model and balancing the cull and draw tasks for a database. Furthermore, the timing graph can show you how close you are to an improvement in frame rate as you view the database.

The timing lines for each stage are broken into pieces displayed at slightly different heights and thicknesses to show the time taken by significant subtasks within each stage. Raised segments reflect time spent in user code, intermediate lines reflect time spent in OpenGL Performer code, and lowered lines reflect time waiting on other operations.

Figure 23-2 illustrates the parts of a draw-stage timing line. Note that this figure is not drawn to scale; sizes are exaggerated in order to discuss the individual parts more easily.

Figure 23-2. Conceptual Diagram of a Draw-Stage Timing Line

Conceptual Diagram of a Draw-Stage Timing Line

The following explains the potential displayed elements of each stage line in the timing graph. Note that if you run your application or perfly, you may not see some parts if the corresponding operations are not needed.

The application stage is divided into five subsegments, starting at the point where pfFrame() returns and the new application frame is beginning:

  • The time spent in the application's main loop between the pfFrame() call and the pfSync() call (highest segment in application line, drawn as a thick, bright line).

  • The time spent cleaning the scene graph from application changes during pfSync(); drawn as mid-height thick, bright line. This will also include the time for pfAppFrame(),which is called from pfSync() if not already called for the current frame by the user. pfSequences are also evaluated as part of pfAppFrame().

  • The time spent sleeping in pfSync() while waiting for the next field or frame boundary (depending on pfPhase and process model); the lowest point in the application line, drawn as a thin pale dotted line. Note that in single process with pfPhase of FREE_RUN, there will be a sleep period to wait for the swapbuffer of the draw to complete before continuing with the application since any other graphics call would effectively force such a sleep anyway and in a place where its timing effect could not be measured.

  • The time spent in the application code between calling pfSync() and calling pfFrame(); drawn as bright raised line. This is the critical path section and this line should be as small as possible or non-existent.

  • The time spent in pfFrame() cleaning the scene graph after any changes that might have been made in the previous subsegment, and then checking intersections; drawn as mid-height thick bright line. This line should typically be very small or non-existent as it is part of the critical path and implies database changes between pfSync() and pfFrame() that would be an expensive place to do such changes.

  • The time spent waiting while the cull and other downstream process copy updated information from the application and then starting the downstream stages on the now-finished frame (drawn as a low thin line). The end of this line is where pfFrame() returns and the user main application section (or post frame section) starts again.

The cull stage is divided into only two subsegments:

  • The time spent receiving updates from the application (in some multiprocessing models, this overlaps with the last subsegment of the application stage). This time is displayed for all channels even though it is only done one time. This is drawn as a lowered thick line.

  • The time spent in the channel cull callback for the given channel, including time spent in pfCull() (drawn as raised line). Note that there may be a large space between this and the update line if there are multiple channels on the same pfPipe that are processed first.

The draw stage has potentially six subsegments:

  • The time spent in the channel draw callback before the call to pfDraw() (a very short thick dark raised segment). This includes the time for your call to pfClearChan(). However, under normal circumstances, this segment should barely be visible at all. Operations taking place during this time should only be latency-critical since they are holding off the draw for the current frame.

  • The time spent by OpenGL Performer traversing the scene graph in pfDraw(); it is drawn as lowered bright thick segment. This should typically be the largest segment of the draw line.

  • The time spent in the channel draw callback after pfDraw() (another short thick dark raised segment). On InfiniteReality, if graphics pipeline timing statistics have been enabled (PFFSTATS_ENGFXPFTIMES), this line will include the time to finish the fill for this channel. Otherwise, it only includes the time for the CPU to execute and send graphics commands, and graphics pipeline processing from this channel could impact the timing of other channels.

  • The time to rendering raster light points computed by a forked lpoint process. This is drawn as a very raised bright line and if it exists will be the highest point in the draw line.

  • The last channel drawn on the pipe will include the time for the graphics pipeline to finish its drawing. Even if you have no operations after pfDraw() in your draw callback, this line for the last channel might look quite long particularly if you are very fill-limited and do not have InfiniteReality graphics pipeline statistics enabled. It is possible for rendering calls issued in the previous section to fill up the graphics FIFO and have calls issued on this section have to wait while the graphics pipeline processes the commands and FIFO drains, making the time look longer than expected. If there is no forked lpoint process, this line will be combined with the post-draw line of the last pfChannel.

  • The time spent waiting for the graphics pipeline to finish drawing the current frame, draw the channel statistics (for all channels), and make the call to swap color buffers. This is drawn as a pale dotted line. The hardware will complete the swapbuffers upon the following vertical field or frame line.

Below the stage timing lines, the average time for each stage (in milliseconds) is shown. Note that the time given for the draw stage is the same as the time shown for the draw stage on the status line above the statistics box.

Load and Stress

The lower portion of the channel statistics diagram shows the recent history of graphics load and stress management. The load measure is based on the amount of time taken to draw previous frames in the channel relative to the specified goal frame time. A wavy red horizontal line is drawn to show the last three seconds of graphics load. A pair of white horizontal lines represent the upper and lower bounds of graphics load for invoking stress management. Thus, when the red line wanders outside the boundaries set by the white lines, stress management is invoked.

Stress management causes scaling of LODs in the database to meet the target frame rate with maximum scene detail. The last three seconds of stress are shown in white while stress management is running. Thus, the channel statistics graph can be used to tune the upper and lower bounds of the hysteresis band for invoking stress management and for tuning LODs of objects in the database.

CPU Statistics

The CPU statistics keep track of system usage and require that the corresponding hardware statistics be enabled:

pfEnableStatsHw(PFSTATSHW_ENCPU);

The percentage of time CPUs spend idle, busy, in user code, waiting on the Graphics Pipeline, or on the swapping of memory is calculated. The statistics packages counts the number of the following:

  • Context switches (process and graphics)

  • System calls

  • Times the graphics FIFO is found to be full

  • Times a CPU went to sleep waiting on a full graphics FIFO

  • Graphics pipeline IOCTLs issued (by the system)

  • Swapbuffers seen

All of these statistics are computed over an elapsed period of time.


Note: Use an interval of at least one second.


PFSTATSHW_CPU_SYS
 

This mode calculates the above CPU statistics for the entire system. This mode is enabled by default.

PFSTATSHW_CPU_IND
 

This mode calculates the above CPU statistics for each individual CPU; it is much more expensive than using just the summed statistics.

CPU statistics, illustrated (with some other statistics) in Figure 23-3, give you information on system usage and load. The numbers shown correspond exactly to numbers given by osview; they are updated every update period just like other statistics (see “Setting Update Rate” for information on how to change the update rate). These numbers represent averages (per second) across all CPUs; thus, if one or more CPUs is busy with some other task, the CPU statistics shown may not accurately reflect OpenGL Performer CPU use. Note that the top line of the CPU statistics panel shows the total number of frames during the last update period and the total time elapsed during that period.

RTMon Statistics (IRIX Only)

The IRIX kernel collects timestamps using the rtmon daemon, rtmond(1). OpenGL Performer issues rtmon timestamps for all operations in the timing graph if the rtmon statistics are enabled with pfStatsClass(PFFSTATS_ENRTMON, PF_ON).

Figure 23-3. Other Statistics Classes

Other Statistics Classes

Rendering Statistics

Several other classes of statistics can be shown, each representing a different aspect of rendering performance. Some of these classes show the following:

  • Data about visible geometry, including a histogram showing the percentage of triangles in the scene that are part of strips of a given length (from 1 to 14). Quads are counted as strips of length two; independent triangles count as strips of length one. This histogram is mostly useful as a diagnostic to see how well your database is structured for drawing efficiency; if it shows too many very short strips, you may want to go back and restructure your database. (As a general rule of thumb, consider a “very short strip” to be one that is less than four triangles long but that number may vary depending on your database). To enable these statistics on a channel do the following:

    pfFStatsClass(pfGetChanFStats(chan),
        
    PFSTATS_ENGFX, 
    PFSTATS_ON);
    

  • A summary of the graphics state operations (including loading of textures) and of the number of operations that have recently been performed on the transformation stack (also part of the GFX stats class), the number of libpf nodes being drawn in several categories (including billboards, light points, and geodes), plus the number of nodes of each type evaluated in the application and cull stages. The following enables these statistics:

    pfFStatsClass(pfGetChanFStats(chan),
        PFSTATS_ENDB, PFSTATS_ON);
    

  • Cull statistics, including how many nodes and pfGeoSets are being tested, how many are accepted, and how many are rejected by the libpf culling task. plus the number of nodes of each type evaluated in the application and cull stages. The following enables these statistics:

    pfFStatsClass(pfGetChanFStats(chan),
        PFFSTATS_ENDB, PFSTATS_ON);
    

  • Graphics pipeline timing statistics showing the time spent rendering as measured by the graphics pipeline. This timing is then used internally for more accurate load management. This is supported by InfiniteReality graphics platforms. These statistics are enabled as follows:

    pfFStatsClass(pfGetChanFStats(chan),
        
    PFSTATSHW_ENGFXPIPE_TIMES, 
    PFSTATS_ON);
    

Fill Statistics

The fill statistics display indicates how many millions of pixels have been drawn since the last statistics update. (For information on setting the length of time between statistics updates, see “Setting Update Rate”). It also computes the average depth complexity of the image, which is the average number of times each pixel was touched per frame.

The depth complexity of a scene is also be displayed in the main channel. Each pixel will be colored according to how many times that pixel was written to during display, rather than according to the current rendering modes. The colors used range from dark blue (not written to at all) to bright pink (written over many times). This color scheme is used in calculating fill statistics; the coloring is done whenever you gather fill statistics even when you are not displaying the totals in your channel statistics display.

Stencil planes are used to store the number of times a pixel is written and, thus, to calculate fill statistics. If n stencil planes are available, no more than 2n writes to any given pixel will be counted. By default, the calculation of fill statistics uses three stencil planes; to change that default, call pfStatsHwAttr().

Fill statistics are part of the libpr pfStats statistics but can be enabled on both pfStats and pfFrameStats classes. To enable fill statistics, simply use the following:


pfStatsClass(statsptr,
    PFSTATSHW_ENGFXPIPE_FILL, PFSTATS_ON);

To enable fill statistics for a channel's pfFrameStats, use the following:


pfFStatsClass(pfGetChanFStats(chan),
    PFSTATSHW_ENGFXPIPE_FILL, PFSTATS_ON);

Examples of fill statistics can be found in perfly and in /usr/share/Performer/src/pguide/libpr/C/fillstats.c for IRIX and Linux and in %PFROOT%\Src\pguide\libpr\C\fillstats.c for Microsoft Windows.

Collecting and Accessing Statistics in Your Application

If you just want to bring up a statistics display in your application, you may not need to know details about the data structures used for statistics. If, however, you want to do more complicated statistics-handling (including collecting statistics without displaying them), you need more advanced information. This section provides a general overview of statistics manipulation, followed by subsections containing specific information.

If you use libpf, a wide variety of statistics-manipulation functions is available. If you use libpr, however, you must do some things on your own. For instance, you have to bind your own pfStats structure in which to accumulate statistics.

Furthermore, you cannot access some kinds of statistics except through libpf calls—for instance, you cannot get culling statistics using libpr calls. If you want full access to statistics, you must use libpf. There are, however, libpr routines that allow you to do your own cumulative totaling and averaging of collected statistics.

To create your own statistics display, enable the statistics classes you want to use and disable any modes you do not want to use. Then enable any relevant hardware, if necessary, with  pfEnableStatsHw().

To ensure the accuracy of timing with your rendering statistics, you want to flush the graphics pipeline before calling pfGetTime(). You can do this with glFinish(). These calls are expensive and should not be done more than at the start and end of drawing in the frame.

Displaying Statistics Simply

To put up a simple statistics display, all you have to do is call the function pfDrawChanStats() and pass it a pointer to the pfChannel whose statistics you want to display. The pfDrawChanStats() routine can be called from any process within the application; the statistics will be displayed in the channel specified.

If you want to display one channel's statistics in another channel, call  pfDrawFStats(); for an example of this technique, as well as the enabling and disabling of every statistics class, see the statistics programming example in the file /usr/share/Performer/src/pguide/libpf/C/stats.c for IRIX and Linux and in file %PFROOT%\Src\pguide\libpf\C\stats.c for Microsoft Windows.

By default, a statistics display shows all enabled statistics. If you want to show only a subset of the statistics you are collecting, call pfChanStatsMode() with an enabling bitmask indicating which classes are to be displayed.

Enabling and Disabling Statistics for a Channel

For efficiency, you may want to turn off statistics collection for a given channel when you are not displaying that channel's statistics. In particular, the stage timing statistics are enabled by default; so, if you are using a channel whose statistics you do not care about, you should disable statistics for that channel. To turn off statistics for a channel, use the following:

pfFStatsClass(pfGetChanFStats(chan),
    PFSTATS_ALL, PFSTATS_OFF);

Use the same function with different parameters to enable all or specific classes of statistics for a channel. You can specify which classes to enable in order to minimize statistics collection overhead.

Statistics in libpr and libpf—pfStats Versus pfFrameStats

libpf statistics accumulate into a pfFrameStats structure to later be displayed, printed, queried, or otherwise processed. The pfFrameStats structure actually contains four buffers of statistics: a buffer for the previous frame's statistics, a buffer of averaged statistics for the previous update period, a buffer of accumulated statistics for the current update period, and a buffer of statistics being accumulated for the current frame.

The pfFrameStats structure is built upon the libpr pfStats structure; so, the pfFrameStats API includes routines to duplicate the functionality of pfStats. The duplicated API exists because the routines cannot be intermixed. pfStats routines can only be used on pfStats structures and pfFrameStats routines can only be called with pfFrameStats structures. However, pfstats classes and class modes (designated with the PFSTATS_ prefix) can be enabled on a pfFrameStats structure.

The pfStats statistics classes include the system and hardware statistics for the graphics pipeline and the CPU, as well as the pixel fill statistics and rendering statistics on geometry, graphics state, and matrix transformations. Some of the libpr statistics commands, such as pfEnableStatsHw(PFSTATSHW_ENGFXPIPE_FILL), require an active graphics context and thus should only be called from the draw process. However, these commands are usually never necessary in a libpf application because the pfFrameStats operation will handle these commands automatically.

Statistics Class Structures

The pfFrameStats structure and the pfStats structure are both inherited from the pfObject structure. Thus, you can use the pfObject routines (pfCopy(), pfPrint(), pfDelete(), pfUserData(), pfGetType(), and so on) with pfStats and pfFrameStats structures. However, some pfObject routines will not support all of the semantics of a pfStats or pfFrameStats structure; so, some pfStats versions of a few of these routines take extra arguments. These routines will have a pfFrameStats version as well. In particular, pfCopyStats() and pfCopyFStats() should be used to copy pfStats and pfFrameStats structures, respectively.

Routines that have “FStats” in their names (rather than just “Stats”) expect to be passed a full pfFrameStats structure rather than a pfStats structure. The pfFrameStats API includes additional routines beyond pfStats for supporting libpf statistics. For example, pfDrawFStats() displays statistics in a channel and pfFStatsCountNode() accumulates the static database statistics for the scene graph rooted at the provided node. Additionally, pfFrameStats has special support for the multiprocessed environment of libpf and ensures that the statistics operations are all done in the correct process. All modifying of a pfFrameStats structure, including enabling and disabling of classes, printing, and copying, should all be done in the application process. pfDrawFStats() and pfDrawChanStats() can be called in either the application process or the draw process.

Statistics Rules of Use

Enabling and disabling of statistics and setting of modes and attributes on a statistics structure should always be done in the application process; the settings will automatically be passed down the process pipeline. To enable classes of statistics on a pfFrameStats, call pfFStatsClass() and provide a statistics structure, a bitmask of statistics-enabling tokens (tokens with “STATS_EN” in their names) for the desired classes, and the token PFSTATS_ON. Obtain the statistics structure from the desired channel by calling pfGetChanFStats() as follows:

pfFStatsClass(pfGetChanFStats(chan), PFFSTATS_ENCULL |
    PFFSTATS_ENDB, PFSTATS_ON);

It enables the cull statistics and database statistics classes, leaving settings alone for any other classes. Notice that the classes specific to pfFrameStats have a PFFSTATS_ prefix. You can use PFSTATS_SET instead of PFSTATS_ON to enable only the specified classes (disabling any others that might already be enabled).

Statistics Tokens

There are five main types of statistics tokens:

  • Statistics class-enable bitmasks, used for selecting a set of classes to enable with pfStatsClass(). Class enables and disables are specified with bitmasks. Each statistics class has an enable token: a PFSTATS_EN* token that can be ORed with other statistics enable tokens and the result passed in to enable and disable statistics operations. These bitmasks are also used when printing with pfPrint() or copying with pfCopy() and pfCopyStats() as well as with the  pfResetStats(), pfClearStats(), pfAverageStats(), and pfAccumulateStats() routines (and their pfFrameStats counterparts). These tokens are of the form PFSTATS_EN* and PFFSTATS_EN* for pfStats and pfFrameStats class, respectively. The PFSTATS_ALL token selects all statistics classes and also all statistics buffers in the case of a pfFrameStats structure. The token PFSTATS_EN_MASK selects all pfStats classes and the token PFFSTATS_EN_MASK selects all pfFrameStats statistics classes, which includes all pfStats classes.

  • Value tokens, used to specify how to set a value for a specified pfStats or pfFrameStats class enable or mode. Value tokens include PFSTATS_ON, PFSTATS_OFF, and PFSTATS_DEFAULT. Another value token, PFSTATS_SET, is used to specify that the entire class enable or mode bitmask should be set to the specified mask. These tokens are used in conjunction with the class bitmasks and the class name tokens for pfStatsClass() and pfStatsClassMode().

  • Class name tokens, used to name a specific class. For instance, these tokens can be passed to pfStatsClassMode() to set individual modes of a statistics class.

  • Class mode tokens, of which each statistics class has its own and which have the form PFSTATS_class_mode and PFFSTATS_class_mode for pfStats and pfFrameStats class modes, respectively.

  • Statistics query tokens, used with pfQueryStats(), pfMQueryStats(), pfQueryFStats(), and pfMQueryFStats(). These tokens are of the form PFSTATSVAL_* and PFFSTATSVAL_* and have matching pfStatsVal* types for holding the returned data. The token PFFSTATS_BUF_MASK selects the pfFrameStats statistics buffers.

Statistics Buffers

You can only access the PREV and CUM statistics buffers from the OpenGL Performer application process. Statistics from desired buffers in other processes should be queried in the application process and then passed down the process pipeline. You can do this using the channel data utility.

The AVG buffer is copied down the process pipeline at the end of each update period and, so, is available to by queried by other processes. The CUR statistics buffer is the current working area and contains the statistics accumulated so far from previous stages current frame; the contents of the CUR buffer is very dependent on the multiprocess configuration (but is almost always empty in the application process, so queries should access the PREV buffer). Statistics that are added to the CUR buffer by copying, accumulation, or immediate-mode collection (such as with pfStatsCountGSet() and pfFStatsCountNode()) are propagated down the process pipeline and then back up to the application process to be included in the PREV buffer.

In a libpf application, most statistics collection is completely automatic. The application must simply enable the desired classes of statistics with pfStatsClass() and/or pfStatsClassMode().

The OpenGL Performer processes are responsible for actually opening the pfFrameStats structure in which to accumulate the enabled statistics classes as well as for managing any statistics hardware resources. All types of libpf statistics can be accumulated without ever making specific calls to open a structure for accumulation or enabling statistics hardware.

When using only libpr statistics, however, one must explicitly open a pfStats structure for statistics accumulation by calling pfOpenStats().

Hardware statistics resources must also be managed by an application using only libpr statistics. Statistics function calls that have “HW” in their names, such as pfEnableStatsHw() and pfStatsHwAttr(), directly access system hardware (such as graphics hardware and CPU); be careful to make such calls only from the relevant process. pfEnableStatsHw() expects PFSTATSHW_EN* bitmask tokens. Statistics classes which have corresponding statistics hardware have a PFSTATSHW_ prefix in their token names.

In a libpf application, OpenGL Performer takes care of enabling the correct hardware modes that correspond to enabled classes of statistics. For more information about specific statistics classes, see the pfFrameStats(3pf) and pfStats(3pf) man pages.

Reducing the Cost of Statistics

Collecting and displaying statistics can have a big impact on performance. This section describes ways to reduce that impact.

Enabling Only Statistics of Interest

Each channel has its Process Frame Times (PFTIMES) statistics class enabled by default. This class maintains a short history of process frame times and averages the frame times over the default update period of two seconds.

To minimize unnecessary overhead, turn off statistics on channels when you are not using them. To turn off all statistics for a channel, call pfFStatsClass() in the application process with the statistics structure of the given channel:

pfFStatsClass(pfGetChanFStats(chan), PFSTATS_ALL,
    PFSTATS_OFF);

Each statistics class has default mode settings. The short history of process frame time is used to draw the timing graph. By default, this history consists of four frames of each OpenGL Performer process (app, cull, draw, intersections).

Maintaining this short history of statistics can be disabled by calling pfStatsClassMode() with the token PFFSTATS_PFTIMES_HIST:

pfStatsClassMode(fstats, PFFSTATS_PFTIMES,
    PFFSTATS_PFTIMES_HIST, PFSTATS_OFF);

This is useful if you are only interested in the average frame times of each task with minimal overhead and you do not need to display the timing graph. However, for most applications, the overhead incurred by keeping the timing history is not noticeable.

Controlling Update Rate

The update rate controls how often statistics are averaged and new results are made available in the AVG buffer for display or query. Change the update rate by using the following call:

pfFStatsAttr(fstats, {PFFSTATS_UPDATE_FRAMES,
    PFFSTATS_UPDATE_SECS}, val);

When the update rate is nonzero, statistics are accumulated every frame. When the update period is set to zero, no statistics accumulation or averaging is done and only statistics in the PREV and CUR buffers are maintained.

When statistics are accumulated and averaged, the averaging happens only in the application process, but accumulation is done in each OpenGL Performer process.

Statistics Output

Once you have collected some statistics, you need to be able to access and manipulate them.

Printing

To print the contents of pfStats and pfFrameStats structures, use the general pfPrint() routine. The verbosity-level parameter to pfPrint() sets the level of detail to use in printing statistics. Statistics class-enable bitmasks can be used to select a subset of statistics to print. For instance, to print only the enabled statistics, use the following:

pfPrint(stats, pfGetStatsClass(stats, PFSTATS_ALL),
    PFPRINT_VB_INFO, 0);

When printing the contents of pfFrameStats structures, you can select which buffers are to be printed: PREV, CUR, AVG, or CUM. The selected statistics from all selected buffers are printed. The following call prints the currently enabled statistics from the previous frame and from the averaged statistics buffer:

pfPrint(stats, PFFSTATS_BUF_PREV | PFFSTATS_BUF_AVG |
    pfGetStatsClass(stats, PFSTATS_ALL),
    PFPRINT_VB_INFO, 0);

Copying

You can copy entire pfStats and pfFrameStats structures with the general pfCopy() command. pfCopy() copies all of the statistics data as well as information on mode settings and which classes are enabled. The source and destination structures must be of the same type. If both statistics structures are pfFrameStats structures, then all statistics from all buffers are copied.

The pfCopyStats() and pfCopyFStats() routines copy only statistics data (not class enables or mode settings) and accept a class enable bitmask to select statistics classes for the copy, as shown in the following:

pfCopyStats(statsA, statsB, pfGetStatsClass(statsB,
    PFSTATS_ALL));

For a pfFrameStats structure, a PFFSTATS_BUF_* token can be included in the stats enable bitmask to select the buffer to be accessed. If no buffer is specified, the CUR buffer is used. The following call copies the currently enabled classes of stats to the PREV pfStats in fstats:

pfCopyFStats(fstats, stats, PFFSTATS_BUF_PREV |
    pfGetStatsClass(stats, PFSTATS_ALL));

In this case, it is an error to select more than one statistics buffer; so, PFSTATS_ALL cannot be used as the select. If you specify two pfFrameStats structures, the buffer select is used for both structures; if you select multiple buffers, then each selected statistics class from each selected buffer is copied. The pfCopyFStats() routine allows you to copy between two different buffers of two pfFrameStats structures.

This routine takes explicit specification of PFFSTATS_BUF_* selects for source and destination. Any PFFSTATS_BUF_* included with the class-enable bitmask is simply ignored, making it safe to specify PFSTATS_ALL. This routine will not accept a pfStats structure.

Querying

pfQueryStats() and pfMQueryStats() (and corresponding pfFrameStats versions) can be used to get values from a pfStats or pfFrameStats structure and into an exposed structure.

These routines are useful when you want to use specific statistics for your own custom load management or for benchmarking, and you can use them to implement your own custom statistics utility routines. pfQueryStats() and pfMQueryStats() both take a pfStats (or pfFrameStats for pfQueryFStats() and pfMQueryFStats()) and return the number of bytes written to the provided destination buffer. pfQueryStats() takes a token that specifies a single query while pfMQueryStats() expects a token buffer for multiple queries. If an error is encountered, both query routines immediately halt and return with the total number of bytes successfully written.

There are specific tokens for querying individual values or entire classes of statistics. The query tokens are of the forms PFSTATSVAL_* and PFFSTATSVAL_*, and the corresponding exposed structure names are of the form pfStatsVal* and pfFStatsVal*. Queries on pfFrameStats structures with PFFSTATSVAL_* tokens expect a PFFSTATS_BUF_* select token to be ORed with the query select. It is an error to include more than one pfFrameStats buffer select token. If no buffer select token is provided, the CUR buffer will be queried. The statistics query tokens and structures are defined in prstats.h and pfstats.h.

Customizing Displays

The standard statistics displays have several parameters hard-wired. For instance, you cannot change the colors used in such displays. If you want to use different colors, you will have to use your own display routines.

Setting Update Rate

To set the frequency at which statistics are automatically collected, use pfFStatsAttr(). See the pfFrameStats(3pf) man page for details. If you want to turn off cumulative statistics collection (and, thus, running averages) entirely, set the update rate to zero. (Note that doing this will change your statistics display; in particular, your actual frame rate will be changed and other averages will not be displayed.)

The pfStats Data Structure

The pfStats data structure contains four statistics buffers: one for current statistics, one for previous statistics, one for cumulative statistics, and one for averages .

If you are using libpf calls to have OpenGL Performer keep track of statistics for you, you should always look at the previous-stats buffer; the current-stats buffer is kept in a state of flux, and if you look at it you are likely to find meaningless numbers there.

If, on the other hand, you are using libpr and keeping track of your own statistics, the current-stats buffer does contain accurate information.

Setting Statistics Class Enables and Modes

This section contains some examples of statistics calls.

  • Set all statistics class enables on a pfStats to their default values:

    
    pfStatsClass(stats, PFSTATS_ALL, PFSTATS_DEFAULT);
    

  • Set all modes for the PFSTATS_GFX class on a pfFrameStats to their default values:

    
    pfFStatsClassMode(fstats, PFSTATS_GFX, PFSTATS_ALL,
        PFSTATS_DEFAULT);
    

    Note that pfStatsClassMode() takes a class name as its class specifier (second argument) and not a bitmask. However, you can use PFSTATS_CLASS to refer to the modes of all classes.

  • Set all modes of all pfStats classes to their default values:

    
    pfFStatsClassMode(fstats, PFSTATS_MODE, PFSTATS_ALL,
        PFSTATS_DEFAULT);
    

    For pfFrameStats classes, there is PFFSTATS_CLASS.

  • Set the entire class enable mask to all PFSTATS_ALL, effectively enabling all statistics classes:

    
    pfFStatsClass(fstats, PFFSTATS_ALL, PFSTATS_SET);
    

  • Force off all modes of the PFSTATS_GFX class of a pfStats:

    
    pfStatsClassMode(stats, PFSTATS_GFX, PFSTATS_OFF,
        PFSTATS_SET);
    

  • To track triangle strip lengths on a pfFrameStats, enable the graphics statistics class mode:

    
    pfFStatsClassMode(fstats, PFSTATS_GFX,
        PFSTATS_GFX_TSTRIP_LENGTHS, PFSTATS_ON);