Appendix C. Application Tuning

You can often increase system performance by tuning your applications to more closely follow your system's resource limits. If you are concerned about a decrease in your system's performance, first check your application software to see if it is making the best use of the operating system. If you are using an application of your own manufacture, you can take steps to improve performance. Even if a commercially purchased application is degrading system performance, you can identify the problem and use that information to make decisions about system tuning or new hardware, or even simply when and how to use the application. The following sections explain how to examine and tune applications. For more detailed information on application tuning, see the MIPSpro Compiling and Performance Tuning Guide .

Checking Application Performance with timex

If your system seems slow, for example, an application runs slowly, first check the application. Poorly designed applications can perpetuate poor system performance. Conversely, an efficiently written application means reduced code size and execution time.

A good utility to use to try to determine the source of the problem is the timex utility. timex reports how a particular application is using its CPU processing time. The format is:

timex -s program

This shows program's real (actual elapsed time), user (time process took executing its own code), and sys (time of kernel services for system calls) time. For example:

timex -s ps -el

The above command executes the ps -el command and then displays that program's time spent as:

real 0.95
user 0.08
sys 0.41

Tuning an Application

There are many reasons why an application spends a majority of its time in either user or sys space. For purposes of example, suspect excessive system calls or poor locality of code.

Typically, you can only tune applications that you are developing. Applications purchased for your system cannot be tuned in this manner, although there is usually a facility to correspond with the application vendor to report poor performance.

Guidelines for Reducing High User Time

If the application is primarily spending its time in user space, the first approach to take is to tune the application to reduce its user time by using the pixie and prof commands. See the respective man pages for more information about these commands. To reduce high user time, make sure that the program does the following:

  • Makes only the necessary number of system calls. Use timex -s to find out the number of system calls/second the program is making. The key is to try to keep scall/s at a minimum. System calls are those like read and exec; they are listed in section 2 of the man pages.

  • Uses buffers of at least 4K for read and write system calls. Or use the standard I/O library routines fread and fwrite, which buffer user data.

  • Uses shared memory rather than record locking where possible. Record locking checks for a record lock for every read and write to a file. To improve performance, use shared memory and semaphores to control access to common data (see the shmop(2), semop(2), and usinit(3P) man pages).

  • Defines efficient search paths ($PATH variable). Specify the most used directory paths first, and use only the required entries, so that infrequently used directories are not searched every time.

  • Eliminates polling loops (see the select(2)) man page.

  • Eliminates busy wait (use sginap(0)).

  • Eliminates system errors. Look at /var/adm/SYSLOG, the system error log, to check for errors that the program generated, and try to eliminate them.

Guidelines for Reducing Excessive Paging

Run timex again. If the application still shows a majority of either user or sys time, suspect excessive paging due to poor “locality” of text and data. An application that has locality of code executes instructions in a localized portion of text space by using program loops and subroutines. In this case, try to reduce high user/sys time by making sure that the program does the following:

  • Groups its subroutines together. If often-used subroutines in a loaded program are mixed with seldom-used routines, the program could require more of the system's memory resources than if the routines were loaded in the order of likely use. This is because the seldom-used routines might be brought into memory as part of a page.

  • Has a working set that fits within physical memory. This minimizes the amount of paging and swapping the system must perform.

  • Has correctly ported Fortran-to-C code. Fortran arrays are structured differently from C arrays; Fortran is column major while C is row major. If you do not port the program correctly, the application will have poor data locality.

After you tune your program, run timex again. If sys time is still high, tuning the operating system may help reduce this time.

Guidelines for Improving I/O Throughput

You can do a few other things to improve the application's I/O throughput. If you are on a single-user workstation, make sure that:

  • The application gains I/O bandwidth by using more than one drive (if applicable). If an application needs to concurrently do I/O on more than one file, try to set things up so that the files are in different filesystems, preferably on different drives and ideally on different controllers.

  • The application obtains unfragmented layout of a file. Try to arrange an application so that there is only one file currently being written to the filesystem where it resides. That is, if you have several files you need to write to a filesystem, and you have the choice of writing them either one after another or concurrently, you actually get better space allocation (and consequently better I/O throughput) by writing these files singly, one after another.

  • If you are on a multiuser server, it is hard to control how other applications access the system. Use a large size I/O—16K or more. You may also be able to set up separate filesystems for different users. With high sys time output from timex, you need to monitor the operating system to determine why this time is high.

Looking at Reordering an Application

Many applications have routines that are executed over and over. You can optimize program performance by modifying these heavily used routines in the source code. The following paragraphs describe the tools that can help tune your programs.

Analyzing Program Behavior with prof

Profiling allows you to monitor program behavior during execution and determine the amount of time spent in each of the routines in the program. Profiling is of two types:

  • Program counter (PC) sampling

  • Basic block counting

PC sampling is a statistical method that interrupts the program frequently and records the value of the program counter at each interrupt. Basic block counting, on the other hand, is done by using the pixie utility to modify the program module by inserting code at the beginning of each basic block (a sequence of instructions containing no branch instructions) that counts the number of times that each block is entered. Both types of profiling are useful. The primary difference is that basic block counting is deterministic and PC sampling is statistical. To do PC sampling, compile the program with the -p option. When the resulting program is executed, it will generate output files with the PC sampling information that can then be analyzed using the prof(1) utility. prof and pixie are not shipped with the basic IRIX distribution, but are found in the optional IRIS Development Option software distribution.

Procedure C-1 describes how to do basic block counting:

Procedure C-1. Basic Block Counting

  1. Compile the program.

  2. Execute pixie on it to produce a new binary file that contains the extra instructions to do the counting.

    When the resulting program is executed, it produces output files that are then used with prof to generate reports of the number of cycles consumed by each basic block.

  3. Use the output of prof to analyze the behavior of the program and optimize the algorithms that consume the majority of the program's time.

Refer to the cc(1), f77(1), pixie(1), and prof(1) man pages for more information about the mechanics of profiling.

Reordering a Program with pixie

User program text is demand-loaded a page at a time (currently 4K). Thus, when a man is made to an instruction that is not currently in memory and mapped to the user's address space, the encompassing page of instructions is read into memory and then mapped into the user's address space. If often-used subroutines in a loaded program are mixed with seldom-used routines, the program can require more of the system's memory resources than if the routines are loaded in the order of likely use. This is because the seldom-used routines might be brought into memory as part of a page of instructions from another routine.

Tools are available to analyze the execution history of a program and rearrange the program so that the routines are loaded in most-used order (according to the recorded execution history). These tools include pixie, prof, and cc. By using these tools, you can maximize the cache hit ratio (checked by running sar -b) or minimize paging (checked by running sar -p), and effectively reduce a program's execution time. Procedure C-2 describes how to reorganize a program named fetch.

Procedure C-2. Reordering a Program

  1. Execute the pixie command, which adds profiling code to fetch:

    pixie fetch
    

    This creates an output file, fetch.pixie, and a file that contains basic block addresses, fetch.Addrs.

  2. Run fetch.pixie (created in the previous step) on a normal set or sets of data. This creates the file named fetch.Counts, which contains the basic block counts.

  3. Create a feedback file that the compiler passes to the loader. Do this by executing prof:

    prof -pixie -feedback fbfile fetch fetch.Addrs fetch.Counts
    

    This produces a feedback file named fbfile.

  4. Compile the program with the original flags and options, and add the following two options:

    -feedback fbfile
    

For more information, see the prof(1) and pixie(1) man pages.

Working around Slow Commercial Applications

You cannot usually tune commercially available applications to any great degree. If your monitoring has told you that a commercially purchased application is causing your system to run at unacceptably slow levels, you have a few options:

  • You can look for other areas to reduce system overhead and increase speed, such as reducing the system load in other areas to compensate for your application. Options such as batch processing of files and programs when system load levels permit often show a noticeable increase in performance. See “Task Scheduling with the at, batch, and cron Commands” in Chapter 2.

  • You can use the nice, renice, npri, and runon utilities to change the priority of other processes to give your application a greater share of CPU time. See “Prioritizing Processes” in Chapter 7 and “Changing the Priority of a Running Process” in Chapter 7.

  • You can undertake a general program of system performance enhancement, which can include maximizing operating system I/O through disk striping and increased swap space. See the IRIX Admin: Disks and Filesystems guide.

  • You can add memory, disk space, or even upgrade to a faster CPU.

  • You can find another application that performs the same function but that is less intensive on your system. (This is the least preferable option, of course.)