About This Guide

This guide describes tuning procedures for best performance for programs that run on the SGI Origin 2000, Onyx2, and Origin 200 multiprocessor systems. The material is meant for two different uses:

This guide also contains a glossary of terms related to performance tuning and to the hardware concepts of SN0 (Scalable Node 0, the name for SGI server architecture).

Who Can Benefit from This Guide

This guide is written for experienced programmers, familiar with IRIX commands and with either the C or Fortran programming languages. The focus is on achieving the highest possible performance by exploiting the features of IRIX, the MIPS R10000, R12000, or R14000 CPU, and the SN0 architecture. Many of the figures and tables contain MIPS R10000 examples, but they are meant to apply to MIPS R12000 and R14000 architectures as well.

The material assumes that you know the basics of software engineering and that you are familiar with standard methods and data structures. If you are new to software design, to UNIX, to IRIX, or to SGI hardware, this guide will not help you learn these things.

What the Guide Contains

Chapter 1, “Understanding SN0 Architecture”, describes the features of the SN0 architecture that affect performance, in particular the cache-coherent nonuniform memory architecture (CC-NUMA).

Chapter 2, “SN0 Memory Management”, reviews general programming issues for these systems and the programming practices that lead to good (and bad) performance.

Chapter 3, “Tuning for a Single Process”, covers tuning for single-process performance in detail, showing how to take best advantage of the R10000, R12000, and R14000 CPU and cache memory, how to use the profiling tools, and how to select among the many compiler options.

Chapter 4, “Profiling and Analyzing Program Behavior”

Chapter 5, “Using Basic Compiler Optimizations”

Chapter 6, “Optimizing Cache Utilization”

Chapter 7, “Using Loop Nest Optimization”

Chapter 8, “Tuning for Parallel Processing”, discusses tuning issues for parallel programs, including points on how to avoid cache contention and how to distribute virtual memory segments to different nodes.

Appendix A, “Bentley's Rules Updated”, is a summary of the performance-tuning guidelines first published by Jon Bentley in the out-of-print classic Writing Efficient Programs, updated for the modern world of superscalar CPUs and multiprocessors.

Appendix B, “R10000 Counter Event Types ”, describes the meanings of the event counter registers in the R10000 CPU and their use for tuning.

Appendix C, “Useful Scripts and Code”, contains several longer examples and scripts mentioned in the text.

Related Documents

The material covered in this book is related to other works in the SGI library.

Related Manuals

All of the following books can be read online on the Internet from the Tech Pubs Library at http://techpubs.sgi.com/library .

Hardware Manuals

  • MIPS R10000 Microprocessor User Guide, Version 2.0, is the authoritative guide to the internal operations of the CPU chip used in SN0 systems.

  • Origin and Onyx2 Theory of Operations Manual covers the basic design of the SN0 architecture.

  • Origin and Onyx2 Programmer's Reference Manual has additional details of SN0 physical and virtual addressing and other topics.

Compiler Manuals

  • MIPSpro Compiling and Performance Tuning Guide covers compiler and linker use that is common to all the compilers, including the many optimization directives and command-line options.

  • MIPSpro 64-Bit Porting and Transition Guide discusses the problems that arise when porting from a 32-bit to a 64-bit computing environment, and has some discussion of optimization features.

  • The Fortran compilers are documented in: MIPSpro Fortran 77 Programmer's Guide and MIPSpro 7 Fortran 90 Commands and Directives Reference Manual. These books address general run-time issues, have some discussion of performance tuning, and document compiler directives, including the OpenMP directives for parallel processing.

  • MIPSpro C and C++ Pragmas covers parallelization and other directives for C programming.

Software Tool Manuals

  • SpeedShop User's Guide documents the tuning and profiling tools mentioned in this book.

  • Topics in IRIX Programming details the available models for parallel programming and documents a number of advanced programming topics.

  • Message Passing Toolkit: MPI Programmer's Manual and Message Passing Toolkit: PVM Programmer's Manual document the use of these popular libraries for parallel programming.

  • IRIX Admin: System Configuration and Operation documents the commands the system administrator uses, including the system tuning variables.

Third-Party Resources

The foundation of the SN0 multiprocessor design is explained in Scalable Shared-Memory Multiprocessing by Daniel Lenoski and Wolf-Dietrich Weber (San Francisco: Morgan Kauffman, 1995).

A good book on parallel programming is Practical Parallel Programming by Barr E. Bauer (Academic Press, 1992; ISBN 0120828103). Although it is not current for SGI compilers and SN0 hardware, it has good conceptual material.

Courses and information about parallel and distributed programming are available on the Internet. The following are some useful links:

  • The Boston University Scientific Computing and Visualization Group offers a number of useful tutorials on such topics as parallel programming in Fortran 90 and the use of MPI. The URL is http://scv.bu.edu/SCV/Tutorials .

  • The web page for the Computational Science and Engineering Graduate Option Program at the University of Illinois at Urbana-Champaign links to the lecture notes for courses on parallel computation and parallel numerical algorithms. The URL is http://www.cse.uiuc.edu .

  • The entire text of Designing and Building Parallel Programs by Ian Foster (Addison-Wesley 1995; ISBN 0-201-57594-9) is available online, with a wealth of supplementary material and links related to parallel programming. The URL is http://www.mcs.anl.gov/dbpp .

Related Man Pages

The man pages for the compilers and tools are detailed and informative. Look up the following man pages using the InfoSearch facility (under IRIX 6.5, found in the desktop Toolchest menu under Help > Man Pages). From the InfoSearch window you can print copies of these pages for study and for reference.

  • cc(1) , CC(1) , f77(1) , and f90(1) each document the operation and main option groups for one compiler. These pages are very similar because the most options are used by the common back-end and linker.

  • ipa(5) documents the -IPA option subgroup, controlling the interprocedural analysis phase of all compilers.

  • lno(5) documents the -LNO option group, controlling loop-nest optimization for all compilers.

  • opt(5) documents the -OPT option group, controlling general optimizations for all compilers.

  • math(3m) details the standard math library used by all programs. Specially tuned libraries are described in libfastm(3m) .

  • ld(1) documents the linker; rld(1) documents the runtime linker; and dso(5) documents the format of dynamic shared objects (runtime-linkable libraries).

  • The auto-parallelizing feature is documented in apo(5) .

Text Conventions

Different text fonts are used in this book to indicate different kinds of information, as shown in the following table:

Terms that are defined in the Glossary. You can click such terms to link to their definitions.

This performance problem is generically referred to as cache contention.

Names of IRIX commands and command-line options.

Compile with cc -LNO:off. Check the CPU clock rate with hinv.

Names of filesystems, paths, linkable libraries, and files.

Examine /etc/config. Devices appear in the /hw filesystem.

Names of routines, functions and procedures when used as names.

Two common library functions are printf() and wait().

User input and program statements or expressions, when they must be typed exactly as shown.

Use c$doacross mp_schedtype=simple to parallelize with the basic scheduling. Enter y when prompted.

Program and mathematical variables used as names; and variable elements of program expressions.

Applying p CPUs to a program does not result in a speedup of p times. The feedback file is written as program.n .fb.


Obtaining Publications

To obtain SGI documentation, go to the SGI Technical Publications Library at:

http://techpubs.sgi.com

Reader Comments

If you have comments about the technical accuracy, content, or organization of this document, please tell us. Be sure to include the title and document number of the manual with your comments. (Online, the document number is located in the front matter of the manual. In printed manuals, the document number is located at the bottom of each page.)

You can contact us in any of the following ways:

  • Send e-mail to the following address:

    techpubs@sgi.com
    

  • Use the Feedback option on the Technical Publications Library World Wide Web page:

    http://techpubs.sgi.com
    

  • Contact your customer service representative and ask that an incident be filed in the SGI incident tracking system.

  • Send mail to the following address:

    Technical Publications
    SGI
    1200 Crittenden Lane, M/S 3-535
    Mountain View, California 94043-1351

  • Send a fax to the attention of “Technical Publications” at +1 650 932 0801.

We value your comments and will respond to them promptly.