Index

64-bit address space
Selecting an ABI and ISA

adi2 example program
Program adi2

aliasing models
Understanding Aliasing Models

Amdahl's law
Understanding Parallel Speedup and Amdahl's Law
awk script for
Awk Script for Amdahl's Law Estimation
execution time given n and p
Predicting Execution Time with n CPUs
parallel fraction p
Understanding Amdahl's Law
parallel fraction p given speedup( n )
Calculating the Parallel Fraction of a Program
speedup(n ) given p
Understanding Amdahl's Law
superlinear speedup
Understanding Superlinear Speedup

application binary interface (ABI)
Selecting an ABI and ISA
64-bit
64-Bit ABI
new 32-bit
New 32-Bit ABI
old 32-bit
Old 32-Bit ABI

arithmetic error
Understanding Arithmetic Standards

array padding
Using Array Padding to Prevent Thrashing
Diagnosing and Eliminating Cache Thrashing
Using Array Padding

auto-parallelizing
Compiling Serial Code for Parallel Execution

Bentley, Jon
Bentley's Rules Updated

cache
and hardware event counter
Primary Cache Use
blocking
Understanding Cache Blocking
Controlling Cache Blocking
cache miss
Understanding Level-One and Level-Two Cache Use
coherent
Understanding Cache Coherency
Cache Coherency Events
compiler's model of
Adjusting the Optimizer's Cache Model
contention in
Diagnosing Cache Problems
correcting
Correcting Cache Contention in General
event 31 reveals
Diagnosing Cache Problems
Identifying False Sharing
diagnosing problems in
Identifying Cache Problems with Perfex and SpeedShop
Diagnosing Cache Problems
directory-based
Memory Overhead Bits
Understanding Directory-Based Coherency
false sharing of
Identifying False Sharing
L1
Level-1 Cache
Understanding Level-One and Level-Two Cache Use
Primary Cache Use
L2
Level-Two Cache
Understanding Level-One and Level-Two Cache Use
Secondary Cache Use
line size
Understanding Level-One and Level-Two Cache Use
data structure blocking for
Data Structure Augmentation
on-chip
Cache Architecture
operation of
Understanding Cache Coherency
Understanding Directory-Based Coherency
Understanding Level-One and Level-Two Cache Use
principles of use
Principles of Good Cache Use
proper use of
Principles of Good Cache Use
Using Other Cache Techniques
array padding
Using Array Padding
blocking data for
Understanding Cache Blocking
Controlling Cache Blocking
grouping related data for
Grouping Data Used at the Same Time
loop fusion for
Understanding Loop Fusion
parallel execution issues
Diagnosing Cache Problems
stride-one access for
Using Stride-One Access
transposition for
Understanding Transpositions
set-associative
Understanding Level-One and Level-Two Cache Use
thrashing in
Understanding Cache Thrashing
snoopy
Coherency Methods
thrashing
Understanding Cache Thrashing
Diagnosing and Eliminating Cache Thrashing

cache coherence
and hardware event counter
Cache Coherency Events

cache coherency
Understanding Cache Coherency

cache line
Understanding Level-One and Level-Two Cache Use

call hierarchy profile
Profiling the Call Hierarchy

compiler directive
See directive
Reader Comments

compiler feedback file
Creating a Compiler Feedback File

compiler flag
See compiler option
Reader Comments

compiler option
-32
Old 32-Bit ABI
-64
64-Bit ABI
recommended
Understanding Compiler Options
-apo
Compiling an Auto-Parallel Version of a Program
-check_bounds
Computational Differences
Using Array Padding
-clist
Reading the Transformation File
default
Understanding Compiler Options
-fb
Creating a Compiler Feedback File
Passing a Feedback File
-flist
Reading the Transformation File
for cache model
Adjusting the Optimizer's Cache Model
IEEE_arithmetic
Exploit Algebraic Identities
-INLINE
Using Manual Inlining
Using Automatic Inlining
-IPA
Requesting IPA
forcedepth
Using Automatic Inlining
inline
Using Automatic Inlining
space
Using Automatic Inlining
-LNO
Using Loop Nest Optimization
blocking
Adjusting Cache Blocking Block Sizes
fission
Controlling Fission and Fusion
gather_scatter
Understanding Gather-Scatter
ignore_pragmas
Requesting LNO
interchange=off
Using Loop Interchange
outer_unroll
Controlling Loop Unrolling
prefetch
Controlling Prefetching
vintr
Vector Intrinsics
-mips3
New 32-Bit ABI
-mips4
New 32-Bit ABI
Recommended Starting Options
-n32
New 32-Bit ABI
Recommended Starting Options
-On
Setting Optimization Level with -On
-O2
Recommended Starting Options
-O3
for SWP
Enabling Software Pipelining with -O3
-Ofast
versus -O3
Compile -O3 or -Ofast for Critical Modules
-Olimit
Using Automatic Inlining
-OPT
alias
Understanding Aliasing Models
cray_ivdep
Breaking Other Dependencies
IEEE_arithmetic
Recommended Starting Options
IEEE Conformance
IEEE_NaN_inf
IEEE Conformance
liberal_ivdep
Breaking Other Dependencies
reorg_common
Using Array Padding
roundoff
Roundoff Control
-r10000
Standard Math Library
Setting Target System with -TARG
-r5000
Standard Math Library
Setting Target System with -TARG
-r8000
Standard Math Library
Setting Target System with -TARG
roundoffWhen
Exploit Algebraic Identities
-S
Reading Software Pipelining Messages
-static
Uninitialized Variables
-TARG
Setting Target System with -TARG
-TENV
Profiling Exception Frequency
X
Controlling the Level of Speculation

copying
to reduce TLB thrashing
Using Copying to Circumvent TLB Thrashing

correctness
Getting the Right Answers

CPU
See MIPS CPU
Reader Comments

CrayLink
Hub and NUMAlink

data distribution
Using Data Distribution Directives
and dplace
Using _DSM_VERBOSE
directives for
Understanding Directive Syntax
Distribute directive
Using Distribute for Loop Parallelization
mapping types
Understanding Distribution Mapping Options
ONTO clause
Understanding the ONTO Clause
page placement
Using the Page_Place Directive for Custom Mappings
redistribution
Understanding the Redistribution Directives
reshaped
Using Reshaped Distribution Directives
restrictions
Restrictions of Reshaped Distribution

data placement
Scalability and Data Placement
for libmp programs
Tuning Data Placement for MP Library Programs
modifying code for
Modifying the Code to Tune Data Placement

DAXPY
Understanding Software Pipelining
and alias model
Understanding Aliasing Models
loop fusion of
Understanding Loop Fusion
with indirection
Breaking Other Dependencies

debugging
possible with -O2
Start with -O2 for All Modules
use -O0 for
Use -O0 for Debugging

dependency
Breaking Other Dependencies

directive
blocking size
Adjusting Cache Blocking Block Sizes
for data distribution
Fortran Source with Directives
Using Data Distribution Directives
Distribute
Using Distribute for Loop Parallelization
page place
Using the Page_Place Directive for Custom Mappings
syntax
Understanding Directive Syntax
for loop interchange
Using Loop Interchange
for loop nest optimizer
Requesting LNO
for loop unrolling
Controlling Loop Unrolling
for parallel execution
Fortran Source with Directives
affinity clause
Using Parallel Do with Distributed Data
Understanding the AFFINITY Clause for Data
Understanding the AFFINITY Clause for Threads
nest clause
Understanding the NEST Clause
for prefetching
Controlling Prefetching
ivdep
Breaking Other Dependencies
OpenMP
Fortran Source with Directives

dlook
Applying dlook

dplace
Non-MP Library Programs and Dplace
disables data distributiondirectives
Using _DSM_VERBOSE
enable migration with
Enabling Page Migration
library interface to
Using the dplace Library for Dynamic Placement
not for use with libmp
Non-MP Library Programs and Dplace
placement file
Placement File Syntax
distribute statement
Assigning Threads to Memories
memories statement
Using the memories Statement
threads statement
Using the threads Statement
set page size with
Changing the Page Size
Using Larger Page Sizes to Reduce TLB Misses
specify topology with
Specifying the Topology
with MPI
Using dplace with MPI 3.1

dprof
Applying dprof

dynamic page migration
Dynamic Page Migration
Enabling Page Migration
Trying Dynamic Page Migration
administration
Trying Dynamic Page Migration
enabling
Trying Dynamic Page Migration

environment variable
_DSM_MIGRATION
Trying Dynamic Page Migration
Experimenting with Migration Levels
_DSM_PPM
Advanced Options
_DSM_ROUND_ROBIN
Trying Round-Robin Placement
_DSM_VERBOSE
Using _DSM_VERBOSE
for SpeedShop
Identifying False Sharing
in dplace placement file
Using Environment Variables in Placement Files
MP_SET_NUMTHREADS
Controlling a Parallelized Program at Run Time
MPI_DSM_OFF
Using dplace with MPI 3.1
PAGESIZE_*
Using Larger Page Sizes to Reduce TLB Misses
SGI_ABI
Specifying the ABI
SpeedShop use of
Sampling Through Other Hardware Counters
TRAP_FPE
Understanding Treatment of Underflow Exceptions

event counter
See hardware event counter
Reader Comments

exception
event counter overflow
R10000 Counter Event Types
from speculative execution
Permitting Speculative Execution
handling
Using Exception Profiling
profiling occurrence of
Using Exception Profiling
TLB miss
Understanding TLB and Virtual Memory Use
underflow
Understanding Treatment of Underflow Exceptions

exception profile
Using Exception Profiling

false sharing
Memory Contention
Identifying False Sharing

fast fourier transform (FFT)
Understanding Transpositions
data placement for
First-Touch Placement with Multiple Data Distributions

feedback file
Creating a Compiler Feedback File
use of
Passing a Feedback File

FFT
See fast fourier transform (FFT)
Reader Comments

first-touch placement
Using First-Touch Placement
Programming For First-Touch Placement

floating-point exception
See exception
Reader Comments

floating-point status register (FSR)
Understanding Treatment of Underflow Exceptions

graduated instruction
Graduated Instructions

hardware event counter
R10000 Counter Event Types
branch instructions
Branching Instructions
cache coherency
Cache Coherency Events
cache use
Primary Cache Use
clock cycles
Clock Cycles
event 21
Displaying Operation Counts
Finding and Removing Memory Access Problems
event 31
Sampling Through Other Hardware Counters
Finding and Removing Memory Access Problems
Diagnosing Cache Problems
Identifying False Sharing
event 4
Finding and Removing Memory Access Problems
instruction counts
Instructions Issued and Done
lock instructions
Lock-Handling Instructions
profiling from
Sampling through Hardware Event Counters
Sampling Through Other Hardware Counters
TLB miss
Virtual Memory Use

hardware graph
Indicating Resource Affinity

hardware trap
See exception, page fault, TLB
Reader Comments

hub
SN0 Organization
Hub and NUMAlink
cache coherency support
Understanding Directory-Based Coherency

hypercube
SN0 Organization
SN0 Memory Distribution

ideal time profile
Using Ideal Time Profiling

IEEE 754
Understanding Arithmetic Standards
versus optimization
IEEE Conformance

IEEE arithmetic
Understanding Arithmetic Standards

inlining
Understanding Inlining
automatic versus manual
Understanding Inlining
manual with -INLINE
Using Manual Inlining

instruction scheduling
Setting Target System with -TARG
Understanding Software Pipelining

instruction set architecture (ISA)
MIPS I
Old 32-Bit ABI
MIPS II
Old 32-Bit ABI
MIPS III
Old 32-Bit ABI
New 32-Bit ABI
MIPS IV
MIPS IV Instruction Set Architecture
New 32-Bit ABI

interprocedural analysis (IPA)
Exploiting Interprocedural Analysis
applied during link step
Compiling and Linking with IPA
features of
Exploiting Interprocedural Analysis
requesting
Requesting IPA

-IPA
See compiler option, -IPA
Reader Comments

IRIX
memory management in
SN0 Memory Management
porting to
Dealing with Porting Issues

lazy evaluation
Lazy Evaluation

ld
performs IPA
Compiling and Linking with IPA

library
BLAS
CHALLENGEcomplib Library
SCSL Library
CHALLENGEcomplib
Exploiting Existing Tuned Code
CHALLENGEcomplib Library
EISPACK
CHALLENGEcomplib Library
LAPACK
CHALLENGEcomplib Library
SCSL Library
libc
Standard Math Library
libfastm
Exploiting Existing Tuned Code
libfastm Library
Recommended Starting Options
libfpe
Using Exception Profiling
Understanding Treatment of Underflow Exceptions
libmp
Controlling a Parallelized Program at Run Time
conflicts with dplace
Non-MP Library Programs and Dplace
data placement with
Tuning Data Placement for MP Library Programs
page migration with
Trying Dynamic Page Migration
Experimenting with Migration Levels
page size control
Using Larger Page Sizes to Reduce TLB Misses
round-robin placement with
Trying Round-Robin Placement
LINPACK
CHALLENGEcomplib Library
SCSL
Exploiting Existing Tuned Code
SCSL Library

library routine
bzero
Initializing to Zero
calloc
Initializing to Zero
dplace_file
Using the dplace Library for Dynamic Placement
dplace_line
Using the dplace Library for Dynamic Placement
dsm_home_threadnum
Using Dynamic Placement Information
handle_sigfpes
Using Exception Profiling
sasum
Using Reshaped Distribution Directives
sscal
Using Reshaped Distribution Directives

-LNO
See loop nest optimizer (LNO) and compiler option -LNO
Reader Comments

loop fission
Using Loop Fission

loop fusion
by LNO
Using Loop Fusion
manual
Understanding Loop Fusion

loop interchange
Using Loop Interchange
disabling
Using Loop Interchange

loop nest optimizer (LNO)
Using Loop Nest Optimization
cache blocking by
Controlling Cache Blocking
controlling
Adjusting Cache Blocking Block Sizes
disable loop transformation
Requesting LNO
gather-scatter by
Understanding Gather-Scatter
loop fission by
Using Loop Fission
loop fusion by
Using Loop Fusion
loop interchange
Using Loop Interchange
loop unrolling
Using Outer Loop Unrolling
prefetching by
Prefetch Overhead and Unrolling
requesting
Requesting LNO
transformed source file
Reading the Transformation File
vector intrinsic transformation
Vector Intrinsics

loop peeling
Using Loop Fusion

loop unrolling
and roundoff
Roundoff Control
and SWP
Using Outer Loop Unrolling
by loop nest optimizer (LNO)
Using Outer Loop Unrolling
with loop interchange
Combining Loop Interchange and Loop Unrolling

makefile
example
Basic Makefile
use of
Using a Makefile

math libraries
Exploiting Existing Tuned Code
vector intrinsics
Standard Math Library

matrix multiply
loop unrolling of
Using Outer Loop Unrolling
memory use in
Understanding Cache Blocking
performance of
Understanding Cache Blocking

matrix multipy
cache blocking of
Controlling Cache Blocking

memory
64-bit addressing
Selecting an ABI and ISA
administrator setup
Using Larger Page Sizes to Reduce TLB Misses
Trying Dynamic Page Migration
bus-based
Memory for Multiprocessors
Scalability in Multiprocessors
cache directory bits
Memory Overhead Bits
contention for
Memory Contention
distributed versus shared
Shared Memory Multiprocessing
error correction bits
Memory Overhead Bits
hierarchy
Understanding the Levels of the Memory Hierarchy
latency of
SN0 Latencies and Bandwidths
Degrees of Latency
locality management
Memory Locality Management
management by IRIX
SN0 Memory Management
page fault
Understanding TLB and Virtual Memory Use
paged virtual
Understanding TLB and Virtual Memory Use
parallel execution tuning
Finding and Removing Memory Access Problems
physical address display
Page Address Routine va2pa()
placement
first-touch
Using First-Touch Placement
Programming For First-Touch Placement
round-robin
Using Round-Robin Placement
Trying Round-Robin Placement
prefetching
Understanding Prefetching
Using Prefetching
stride
Using Stride-One Access
virtual
Understanding Level-One and Level-Two Cache Use
See also page
Reader Comments

memory locality domain (MLD)
Memory Locality Management
Memory Locality Domain Use

memory locality domain set (MLDS)
Memory Locality Domain Use

Message-Passing Interface (MPI)
Message-Passing Models MPI and PVM
dplace with
Using dplace with MPI 3.1
perfex with
Using perfex with MPI

MIPS CPU
architecture of
Understanding MIPS R10000 Architecture
Understanding Prefetching
event counters in
R10000 Counter Event Types
issued versus graduated instruction
Graduated Instructions
off-chip cache
Level-Two Cache
on-chip cache
Cache Architecture
out-of-order execution
Executing Out of Order
R10000
speculative execution
Hardware Speculative Execution
underflow control
Understanding Treatment of Underflow Exceptions
R4000
Specifying the ABI
R8000
Specifying the ABI
Software Speculative Execution
Dealing with Software Pipelining Failures
underflow ignored on
Understanding Treatment of Underflow Exceptions
specify to compiler
Standard Math Library
speculative execution
Speculative Execution
superscalar features
Superscalar CPU Features
See also hardware event counter
Reader Comments

MIPS IV ISA
MIPS IV Instruction Set Architecture
and IEEE 754
IEEE Conformance
prefetch in
Understanding Prefetching

MP library
See library,libmp
Reader Comments

MPI
See Message-Passing Interface (MPI)
Reader Comments

mpirun
with perfex
Using perfex with MPI

node
SN0 Organization
SN0 Node Board
CPU in
CPUs and Memory

nonuniform memory access (NUMA)
SN0 Memory Distribution
Dealing With Nonuniform Access Time
and parallel program
Parallel Programs under NUMA
and single-threaded program
Single-Threaded Programs under NUMA

numeric error
Understanding Arithmetic Standards

OpenMP directives
Fortran Source with Directives
C pragmas for
C and C++ Source with Pragmas

-OPT
See compiler option, -OPT
Reader Comments

optimization level
Setting Optimization Level with -On

out of order execution
Executing Out of Order

packing
Packing

page
Understanding TLB and Virtual Memory Use
migration of
Dynamic Page Migration
Enabling Page Migration
Trying Dynamic Page Migration
size of
Dynamic Page Migration
Policy Modules
Single-Threaded Programs under NUMA
Using Larger Page Sizes to Reduce TLB Misses
set with dplace
Changing the Page Size
valid sizes
Using Larger Page Sizes to Reduce TLB Misses

page fault
Understanding TLB and Virtual Memory Use

parallel execution
affinity clause
Using Parallel Do with Distributed Data
Understanding the AFFINITY Clause for Data
Understanding the AFFINITY Clause for Threads
Amdahl's law
Understanding Parallel Speedup and Amdahl's Law
auto-parallizing
Compiling Serial Code for Parallel Execution
data placement for
Scalability and Data Placement
memory access tuning for
Finding and Removing Memory Access Problems
nest clause
Understanding the NEST Clause
parallel fraction p
Understanding Amdahl's Law
Ensuring That the Program Is Properly Parallelized
programming models for
Explicit Models of Parallel Computation
scalability of
Scalability in Multiprocessors
Scalability and Data Placement
topology
Specifying the Topology
tuning SN0 for
Tuning Parallel Code for SN0

perfex
Analyzing Performance with perfex
absolute event counts
Taking Absolute Counts of One or Two Events
analytic output
Getting Analytic Output with the -y Option
awk script to parse
Awk Script for Perfex Output
cache use analysis
Identifying Cache Problems with Perfex and SpeedShop
library interface
Collecting Data over Part of a Run
statistical counts
Taking Statistical Counts of All Events

performance
aphorisms about
Bentley's Rules Updated
of matrix multiply
Understanding Cache Blocking
of parallel program
Parallel Programs under NUMA
of single-threaded program
Single-Threaded Programs under NUMA

performance techniques
algebraic identities
Exploit Algebraic Identities
Exploit Algebraic Identities
array padding
Using Array Padding to Prevent Thrashing
Using Array Padding
avoiding tests
Combining Tests
cache blocking
Understanding Cache Blocking
Controlling Cache Blocking
Controlling Cache Blocking
caching
Principles of Good Cache Use
code motion
Code Motion Out of Loops
combining related functions
Combine Paired Computation
common block padding
Exploiting Interprocedural Analysis
common subexpressions
Eliminate Common Subexpressions
constant propagation
Exploiting Interprocedural Analysis
copying
Using Copying to Circumvent TLB Thrashing
coroutines
Use Coroutines
data structure augmentation
Data Structure Augmentation
dead function elimination
Exploiting Interprocedural Analysis
dead variable elimination
Exploiting Interprocedural Analysis
gather-scatter
Understanding Gather-Scatter
inlining
Exploiting Interprocedural Analysis
Collapse Procedure Hierarchies
interpreters
Interpreters
lazy evaluation
Lazy Evaluation
loop fission
Using Loop Fission
loop fusion
Understanding Loop Fusion
Using Loop Fusion
Loop Fusion
loop interchange
Using Loop Interchange
loop unrolling
Using Outer Loop Unrolling
Loop Unrolling
packing
Packing
precomputation
Store Precomputed Results
Precompute Logical Functions
prefetching
Using Prefetching
recursion elimination
Transform Recursive Procedures
short-circuiting
Short-Circuit Monotone Functions
software pipelining
Understanding Software Pipelining
speculative execution
Permitting Speculative Execution
transposition
Understanding Transpositions

policy module (PM)
Memory Locality Management
Policy Modules

Portable Virtual Machine (PVM)
Message-Passing Models MPI and PVM

POSIX threads
C Source Using POSIX Threads

pragma
See directive
Reader Comments

precomputation
Store Precomputed Results

prefetching
Understanding Prefetching
Using Prefetching
controlling
Controlling Prefetching
manual
Using Manual Prefetching
overhead of
Prefetch Overhead and Unrolling
pseudo
Using Pseudo-Prefetching

prof
default report
Displaying Profile Reports from Sampling
feedback file
Creating a Compiler Feedback File
ideal time report
Default Ideal Time Profile
line numbers off with opt
Including Line-Level Detail
option -archinfo
Displaying Operation Counts
option -butterfly
Displaying Ideal Time Call Hierarchy
option -feedback
Creating a Compiler Feedback File
Passing a Feedback File
option -heavy
Displaying Profile Reports from Sampling
Including Line-Level Detail
option -lines
Including Line-Level Detail
simplifying report
Removing Clutter from the Report

profiling
address space usage
Using Address Space Profiling
cache usage
Identifying Cache Problems with Perfex and SpeedShop
call hierarchy
Profiling the Call Hierarchy
ideal time for
Using Ideal Time Profiling
Identifying Cache Problems with Perfex and SpeedShop
opcode counts
Displaying Operation Counts
sampling for
Understanding Sample Time Bases
Identifying Cache Problems with Perfex and SpeedShop
tools for
Profiling Tools

program correctness
Getting the Right Answers

R4000
See MIPS CPU
Reader Comments

R8000
See MIPS CPU
Reader Comments

R10000
See MIPS CPU
Reader Comments

roundoff
Roundoff Control

round-robin placement
Using Round-Robin Placement
Trying Round-Robin Placement

scalability
Scalability in Multiprocessors
and bus architecture
Scalability in Multiprocessors
and data placement
Scalability and Data Placement
and shared memory
Scalability and Shared, Distributed Memory

smake
Using a Makefile

SN0
CrayLink
Hub and NUMAlink
hub
SN0 Organization
Hub and NUMAlink
Input/Output
SN0 Input/Output
latencies
SN0 Latencies and Bandwidths
node
SN0 Organization
SN0 Node Board
router
SN0 Organization
XIO
SN0 Organization
XIO Connection

SN0 architecture
Understanding SN0 Architecture
building blocks of
SN0 Organization
hypercube
SN0 Organization
SN0 Memory Distribution
nonuniform memory access (NUMA)
SN0 Memory Distribution

snoopy cache
Coherency Methods

software pipelining (SWP)
Exploiting Software Pipelining
compiler report in
script to extract
Software Pipeline Script swplist
compiler report in .s
Reading Software Pipelining Messages
Using Outer Loop Unrolling
dereferenced pointer defeats
Improving C Loops
effect of alias model
Understanding Aliasing Models
enable with -O3
Enabling Software Pipelining with -O3
failure cause
Dealing with Software Pipelining Failures
global variables defeat
Improving C Loops
loop unrolling with
Using Outer Loop Unrolling
of DAXPY loop
Pipelining the DAXPY Loop

speculative execution
Speculative Execution
Permitting Speculative Execution
hardware driven
Hardware Speculative Execution
software-driven
Software Speculative Execution

speedshop
Using SpeedShop
sample time bases
Understanding Sample Time Bases
See also prof, ssrun
Reader Comments

ssrun
exception trace
Profiling Exception Frequency
experiment types
Understanding Sample Time Bases
ideal time trace
Capturing an Ideal Time Trace
Passing a Feedback File
output filename format
Performing ssrun Experiments
shell script to run
Shell Script ssruno
usertime experiment
Displaying Usertime Call Hierarchy
using
Performing ssrun Experiments

stride
Using Stride-One Access

superlinear speedup
Understanding Superlinear Speedup

superscalar
Superscalar CPU Features

-SWP
See compiler option, -SWP
Reader Comments

swplist shell script
Reading Software Pipelining Messages

system routine
mmap
C and C++ Source Using UNIX Processes
Initializing to Zero
sproc
C and C++ Source Using UNIX Processes
sysmp
Advanced Options
syssgi
Using Dynamic Placement Information

thread
C Source Using POSIX Threads

TLB
See translate lookaside buffer (TLB)
Reader Comments

translate lookaside buffer (TLB)
Understanding TLB and Virtual Memory Use
miss
Understanding TLB and Virtual Memory Use
hardware counter
Virtual Memory Use
thrashing elimination
Diagnosing and Eliminating TLB Thrashing
copying
Using Copying to Circumvent TLB Thrashing
larger page size
Using Larger Page Sizes to Reduce TLB Misses

transposition
Understanding Transpositions

trap
See exception
Reader Comments

uninitialized variable, avoiding
Uninitialized Variables

vector intrinsic function
Standard Math Library
and LNO
Vector Intrinsics

virtual memory
Understanding TLB and Virtual Memory Use

XIO
SN0 Organization
XIO Connection

zero-fill
Initializing to Zero