Chapter 10. System Performance Tuning

This chapter describes the process by which you can improve your system performance. It covers the basics of t uning the IRIX operating system for the best possible performance for your particular needs.

Your system is configured to run as fast as possible under most circumstances. However, you may find that adjusting certain parameters and operating system values may improve your total performance, or you may wish to optimize your system for some feature, such as disk access, to better make use of the graphics features or your application software.

Information provided includes the following topics:

See Appendix C, “Application Tuning”, for information on tuning applications under development.

System Performance Tuning

The standard IRIX system configuration is designed for a broad range of uses, and adjusts itself to operate efficiently under all but the most unusual and extreme conditions. The operating system controls the execution of programs in memory and the movement of programs from disk to memory and back to disk.

Procedure 10-1 outlines the basic method of system tuning:

Procedure 10-1. System Tuning Steps

  1. Monitor system performance using various utilities.

  2. Adjust specific values (for example, the maximum number of processes).

  3. Reboot the system if necessary.

  4. Test the performance of the new system to see if it is improved.

Note that performance tuning cannot expand the capabilities of a system beyond its hardware capacity. You may need to add hardware, in particular another disk or additional memory, to improve performance.

Files Used for Kernel Tuning

Table 10-1 lists the files/directories used for tuning and reconfiguring a system.

Table 10-1. Files and Directories Used for Tuning

File/Directory

Purpose

/var/sysgen/system/*

Directory containing files defining software modules

/var/sysgen/master.d

Directory containing files defining kernel switches and parameters

/var/sysgen/mtune/*

Directory containing files defining more tunable parameters

/var/sysgen/stune

File defining default parameter values

/var/sysgen/boot/*

Directory of object files

/unix

File containing kernel image

Typically you tune a parameter in one of the files located in the mtune directory (for example, the kernel file) by using the systune command. For information about systune, see the systune(1M) man page.

Overview of Kernel Tunable Parameters

Tunable parameters control characteristics of processes, files, and system activity. They set various table sizes and system thresholds to handle the expected system load. If certain system structures are too large, they waste memory space that would be used for other processes and can increase system overhead due to lengthy table searches. If they are set too low, they can cause excessive I/O, process aborts, or even a system crash, depending on the particular parameter.

This section briefly introduces some of the tunable parameters and switches. Appendix A, “IRIX Kernel Tunable Parameters” describes all parameters, gives default values, provides suggestions on when to change each parameter, and describes problems you may encounter.

Tunable parameters are specified in separate configuration files in the /var/sysgen/mtune and the /var/sysgen/master.d directories. For mtune and master.d information, see the mtune(4)and  master(4) man pages.

The default values for the tunable parameters are usually acceptable for most configurations for a single-user workstation environment. If you have a lot of memory or your environment has special needs, you may want to adjust the size of a parameter to meet those needs. Here are a few of the parameters you may want to adjust are:

nproc  

Maximum number of processes, systemwide, typically auto-configured

maxup  

Maximum number of processes per UID

rlimit_core_cur 

Maximum size of a core file

rlimit_data_cur 

Maximum amount of data space available to a process

rlimit_fsize_cur 

Maximum file size available to a process

rlimit_nofile_cur 

Maximum number of file descriptors available to a process

rlimit_rss_cur 

Maximum resident set size available to a process

rlimit_vmem_cur 

Maximum amount of mapped memory for a process

sshmseg 

Maximum number of attached shared memory segments per process

Large System Tunable Parameters

Table 10-2 lists the system tuning parameters recommended for large (64 processors or greater) systems. See Appendix A, “IRIX Kernel Tunable Parameters” for detailed descriptions of each of these parameters.


Note: These parameters are highly system-dependent. The values listed are recommended initial values. You may want to alter them for a specific system or set of applications, and then evaluate the results of the changes.

Note also that the kernel forces the value of the maxup parameter to be less than the value of the nproc parameter by 20. While setting the value of maxup to be equal to or greater than nproc will work in systune, the next time the system is rebooted or the kernel is reconfigured, the limit to maxup of nproc minus 20 will be enforced. See the systune(1M) man page for more information on systune.

Table 10-2. Large System Tunable Parameters

Parameter

Recommended Initial Value

dump_level

3

maxdmasz

0x2001

rsshogfrac

99

rlimit_stack_max

0x20000000 ll

rlimit_stack_cur

0x04000000 ll

rlimit_rss_max

0x20000000 ll

rlimit_rss_cur

0 ll

rlimit_data_max

0 ll

rlimit_data_cur

0 ll

rlimit_vmem_max

0 ll

rlimit_vmem_cur

0 ll

nbuf

2000

syssegsz

0xfe800

sshmseg

2000

shmmax

(0x4000000) 11

semmni

2000

semume

80

semopm

80

gpgshi

2000

gpgslo

1000

maxup

7980

nproc

8000

percent_totalmem_1m_pages

0

percent_totalmem_4m_pages

0

percent_totalmem_16m_pages

0

percent_totalmem_64k_pages

0

percent_totalmem_256k_pages

0


Monitoring the Operating System

Before you make any changes to your kernel parameters, learn which parameters should be changed and why. Monitoring the functions of the operating system will help you determine if changing parameters will help your performance, or if new hardware is necessary.

Receiving Kernel Messages and Adjusting Table Sizes

In rare instances, a table overflows because it is not large enough to meet the needs of the system. In this case, an error message appears on the console and in /var/adm/SYSLOG. If the console window is closed or stored, check SYSLOG periodically.

Some system calls return an error message that can indicate a number of conditions, one of which is that you need to increase the size of a parameter. Table 10-3 lists the error messages and parameters that may need adjustment. These parameters are in the /var/sysgen/master.d/kernel file.

Table 10-3. System Call Errors and Related Parameters

Message

System Call

Parameter

EAGAIN
No more processes

fork(2)

Increases nproc or swap space

ELIBMAX
linked more shared libraries than limit

exec(2)

Increases the shlbmax tunable parameter.

E2BIG
Arg list too long

shell(1),

make(1),

exec(2)

Increases the ncargs tunable parameter.

Be aware that there can be other reasons for these errors. For example, EAGAIN may appear because of insufficient virtual memory. In this case, you may need to add more swap space. For other conditions that can cause these messages, see the owner's guide appendix on error messages.

Other system calls fail and return error messages that may indicate IPC (interprocess communication) structures need adjustment. These messages and the parameters to adjust are listed in Appendix A, “IRIX Kernel Tunable Parameters”.

timex, sar, and par

Three utilities you can use to monitor system performance are timex, sar, and par. They provide very useful information about what is happening in the system.

The operating system has a number of counters that measure internal system activity. Each time an operation is performed, an associated counter is incremented. You can monitor internal system activity by reading the values of these counters.

The timex and sar utilities monitor the value of the operating system counters, and thus sample system performance. Both utilities use sadc, the sar data collector, which collects data from the operating system counters and puts it in a file in binary format. The difference is that timex takes a sample over a single span of time, while sar takes a sample at specified time intervals. The sar program also has options that allow sampling of a specific function such as CPU usage (-u option) or paging (-p option). In addition, the utilities display the data differently.

The par utility has the ability to trace system call and scheduling activity. It can be used to trace the activity of a single process, a related group of processes, or the system as a whole.

When would you use one utility over the other? If you are running a single application or a couple of programs, use timex. If you have a multiuser/multiprocessor system and/or are running many programs, use sar or par.

As in all performance tuning, be sure to run these utilities at the same time you are running an application or a benchmark, and be concerned only when figures are outside the acceptable limits over a period of time.

Using timex

The timex utility is a useful troubleshooting tool when you are running a single application. For example:

timex -s application

The -s option reports total system activity (not just that due to the application) that occurred during the execution interval of application. To redirect timex output to a file, (assuming you use the Bourne shell, ( sh(1)) enter:

timex -s application 2> file

The same command, entered using the C shell, looks like this:

timex -s application > file

Using sar

The sar utility is a useful troubleshooting tool when you are running many programs and processes and/or have a multiuser system such as a server. You can take a sample of the operating system counters over a period of time (for a day, a few days, or a week).

Depending on your needs, you can choose the way in which you examine system activity. You can monitor the system:

  • During daily operation

  • Consecutively with an interval

  • Before and after an activity under your control

  • During the execution of a command

You can set up the system so that sar automatically collects system activity data and puts it into files for you. Use the chkconfig command to turn on sar's automatic reporting feature, which generates a sar -A listing. A crontab entry instructs the system to sample the system counters every 20 minutes during working hours and every hour at other times for the current day (data is kept for the last 7 days). To enable this feature, type:

/etc/chkconfig sar on

The collected data is put in /var/adm/sa in the form sann and sarnn, where nn is the date of the report (sarnn is in ASCII format). You can use the sar(1M) command to output the results of system activity.

Using sar Consecutively with a Time Interval

You can use sar to generate consecutive reports about the current state of the system. On the command line, specify a time interval and a count. For example:

sar -u 5 8

This prints information about CPU use eight times at five-second intervals.

Using sar before and after a User-Controlled Activity

You may find it useful to take a snapshot of the system activity counters before and after running an application (or after running several applications concurrently). To take a snapshot of system activity, instruct sadc (the data collector) to dump its output into a file. Then run the application(s) either under normal system load or restricted load, and when you are ready to stop recording, take another snapshot of system activity. Then compare results to see what happened.

Following is an example of commands that samples the system counters before and after the application:

/usr/lib/sa/sadc 1 1 file

Run the application(s) or perform any work you want to monitor, then type:

/usr/lib/sa/sadc 1 1 file 
sar -f file

If file does not exist, sadc creates it. If it does exist, sadc appends data to it.

Using sar and timex during the Execution of a Command

Often you want to examine system activity during the execution of a command or set of commands. For example, to examine all system activity while running nroff(1), type:

/usr/lib/sa/sadc 1 1 sa.out 
nroff -mm file.mm > file.out 
/usr/lib/sa/sadc 1 1 sa.out 
sar -A -f sa.out 

By using timex, you can do the same thing with one line of code:

timex -s nroff -mm file.mm > file.out

Note that the timex also includes the real, user, and system time spent executing the nroff request.

There are two minor differences between timex and sar. The sar program can limit its output (such as, the -u option reports only CPU activity), while timex always prints the -A listing. Also, sar works in a variety of ways, but timex only works by executing a command—however, the command can be a shell file.

If you are interested in system activity during the execution of two or more commands running concurrently, put the commands into a shell file and run timex -s on the file. For example, suppose the file nroff.sh contained the following lines:

nroff -mm file1.mm > file1.out &
nroff -mm file2.mm > file2.out &
wait

To get a report of all system activity after both of the nroff requests (running concurrently) finish, invoke timex as follows:

timex -s nroff.sh

Using par

You can use par in much as you use sar:

  • During daily operation

  • Consecutively with an interval

  • Before and after an activity under your control

  • During the execution of a command

See the par(1) man page for specifics on usage.

Use par instead of sar when you want a finer look at a suspect or problem process. Instead of simply telling you how much total time was used while your process was executing, like timex, par breaks down the information so you can get a better idea of what parts of the process are consuming time. In particular, use the following command options:

-isSSdu 

Checks the time used by each system call and the intervening time lag.

-rQQ 

Checks process scheduling, to see if it should be run more or less frequently.

When tracing system calls, par prints a report showing all system calls made by the subject processes complete with arguments and return values. In this mode, par also reports all signals delivered to the subject processes. In schedule tracing mode, par prints a report showing all scheduling events taking place in the system during the measurement period. The report shows each time a process is put on the run queue, started on a processor, and unscheduled from a processor, including the reason that the process was unscheduled. The events include timestamps. You can set up the system so par automatically collects system activity data and puts it into files for you.

The par utility works by processing the output of the padc(1)man page. This can be done in two ways:

  • padc can be run separately and the output saved in a file to be fed to par as a separate operation.

  • padc can be invoked by par to perform the data collection and reporting in one step.

The par utility can provide different types of reports from a given set of padc data depending on the reporting options that are specified. This is a reason why it is sometimes desirable to run the data collection as a separate step.

Summary of sar, par, and timex

Now that you have learned when and how to use par, sar, and timex, you can choose one of these utilities to monitor the operating system. Then examine the output and try to determine what is causing performance degradation. Look for numbers that show large fluctuation or change over a sustained period; do not be too concerned if numbers occasionally go beyond the maximum.

The first thing to check is how the system is handling the disk I/O process. After that, check for excessive paging/swapping. Finally look at CPU use and memory allocation.

The following sections assume that the system you are tuning is active (with applications/benchmark executing).

Disk I/O Performance

The system uses disks to store data, and transfers data between the disk and memory. This input/output (I/O) process consumes a lot of system resources; so you want the operating system to be as efficient as possible when it performs I/O.

Checking Disk I/O

If you are going to run a large application or have a heavy system load, the system benefits from disk I/O tuning. Run sar -A or timex -s and look at the %busy, %rcache, %wcache, and %wio fields. To see if your disk subsystem needs tuning, check your output of sar -A against the figures in Table 10-4. (Note that in this table, the right column lists the sar option that prints only selected output, for example, output for disk usage (sar -d) or CPU activity (sar -u).)

Table 10-4 lists sar results that indicate an I/O-bound system.

Table 10-4. Indications of an I/O-Bound System

Field

Value

sar Option

%busy (% time disk is busy)

>85%

sar -d

%rcache (reads in buffer cache)

low, <85

sar -b

%wcache (writes in buffer cache)

 low, <60%

sar -b

%wio (idle CPU waiting for disk I/O)

dev. system >30
fileserver >80

sar -u

Notice that for the %wio figures (indicates the percentage of time the CPU is idle while waiting for disk I/O), there are examples of two types of systems:

  • A development system that has users who are running programs such as make. In this case, if %wio > 30, check the breakdown of %wio (sar -u). By looking at the %wfs (waiting for filesystem) and %wswp (waiting for swap), you can pinpoint exactly what the system is waiting for.

  • An NFS system that is serving NFS clients and is running as a file server. In this case, if %wio > 80, %wfs > 90, the system is disk I/O bound.

There are many other factors to consider when you tune for maximum I/O performance. You may also be able to increase performance by:

  • Using logical volumes

  • Using partitions on different disks

  • Adding hardware (a disk, controller, memory)

Logical Volumes for Improving Disk I/O

By using logical volumes, you can improve disk I/O:

  • You can increase the size of an existing filesystem without having to disturb the existing filesystem contents.

  • You can stripe filesystems across multiple disks. You may be able to obtain up to 50% improvement in your I/O throughput by creating striped volumes on disks.

Striping works best on disks that are on different controllers. Logical volumes give you more space without remaking the first filesystem. Disk striping gives you more space with increased performance potential, but you run the risk that if you lose one of the disks with striped data, you lose all the data on the filesystem, since the data is interspersed across all the disks.

Contiguous logical volumes fill up one disk, and then write to the next. Striped logical volumes write to both disks equally, spreading each file across all disks in the volume. It is impossible to recover from a bad disk if the data is striped, but it is possible if the data is in a contiguous logical volume. For information on creating a striped disk volume, see IRIX Admin: Disks and Filesystems .

Partitions and Additional Disks for Improving Disk I/O

There are obvious ways to increase your system's throughput, such as limiting the number of programs that can run at peak times, shifting processes to non-peak hours (run batch jobs at night), and shifting processes to another system. You can also set up partitions on separate disks to redistribute the disk load or add disks.

Before continuing with the discussion about partitions, look at how a program uses a disk as it executes. Table 10-5 shows various reasons why an application may need to access the disk.

Table 10-5. Disk Access of an Application

Application

Disk Access

Execute object code.

Text and data

Use swap space for data, stack.

/dev/swap

Write temporary files.

/tmp and /var/tmp

Reads/writes data files.

Data files

You can maximize I/O performance by using separate partitions on different disks for some of the disk access areas. In effect, you are spreading out the application's disk access routines, which speeds up I/O.

By default, disks are partitioned to allow access in one of two ways:

  • Two partitions: partitions 0 and 1

  • One large partition: partition 7 (encompasses the two smaller partitions)

On the system disk, partition 0 is for root and partition 1 is for swap.


Note: On older systems, disks may have three partitions: partitions 0, 1, and 6. On the system disk, partition 0 is for root, 1 is for swap, and 6 is for /usr. If there is one large partition, it encompasses the three smaller partitions.

For each additional disk, decide if you want a number of partitions or one large one and the filesystems (or swap) you want on each disk and partition. It is best to distribute filesystems in the disk partitions so that different disks are being accessed concurrently.

The configuration depends on how you use the system; so it helps to look at a few examples.

  • Consider a system that typically runs a single graphics application that often reads from a data file. The application is so large that its pages are often swapped out to the swap partition.

    In this case, it might make sense to have the application's data file on a disk separate from the swap area.

  • If after configuring the system this way, you find that it does not have enough swap space, consider either obtaining more memory, or backing up everything on the second hard disk and creating partitions to contain both a swap area and a data area.

  • Changing the size of a partition containing an existing filesystem may make any data in that filesystem inaccessible. Always do a complete and current backup (with verification) and document partition information before making a change. If you change the wrong partition, you can change it back, providing you do not run mkfs on it or overwrite it. It is recommended that you print a copy of the prtvtoc command output after you have customized your disks, so that they may be more easily restored in the event of severe disk damage.

If you have a very large application and have three disks, consider using partitions on the second and third disks for the application's executables (/bin and /usr/bin) and for data files, respectively. Next, consider a system that mostly runs as a compile-engine.

In this case, it might be best to place the /tmp directory on a disk separate from the source code being compiled. Make sure that you check and mount the filesystem before creating any files on it. (If this is not feasible, you can instruct the compiler to use a directory on another disk for temporary files. Just set the TMPDIR environment variable to the new directory for temporary files.) Now, look at a system that mainly runs many programs at the same time and does a lot of swapping.

In this case, it might be best to distribute the swap area in several partitions on different disks.

Adding Disk Hardware to Improve Disk I/O

If improved I/O performance still does not occur after you have tuned your system, you may want to consider adding more hardware: disks, controllers, or memory.

If you are going to add more hardware to your system, how do you know which disk or controller to add? You can compare hardware specifications for currently supported disks and controllers by looking up the system specifications in your hardware owner's guide. By using this information, you can choose the right disk or controller to suit your particular needs.

By balancing the most active filesystems across controllers/disks, you can speed up disk access.

Another way to reduce the number of reads and writes that go out to the disk is to add more memory. This reduces swapping and paging.

Paging and Swapping

The CPU can only reference data and execute code if the data or code are in the main memory (RAM). Because the CPU executes multiple processes, there may not be enough memory for all the processes. If you have very large programs, they may require more memory than is physically present in the system. So, processes are brought into memory in pages. If there is not enough memory, the operating system frees memory by writing pages temporarily to a secondary memory area, the swap area, on a disk.

The IRIX system overcommits real memory, loading and starting many more processes than can fit at one time into the available memory. Each process is given its own virtual section of memory, called its address space, which is theoretically large enough to contain the entire process. However, only those pages of the address space that are currently in use are actually kept in memory. These pages are called the working set. As the process needs new pages of data or code to continue running, the needed pages are read into main memory (called faulting in pages or page faults). If a page has not been used in the recent past, the operating system moves the page out of main memory and into the swap space to make room for new pages being faulted in. Pages written out can be faulted back in later. This process is called paging, and it should not be confused with the action of swapping.

Swapping is when all the pages of an inactive process are removed from memory to make room for pages belonging to active processes. The entire process is written out to the swap area on the disk and its execution effectively stops. When an inactive process becomes active again, its pages must be recovered from disk into memory before it can execute. This is called swapping in the process. On a personal workstation, swapping in is the familiar delay for disk activity, after you click on the icon of an inactive application and before its window appears.

Checking for Excessive Paging and Swapping

When the IRIX system is multiprocessing a large number of processes, the amount of this swapping and paging activity can dominate the performance of the system. You can use the sar command to detect this condition and other tools to deal with it.

Determining whether your system is overloaded with paging and swapping requires some knowledge of a baseline. You need to use sar under various conditions to determine a baseline for your specific implementation. For example, you can boot your system and run some baseline tests with a limited number of processes running, and then again during a period of light use, a period of heavy networking activity, and then especially when the load is high and you are experiencing poor performance. Recording the results in your system log book can help you in making these baseline measurements.

Table 10-6 shows indicators of excessive paging and swapping on a smaller system.

Table 10-6. Indicators of Excessive Swapping and Paging

Important Field

sar Option

vflt/s - page faults (valid page not in memory)

sar -p

bswot/s (transfers from memory to disk swap area)

sar -w

bswin/s (transfers to memory)

sar -w

%swpocc (time swap queue is occupied)

sar -q

rflt/s (page man fault)

sar -t

freemem (average pages for user processes)

sar -r

You can use the following sar options to determine if poor system performance is related to swap I/O or to other factors:

-u %wswp 

Percent of total I/O wait time owed to swap input. This measures the percentage of time during which active processes were blocked waiting for a page to be read or written. This number is not particularly meaningful unless the wio value is also high.

-p vflt/s 

Frequency with which a process accessed a page that was not in memory. Compare this number between times of good and bad performance. If the onset of poor performance is associated with a sharp increase of vflt/s, swap I/O may be a problem even if %vswp is low or 0.

-r freemem 

Unused memory pages. The paging daemon (vhand) recovers what it thinks are unused pages and returns them to this pool. When a process needs a fresh page, the page comes from this pool. If the pool is low or empty, the IRIX system often has to get a page for one process by taking a page from another process, encouraging further page faults.

-p pgswp/s 

Number of read/write data pages retrieved from the swap disk space per second.

-p pgfil/s 

Number of read-only code pages retrieved from the disk per second.

If the %vswp number is 0 or very low, and vflt/s does not increase with the onset of poor performance, the performance problem is not primarily due to swap I/O.

Fixing Swap I/O Problems

However, when swap I/O may be the cause, there are several possible actions you can take:

  • Provide more real memory. This is especially effective in personal workstations, where it is relatively economical to double the available real memory.

  • Reduce the demand for memory by running fewer processes. This can be effective when the system load is not interactive, but composed of batch programs or long-running commands. Schedule commands for low-demand hours, using cron and at. Experiment to find out whether the total execution time of a set of programs is less when they are run sequentially with low swap I/O, or concurrently with high swap I/O.

  • Make the swap input of read-only pages more effective. For example, if pages of dynamic shared objects are loaded from NFS-mounted drives over a slow network, you can make page input faster by moving all or a selection of dynamic shared objects to a local disk.

  • Make swap I/O of writable pages more effective. For example, use swap(1M) to spread swap activity across several disks or partitions. For more information on swapping to files and creating new swap areas, see “Swap Space” in Chapter 6.

  • If you have changed process or CPU-related kernel parameters (for example, nproc), consider restoring them to their former values.

  • Reduce page faults. Construct programs with “locality” in mind (see Appendix C, “Application Tuning”).

  • Consider using shared libraries when constructing applications.

  • Reduce resident set size limits with systune. See “ System Limits Parameters” in Appendix A for the names and characteristics of the appropriate parameters.

Refer to “Multiple Page Sizes” for information on dynamic tuning of page size.

CPU Activity and Memory Allocation

After looking at disk I/O and paging for performance problems, check CPU activity and memory allocation.

Checking the CPU

A CPU can execute only one process at any given instant. If the CPU becomes overloaded, processes have to wait instead of executing. You cannot change the speed of the CPU (although you may be able to upgrade to a faster CPU or add CPU boards to your system if your hardware allows it), but you can monitor CPU load and try to distribute it. Table 10-7 shows the fields to check for indications that a system is CPU bound.

Table 10-7. Indications of a CPU-Bound System

Field

Value

sar Option

%idle (% of time CPU has no work to do)

<5

sar -u

runq-sz (processes in memory waiting for CPU)

>2

sar -q

%runocc (% run queue occupied and processes not executing)

>90

sar -q

You can also use the top(1) or gr_top(1) commands to display processes having the highest CPU usage. For each process, the output lists the user, process state flags, process ID and group ID, CPU cycles used, processor currently executing the process, process priority, process size (in pages), resident set size (in pages), amount of time used by the process, and the process name. For more information, see the top(1) or gr_top(1) man page.

Increasing CPU Performance

To increase CPU performance, make the following modifications:

  • Off-load jobs to non-peak times or to another system, set efficient paths, and tune applications.

  • Eliminate polling loops (see the select(2)man page).

  • Increase the slice-size parameter (the length of a process time slice). For example, change slice-size from Hz/30 to Hz/10. However, be aware that this may slow interactive response time.

  • Upgrade to a faster CPU or add another CPU.

Checking Available Memory

“Paging and Swapping” describes what happens when you do not have enough physical (main) memory for processes. This section discusses a different problem—what happens when you do not have enough available memory (sometimes called virtual memory), which includes both physical memory and logical swap space.

The IRIX virtual memory subsystem allows programs that are larger than physical memory to execute successfully. It also allows several programs to run even if the combined memory needs of the programs exceed physical memory. It does this by storing the excess data on the swap device(s).

The allocation of swap space is done after program execution has begun. This allows programs with large a virtual address to run as long as the actual amount of virtual memory allocated does not exceed the memory and swap resources of the machine.

Usually it is evident when you run out of memory, because a message is sent to the console that begins:

Out of logical swap space...

If you see this message these are the possible causes:

  • The process has exceeded ENOMEM or UMEM.

  • There is not enough physical memory for the kernel to hold the required non-pageable data structures.

  • There is not enough logical swap space.

You can add virtual swap space to your system at any time. See “Swap Space” in Chapter 6 to add more swap space. You need to add physical swap space, though, if you see the message:

Process killed due to insufficient memory

The following system calls return EAGAIN if there is insufficient available memory: exec, fork, brk, sbrk (called by malloc), mpin, and plock. Applications should check the return status and exit gracefully with a useful message.

To check the size (in pages) of a process that is running, execute ps -el (you can also use top). The SZ:RSS field shows very large processes.

By checking this field, you can determine the amount of memory the process is using. A good strategy is to run very large processes at less busy times.

Determining the Amount of System Memory

To see the amount of main memory, use the hinv(1) command. It displays data about your system's configuration. For example:

Main memory size: 64 Mb

Maximizing Memory

To increase the amount of virtual memory, increase the amount of real memory and/or swap space. Note that most of the paging/swapping solutions are also ways to conserve available memory. These include:

  • Limiting the number of programs

  • Using shared libraries

  • Adding more memory

  • Decreasing the size of system tables

However, the most dramatic way to increase the amount of virtual memory is to add more swap space. See “Swap Space” in Chapter 6.

Operating System Tuning

The process of tuning the operating system is not difficult, but it should be approached carefully. Make complete notes of your actions in case you need to reverse your changes later on. Understand what you are going to do before you do it, and do not expect miraculous results; the IRIX system has been engineered to provide the best possible performance under all but the most extreme conditions. Software that provides a great deal of graphics manipulation or data manipulation also carries a great deal of overhead for the system, and can seriously affect the speed of an otherwise robust system. No amount of tuning can change these situations.

Operating System Tuning Procedure

To tune a system, you first monitor its performance with various system utilities as described in “Monitoring the Operating System”. Procedure 10-2 describes the steps to take when you are tuning a system.

Procedure 10-2. Tuning a System

  1. Determine the general area that needs tuning (for example, disk I/O or the CPU) and monitor system performance using utilities such as sar and osview. If you have not already done so, see “Monitoring the Operating System”.

  2. Pinpoint a specific area and monitor performance over a period of time. Look for numbers that show large fluctuation or change over a sustained period; do not be too concerned if numbers occasionally go beyond the maximum.

  3. Modify one value/characteristic at a time (for example, change a parameter, add a controller) to determine its effect. It is good practice to document any changes in a system notebook.

  4. Use the systune command to change parameter values or make the change in the master.d directory structure if the variable is not tunable through systune. Remake the kernel and reboot if necessary.

  5. Remeasure performance and compare the before and after results. Then evaluate the results (is system performance better?) and determine whether further change is needed.

Keep in mind that the tuning procedure is more an art than a science; you may need to repeat the above steps as necessary to fine tune your system. You may find that you will need to do more extensive monitoring and testing to thoroughly fine-tune your system.

Operating System Tuning: Finding Parameter Values

Before you can tune your system, you need to know the current values of the tunable parameters. To find the current value of your kernel parameters, use the systune command. This command, entered with no arguments, prints the current values of all tunable parameters on your system. For complete information on this command, see the systune(1M) man page.

Operating System Tuning: Changing Parameters and Reconfiguring the System

After determining the parameter or parameters to adjust, you must change the parameters and you may need to reconfigure the system for the changes to take effect. The systune utility tells you when you make parameter changes if you must reboot to activate those changes. Procedure 10-3 describes the steps you take to reconfigure a system.

Procedure 10-3. Reconfiguring a System

  1. Back up the system.

  2. Copy your existing kernel to unix.save.

  3. Make your changes.

  4. Reboot your system, if necessary.

Backing Up the System

Before you reconfigure the system by changing kernel parameters, it is a good idea to have a current and complete backup of the system. See IRIX Admin: Backup, Security, and Accounting .


Caution: Always back up the entire system before tuning.


Copying the Kernel

After determining the parameter you need to change (for example, you need to increase nproc because you have a large number of users), you must first back up the system and the kernel. Give the command:

cp /unix /unix.save

This command creates a safe copy of your kernel. Through the rest of this example, this is called your old saved kernel. If you make this copy, you can always go back to your old saved kernel if you are not satisfied with the results of your tuning.

Changing a Parameter

Once your backups are complete, you can execute the systune command. Note that you can present new values to systune in either hexadecimal or decimal notation. Both values are printed by systune.

An invocation of systune to increase nproc looks something like this:

systune -i
Updates will be made to running system and /unix.install
systune-> nproc
        nproc = 400 (0x190)
systune-> nproc = 500
        nproc = 400 (0x190)
        Do you really want to change nproc to 500 (0x1f4)? (y/n) y
In order for the change in parameter nproc to become effective /unix.install must be moved to /unix and the system rebooted
systune-> quit

Then reboot your system. Also, be sure to document the parameter change you made in your system log book.


Caution: When you issue the reboot command, the system overwrites the current kernel (/unix) with the kernel you have just created ( /unix.install). This is why you should always copy the current kernel to a safe place before rebooting.


Creating and Booting a New Kernel with autoconfig

The systune command creates a new kernel automatically. However, if you changed parameters without using systune, or if you have added new system hardware (such as a new CPU board on a multiprocessor system), you must use autoconfig to generate a new kernel.

The autoconfig command uses some environment variables. These variables are described in detail in the autoconfig(1M) man page. If you have any of the following variables set, you may need to unset them before running autoconfig:

  • UNIX

  • SYSGEN

  • BOOTAREA

  • SYSTEM

  • MASTERD

  • STUNEFILE

  • MTUNEDIR

  • WORKDIR

To build a new kernel after reconfiguring the system, follow the steps in Procedure 10-4:

Procedure 10-4. Building a New Kernel

  1. Become the superuser by giving the command:

    su
    

  2. Make a copy of your current kernel with the command:

    cp /unix /unix.save
    

  3. Give the command:

    /etc/autoconfig -f
    

    This command creates a new kernel and places it in the file /unix.install.

  4. Reboot your system with the command:

    reboot
    


    Caution: When you issue the reboot command, the system overwrites the current kernel (/unix) with the kernel you have just created (/unix.install). This is why you should always copy the current kernel to a safe place before rebooting.


An autoconfiguration script, found in /etc/rc2.d/S95autoconfig, runs during system startup. This script asks you if you would like to build a new kernel under the following conditions:

  • A new board has been installed for which no driver exists in the current kernel.

  • There have been changes to object files in /var/sysgen/mtune, master files in /var/sysgen/master.d, or the system files in /var/sysgen/system. This is determined by the modification dates on these files and the kernel.

If any of these conditions is true, the system prompts you during startup to reconfigure the operating system:

Automatically reconfigure the operating system? y

If you answer y to the prompt, the script runs lboot and generates /unix.install with the new image.You can disable the autoconfiguration script by renaming /etc/rc2.d/S95autoconfig to something else that does not begin with the letter S, for example, /etc/rc2.d/wasS95autoconfig.

Recovering from an Unbootable Kernel

Procedure 10-5 explains how to recover from an unbootable /unix, and describes how to get a viable version of the software running after an unsuccessful reconfiguration attempt. If you use the systune utility, you should never have to use this information, since systune does not allow you to set your parameters to unworkable values.

Procedure 10-5. Recovering from an Unbootable Kernel

  1. If the system fails to reboot, try to reboot it again. If it still fails, interrupt the boot process and direct the boot PROM to boot from your old saved kernel (unix.save).

  2. Press the Reset button.You see the System Maintenance Menu:

    System Maintenance Menu
    

    1) Start System.
    2) Install System Software.
    3) Run Diagnostics.
    4) Recover System.
    5) Enter Command Monitor.
    

  3. Choose option 5 to enter the command monitor. You see:

    Command Monitor. Type "exit" to return to the menu.
    >>
    

  4. Now at the >> prompt, tell the PROM to boot your old saved kernel. The command is:

    boot unix.save
    

    The system boots the old saved kernel.

  5. Once the system is running, use the following command to move your old saved kernel to the default /unix name. This method also keeps a copy of your old saved kernel in unix.save:

    cp /unix.save /unix
    

Then you can normally boot the system while you investigate the problem with the new kernel. Try to figure out what went wrong. What was changed that stopped the kernel from booting? Review the changes that you made.

  • Did you increase/decrease a parameter by a large amount? If so, make the change less drastic.

  • Did you change more than one parameter? If so, make a change to only one parameter at a time.

Multiple Page Sizes

The operating system supports multiple page sizes, which can be tuned as described in this section.

Recommended Page Sizes

The page sizes supported depend on the base page size of the system. The base page size can be obtained by using the getpagesize() system call. Currently, the IRIX system supports two base page sizes, 16K and 4K. On systems with 16K base page size the following tunable page sizes are supported, 16K, 64K, 256K, 1M, 4M, 16M. On systems with 4K base page size, the following tunable page sizes are supported, 4K, 16K, 256K, 1M, 4M, 16M. In general for most applications 4K, 16K, and 64K page sizes are sufficient to eliminate tlbmiss overhead.

Tunable Parameters for Coalescing

The IRIX kernel tries to keep a percentage of total free memory in the system at a certain page size. It periodically tries to coalesce a group of adjacent pages to form a large page. The following tunable parameters specify the upper limit for the number of free pages at a particular page size. Systems that do not need large pages can set these parameters to zero. The tunable parameters are:

  • percent_totalmem_16k_pages

  • percent_totalmem_64k_pages

  • percent_totalmem_256k_pages

  • percent_totalmem_1m_pages

  • percent_totalmem_4m_pages

  • percent_totalmem_16m_pages

The parameters specify the percentage of total memory that can be used as an upper limit for the number of pages in a specific page size. Thus setting percent_totalmem_64k_pages to 20 implies that the coalescing mechanism tries to limit the number of free 64K pages to 20% of total memory in the system. These parameters can be tuned dynamically at run time. Note that very large pages (>= 1 MB) are harder to coalesce dynamically during run time on a busy system. It is recommended these tunable parameters be set during boot time in such cases. Setting these tunable parameters to a high value can result in high coalescing activity. If the system runs low on memory, the large pages can be split into smaller pages as needed.

Reserving Large Pages

It is hard to coalesce very large pages (>= 1 MB) at run time due to fragmentation of physical memory. Applications that need such pages can set tunable parameters to reserve large pages during boot time. They are specified as the number of pages. The tunable parameters are:

  • nlpages_64k

  • nlpages_256k

  • nlpages_1m

  • nlpages_4m

  • nlpages_16m

Thus setting nlpages_4m to 4 results in the system reserving four 4 MB pages during boot time. If the system runs low on memory, the reserved pages can be split into smaller pages for use by other applications. The osview command can be used to view the number of free pages available at a particular page size (see the osview(1)man page). The default value for all these parameters is zero. Refer to “nlpages_64k” in Appendix A for additional information.