Chapter 2. Job Limits

Standard system resource limits are set up so that each process receives the same process-based limits at the time the process is created. While limits on individual processes are useful, they do not restrict individual users to a given share of the system. With the IRIX kernel job limits feature, all processes associated with a particular login session or batch submission are encapsulated as a single logical unit called a job. The job is the container used to group processes by login session. Limits on resource usage are applied on a per user basis for a particular job and these limits are enforced by the kernel. All processes are associated with a particular job and are identified by a unique job identifier (job ID). The processes belonging to a particular job can be limited, controlled, queried, and accounted for as a unit. This allows a system administrator to set job-specific limits on CPU time, memory, file space, and other system resources. The user limits database (ULDB) allows user-specific limits for jobs. If no ULDB is defined, job limits are the same for all jobs. Job limits software can help maximize utilization of larger systems in a multiuser environment.


Note: Job limit values (rlim_t) are 64-bit in both n32 and n64 binaries. Consequently, n32 binaries can set 64-bit limits. o32 binaries cannot set 64-bit limits because rlim_t is 32-bits in o32 binaries. IRIX supports three Application Binary Interfaces (ABIs): o32, n64, and n32 (for more information on ABIs, see the abi(5) man page).

For more information on rlimit_* values, see “Using systune to Display and Set Process Limits” in Chapter 1 and “showlimits”.


This chapter contains the following sections:

Read Me First

The sections in this chapter contain information about installing job limits software on your system. You should reference them in the order they are listed here:

  1. For a general description of jobs and job limits, see “Job Limits Overview”, and “Job Limits Supported”.

  2. To install the job limits package, see “Installing Job Limits”.

  3. For information about writing a user limits directives input file infile and creating the user limits database (ULDB), see “Creating the User Limits Directives Input File”, and “Creating the User Limits Database”, respectively.

    For a list of man pages related to job limits, see “Job Limits Man Pages”.

  4. For information on how to use the systune joblimits command to set systemwide default values for job limits, see “Using systune to Display and Set Job Limits ”.

  5. For information on how to view job limits on a system, see “User Commands for Viewing and Setting Job Limits”.

  6. For information on troubleshooting your job limits installation, see “Troubleshooting Job Limits”.

  7. For information on application programming interfaces, see “Application Programming Interface for Job Limits” in Appendix A, and “Application Programming Interface for the ULDB” in Appendix A.

Job Limits Overview

Job limits software helps ensure that each user has access to the appropriate amount of system resources such as CPU time and memory and makes sure that users do not exceed their allotted amount. Job limits software can improve system throughput and utilization by restricting how much of a machine each user can use. For information on user-based job limits supported in IRIX, see “Job Limits Supported”.

Work on a machine is submitted in a variety of ways, such as an interactive login, a submission from a workload management system, a cron job, or a remote access such as rsh, rcp, or array services. Each of these points of entry create an original shell process and multiple processes flow from that original point of entry. The kernel job provides a means to limit the resource usage of all the processes resulting from a point of entry. A job is a group of related processes all descended from a point of entry process and identified by a unique job ID. A job can contain multiple process groups, sessions, or array sessions and all processes in one of these subgroups are always contained within one job. Figure 2-1, shows the point of entry processes that initiate the creation of jobs.

Figure 2-1. Point of Entry Processes

Point of Entry Processes

IRIX job limits have the following characteristics:

  • A job is an inescapable container. A process cannot leave the job nor can a new process be created outside the job without explicit action, that is, a system call with root privilege.

  • Each new process inherits the job ID and limits from its parent process.

  • All point of entry processes (job initiators) create a new job and set the job limits appropriately.

  • Users can raise and lower their own job limits within maximum values specified by the system administrator.

  • The job initiator performs authentication and security checks.

The process control initialization process ( init(1M)) and startup scripts called by init are not part of a job and have a job ID of zero.


Note: The upper bits of the job ID are used to indicate the machine ID. The job ID contains the array services machine ID ( asmchid). Array services are started by the init process and large job IDs are created. To the administrator, this may seem like large job ID values appear without explanation because they have not set the machine ID. For more information on the asmchid parameter, see Appendix A, “IRIX Kernel Tunable Parameters”, in the IRIX Admin: System Configuration and Operation and the arsctl(2) and newarraysess(2) man pages.



Note: The existing IRIX commands jobs(1), fg(1), and bg(1) man pages apply to shell “jobs” and are not related to IRIX kernel job limits.



Note: Job initiators like secure shell that are not developed by SGI might not initiate an IRIX kernel job.


Figure 2-2 shows two limit domains. Limit domains are a way to categorize work. The job initiators shown in Figure 2-1, can be categorized as either interactive or batch processes. Limit domain names are defined by the system administrator when the user limits database (ULDB) is created. Applications that use the ULDB to retrieve job limits information expect to find limit information with specific names. These names are defined by convention. For additional information on limit domains and the ULDB, see “User Limits Database”.

Figure 2-2. Limit Domains

Limit Domains

The IRIX operating system provides a number of commands that provide information about the memory usage on a system. The job limits jstat(1) command reports the current usage and highwater memory values of all concurrently running processes within a job. For more information on memory usage in IRIX, see Chapter 6, “IRIX Memory Usage”. For more information on the jstat(1) command, see “jstat”.

Job Limits Supported

Table 2-1 shows job limits supported by the IRIX operating system. Each limit restricts the use of a particular system resource for all the processes contained within a job. Job limits software also introduces a limit unique to jobs called JLIMIT_NUMPROC that controls the number of processes in a job.

Table 2-1. Job Limits

Limit Name

Symbolic ID

Units

Description

Enforcement

jlimit_nproc_cur
jlimit_nproc_max

JLIMIT_NUMPROC

processes

Maximum number of processes within the job

Process creation by any job fails with errno set to EAGAIN

jlimit_nofile_cur
jlimit_nofile_max

JLIMIT_NOFILE

file descriptors

Maximum total number of open file descriptors all processes in job can have

open(2) calls by any job fail with errno set to EMFILE

jlimit_rss_cur
jlimit_rss_max

JLIMIT_RSS

bytes

Maximum total resident set size for all processes in a job

Resident pages above limit become prime swap candidates

jlimit_vmem_cur
jlimit_vmem_max

JLIMIT_VMEM

bytes

Maximum total address space for all processes in a job

The brk (2) and mmap(2) calls in any job fail with errno set to ENOMEM

jlimit_data_cur
jlimit_data_max

JLIMIT_DATA

bytes

Maximum total heap size for all processes in job

The brk (2) calls in any job fail with errno set to ENOMEM

jlimit_cpu_cur
jlimit_cpu_max

JLIMIT_CPU

seconds

Maximum number of CPU seconds allowed for all processes in a job.

Termination of all processes in a job that continue to consume CPU time via SIGXCPU signal. See Note below. You can also use the cpulimit_gracetime parameter to alter signalling behavior, see “cpulimit_gracetime”.

jlimit_pmem_cur
jlimit_pmem_max

JLIMIT_PMEM

bytes

Maximum total resident set size for all processes in a job.

Termination of all processes in job that continue to consume system resources via SIGKILL signal. See Note below and “cpulimit_gracetime”.

getjlimit and setjlimit

Limits on the consumption of system resources by a job, shown in Table 2-1, may be obtained with the getjlimit(2) function and set by the setjlimit (2) function. The getjlimit function gets the current and maximum job limits values for the specified job. The CAP_MAC_READ capability is needed to retrieve values from jobs belonging to other users.

The setjlimit(2) function sets the current and maximum job limits values for the specified job. If the current job is different from the job being requested, the setjlimit function checks for the CAP_MAC_WRITE capability. If the maximum (hard) limits are being raised, the setjlimit function checks for the CAP_PROC_MGT capability.

For additional information, see the getjlimit (2) man page. For more information on the capability mechanism that provides fine grained control over the privileges of a process, see the capability(4) and capabilities(4) man pages.

waitjob

The waitjob mechanism allows a batch processing system to find out job limit information for jobs that exit abnormally. The waitjob function obtains information about a terminated job that has been set with setwaitjobpid argument to wait. For more information on the waitjob(2) and setwaitjobpid(2) calls, see “Application Programming Interface for Job Limits” in Appendix A and “Application Programming Interface for the ULDB” in Appendix A, respectively, and the waitjob(2) and setwaitjobpid(2) man pages.

systune

You can use the systune joblimits command to set system-wide defaults. For additional information, see “Using systune to Display and Set Job Limits ” and the systune (1M) man page.

cpulimit_gracetime

The cpulimit_gracetime parameter establishes a grace period for processes that exceed the CPU time limit. Each process in a job has a cpulimit_gracetime associated with it. If the cpulimit_gracetime parameter is set to 10 seconds and a job has 100 processes, theoretically, a job could run for an additional 1000 seconds after the JLIMIT_CPU limit had been exceeded. The cpulimit_gracetime parameter controls the signalling behavior associated with the CPU limit. For additional information on the cpulimit_gracetime parameter, see “Additional Process Limits Parameters” in Chapter 1.

Job limits software works in a manner similar to process limits when dealing with the cpulimit_gracetime. As a process executes, the CPU usage increases. When the limit is reached, the SIGXCPU signal is sent individually to each process when it executes. When the SIGXCPU is sent to a process, the grace period goes into effect for that process. If the process is still executing when the grace period expires, it is terminated with the SIGKILL signal. Only the processes in a job that are executing, are sent a SIGXCPU signal. Each process in a job gets an individual grace period. Therefore, the SIGXCPU signal is not sent en masse to all processes in a job.


Note: Only processes in a job that are executing and consuming system resources, such as CPU time or memory, when a clock interrupt occurs and a JLIMIT_CPU or JLIMIT_PMEM limit has been exceeded, will receive either a SIGXCPU or SIGKILL signal, respectively. It is possible that processes in a job that are idle will not be signalled even if a limit has been exceeded.


User Limits Database

The User Limits Database (ULDB) contains job limits information which allows a system administrator to control access to a machine on a per user basis. Job initiators, the applications that initiate new jobs on the system like login, rsh, rlogin, cron, and workload management systems like Miser, retrieve job limits values from the ULDB for a particular user and use the information to set limits, appropriately.

For more information on job initiators, see “Job Limits Overview”.

The ULDB is used to set job limit and process limit values for jobs when the job limits package is installed. If job limits are not installed, process limits are handled by the current resource limits functionality.

Domain defaults apply to all users unless there is a "user" entry that describes values for that user. User specific values override the domain defaults. Values in the ULDB override the system default values for both job limits and process limits.

This section describes the commands used to create, maintain, and display the contents of the ULDB and the library application programming interface (API), which allows applications access to the ULDB information.


Note: The ULDB configuration file contained in the /etc/jlimits.in file contains a template you can follow when setting up the ULDB on your system.


The /etc directory also contains the jlimits and jlimits.m files. The jlimits.in file is parsed into the colon delimited jlimits file, which is used to load job limits into the local ULDB jlimits.m file or into the NIS master map. The jlimits file is automatically generated by the genlimits(1M) command. The jlimits.m file is the local ULDB mdbm file.

Creating the User Limits Database

The command to create the ULDB is as follows:

genlimits [-i infile] [-l] [-m] [-L local_database] [-N nisfile] [-v]

The genlimits command parses the formatted ASCII user limits directives input file ( infile) into a colon-delimited ASCII file, which can be used to create one of the following output formats:

  • Network Information Service (NIS) master server map ( -m option)

  • Local database for NIS or direct (non-NIS) use ( -l option)

The genlimits command accepts the following options:

-i infile 

Identifies the location of the user limits directives input file. If you do not specify the -i option, the default file is /etc/jlimits.in.

-l 

Creates a local database for Network Information Service (NIS) or direct (non-NIS) use. When NIS is enabled, the local database contains local entries which override or supplement entries from the NIS server. When NIS is not enabled, the local database contains information to set limits on the system. By default, this database is in the /etc/jlimits.m file. You cannot use the -l option with the -m option.

-m 

Creates the NIS master server map. It generates and stores the map in the standard NIS map location. You cannot override this location. You cannot use the -m option with the -l option.

-L local_database 

Specifies an alternate location for the local database. The -L option works in conjunction with the -l option.

-N nisfile 

Specifies a different location for the created NIS database source input file. The default location is the /etc/jlimits file. You can use the -N nisfile option to create a new database without overwriting the existing /etc/jlimits file.

-v 

Specifies verbose mode, which prints out messages describing actions of the genlimits command.

For additional information, see the genlimits (1M) man page.

Creating the User Limits Directives Input File

The user limits directive file contains the input to the genlimits(1M) command, defining the information on domains, limits, and users that will be used to generate the ULDB. This section describes how to write a user limits directives input file.

Comments

Any text following the # character is treated as a comment.

Numeric Limit Values

Numeric values can have a letter appended that indicate a multiplier that is applied to the numeric value provided to determine the limit value as follows:

Letter 

Multiplier Value

k (kilo) 

1024 (2**10)

m (mega) 

1,048,576 (2**20)

g (giga) 

1,073,741,824 (2**30)

t (tera) 

1,099,511,627,776 (2**40)

H (hours) 

3600

M (minutes) 

60

  • Use the k, m, g, and t multipliers when defining memory limits or other large values.

  • Use the H and M multipliers when defining time values.

Multiplier values are defined in the /usr/include/uldb.h system include file.

There are no requirements that multipliers be use in the above manner.

Numeric limit values can also be specified as “unlimited” which indicates there is no upper limit for this particular limit type.

For additional information about creating the ULDB, see the genlimits(1M) man page.

Domain Directives

Each limit domain that is referenced in the ULDB must first be identified using the "domain" directive. The directive provides the ASCII domain name and a list of the default limit values for the domain. An example domain directive follows:

 domain domain_name {
      limit_name = value
      limit_name:machname = value
      ...
   }

Certain domain names are reserved for user job limits. Other domain names may be created and used for special purposes. The following list contains reserved domain names:

Reserved Domain Name

Description

interactive

Used by interactive job initiators such as telnet and login

batch

A generic batch domain used as secondary choice for all workload management software

miser

The domain used when submitting work to Miser

nqe

The domain used when submitting work to NQE

lsf

The domain used when submitting work to LSF

User Directives

The "user" directive specifies a set of limits for an individual user. The user name must identify a valid login account. The uid value is optional. If uid is specified, the genlimits command verifies that the uid provided matches the uid defined for the user on the machine where genlimits executes. Domain clauses identify each domain for which the user will have unique limit values. The domain listed in the user directive must already be defined in a prior domain directive. The syntax and semantics of the domain clause is the same as the domain directive. It is not necessary to provide user directives for all users on the system. If there is no user directive for a queried user or there are no values for a queried domain, the default values for that domain are returned. An example user directive follows:

user user_name[:uid] {
      domain_name {
         limit_name = value
         limit_name:machname = value
         ...
      }
      domain_name {
         ...
      }
      ...
   }

The limit specifications for both the domain and user directives may include an optional machine name. Limit values specified with a machine name apply only to that machine. Limits without a machine name apply to all machines in the cluster. The directives input file can contain several occurrences of the same limit, each with a different name, as well as an occurrence without a machine name specified.

The genlimits command processes limit values with associated machine names differently depending on the type of database (see “Creating the User Limits Database”) being generated:

  • If the -m option is used to generate a NIS master map, limit values with associated machine names are ignored. Only clusterwide values without machine names are included in the database.

  • If the -l option is used to generate a local database, the genlimits command selects the limit value with the name of the local machine if present. If there is no limit value with the local machine name, the genlimits command selects the clusterwide value with no machine name. You can determine the local machine name by running the uname -n command. For additional information on the uname command, see the uname(1) man page.

Setting Up a User Limits Directive Input File Example

Because the ULDB is completely rebuilt whenever the genlimits command is invoked, the input directive file must contain a complete representation of the database. When changes are needed, the system administrator must edit the user limits directives input file and then rebuild the database. Because domain defaults are used if there is no user entry for a particular user, the administrator only needs to provide user entries for named users to overwrite default values. The following example shows a user limits directives input file that specifies three limit types, two domains, and one user with individual limits. The ULDB only stores the limit values. The meaning of a value and the units it expresses are up to the application that uses the limit.


Note: If you are updating entries in the ULDB and they do not change the job limit values on your system, make sure that limit names used in the ULDB and limit names used in the systune joblimits group are exactly the same. For additional information, see “Troubleshooting Job Limits”.


domain interactive {              # domain for interactive logins
   jlimit_cpu_cur = 60
   jlimit_cpu_max = 120           # limit interactive jobs to 120 CPU seconds
   jlimit_vmem_cur = 2m
   jlimit_vmem_max = 4m           # limit interactive jobs to 4 megabytes of virtual memory
   jlimit_numproc_cur =10
   jlimit_numproc_max = 20        # limit interactive jobs to 20 concurrent processes
}
domain batch {                    # domain for batch submissions
   jlimit_cpu_cur = 3600
   jlimit_cpu_max = 7200          # limit batch jobs to two hours of CPU time
   jlimit_vmem_cur = 128m
   jlimit_vmem_max = 256m         # limit batch jobs to 256 megabytes of memory
   jlimit_numproc_cur = unlimited
   jlimit_numproc_max = unlimited # no limit on processes in a batch job
}

user fred:123 {                   # User "fred" gets his own interactive CPU limits
   interactive  {                 #
      jlimit_cpu_cur = 300
      jlimit_cpu_max = 600        # "fred" needs to run longer jobs in interactive mode
    }
}

Using systune to Display and Set Job Limits

You can use the systune joblimits command to view and set systemwide default values for user job limits. The ULDB will override these values if it exists. The joblimits group contains the following variables:

    jlimit_cpu_cur 
    jlimit_cpu_max 
    jlimit_data_cur 
    jlimit_data_max 
    jlimit_vmem_cur 
    jlimit_vmem_max 
    jlimit_rss_cur 
    jlimit_rss_max 
    jlimit_nofile_cur 
    jlimit_nofile_max 
    jlimit_numproc_cur 
    jlimit_numproc_max
    jlimit_pmem_cur
    jlimit_pmem_max

Output from the systune joblimits command follows:

$ systune joblimits
group: joblimits (statically changeable)
        jlimit_numproc_max = 1024 (0x400) ll
        jlimit_numproc_cur = 1024 (0x400) ll
        jlimit_nofile_max = 5000 (0x1388) ll
        jlimit_nofile_cur = 400 (0x190) ll
        jlimit_rss_max = 9223372036854775807 (0x7fffffffffffffff) ll
        jlimit_rss_cur = 9223372036854775807 (0x7fffffffffffffff) ll
        jlimit_vmem_max = 9223372036854775807 (0x7fffffffffffffff) ll
        jlimit_vmem_cur = 9223372036854775807 (0x7fffffffffffffff) ll
        jlimit_data_max = 9223372036854775807 (0x7fffffffffffffff) ll
        jlimit_data_cur = 9223372036854775807 (0x7fffffffffffffff) ll
        jlimit_cpu_max = 9223372036854775807 (0x7fffffffffffffff) ll
        jlimit_cpu_cur = 9223372036854775807 (0x7fffffffffffffff) ll
        jlimit_pmem_max = 9223372036854775807 (0x7fffffffffffffff) ll
        jlimit_pmem_cur = 9223372036854775807 (0x7fffffffffffffff) ll

The display information is described below:

  • jlimit_numproc - Number of processes limit

  • jlimit_nofile - Number of files limit

  • jlimit_rss - Resident set size, default is in bytes

  • jlimit_vmem - Virtual memory limit, default is in bytes

  • jlimit_data - Data size, default is in bytes

  • jlimit_cpu - CPU time, default in seconds.

  • jlimit_pmem - Maximum resident set size for all processes in a job, default in bytes

For additional information, see the systune (1M) and jlimit(1) man pages.

User Commands for Viewing and Setting Job Limits

This section describes the following user commands which can be used to view and set job limits:

showlimits

The command to view limit information from the ULDB is as follows:

showlimits [-D] [-d] [-u user_name] [domain_name]

The showlimits command displays limits information from the user limits database (ULDB).

The showlimits command accepts the following options:

-D 

Displays the names of all the domains defined in the ULDB. When you specify the -D option, the domain name and other options are ignored.

-d 

Displays the domain default limits. When no options are specified, the showlimits command displays the default limits for all domains.

-u user_name 

Displays the limits values for the specified user rather than the current user.

domain_name 

Displays the limits values for the specified domain rather than all domains.

If no options are specified, the showlimits command displays the current limits information for the current user for all domains as shown below:

% showlimits

Domain interactive:
        jlimit_cpu_cur: unlimited
        jlimit_cpu_max: unlimited
        jlimit_data_cur: unlimited
        jlimit_data_max: unlimited
        jlimit_nofile_cur: 400
        jlimit_nofile_max: unlimited
        jlimit_vmem_cur: unlimited
        jlimit_vmem_max: unlimited
        jlimit_rss_cur: unlimited
        jlimit_rss_max: unlimited
        jlimit_pthread_cur: 2k
        jlimit_pthread_max: 65535
        jlimit_numproc_cur: 1k
        jlimit_numproc_max: 65535
        rlimit_cpu_cur: unlimited
        rlimit_cpu_max: unlimited
        rlimit_fsize_cur: unlimited
        rlimit_fsize_max: unlimited
        rlimit_data_max: unlimited
        rlimit_stack_cur: 64m
        rlimit_stack_max: unlimited
        rlimit_core_cur: unlimited
        rlimit_core_max: unlimited
        rlimit_nofile_cur: 200
        rlimit_nofile_max: unlimited
        rlimit_vmem_max: unlimited
        rlimit_rss_max: unlimited
       
Domain batch:
        jlimit_cpu_cur: unlimited
        jlimit_cpu_max: unlimited
        jlimit_data_cur: unlimited
        jlimit_data_max: unlimited
        jlimit_nofile_cur: 400
        jlimit_nofile_max: unlimited
        jlimit_vmem_cur: unlimited
        jlimit_vmem_max: unlimited
        jlimit_rss_cur: unlimited
        jlimit_rss_max: unlimited
        jlimit_pthread_cur: 2k
        jlimit_pthread_max: 65535
        jlimit_numproc_cur: 1k
        jlimit_numproc_max: 65535
        rlimit_cpu_cur: unlimited
        rlimit_cpu_max: unlimited
        rlimit_fsize_cur: unlimited
        rlimit_fsize_max: unlimited
        rlimit_data_max: unlimited
        rlimit_stack_cur: 64m
        rlimit_stack_max: unlimited
        rlimit_core_cur: unlimited
        rlimit_core_max: unlimited
        rlimit_nofile_cur: 200
        rlimit_nofile_max: unlimited
        rlimit_vmem_max: unlimited
        rlimit_rss_max: unlimited


Note: If the ULDB has changed after the user logged in, the current limits will not be effective. Current limits will be effective for any new users that login.


For a description of the job limit values, see Table 2-1. For a description of the process limit values, see Table 1-1.

For additional information, see the showlimits (1) man page.

jlimit

The command to display and set job limits is as follows:

jlimit [-j job_id] [-h] [limit_name [value]]

The jlimit command displays and changes limits on job resource usage. The current and maximum (hard) limits are set when a job starts from values that are contained in the user limits database (ULDB) information for the user. You can raise and lower your current limits within the range not to exceed your maximum limit. You can irrevocably lower your maximum limit. You must have the CAP_PROC_MGT capability to raise your maximum limit. Limit enforcement always occurs at the current limit regardless of your maximum limit value. See the capability(4) and capabilities (4) man pages for additional information on the capability mechanism that provides fine grained control over the privileges of a process.

The jlimit command accepts the following options:

-j job_id 

Specifies a particular job ID for a job where limits are going to be changed. You must have the CAP_MAC_WRITE and CAP_PROC_MGT capabilities to change job limits for jobs that belong to other users. The job ID is printed out in hexadecimal. When the job ID is specified, the "0x" prefix is optional.

-h  

Specifies that the maximum (hard) limit values for a job are displayed or modified. If you do not specify the -h option, the jlimit command displays or modifies current limit values.

limit_name [value]  

Displays or sets the value for the specified limit:

  • If no limit name is specified, jlimit displays the values for all limits.

  • If the limit name is specified without a value, jlimit displays the value for the limit.

  • If both a limit name and a value are specified, jlimit sets the appropriate value for the limit.

If the -j option with a job_id argument is specified, the jlimit command prints out the following information:

 % jlimit -j 0x14
cputime: unlimited
datasize: unlimited
files: unlimited
vmemory: unlimited
ressetsize: unlimited
processes: 65535

For an explanation of the limit values, see Table 2-1.

For additional information, see the jlimit (1) man page.

jstat

The command to display job status information for active jobs is as follows:

jstat [-a] [-l] [-p]
jstat [-j job_id] [-l] [-p]

The jstat command accepts the following options:

-a 

Displays information about all jobs.

-j job_id 

Displays information only for the specified job ID ( job_id).

-l 

Displays limit information about the current or specified job including the current usage, current limit, and maximum limit.

-p 

Displays information about each process that belongs to the current or specified job including the process ID, state, and executing command.

-P 

Displays the memory limits information in pages rather than in bytes. This option is used with the -l option.

If neither the -a or -j job_id are used, the jstat command displays information on the current job.

If the -l option is specified, the jstat command prints out the current usage, high usage, current limit, and maximum limit information for the current job as shown below:

% jstat -l

JID             OWNER          COMMAND       
--------------- -------------- --------------
0x5eac0000001bd terry            -csh          

LIMIT NAME      USAGE          HIGH USAGE     CURRENT LIMIT  MAX LIMIT     
--------------- -------------- -------------- -------------- --------------
cputime         1:05           1:05           unlimited      unlimited     
datasize        400k           400k           unlimited      unlimited     
files           10             35             400            5000          
vmemory         44             201            unlimited      unlimited     
ressetsize      340            357            unlimited      unlimited     
processes       2              4              1024           1024  

If the -l and -P options are specified, the jstat command will print out the same information that the -l option displays with the exception that memory values are shown in pages. SGI systems support multiple page sizes. For more information on pages sizes, see the "Multiple Page Sizes" section, chapter 10, "System Performance Tuning" in the IRIX Admin: System Configuration and Operation manual.

Summary information is always printed. For an explanation of the limit values, see Table 2-1.

For additional information, see the jstat (1) man page.

Job Limits and Existing IRIX software

The ps -j command prints out the process ID, process group ID, session ID, and job ID in hexadecimal:

% ps -j
        PID       PGID        SID        JID TTY          TIME CMD
     253430     253430     253430     0x5eac001bd ttyq12  0:00 csh 
     254563     254563     253430     0x5eac001bd ttyq12  0:00 ps 

For additional information, see the ps(1) man page.

The array services daemon, arrayd(1M), propagates the job ID from the originating machine to any other machines when starting new processes for the job on other machines in a cluster.

For additional information, see the arrayd (1M) man page.

The cpr(1) command allows you to include job information in the system restart statefile. A JID checkpoint type has been added to the cpr -p option. This JID type allows you to checkpoint and restart an entire job. See the example as follows:

% cpr -c ckpt02 -p 0x8000000000001234:JID

This example checkpoints all the processes contained within a job with the job ID 0x8000000000001234 to the statefile directory ./ckpt02.

For additional information, see the cpr (1) man page.

If you have job limits software installed on your system and want jobs started via the remote shell server (rshd (1M)) and remote execution server (rexecd (1M)) to recognize the SIGXCPU signal, you must update the /etc/default/rshd and /etc/default/rexecd files, respectively. You must set the SVR4_SIGNALS parameter to NO. This allows the rshd and rexecd servers to recognize the SIGXCPU signal.

For additional information, see the rsh (1M) and rexecd(1M) man pages.

Running Job Limits with Message Passing Interface (MPI) Jobs

Message Passing Interface (MPI) jobs requires a great number of file descriptors. By default, a job's current limit for the files limit is set to 400 as shown by the jstat command with the -l option:

% jstat -l

JID                OWNER          COMMAND       
------------------ -------------- --------------
0x23fc000000000035 user           -csh          

LIMIT NAME         USAGE          HIGH USAGE     CURRENT LIMIT  MAX LIMIT     
------------------ -------------- -------------- -------------- --------------
cputime            0              0              unlimited      unlimited     
datasize           80k            208k           unlimited      unlimited     
files              8              28             400            5000
vmemory            2384k          9824k          unlimited      unlimited     
ressetsize         608k           2320k          unlimited      unlimited     
threads            1              1              2048           2048          
processes          2              6              1024           1024          
physmem            608k           2320k          unlimited      unlimited     

If you run MPI jobs on systems with 16 or more CPUs, the default current limit for files set at 400 is easily encountered and an error message similar to the following is issued:

MPI jobs fail with the error MPI: fork_slaves/fork: Resource temporarily unavailable 
MPI: daemon terminated: mice1 - job aborting  

To avoid this error, set the default current limit for the files limit higher when you are running MPI jobs. For information on setting system job limits, see “User Limits Database” and “Using systune to Display and Set Job Limits ”.

The following table contains the recommended default current limit for the files limit when you are running large MPI jobs depending upon the number of CPUs in your system. The recommended settings are approximate values.

Number of CPUs 

Default Current Limit or Higher

16 

351

17 

380

18 

410

20 

472

25 

648

30 

848

50 

4448

Installing Job Limits

Use the inst(1M) software installation tool or the swmgr(1M) software management tool to install kernel job limits software. For more information on inst(1M) and swmgr(1M), see IRIX Admin: Software Installation and Licensing in the IRIX Admin manual set and their respective man pages.

To install the kernel job limits software on IRIX systems, install this subsystem: eoe.sw.jlimits.

Once the job limits software is installed, run the autoconfig(1M) command and reboot the system.

Job limits software is only available in the IRIX feature stream.

To turn off job limits, you must deinstall the eoe.sw.jlimits software module and then reboot the system.

Troubleshooting Job Limits

If you are updating entries in the ULDB and they do not change the job limit values on your system, make sure that limit names used in the ULDB and limit names used in the systune joblimits group are exactly the same. The ULDB cannot determine which job limit variables are valid and which are not. If the symbolic names in the ULDB are entered incorrectly, values from the systune joblimits group will be applied. For information on limit names, see Table 2-1.

Job Limits Man Pages

The man command provides online help on all resource management commands. To view a man page online, type man commandname.

User-Level Man Pages

The following user-level man pages are provided with job limits software:

User-level man page

Description

jlimit(1)

Displays and sets resource limits

jstat(1)

Displays job status information

showlimits(1)

Displays limits information from the user limits database

Administrator Man Pages

The following administrator man page is provided with job limits software:

Administrator man page

Description

genlimits(1M)

Creates the user limits data base

Application Interface Man Pages

The following online man pages are provided with job limits software to help those who develop applications that use job limits software:

Application interface man page

Description

getjid(2)

Get job ID

getjlimit(2)

Control a job's maximum system resource consumption

getjusage(2)

Get job usage information

killjob(2)

Terminates all processes for the specified job

jlimit_startjob(3c)

Creates a new job

makenewjob(2)

Creates a new job container

setjusage(2)

Updates the resource usage values for the specified job ID.

setwaitjobpid(2)

Sets a job to wait for a specified process ID (PID) to call the waitjob(2) function

waitjob(2)

Obtains information about a terminated job

uldb_get_limit_values(3c)

Collection of functions that all interact with the user limits database (ULDB) to retrieve or set limit values for a domain or user.

Error Messages

The following job limits related error messages are returned:

EBUSY 

The requested job ID value is in use.

EINVAL 

Invalid parameters encountered.

ENOATTR 

The domain name or namelist are not specified.

ENOEXIST 

The jlimits file does not exit.

ENOJOB 

A job with the specified job ID cannot be found.

ENOMEM 

Sufficient memory is not available.

ENOPKG 

The job limits software is not installed.