Appendix C. Autotasking Directives (Outmoded)

If your system includes multiple central processing units (CPUs), your program may be able to make use of multitasking, or running simultaneously on more than one CPU. This technology speeds up program execution by decreasing elapsed time. You can determine the number of CPUs on your system by entering the hinv(1) command.

The compiler automatically recognizes many parallel coding constructs, and it compiles them for multitasking without requiring additional user input; this capability is called Autotasking.

Autotasking directives let you specify the level of parallelism desired. You can start and end parallel processing at any number of suitable points within a subprogram. These directives are useful when the compiler fails to recognize parallelism that you know exists. This can occur, for example, when you have subroutine calls that can be executed in parallel.


Note: The directives in this section are outmoded, but they are still supported for older codes that require this functionality. Silicon Graphics encourages you to write new codes using the OpenMP directives described in Chapter 4, “OpenMP Fortran API Multiprocessing Directives”.

This section provides an overview of the Autotasking directives recognized by the compiler.


Caution: The ability to use Autotasking directives in a subprogram that host associates a variable can result in undefined behavior. This applies only to Autotasking directives; it does not apply to parallelism detected by the compiler.

A branch out of a parallel region is not permitted and can produce incorrect results.

Autotasking directives control the way the compiler multitasks your program. You can insert tasking directive lines directly into your source code. The compiler supports the following Autotasking directives:

The following sections describe the Autotasking directives.

Using Directives

The following sections describe how to use the CF90 Autotasking directives and the effects they have on programs.

For additional general information on using directives, see “Using Directives” in Chapter 3.

Directive Continuation

In the following example, an asterisk (*) appears in column 6 to indicate that the second line is a continuation of the preceding line:

!MIC$ GU
!MIC$*ARD

If you want to specify more than one directive on a line, separate each directive with a comma. Some directives require that you specify one or more arguments; when specifying a directive of this type, no other directive can appear on the line.

Spaces can precede, follow, or be embedded within a directive, regardless of source form.

Do not use source preprocessor (#) directives within multiline compiler directives (CMIC$ or !MIC$).

Directive Range and Placement

The range and placement of directives is as follows:

  • The Autotasking directives must appear within a program unit.

  • The ENDDO directive must appear after the loop body of a DOPARALLEL loop, if it appears. The corresponding DOPARALLEL directive must be present.

  • The following directives apply only to the next loop encountered lexically:

    • DOALL

    • DOPARALLEL

  • The following Autotasking directives must appear as pairs within a program unit:

    • CASE, ENDCASE

    • GUARD, ENDGUARD

    • PARALLEL, ENDPARALLEL

Interaction of Directives with the -x Command Line Option

The -x option on the f90(1) command accepts one or more directives as arguments. When your input is compiled, the compiler ignores directives named as arguments to the -x option. If you specify -x mipspro, all directives are ignored. If you specify -x dirname, a particular directive is ignored. For more information on this command line option, see “-xdirlist” in Chapter 2.

Concurrent Blocks: CASE and ENDCASE

The !MIC$ CASE directive serves as a separator between adjacent code blocks that can be executed concurrently. It marks the beginning of a control structure and signals that the code following it will be executed on a single processor.

!MIC$ ENDCASE serves as the terminator for a group of one or more parallel CASE directives. All work within the control structure must complete before execution continues with the code below the ENDCASE. The compiler does not automatically generate CASE directives.

The formats for these directives are as follows:

!MIC$ CASE
!MIC$ ENDCASE

Example. A single CASE/ENDCASE directive pair can also be used within a parallel region to allow only one processor to execute a code block, as follows:

!MIC$ PARALLEL
!MIC$ CASE
      CALL XYZ
!MIC$ ENDCASE
       :
!MIC$ DOPARALLEL
      DO I = 1, IMAX
       :
      END DO
!MIC$ ENDPARALLEL

In the preceding code, only one processor calls XYZ, and then all available processors execute the code following the ENDCASE.

Declare Lack of Side Effects: CNCALL

The !MIC$ CNCALL directive allows a loop to be Autotasked by asserting that subroutines called from the loop have no loop-related side effects (that is, they do not modify data referenced in other iterations of the loop) and therefore can be called concurrently by separate iterations of the loop. CNCALL is inserted immediately preceding the loop.

The format for this directive is as follows:

!MIC$ CNCALL

Example:

!MIC$ CNCALL
      DO I = 1, N
        CALL CRUNCH(A(I), B(I))
      END DO

Mark Parallel Loop: DOALL

The !MIC$ DOALL directive indicates that the DO loop beginning on the next line may be executed in parallel by multiple processors. No directive is needed to end a DOALL loop, (that is, the DOALL initiates a parallel region that contains only a DO loop with independent iterations). The loop index variable for a DOALL must be specified as a PRIVATE variable.

For a !MIC$ DOALL directive, all the variables and arrays in the region must be defined in a SHARED or PRIVATE parameter.

The format of this directive is as follows:

!MIC$ DOALL parameter[[,]parameter] ... [[,]work_distribution]
parameter 

Table C-1, describes parameters for the DOALL directive. More than one parameter can appear on the directive, but they must be separated by commas or blanks.

work_distribution 

Parameters that specify the work distribution policy for iterations of the parallel DO loop. Only one can be used for a given DO loop.

By default, iterations are distributed one at a time. Table C-2, describes the work distribution parameters.

The default scheduling for a DOALL directive is STATIC. In addition, CHUNKSIZE = CEILING(n/p), where n is the number of trips and p is the number of processors.

The DOALL directive does not accept the MAXCPUS or AUTOSCOPE clauses; their presence generates a fatal error.

Table C-1. Autotasking directive parameter values

parameter

Description

IF(expr)

Performs a run-time test to choose between uniprocessing and multiprocessing. When not specified, multiprocessing is chosen if the loop is not in a routine that was called from within a parallel region. The logical expression (expr) determines (at run time) whether multiprocessing will occur. When expr is true, multiprocessing is enabled.

PRIVATE(var[,var] ...)

Specifies that the variables listed will have private scope; that is, each task (original or helper) will have its own private copy of these variables. The PRIVATE clause identifies those variables that are not shared between parallel processes. One variable cannot be declared both PRIVATE and SHARED. The loop control variable of the DOALL loop cannot be specified as SHARED; it must be specified as PRIVATE. Variables cannot be subobjects (that is, array elements or components of derived types).

SAVELAST

Specifies that the values of private variables, from the final iteration of a DOALL directive, will continue in the original task after execution of the iterations of the DOALL. By default, private variables are not guaranteed to retain the last iteration values. SAVELAST can be used only with DOALL, and if the full iteration set is not completed (for example, if the loop is exited early), the values of private variables are indeterminate.

SHARED(var[,var] ...)

Specifies that the variables listed will have shared scope; that is, they are accessible to both the original task and all helper tasks. The SHARED clause identifies those variables that are shared between parallel processes. One variable cannot be declared both PRIVATE and SHARED. The loop control variable of the DOALL loop cannot be specified as SHARED; it must be specified as PRIVATE. Variables cannot be subobjects (that is, array elements or components of derived types).


Table C-2. Autotasking directive work_distribution values

work_distribution

 

Description

CHUNKSIZE(n)

Specifies the number of iterations to distribute to an available processor. n is an integer expression. For best performance, n should be an integer constant. For example, given 100 iterations and CHUNKSIZE(4), 4 iterations at a time are distributed to each available processor until the 100 iterations are complete.

By default, n is the number of loop iterations divided by the number of processors.

GUIDED[(vl)]

Specifies the use of guided self-scheduling to distribute the iterations to available processors. This mechanism minimizes synchronization overhead while providing acceptable dynamic load balancing.

The vl argument is the vector length. vl must be of type integer and can be either a constant or a variable.

The default vl is 1.


Mark Parallel Loop: DOPARALLEL and ENDDO

The !MIC$ DOPARALLEL directive indicates that the DO loop beginning on the next line may be executed in parallel by multiple processors. No directive is needed to end a DOPARALLEL loop.

The !MIC$ ENDDO directive extends a control structure beyond the DO loop. Without a !MIC$ ENDDO directive, all CPUs synchronize immediately after the loop, so that no processors can continue executing until all of the iterations are done. A !MIC$ ENDDO directive moves this point of synchronization from the end of the loop to the line of the !MIC$ ENDDO directive.

This lets the compiler use parallelism in loops containing some forms of reduction computations. These directives can be used only within a parallel region bounded by the PARALLEL and ENDPARALLEL directives.

All variables and arrays in a parallel region must be declared as PRIVATE or SHARED.

The formats for these directives are as follows:

!MIC$ DOPARALLEL [work_distribution]
!MIC$ ENDDO

The work_distribution arguments are described in Table C-2. Only one work_distribution can be used for a given DO loop.

In the following example, a parallel region is defined by PARALLEL and ENDPARALLEL. A reduction computation is implemented by a DOPARALLEL/ENDDO pair, which ensures that all contributions to SUM and BIG are included, and GUARD/ENDGUARD, which protects the updating of shared variables SUM and BIG.

      SUM = 0.0
      BIG = -1.0
!MIC$ PARALLEL PRIVATE(XSUM,XBIG,I)
!MIC$*         SHARED(SUM,BIG,AA,BB,CC)
      XSUM = 0.0
      XBIG = -1.0
!MIC$ DOPARALLEL
      DO I = 1, 2000
         :
         XSUM = XSUM + (AA(I)*(BB(I)-CC(AA(I))))
         XBIG = MAX(ABS(AA(I)*BB(I)), XBIG)
         :
      END DO
!MIC$ GUARD
      SUM = SUM + XSUM
      BIG = MAX(XBIG,BIG)
!MIC$ ENDGUARD
!MIC$ ENDDO
!MIC$ ENDPARALLEL

Critical Region: GUARD and ENDGUARD

The !MIC$ GUARD and !MIC$ ENDGUARD directives delimit a critical region, providing the necessary synchronization to protect or guard the code inside the critical region. A critical region is a code block that is to be executed by only one processor at a time, although all processors that enter a parallel region will execute it.

The formats for these directives are as follows:

!MIC$ GUARD [n]
!MIC$ ENDGUARD [n]
n

Mutual exclusion flag; two regions with the same flag cannot be active concurrently. n must be of type integer and can be a variable or an expression, from which the low-order 6 bits are used. For example, GUARD 1 and GUARD 2 can be active concurrently, but two GUARD 7 directives cannot.

For optimal performance, no n should be specified. Otherwise, n should be an integer constant; a general expression can be used for the unusual case that the critical region number must be passed to a lower-level routine. When n is not provided, the critical region blocks only other instances of itself, but no other critical regions. Critical regions may appear anywhere in a program. That is, they are not limited to parallel regions.

Numbered GUARD directives are not supported. They are implemented as unnamed GUARD directives. This can lead to deadlock if the user has nested GUARD directives.

Specify Maximum Number of CPUs for a Parallel Region: NUMCPUS

The !MIC$ NUMCPUS directive globally indicates the maximum number of CPUs that a section of code can use effectively. It does not guarantee that this number of processors will actually be assigned. The NUMCPUS directive is in effect until a subsequent NUMCPUS directive is encountered. The NUMCPUS directive stays in effect across program units. The NUMCPUS directive remains in effect for all subsequently called subroutines. Without this directive, CPUs are allocated based on the MP_SET_NUMTHREADS environment variable and workload.

The format for this directive is as follows:

!MIC$ NUMCPUS (ncpus)
ncpus

Globally specifies the maximum number of CPUs that a code can use effectively. ncpus must be of type integer and can be a constant, variable, or expression.

The number of CPUs specified with this directive should be equal to or less than the number of CPUs specified by the MP_SET_NUMTHREADS environment variable. If the number requested with the NUMCPUS directive is greater than the number specified by the MP_SET_NUMTHREADS environment variable, no error is issued, but the directive has no effect.

Mark Parallel Region: PARALLEL and ENDPARALLEL

The !MIC$ PARALLEL and !MIC$ ENDPARALLEL directives mark, respectively, the beginning and end of a parallel region. Parallel regions are combinations of redundant code blocks and partitioned code blocks. The formats for these directives are as follows:

!MIC$ PARALLEL [parameter[[,]parameter] ...]
!MIC$ ENDPARALLEL

The parameters are described in Table C-1.

The PARALLEL directive indicates where multiple processors enter execution. The portion of code that all processors execute until reaching a DOPARALLEL directive is called a redundant code block. Because the iterations of the DO loop within a DOPARALLEL directive are distributed across available processors, this portion of code is called the partitioned code block. The scope of a variable in a parallel region is either shared or private. Shared variables are used by all processors; private variables are unique to a processor.

When the compiler generates code for a !MIC$ PARALLEL directive, all the variables and arrays in the region must be defined in a SHARED or PRIVATE parameter.

Declare an Array with No Repeated Values: PERMUTATION

The !MIC$ PERMUTATION directive declares that an integer array has no repeated values. This is useful when the integer array is used as a subscript for another array (vector-valued subscript). The format for this directive is as follows:

!MIC$ PERMUTATION (ia[, ia] ...)
ia

Integer array that has no repeated values for the entire routine.

When an array with a vector-valued subscript appears on both sides of the equal sign in a loop, many-to-one assignment is possible even when the subscript is identical. Many-to-one assignment occurs if any repeated elements exist in the subscripting array. If it is known that the integer array is used merely to permute the elements of the subscripted array, it can often be determined that many-to-one assignment does not exist with that array reference.

Sometimes a vector-valued subscript is used as a means of indirect addressing because the elements of interest in an array are sparsely distributed; in this case, an integer array is used to select only the desired elements, and no repeated elements exist in the integer array, as in the following example:

!MIC$ PERMUTATION(IPNT) ! IPNT has no repeated values
      ...
      DO I = 1, N
         A(IPNT(I)) = A(IPNT(I)) + B(I)
      END DO

Examples

The following examples show shared and private variables and arrays.

Read-only Variables

The following examples show read-only variables:

!MIC$ DOALL PRIVATE(I) SHARED(N1,N2,A)
      DO I = N1, N2
      ...= A
      END DO

A is a shared variable because it is a read-only variable. All processors share the same location for A.

!MIC$ DOALL SHARED(N1,N2,M1,M2,V) PRIVATE(I,J)
      DO 10 I = N1, N2
      DO 10 J = M1, M2
         ... = V(J)
      END DO

V is shared because it is a read-only array. N1, N2, M1, and M2 are also shared because they are read-only variables. I and J are written and then read, so they are private variables.

Array Indexed by Loop Index

The following example shows an array indexed by the loop index:

!MIC$ DOALL SHARED(N1,N2,V,U,J) PRIVATE(I,T)
      DO I = N1, N2
        T = V(I)
        U(I,J) = T
      END DO

U and V are shared arrays because they are indexed by the loop index. All processors share the same location for V and U. T is written and then read, so it is a private variable. J is shared because it is a read-only variable.

Read-then-write Variables

The following example shows read-then-write variables:

      SUM = 0.0
!MIC$ DOALL SHARED(N1,N2,V,SUM) PRIVATE(I,T)
      DO I = N1, N2
        T = V(I)
!MIC$ GUARD
        SUM = SUM + T
!MIC$ ENDGUARD
      END DO

SUM is a shared variable because it is read before it is written. Special care is needed in writing into a shared variable that is not indexed by the loop control variable.

Write-then-read Variables and Arrays

The following example shows write-then-read variables and arrays:

!MIC$ DOALL SHARED(N1,N2,M1,M2) PRIVATE(I,J,V)
      DO 10 I = N1, N2
      DO 10 J = M1, M2
      V(J) = ...
      ... = V(J)
      END DO

V is written to and then read. It must be a private array.