Chapter 3. General Directives

A directive is a line inserted into Fortran source code that specifies actions to be performed by the compiler. Directive lines are not Fortran statements.

Many MIPSpro 7 Fortran 90 compiler features are implemented as either command line options or directives. The features implemented as command line options are set at compile time and applied to all files in the compilation. The features implemented through directives are set within your Fortran source code, and they apply to portions of your source code.

This chapter introduces the MIPSpro 7 Fortran 90 directive set and describes the general directives.

The sections in this chapter are as follows:

Using Directives

All directives are of the following form:

prefixdirective
prefix

Each directive begins with a prefix. The prefix needed for each directive is shown in the directive's description. The following directive prefixes are used by the MIPSpro 7 Fortran 90 compiler:

The prefix used also depends on which Fortran source form you are using, as follows:

  • If you are using fixed source form, begin a directive line with the characters Cprefix or !prefix. The ! or C character must appear in column 1. Beginning the directive with a ! or C character ensures that compilers other than the MIPSpro 7 Fortran 90 compiler will treat compiler directive lines as comment lines.

  • If you are using free source form, begin a directive line with the characters !prefix, followed by a space, and then one or more directives. The !prefix need not start in column 1, but it must be the first text on a line.

Because both fixed source form and free source form accept directives that start with the exclamation point (!), that is the initial character used in all directive syntax descriptions in this manual.

directive

This is the specific directive's syntax. The syntax usually consists of the directive name. Some directives accept arguments. A directive's arguments, if any, are shown in the description for the directive itself.

The following sections describe the general format for directives and explain how directives are continued across source code lines.


Note: The multiprocessing directives supported in previous MIPSpro 7 Fortran 90 releases are outmoded, and so are the !$PAR, C$PAR, !$, and C$ directive prefixes. This technology is outmoded, but it is still supported for older codes that require this functionality. Silicon Graphics and Cray Research encourage you to modify your code using the OpenMP directives described in Chapter 4, “OpenMP Fortran API Multiprocessing Directives”.


Directives and Command Line Options

Some compiler features can be activated on the command line and through compiler directives. The difference is that a command line setting applies to all files in the compilation, but a directive applies to only a program unit or to another specific part of a source file.

Generally, and by default, directives override command line options. There are exceptions to this rule, however. The exceptions, if any, are noted in the introductory text to each directive group.

Directive Range

The range of a particular directive depends on the directive itself, as follows:

  • If a directive appears within a program unit, it applies only to that program unit. Within a program unit, many directives apply only to the loops that they immediately precede.

  • If a directive appears outside a program unit (for example, prior to program code in a file) it applies to the entire file.

The descriptions for the individual directives indicate the range of the directive.

Directive Continuation and Other Considerations

It is sometimes necessary to continue a directive across one or more source code lines. The continuation character used and its placement within the directive line depends on the type of directive you are using. The introductory text for each directive group indicates the continuation character that is appropriate for that group.

For all directives in this chapter, the prefix for a directive line that is a continuation line is !*$*&.

Do not use source preprocessor (#) directives within multiline compiler directives.

LNO Directives

The loop nest optimization (LNO) directives control loop nest optimizations. By default, directives override command line options. To reverse this, and have command line options override the LNO directives, specify -LNO:ignore_pragmas. For information on the -LNO:ignore_pragmas option, see “-LNO:ignore_pragmas=setting” in Chapter 2.

To continue a directive, the continuation line must begin with !*$*&.

The following directives control loop nest optimizations:

  • AGGRESSIVEINNERLOOPFISSION

  • BLOCKABLE

  • BLOCKINGSIZE, NOBLOCKING

  • FISSION, FISSIONABLE, NOFISSION

  • FUSE, FUSEABLE, NOFUSION

  • INTERCHANGE, NOINTERCHANGE

  • PREFETCH

  • PREFETCH_MANUAL

  • PREFETCH_REF

  • PREFETCH_REF_DISABLE

  • UNROLL

The following sections describe the LNO directives.

Request Loop Fission for Inner Loops: AGGRESSIVEINNERLOOPFISSION Directive

The AGGRESSIVEINNERLOOPFISSION directive specifies that the following loop should be split into as many loops as possible. In a loop nest, this directive must precede an inner loop.

The format of this directive is as follows:

!*$* AGGRESSIVEINNERLOOPFISSION

Permit Cache Blocking: BLOCKABLE Directive

The BLOCKABLE directive specifies that it is legal to cache block the subsequent loops. For more information on controlling cache blocking, see the -LNO:blocking option in “-LNO:blocking=setting” in Chapter 2, and the -LNO:blocking_size option in “-LNO:blocking_size=n1,n2” in Chapter 2.

The format of this directive is as follows:

!*$* BLOCKABLE (do_variable,do_variable[,do_variable]...)
do_variable

Specify the do_variable names of two or more loops. The loops identified by the do_variable names must be adjacent and nested within each other, although they need not be perfectly nested.

This directive informs the compiler that these loops can be involved in a blocking situation with each other, even if the compiler would consider such a transformation illegal. The loops must also be interchangeable and unrollable. This directive does not instruct the compiler on which of these transformations to apply.

Declare Cache Blocking: BLOCKINGSIZE and NOBLOCKING Directives

The BLOCKINGSIZE and NOBLOCKING directives assert that the loop following the directive either is (or is not) involved in a cache blocking for the primary or secondary cache.

The formats of these directives are as follows:

!*$* BLOCKINGSIZE(n1[,n2])
!*$* NOBLOCKING
n1,n2

An integer number that indicates the block size. If the loop is involved in a blocking, it will have a block size of n1 for the primary cache and n2 for the secondary cache. The compiler attempts to include this loop within such a block, but it cannot guarantee this.

If n1 or n2 are 0, the loop is not blocked, but the entire loop is inside the block.

Example:

      SUBROUTINE AMAT(X,Y,Z,N,M,MM)
      REAL(KIND=8) X(100,100), Y(100,100), Z(100,100)
      DO K = 1, N
!*$* BLOCKING SIZE (20)
         DO J = 1, M
!*$* BLOCKING SIZE (20)
            DO I = 1, MM
               Z(I,K) = Z(I,K) + X(I,J)*Y(J,K)
            END DO
         END DO
      END DO
      END

For the preceding code, the compiler makes 20 X 20 blocks when blocking, but it could block the loop nest such that loop K is not included in the tile. If it did not, add a BLOCKINGSIZE(0) directive just before loop K to specify that the compiler should generate a loop such as the following:

      SUBROUTINE AMAT(X,Y,Z,N,M,MM)
      REAL(KIND=8) X(100,100), Y(100,100), Z(100,100)
      DO JJ = 1, M, 20
         DO II = 1, MM, 20
            DO K = 1, N
               DO J = JJ, MIN(M, JJ+19)
                  DO I = II, MIN(MM, II+19)
                     Z(I,K) = Z(I,K) + X(I,J)*Y(J,K)
                  END DO
               END DO
            END DO
         END DO
      END DO
      END

Note that an INTERCHANGE directive can be applied to the same loop nest as a BLOCKINGSIZE directive. The BLOCKINGSIZE directive applies to the loop it directly precedes; it moves with that loop when an interchange is applied.

The NOBLOCKING directive prevents the compiler from involving the subsequent loop in a cache blocking situation.

Control Loop Fission for Outer Loops: FISSION, FISSIONABLE, and NOFISSION Directives

The fission control directives specify whether the compiler should perform loop fission on the loops that immediately follow these directives.

The formats of these directives are as follows:

!*$* FISSION[(level)]
!*$* FISSIONABLE
!*$* NOFISSION
level

Specify an integer number that indicates the number of loop levels that should undergo loop fission.

The FISSION directive specifies that loop fission should be attempted. The compiler performs a validity test on the subsequent loops unless you have also specified a FISSIONABLE directive. The NOFISSION directive specifies that the following loop should not undergo fission, but its inner loops, if any, may undergo fission.

These directives do not cause statements to be reordered.

Control Loop Fusion for Outer Loops: FUSE, FUSEABLE, and NOFUSION Directives

The fusion control directives specify whether the compiler should perform loop fusion on the loops that immediately follow these directives.

The formats of these directives are as follows:

!*$* FUSE[(n,[level])]
!*$* FUSEABLE
!*$* NOFUSION
n

Specify an integer number that indicates the number of subsequent loops that should undergo loop fusion. The default is 2.

level

Specify an integer that indicates how deeply the loops should be fused.

The level of loop fusion is determined by the maximum perfectly nested loop levels of the fused loops, although partial fusion is allowed.

Loop iterations may be peeled as needed during loop fusion. The limit of this peeling is 5, or the number specified by the -LNO:fusion_peeling_limit command line option.

The FUSE directive specifies that loop fusion should be attempted. The compiler performs a validity test on the subsequent loops unless you have also specified a FUSEABLE directive. When the FUSEABLE directive is specified, the fusion is done for loops with identical iteration counts. The NOFUSION directive specifies that the following loop should not be fused with any other loop. For more information on the -LNO:fusion_peeling_limit command line option, see “-LNO:fusion_peeling_limit=n” in Chapter 2.

Example. Consider the following code:

DO I = 1,N
  DO J = 1,N
    ...
  END DO
END DO
DO I = 1,N
  DO J = 1,N
    ...
  END DO
END DO

Fusing the loops with a level of 1 results in the following loop nest:

DO I = 1,N
  DO J = 1,N
    ...
  END DO
  DO J = 1,N
    ...
  END DO
END DO

Fusing the loops with a level of 2 results in the following loop nest:

DO I = 1,N
  DO J = 1,N
    ...
    ...
  END DO
END DO

Control Loop Interchange: INTERCHANGE and NOINTERCHANGE Directives

The loop interchange control directives specify whether or not the order of the following two or more loops should be interchanged. These directives apply to the loops that they immediately precede.

The formats of these directives are as follows:

!*$* INTERCHANGE (do_variable1,do_variable2[,do_variable3]...)
!*$* NOINTERCHANGE
do_variable

Specifies two or more do_variable names. The do_variable names can be specified in any order, and the compiler reorders the loops. The loops must be perfectly nested. If the loops are not perfectly nested, you may receive unexpected results.

The compiler reorders the loops such that the loop with do_variable1 is outermost, then loop do_variable2, then loop do_variable3.

The NOINTERCHANGE directive inhibits loop interchange on the loop that immediately follows the directive.

Control Prefetching for a Program Unit: PREFETCH Directive

The PREFETCH directive controls the MIPS IV prefetch instruction. Using this directive can increase performance in program units that are likely to encounter cache misses during execution. This directive applies only to the program unit in which it appears.

When the directive is specified, the compiler estimates the memory references that will be cache misses, inserts prefetches for the misses, and schedules the prefetches ahead of their corresponding references. You can specify different levels of prefetching aggressiveness for the primary and secondary cache.

The format of this directive is as follows:

!*$* PREFETCH (primary_cache[,secondary_cache])
primary_cache, secondary_cache 

For each of these, specify 0, 1, or 2. The number specified indicates the level of prefetching requested for the primary and secondary cache levels, respectively.

A 0 disables all prefetching. 1 requests conservative prefetching. 2 requests aggressive prefetching. By default, primary_cache and secondary_cache are both set to 1 when the -r10000 command line option is in effect, and they are set to 0 for all other processor settings.

This directive is recognized only if the -mips4 and -r10000 command line options are in effect.

Control Prefetching in a Subprogram: PREFETCH_MANUAL Directive

The PREFETCH_MANUAL directive specifies whether the PREFETCH_REF and the PREFETCH_REF_DISABLE directives, which perform manual prefetches, should be respected or ignored within a subprogram. This directive applies only to the program unit in which it appears.

The format of this directive is as follows:

!*$* PREFETCH_MANUAL (n)
n

Specify either 0 or 1 for n. 0 indicates that the compiler should ignore all prefetch directive. 1 indicates that all prefetch directives should be recognized. By default, all prefetch directives are recognized.

This directive is recognized only if the -mips4 and -r10000 command line options are in effect. For more information on the -mips4 option, see “-mipsn” in Chapter 2. For more information on the -r10000 option, see “-rprocessor” in Chapter 2.

Request Prefetching for an Array: PREFETCH_REF Directive

The PREFETCH_REF directive requests prefetching for a specific memory reference. This directive applies only to the loop nest that includes references to array, and the directive must immediately precede the loop nest.

When this directive is specified, all references to array in the subsequent loop nest are ignored by the automatic prefetcher (if enabled).

The format of this directive is as follows:

!*$* PREFETCH_REF=array[,stride=stride[,stride]][,level=level[,level]][,kind=rw][,size=size]
array

For array, specify identification information for the array. For example: A(I,J).

stride

Specify prefetching for every stride iterations of the loop. The default is 1.

level

Specify the level in the memory hierarchy to prefetch, either 1 or 2. The default is 2. 1 specifies a prefetch from secondary cache to primary cache. 2 specifies a prefetch from memory to primary cache.

rw

Specify rd or wr. rd indicates that the location is read. wr indicates that the location is written. The default is wr.

size

Specify the size, in KB, of array. Must be a constant.

If size is specified, the automatic prefetcher (if enabled) reduces the effective cache size by that amount in its calculations. The compiler tries to issue one prefetch per stride iterations, but this cannot be guaranteed.

This directive generates a single prefetch instruction to a specified memory reference. It searches for array references that match the supplied reference in the current loop nest and takes the following actions:

  • If the reference is found, the reference is scheduled relative to the prefetch node, based on the miss latency for the specified level of the cache.

  • If no such reference is found, the prefetch is generated at the start of the loop body.

This directive is recognized only if the -mips4 and -r10000 command line options are in effect. For more information on the -mips4 option, see “-mipsn” in Chapter 2. For more information on the -r10000 option, see “-rprocessor” in Chapter 2

Disable Prefetching for a Specific Array: PREFETCH_REF_DISABLE Directive

The PREFETCH_REF_DISABLE directive disables prefetching for all references to an array. This directive applies to all array references within the program unit.

The format of this directive is as follows:

!*$* PREFETCH_REF_DISABLE=array[, size=size]
array

For array, specify identification information for the array. For example: A(I,J).

If the automatic prefetcher is enabled, it ignores array.

size

Specifies the size, in Kbytes, of array. Must be a constant.

The size is used for volume analysis. Volume analysis is performed as part of prefetching analysis. In volume analysis, the compiler tries to determine the amount of data referenced by each loop or loop nest. This information is used when determining whether or not to prefetch memory references.

This directive is recognized only if the -mips4 and -r10000 command line options are in effect.

Request Loop Unrolling: UNROLL Directive

The UNROLL directive specifies loop unrolling. This directive applies to the loop that immediately follows the directive.

Inner loop unrolling occurs automatically when -O2 or -O3 are in effect. Non-inner loop unrolling (and jam) occurs when -O3 is in effect.

The format of this directive is as follows:

!*$* UNROLL (n)
n

Specifies the number of copies of the loop body to be generated, as follows:

  • When this directive precedes an inner loop, the compiler generates n - 1 copies of the loop body. This is standard loop unrolling.

  • When this directive precedes an outer loop, the compiler performs an unroll and jam operation on the loop.

The value of n must be at least 2 in order for unrolling to occur. If n = 1, no unrolling is performed.

Even with this directive specified, unrolling is not performed if the compiler determines that unrolling would be unsafe. To specify that the compiler unroll the loop regardless of its analysis, you must also specify a BLOCKABLE directive. For information on the BLOCKABLE directive, see “Permit Cache Blocking: BLOCKABLE Directive”.

Example. Assume that -O3 is specified and that the outer loop of the following nest will be unrolled by two:

!*$* UNROLL (2)
      DO I = 1, 10
        DO J = 1,100
              A(J,I) = B(J,I) + 1
        END DO
      END DO

With outer loop unrolling, the compiler produces the following nest, in which the two bodies of the inner loop are adjacent to each other:

      DO I = 1, 10, 2
        DO J = 1,100
              A(J,I) = B(J,I) + 1
        END DO
        DO J = 1,100
              A(J,I+1) = B(J,I+1) + 1
        END DO
      END DO

The compiler then jams, or fuses, the inner two loop bodies together, producing the following nest:

      DO I = 1, 10, 2
        DO J = 1,100
              A(J,I)   = B(J,I) + 1
              A(J,I+1) = B(J,I+1) + 1
        END DO
      END DO

Argument Aliasing Directives (ASSERT ARGUMENTALIASING and ASSERT NOARGUMENTALIASING)

The ASSERT ARGUMENTALIASING and ASSERT NOARGUMENTALIASING directives allow the compiler to make assumptions about procedure dummy arguments when performing optimizations.

It is possible to call a procedure and specify the same variable or array element in two or more positions of the actual argument list. Within the procedure, two or more dummy argument names, which appear to refer to different memory locations, actually refer to the same location. This practice violates the Fortran standard. You can use the ASSERT ARGUMENTALIASING directive to force the compiler to be more conservative.

By default, ASSERT NOARGUMENTALIASING is in effect.

The formats for these directives are as follows:

!*$* ASSERT ARGUMENTALIASING
!*$* ASSERT NOARGUMENTALIASING

If these directives appear prior to Fortran source code in a file, they are applied to all program units in the file. If they appear in a program unit, they are applied to that program unit only. If one of these directives is encountered, it remains in effect until reset by the opposing directive.

Symbol Storage Directives

The following directives control symbol storage:

  • ALIGN_SYMBOL

  • FILL_SYMBOL

  • FLUSH

  • SECTION_GP

  • SECTION_NON_GP

Control Symbol Alignment and Padding: ALIGN_SYMBOL and FILL_SYMBOL Directives

The ALIGN_SYMBOL and FILL_SYMBOL directives control the way symbols are stored.

The ALIGN_SYMBOL directive aligns the start of symbol at a specified alignment boundary.

The FILL_SYMBOL directive pads symbol with additional storage so that the symbol is assured not to overlap (even partially) with any other data item within the storage of the specified size. The additional padding required is divided between each end of the specified variable. For example, a FILL_SYMBOL(X,L1CACHELINE) directive guarantees that X does not suffer from false sharing for the primary cache line.

The formats for these directives are as follows:

!*$* ALIGN_SYMBOL (symbol[, storage]) 
!*$* FILL_SYMBOL (symbol[, storage])
symbol

Specify the name of a symbol. symbol can be a common block variable or a module name. symbol cannot be a component of a derived type, an array element, a common block, or blank common.

storage

Specify the storage size. Specify one of the following values for storage:

storage 

Action

L1CACHELINE 

Specifies the machine-specific first-level cache line size, typically 32 bytes.

L2CACHELINE 

Specifies the machine-specific secondary cache line size, typically 128 bytes.

PAGE 

Specifies a machine-specific page. Typically 16 KB.

power-of-two 

An integer value that is a power of 2. This is measured in bytes.

For common block variables, these directives are required at each declaration of the common block. Because the directives modify the allocated storage and its alignment for the named symbol, inconsistent directives can lead to undefined results.

The ALIGN_SYMBOL directive has no effect on fixed-size local symbols, such as simple scalars or arrays of known size (for example symbols declared as REAL(N) or REAL(A(3))). The directive continues to be effective for automatic arrays (stack-allocated arrays of dynamically determined size).

You cannot specify an ALIGN_SYMBOL directive and a FILL_SYMBOL directive for the same symbol.

Example:

! X IS A COMMON BLOCK VARIABLE
      COMMON X!
      INTEGER(KIND=4) X
!*$* ALIGN_SYMBOL (X, 32)

!   X WILL START AT A 32-BYTE BOUNDARY.
!   WARNING: THE LAYOUT OF THE COMMON BLOCK WILL BE AFFECTED

!*$* ALIGN_SYMBOL (X, 2)
!   ERROR: CANNOT REQUEST AN ALIGNMENT LOWER THAN THE NATURAL
!   ALIGNMENT OF THE SYMBOL.

      REAL(KIND=8) Y
!   Y IS A COMMON BLOCK OR LOCAL VARIABLE
!*$* FILL_SYMBOL (Y, L2CACHELINE)


!   ALLOCATE EXTRA STORAGE BOTH BEFORE AND AFTER Y SO THAT
!   Y IS WITHIN AN L2CACHELINE (128 BYTES) ALL BY ITSELF.
!   THIS CAN BE USEFUL TO AVOID FALSE-SHARING BETWEEN MULTIPLE
!   PROCESSORS FOR THE CACHE LINE CONTAINING Y.

Declare a Synchronization Point: FLUSH Directive

The FLUSH directive identifies synchronization points at which thread-visible variables are written back to memory. This directive must appear at the precise point in the code at which the synchronization is required.


Note: This directive has the same effect as the FLUSH directive described in the OpenMP Fortran API. For more information on the OpenMP FLUSH directive, see “Read and Write Variables to Memory: FLUSH Directive” in Chapter 4.

Thread-visible variables include the following data items:

  • Globally visible variables (common blocks and modules).

  • Local variables that do not have the SAVE attribute but have had their address taken and saved or have had their address passed to another subprogram.

  • Local variables that do not have the SAVE attribute that are declared shared in a parallel region within the subprogram.

  • Dummy arguments.

  • All pointer dereferences.

This directive has the following format:

!*$* FLUSH [(var[, var] ...)]
var

Variables to be flushed.

Specify Global Pointer Use: SECTION_GP and SECTION_NON_GP Directives

The MIPSpro 7 Fortran 90 compiler can reference global data by using the global pointer and an offset value. Using the global pointer (gp) is more efficient than constructing the address at each occurence, but because the offset size is limited to 16 bits, only a limited set of elements can be referenced using the global pointer.

The compiler places global data in gp-relative or non-gp-relative sections, but you can use the SECTION_GP and SECTION_NON_GP directives to specify the variables to go within the gp-relative section and the variables that need to be addressed explicitly.

The formats for these directives are as follows:

!*$* SECTION_GP (symbol[, symbol] ...)
!*$* SECTION_NON_GP (symbol[, symbol] ...)
symbol

Enter one or more symbols. Separate multiple symbols with commas. Valid symbols are common block names, variables specified on SAVE statements, and module names. If a module name is specified, all storage in the module is affected. If a common block name is specified, it must be of the following form: /name/.

Inlining and IPA Directives (INLINE, NOINLINE, IPA, and NOIPA)

The following are the inlining and interprocedural analysis (IPA) directives:

  • INLINE, NOINLINE

  • IPA, NOIPA


Note: Neither inlining nor IPA are enabled by default. By default, the directives in this section, if present in your source code, are ignored. To enable the directives and turn on inlining and IPA, specify the -INLINE: option or the -IPA: option on your f90(1) command line. For more information on the command line interaction with these features, see Chapter 2, “Invoking MIPSpro 7 Fortran 90”, or see one of the following man pages: f90(1) or ipa(5).

Inlining is the process of replacing a procedure reference with a copy of the procedure's code. This eliminates procedure call overhead and exposes the relationships between the procedure code, the return value, and the surrounding code. The INLINE and NOINLINE directives allow you to specify procedures that should be inlined.

Interprocedural analysis (IPA) is a MIPSpro compiler feature that includes inlining, common block array padding, constant propagation, dead procedure elimination, dead variable elimination, and global name optimizations. For detailed information on the IPA feature, see the ipa(5) man page. The IPA and NOIPA directives allow you to control IPA.

The formats of these directives are as follows:

!*$* INLINE location[(name[,name] ...)]
!*$* NOINLINE location[(name[,name] ...)]
!*$* IPA location[(name[,name] ...)]
!*$* NOIPA location[(name[,name] ...)]
location

Specify one of the following for location:

location

Action

HERE

Specifies that routines named on the subsequent source code line should be inlined or should undergo IPA. Default.

ROUTINE

Specifies that the named function should be inlined or should undergo IPA everywhere it appears within the current routine.

GLOBAL

Specifies that the named function should be inlined or should undergo IPA throughout the source file.

name

For the inlining directives, each name specification represents one or more routines to be inlined. If no routines are named, all routines in the program are inlined.

For the IPA directives, each name specification represents one or more routines to undergo IPA. If no routines are named, all routines in the program undergo IPA.

Example. Consider the following code fragment:

      DO I = 1,N
!*$* INLINE (BETA) HERE
         CALL BETA(I,1)
      ENDDO
      CALL BETA(N,2)

Using the specifier ROUTINE rather than HERE in this example would inline both calls to BETA. Note that -INLINE:=ON must be specified on the f90(1) command line when this code is compiled in order for the inlining directive to be recognized.