Chapter 10. OpenMP C/C++ API Multiprocessing Directives

Chapter 10. OpenMP C/C++ API Multiprocessing Directives
Prev		Next

This appendix discusses the multiprocessing directives that MIPSpro C and C++ compilers support. These directives are based on the OpenMP C/C++ Application Program Interface (API) standard. Programs that use these directives are portable and can be compiled by other compilers that support the OpenMP standard.

To enable recognition of the OpenMP directives, specify -mp on the cc or CC command line.

In addition to directives, the OpenMP C/C++ API describes several library functions and environment variables. Information on the library functions can be found on the omp_lock(3), omp_nested(3), and omp_threads(3) man pages. Information on the environment variables can be found on the pe_environ(5) man page.

This chapter contains the following sections:

“Using Directives”, describes using directives and the directive format.
“Conditional Compilation”, describes conditional compilation.
“parallel Construct”, describes the parallel region construct.
“Work-sharing Constructs”, describes work-sharing constructs.
“Combined Parallel Work-sharing Constructs”, describes the combined parallel work-sharing constructs.
“Master and Synchronization Constructs”, describes the synchronization constructs.
“Data Environment”, describes the data environment, which includes directives and clauses that affect the data environment.
“Directive Binding”, describes directive binding.
“Directive Nesting”, describes directive nesting.

Note: The Silicon Graphics multiprocessing directives, including the Origin series distributed shared memory directives, are outmoded. Their preferred alternatives are the OpenMP C/C++ API directives described in this chapter.

Using Directives

Each OpenMP directive starts with #pragma omp, to reduce the potential for conflict with other #pragma directives with the same name. They have the following form:

#pragma omp directive-name [clause[ clause] ...] new-line

Except for starting with #pragma omp, the directive follows the conventions of the C and C++ standards for compiler directives.

Directives are case-sensitive. The order in which clauses appear in directives is not significant. Only one directive name can be specified per directive.

An OpenMP directive applies to at most one succeeding statement, which must be a structured block.

Conditional Compilation

The _OPENMP macro name is defined by OpenMP-compliant implementations as the decimal constant, yyyymm, which will be the year and month of the approved specification. This macro must not be the subject of a #define or a #undef preprocessing directive.

#ifdef _OPENMP
iam = omp_get_thread_num() + index;
#endif

If vendors define extensions to OpenMP, they may specify additional predefined macros.

If an implementation is not OpenMP-compliant, or if its OpenMP mode is disabled, it may ignore the OpenMP directives in a program. In effect, an OpenMP directive behaves as if it were enclosed within #ifdef _OPENMP and #endif. Thus, the following two examples are equivalent:

if(cond)
{
   #pragma omp flush (x)
}
X++;

if(cond)
   #ifdef )OPENMP
   #pragma omp flush (x)
   #endif
x++;

`parallel` Construct

The #pragma omp parallel directive defines a parallel region, which is a region of the program that is to be executed by multiple threads in parallel.

The #pragma omp parallel directive has the following syntax:

#pragma omp parallel [clause[ clause] ...] new-linestructured-block

clause is one of the following:

if (scalar-expression)

private (list)

firstprivate (list)

default (shared | none)

shared (list)

copyin (list)

reduction (operator: list)

For information on these data scope attribute clauses, see “Data Scope Attribute Clauses”.

When a thread encounters a parallel construct and no if clause is present, or the if expression evaluates to a nonzero value, a team of threads is created. This thread becomes the master thread with a thread number of 0. The number of threads is controlled by environment variables and library calls. If the value of the if expression is zero, the region is serialized.

The number of threads remains constant while that parallel region is being executed. It can be changed either explicitly by the user or automatically by the runtime system from one parallel region to another. The omp_set_dynamic(3) library function and the OMP_DYNAMIC environment variable can be used to enable and disable the automatic adjustment of the number of threads. For more information on environment variables, see the pe_environ(5) man page.

If a thread in a team executing a parallel region encounters another parallel construct, it creates a new team, and it becomes the master of that new team. Nested parallel regions are seialized by default. By default, a nested parallel gregion is executed by a team composed of one threads. The default behavior can be changed by using either the omp_set_nested runtime library function or the OMP_NESTED environment variable.

The following restrictions apply to the #pragma omp parallel directive:

Only one if clause can appear on the directive.
It is unspecified whether any side-effects inside the if expression occur.
A throw executed inside a parallel region must cause execution to resume within the dynamic extent of the same structured block, and it must be caught by the same thread that threw the exception.

The parallel directive can be used in coarse-grain parallel programs. In the following example, each thread in the parallel region decides what part of the global array x to work on, based on the thread number.

#pragma omp parallel shared(x, npoints) private(iam, np, ipoints)
{
   iam = omp_get_thread_num();
   np = omp_get_num_threads();
   ipoints = npoints / np;
   subdomain(x, iam, ipoints);
}

Work-sharing Constructs

A work-sharing construct distributes the execution of the associated statement among the members of the team that encounter it. The work-sharing directives do not launch new threads, and there is no implied barrier on entry to a work-sharing construct.

The sequence of work-sharing constructs and barrier directives encountered must be the same for every thread in a team.

OpenMP defines the following work-sharing constructs:

for directive
sections directive
single directive

`for` Construct

The #pragma omp for directive identifies an iterative work-sharing construct that specifies a region in which the iterations of the associated loop should be executed in parallel. The iterations of the for loop are distributed across threads that already exist. The #pragma omp for directive has the following syntax:

#pragma omp for [clause[ clause]
... ] new-linefor-loop

clause is one of the following:

private (list)

firstprivate (list)

lastprivate (list)

reduction (operator: list)

ordered

schedule (kind[, chunk_size])

nowait

For information on the private, firstprivate, lastprivate, and reduction clauses, see “Data Scope Attribute Clauses”.

The #pragma omp for directive places restrictions on the structure of the corresponding for loop, which must have the following canonical shape:

for (init-expr;
var logical-op b; incr-expr)

init-expr

One of the following:

var = lb
integer-type var = lb

incr-expr

One of the following:

++var
var++
--var
var--
var += incr
var -= incr
var = var + incr
var = incr + var
var = var - incr

var

One of the following:

<
<=
>
>=

lb, b, and incr

Loop invariant integer expressions. There is no synchronization during the evaluation of these expressions. Thus, any evaluated side effects produce indeterminate results.

The schedule clause specifies how iterations of the for loop are divided among threads of the team. The value of chunk_size, if specified, must be a log invariant integer expression with a positive value. Synchronization does not occur during the evaluation of this expression, therefore, any evaluated side effects produce indeterminate results. The schedule kind can be one of the following:

`static`		When `schedule`(`static`,chunk_size) is specified, iterations are divided into chunks specified by chunk_size. The chunks are statically assigned to threads in the team in a round-robin fashion in the order of the thread number. When chunk_size is not specified, the iteration space is divided into chunks that are equal in size, with one chunk assigned to each thread.
`dynamic`		When `schedule`(`dynamic`,chunk_size) is specified, chunk_size iterations are assigned to each thread. When a thread finishes its chunk, it is dynamically assigned another until none remain. Default is 1.
`guided`		When `schedule`(`guided`,chunk_size) is specified, iterations are assigned to threads by decreasing sizes. When a thread finishes, it is dynamically assigned another chunk until none remain. Sizes decrease exponentially to 1. Default is 1.
`runtime`		When `schedule`(`runtime`) is specified, scheduling is deferred until runtime. Schedule kind and dize of chunks can be chosen by setting the `OMP_SCHEDULE` environment variable . If not set, the schedule is implementation-dependent.

The default schedule is implementation-dependent.

An OpenMP-compliant program should not rely on a particular schedule for correct execution. It is possible to have variations in the implementations of the same schedule kind across different compilers.

The ordered clause must be present when ordered directives are contained in the dynamic extent of the for construct.

There is an implicit barrier at the end of a for construct unless a nowait clause is specified.

The following restrictions apply to the #pragma omp for directive:

The for loop iteration variable must have a signed integer type.
The values of the loop control expressions of the for loop associated with a for directive must be the same for all the threads in the team.
The for loop iteration variable must have a signed integer type.
Only one schedule clause can appear on a for directive.
Only one ordered clause can appear on a for directive.
Only one nowait clause can appear on a for directive.
It is unspecified if or how often any side effects within the chunk_size, lb, b, or incr expressions occur.
The value of the chunk_size expression must be the same for all threads in the team.

If there are multiple independent loops within a parallel region, you can use the nowait clause to avoid the implied barrier at the end of the for directive, as follows:

#pragma omp parallel
{
   #pragma omp for nowait
      for (i=1; i<n; i++)
         b[i] = (a[i] + a[i-1]) / 2.0;

`sections` Construct

The #pragma omp sections directive identifies a non-iterative work-sharing construct that specifies a set of constructs that are to be divided among threads in a team. Each section is executed once by a thread in the team. Each section is preceded by a sections directive, although the sections directive is optional for the first section.

The #pragma omp sections directive has the following syntax:

#pragma omp sections [clause[ clause] ...] new-line
{
[#pragma omp section new-line] structured-block
[#pragma omp section new-linestructured-block
.
.
.]
}

clause is one of the following:

firstprivate(list)

lastprivate(list)

reduction(operator: list)

nowait

For information on private, firstprivate, lastprivate, and reduction, see “Data Scope Attribute Clauses”.

There is an implicit barrier at the end of a sections construct, unless a nowait is specified.

The following restrictions apply to the sections construct:

A section directive must not be outside the lexical extent of the sections directive.
Only one nowait clause can appear on a sections directive.

`single` Construct

The #pragma omp single directive identifies a construct that specifies that the associated structured block is executed by only one thread in the team (not necessarily the master thread). The #pragma omp single directive has the following syntax:

#pragma omp single [clause[ clause] ...] new-line structured-block

clause is one of the following:

private(list)

firstprivate(list)

nowait

For information on the private and firstprivate clauses, see “Data Scope Attribute Clauses”.

There is an implicit barrier after the single construct unless a nowait clause is specified.

The following restrictions apply to the #pragma omp single directive:

Only one nowait clause can appear on a single directive.

In the following example, only one thread (usually the first thread that encounters the single directive) prints the progress message. The user must not make any assumptions as to which thread will execute the single section. All other threads will skip the single section and stop at the barrier at the end of the single construct. If other threads can proceed without waiting for the thread executing the single section, a nowait clause can be specified on the single directive.

#pragma omp parallel
{
  #pragma omp single
    printf("Beginning work1.\n");
  work1();
  #pragma omp single
    printf("Finishing work1.\n");
  #pragma omp single nowait
    printf("Finished work1 and beginning work2.\n");
  work2();
}

Combined Parallel Work-sharing Constructs

Combined parallel work-sharing constructs are short cuts for specifying a parallel region that contains only one work-sharing construct. The semantics of these directives are identical to that of explicitly specifying a parallel directive followed by a single work-sharing construct.

`parallel for` Construct

The parallel for directive is a shortcut for a parallel region that contains one for directive. It has the following syntax:

#pragma omp parallel for [clause[ clause] ...] new-line for-loop

clause is one of the following:

if (scalar-expression)

private (list)

firstprivate (list)

lastprivate (list)

default (shared | none)

shared (list)

copyin (list)

reduction (operator: list)

ordered

schedule (kind[, chunk_size])

nowait

The following restrictions apply to the parallel for directive:

Only one if clause can appear on the directive.
It is unspecified whether any side-effects inside the if expression occur.
A throw executed inside a parallel region must cause execution to resume within the dynamic extent of the same structured block, and it must be caught by the same thread that threw the exception.

The for loop iteration variable must have a signed integer type.
The values of the loop control expressions of the for loop associated with a for directive must be the same for all the threads in the team.
The for loop iteration variable must have a signed integer type.
Only one schedule clause can appear on a for directive.
Only one ordered clause can appear on a for directive.
Only one nowait clause can appear on a for directive.
It is unspecified if or how often any side effects within the chunk_size, lb, b, or incr expressions occur.
The value of the chunk_size expression must be the same for all threads in the team.

`parallel sections` Construct

The #pragma omp parallel sections directive provides a shortcut form for specifying a parallel region containing one sections directive. The parallel sections directive has the following syntax:

#pragma omp parallel sections [clause[ clause] ...] new-line
{
[#pragma omp section new-line] structured-block
[#pragma omp section new-linestructured-block
.
.
.]
}

clause is one of the following:

if (scalar-expression)

private (list)

firstprivate (list)

lastprivate(list)

default (shared | none)

shared (list)

copyin (list)

reduction (operator: list)

nowait

In the following example, functions xaxis, yaxis, and zaxis can be executed concurrently. The first section directive is optional. Note that all section directives must appear in the lexical extent of the parallel sections construct.

#pragma omp parallel sections
{
   #pragma omp section
     xaxis();
   #pragma omp section
     yaxis();
   #pragma omp section
     zaxis();
}

Master and Synchronization Constructs

The following sections describe the synchronization constructs:

“master Construct”, describes the #pragma omp master directive.
“critical Construct”, describes the #pragma omp critical directive.
“barrier Directive”, describes the #pragma omp barrier directive.
“atomic Construct”, describes the #pragma omp atomic directive.
“flush Directive”, describes the #pragma omp flush directive.
“ordered Construct”, describes the #pragma omp ordered directive.

`master` Construct

The #pragma omp master directive identifies a construct that specifies a structured block that is executed by the master thread of the team. It has the following syntax:

#pragma omp master new-linestructured-block

Other threads in the team do not execute the associated statement. There is no implied barrier either on entry to or exit from the master section.

`critical` Construct

The #pragma omp critical directive identifies a construct that restricts execution of the associated structured block to one thread at a time. It has the following syntax:

#pragma omp critical [(name)] new-linestructured-block

An optional name that has external linkage may be used to identify the critical region.

A thread waits at the beginning of a criical region until no other thread is executing a critical region with the same name. All unnamed #pragma omp critical directives map to the same unspecified name.

The following example includes several #pragma omp critical directives. It illustrates a queuing model in which a task is dequeued and worked on. To guard against multiple threads dequeuing the same task, the dequeuing operation must be in a #pragma omp critical section. Because the two queues in this example are independent, they are protected by #pragma omp critical directives with different names, xaxis and yaxis.

#pragma omp parallel shared(x, y) private(x_next, y_next)
{
   #pragma omp critical ( xaxis )
     x_next = dequeue(x);
   work(x_next);
   #pragma omp critical ( yaxis )
     y_next = dequeue(y);
   work(y_next);
}

`barrier` Directive

The #pragma omp barrier directive synchronizes all the threads in a team, each thread waiting until all other threads have reached this point. After all threads have been synchronized, they begin executing the statements after the barrier directive in parallel. The barrier directive has the following syntax:

#pragma omp barrier new-line

`atomic` Construct

The #pragma omp atomic directive ensures that a specific memory location is updated atomically. The atomic directive has the following syntax:

#pragma omp atomic new-lineexpression-stmt

The expression-stmt must have one of the following forms:

x binop = expr
x++
++x
x--
--x

Where:

x is an lvalue expression with scalar type.

expr is an expression with scalar type, and it does not reference the object designated by x.

binop is not an overloaded operator and one of +, *, --, /, &, ^, |, <<, or >>.

Although a conforming implementation can replace all #pragma omp atomic directives with critical directives that have the same unique name, the #pragma omp atomic directive permits better optimization. Often hardware instructions are available that can perform the atomic update with the least overhead.

Only the load and store of the object designated by x are atomic. To avoid race conditions, all updates of the location in parallel should be protected with the atomic directive, unless they are known to be free of race conditions.

The following restrictions apply to the #pragma omp atomic directive:

All atomic references to the storage location x throughout the program are required to have a compatible type.

Examples:

extern float a[], *p = a, b;
/* Protect against races among multiple updates.*/
#pragma omp atomic
a[index[i] += b;
/* Protect against races with updates through a.*/
#pragma omp atomic
p[i] -= 1.0f;
extern union {int n; float x;} u;
/* ERROR - References through incompatible types.*/
#pragma omp atomic
u.n++;
#pragma omp atomic
u.x -= 1.0f;

`flush` Directive

The #pragma omp flush directive, explicit or implied, identifies precise synchronization points at which the implementation is required to provide a consistent view of certain objects in memory. This means that previous evaluations of expressions that reference those objects are complete and subsequent evaluations have not yet begun.

The flush directive has the following syntax:

#pragma omp flush [(list)] new-line

list

Objects that require synchronization that can be designated by variables. If a pointer is present in the list, the pointer itself is flushed, not the object to which the pointer refers.

If no list is specified, all shared objects except inaccessible objects with automatic storage duration are synchronized. A flush directive without a list is implied for the following directives:

barrier
At entry to and exit from critical
At entry to and exit from ordered
At exit from parallel
At exit from for
At exit from sections
At exit from single

The directive is not implied if a nowait clause is present.

A reference that accesses the value of an object with a volatile-qualified type behaves as if there were a flush directive specifying that object at the previous sequence point. A reference that modifies the value of an object with a volatile-qualified type behaves as if there were a flush directive specifying that object at the subsequent sequence point.

The following restriction applies to the flush directive:

A variable specified in a flush directive must not have a reference type.

The following example uses the flush directive for point-to-point synchronization of specific objects between pairs of threads:

#pragma omp parallel private(iam,neighbor) shared(work,sync)
{

  iam = omp_get_thread_num();
  sync[iam] = 0;
  #pragma omp barrier

  /*Do computation into my portion of work array */
  work[iam] = ...;

  /*  Announce that I am done with my work
   *  The first flush ensures that my work is made visible before sync.
   *  The second flush ensures that sync is made visible.
   */
  #pragma omp flush(work)
  sync[iam] = 1;
  #pragma omp flush(sync)

  /*Wait for neighbor*/
  neighbor = (iam>0 ? iam : omp_get_num_threads()) - 1;
  while (sync[neighbor]==0) {
    #pragma omp flush(sync)
  }

  /*Read neighbor's values of work array */
  ... = work[neighbor];
}

The following example distinguishes the shared objects affected by a flush directive with no list from the shared objects that are not affected:

int x, *p = &x;

void f1(int *q)
{
  *q = 1;
  #pragma omp flush
  // x, p, and *q are flushed
  //   because they are shared and accessible
}

void f2(int *q)
{
  *q = 2;
  #pragma omp barrier
  // a barrier implies a flush
  // x, p, and *q are flushed
  //   because they are shared and accessible
}

int g(int n)
{
  int i = 1, j, sum = 0;
  *p = 1;
  #pragma omp parallel reduction(+: sum)
  {
    f1(&j);
    // i and n were not flushed
    //   because they were not accessible in f1
    // j was flushed because it was accessible
    sum += j;
    f2(&j);
    // i and n were not flushed
    //   because they were not accessible in f2
    // j was flushed because it was accessible
    sum += i + j + *p + n;
  }
  return sum;
}

`ordered` Construct

A #pragma omp ordered directive must be within the dynamic extent of a for or parallel for construct that has an ordered clause. The structured-block following an ordered directive is executed in the same order as iterations in a sequential loop. It has the following syntax:

#pragma omp ordered new-linestructured-block

The following restrictions apply to an ordered directive:

It must not be in the dynamic extent of a for directive that does not have the ordered clause specified.
An iteration of a loop with a for construct must not execute the same ordered directive more than once, and it must not execute more than one ordered directive.

Ordered sections are useful for sequentially ordering the output from work that is done in parallel. The following program prints out the indexes in sequential order:

#pragma omp for ordered schedule(dynamic)
  for (i=lb; i<ub; i+=st)
    work(i);

void work(int k)
{
  #pragma omp ordered
    printf(" %d", k);
}

Data Environment

The #pragma omp threadprivate directive and data scope attribute clauses control the data environment during the execution of parallel regions.

`threadprivate` Directive

The #pragma omp threadprivate directive makes named common blocks private to a thread but global within the thread. In other words, each thread executing a threadprivate directive receives its own private copy of the named common blocks, which are then available to it in any routine within the scope of an application.

The threadprivate directive has the following syntax:

#pragma omp threadprivate(list) new-line

A thread must not reference another thread's copy of a threadprivate object. During serial regions and master regions of the program, references will be to the master thread's copy of the object.

On entry to the first parallel region, data in the threadprivate common blocks should be assumed to be undefined unless a copyin clause is specified on the parallel directive. When a common block that is initialized using data statements appears in a threadprivate directive, each thread's copy is initialized once prior to its first use. For subsequent parallel regions, the data in the threadprivate common blocks are guaranteed to persist only if the dynamic threads mechanism has been disabled and if the number of threads are the same for all the parallel regions. For more information on dynamic threads, see the omp_set_dynamic(3) library function and the OMP_DYNAMIC environment variable on the pe_environ(5) man page.

The following restrictions apply to the threadprivate directive:

The threadprivate directive must appear at file scope or namespace scope, must appear outside of any definition or declaration, and must lexically precede all references to any of the variables in its list.
Each variable in the list of a threadprivate directive must have a file-scope or namespace-scope declaration that lexically precedes the directive.
If a variable is specified in a threadprivate directive in one translation unit, it must be specified in a threadprivate directive in every translation unit in which it is declared.
A threadprivate variable may appear only in the copyin, schedule, or the if clause. It is not permitted in the private, firstprivate, lastprivate, shared, or reduction clauses. They are not affected by the default clause.
The address of a threadprivate variable is not an address constant.
A threadprivate variable must not have an incomplete type or a reference type.
A threadprivate variable with non-POD class type must have an accessible, unambiguous copy constructor if it is declared with an explicit initializer (in case the initialization is implemented using a temporary shared object).

The following example shows how modifying a variable that appears in an initializer can cause unspecified behavior, and also how to avoid this problem by using an auxiliary object and a copy-constructor:

int x = 1;
T a(x);
const T baux(x); /*Capture value of x = 1 */
T b(b_aux);
#pragma omp threadprivate(a, b)

void f(int n) {
  x++;
  #pragma omp parallel for
  /* In each thread:
   * Object a is constructed from x (with value 1 or 2?)
   * Object b is copy-constructed from b_aux
   */
   for (int i=0; i<n; i++) {
      g(a, b); /* Value of a is unspecified. */
   }
}

Data Scope Attribute Clauses

Several directives accept clauses that allow a user to control the scope attributes of variables for the duration of the construct. Not all of the clauses in this section are allowed on all directives, but the clauses that are valid on a particular directive are included with the description of the directive. Usually, if no data scope clauses are specified for a directive, the default scope for variables affected by the directive is share.

The following sections describe the data scope attribute clauses:

“private Clause”, describes the private clause.
“firstprivate Clause”, describes the firstprivate clause.
“lastprivate Clause”, describes the lastprivate clause.
“shared Clause”, describes the shared clause.
“default Clause”, describes the default clause.
“reduction Clause”, describes the reduction clause.
“copyin Clause”, describes the copyin clause.

`private` Clause

The private clause declares the variables in list to be private to each thread in a team.

This clause has the following syntax:

private(list)

The behavior of a variable declared in a private clause is as follows:

A new object of the same type is declared once for each thread in the team. The new object is no longer storage associated with the storage location of the original object.

All references to the original object in the lexical extent of the directive construct are replaced with references to the private object.

Variables defined as private are undefined for each thread on entering the construct and the corresponding shared variable is undefined on exit from a parallel construct.

Contents, allocation state, and association status of variables defined as private are undefined when they are referenced outside the lexical extent (but inside the dynamic extent) of the construct, unless they are passed as actual arguments to called functions.

The following restrictions apply to the private clause:

A variable with a class type that is specified in a private clause must have an accessible, unambiguous default constructor.
Unless it has a class type with a mutable member, a variable specified in a private clause must not have a const-qualified type.
A variable specified in a private clause must not have an incomplete type or a reference type.
Variables that are private within a parallel region cannot be specified in a private clause on an enclosed work-sharing or parallel directive. As a result, variables that are specified private on a work-sharing or parallel directive must be shared in the enclosing parallel region.

Example: The values of i and j in the following example are undefined on exit from the parallel region:

int i, j;
i = 1;
j = 2;
#pragma omp parallel private(i) firstprivate(j)
{
  i = 3;
  j = j + 2;
}
printf("%d %d\n", i, j);

`firstprivate` Clause

The firstprivate clause provides a superset of the functionality provided by the private clause.

This clause has the following syntax:

firstprivate(list)

In addition to the private clause semantics, each new private object is initialized as if there were an implied declaration inside the structured block, and the initializer is the value of the variable's original object. A copy constructor is invoked for a class object, if necessary.

The following restrictions apply to the firstprivate clause:

All restrictions for private apply, except for the restrictions about default constructors and about const-qualified types.
A variable with a class type that is specified as firstprivate must have an accessible, unambiguous copy constructor.

`lastprivate` Clause

The lastprivate clause provides a superset of the functionality provided by the private clause.

This clause has the following syntax:

lastprivate(list)

When a lastprivate clause appears on the directive that identifies a work-sharing construct, the value of each variable from the sequentially last iteration of the associated loop, or the lexically last section directive, is assigned to the variable's original object. Variables that are not assigned a value by the last iteration of the for or parallel for, or by the lexically last section of the sections or parallel sections directive, have indeterminate values after the construct. Unassigned subobjects also have an indeterminate value after the construct.

The following restrictions apply to the lastprivate clause:

All restrictions for private apply.
A variable that is specified as lastprivate must have an accessible, unambiguous copy assignment operator.

Example: Correct execution sometimes depends on the value that the last iteration of a loop assigns to a variable. Such programs must list all such variables as arguments to a lastprivate clause so that the values of the variables are the same as when the loop is executed sequentially. In the following example, the value of i at the end of the parallel region will equal n-1, as in the sequential case.

#pragma omp parallel
{
   #pragma omp for lastprivate(i)
     for (i=0; i<n; i++)
       a[i] = b[i] + b[i+1];
}
a[i]=b[i];

`shared` Clause

This clause shares variables that appear in the list among all the threads in a team. All threads within a team access the same storage area for shared variables.

This clause has the following syntax:

shared(list)

`default` Clause

The default clause allows the user to specify a shared or none default scope attribute for all variables in the lexical extent of any parallel region. Variables in threadprivate common blocks are not affected by this clause.

This clause has the following syntax:

default(shared | none)

`shared`		Specifying `default(shared)` is equivalent to explicitly listing each currently visible variable in a `shared` clause. It is the default.
`none`		Specifying default(none) declares that there is no implicit default as to whether variables are shared. In this case, the `private`, `shared`, `firstprivate`, `lastprivate`, or `reduction` attribute of each variable used in the lexical extent of the parallel region must be specified.

Only one default clause can be specified on a parallel directive.

The following example shows how variables can be exceptioned from a defined default using the private, shared, firstprivate, lastprivate, or reduction clauses:

#pragma omp parallel for default(shared) firstprivate(i) private(x)/private(r) lastprivate(i)

The following example distinguishes the variables that are affected by the default(none) clause from those that are not:

int x, y, z[1000];
#pragma omp threadprivate(x)

void fun(int a) {
  const int c = 1;
  int i = 0;

  #pragma omp parallel default(none) private(a) shared(z)
  {
     int j = omp_get_num_thread();
                // O.K.  - j is declared within parallel region
     a = z[j];   // O.K.  - a is listed in private clause
                //       - z is listed in shared clause
     x = c;     // O.K.  - x is threadprivate
                 //       - c has const-qualified type
     z[i] = y;   // Error - cannot reference i or y here

     #pragma omp for firstprivate(y)
     for (i=0; i<10 ; i++) {
         z[i] = y;  // O.K. - i is the loop control variable
                    //      - y is listed in firstprivate clause
     }
     z[i] = y;   // Error - cannot reference i or y here
  }
}

`reduction` Clause

This clause performs a reduction on the variables specified, with the operator or the intrinsic specified. This clause has the following syntax:

reduction(op:list)

A reduction is typically used in a statement with one of the following forms:

x = x opexpr
x <binop> = expr
x = expr op x (except for subtration)
x++
++x
x--
--x

Where:

`x`		One of the reduction variables specified in the list.
list		A comma-separated list of reduction variables.
expr		An expression with scalar type that does not reference x.
op		One of +, *, --, &, ^, \|, &&, or \|\|.
binop		One of +, *, -- &, ^, or \|.

The following example shows how to use the reduction clause:

#pragma omp parallel for reduction(+: a, y) reduction(||: am) for (i=0; i<n; i++) {
   a += b[i];
   y = sun(y, c[i];
   am = am || b[i] == c[i];
}

Because the operator may be hidden inside a function call, ensure that the operator specified in the reduction clause matches the reduction operation.

The following table lists the operators and intrinsics that are valid and their canonical initialization values. The actual initialization value will be consistent with the data type of the reduction variable:

Operator	Initialization
+	0
*	1
-	0
&	~0
\|	0
^	0
&&	1
\|\|	0

Any number of reduction clauses can be specified on the directive, but a variable can appear in at most one reduction clause for that directive.

The following example shows how variables that appear in the reduction clause must be shared in the enclosing context:

#pragma omp parallel private(y)
{ /* ERROR - private variable y cannot be specified in a reduction clause */
   #pragma omp for reduction(+: y)
   for (i=0; i<n; i++)
     y += b[i];
}

/* ERROR - variable x cannot be specified in both a shared and a reduction clause */
#pragma omp parallel for shared(x) reduction(+: x)

The following restrictions apply to the reduction clause:

The type of the variables in the reduction clause must be valid for the reduction operator except that pointer types and reference types are never permitted.
A variable that is specified in the reduction clause must not be const-qualified.
A variable that is specified in the reduction clause must be shared in the enclosing context.

`copyin` Clause

The copyin clause lets you assign the same value to threadprivate variables for each thread in the team executing the parallel region. For each variable specified, the value of the variable in the master thread of the team is copied to the threadprivate copies at the beginning of the parallel region.

This clause has the following syntax:

copyin(list)

The following restrictions apply to the copyin clause:

A variable that is specified in the copyin clause must have an accessible, unambiguous copy assignment operator.
A variable that is specified in the copyin clause must be a threadprivate variable.

Directive Binding

Some directives are bound to other directives. A binding specifies the way in which one directive is related to another. For instance, a directive is bound to a second directive if it can appear in the dynamic extent of that second directive. The following rules apply with respect to the dynamic binding of directives:

The for, sections, single, master, and barrier directives bind to the dynamically enclosing parallel directive, if one exists. If no parallel region is currently being executed, the directives have no effect.
The ordered directive binds to the dynamically enclosing for directive.
The atomic directive enforces exclusive access with respect to atomic directives in all threads, not just the current team.
The critical directive enforces exclusive access with respect to critical directives in all threads, not just the current team.
A directive cannot bind to a directive outside the closest enclosing parallel directive.

The directive binding rules call for a barrier directive to bind to the closest enclosing parallel directive. In the following example, the calls in main, to sub1 and sub2, are both valid, and the barrier in sub3 binds to the parallel region in sub2 in both cases. The effect is different, however, because in the call to sub1, the barrier affects only a subteam. The number of threads in a subteam is implementation-dependent if nested parallelism is enabled (with the OMP_NESTED environment variable), and otherwise is one (in which case the barrier has no real effect).

int main()
{
  sub1(2);
  sub2(2);
}

void sub1(int n)
{
  int i;
  #pragma omp parallel private(i) shared(n)
  {
    #pragma omp for
    for (i=0; i<n; i++)
      sub2(i);
  }
}

void sub2(int k)
{
  #pragma omp parallel shared(k)
    sub3(k);
}

void sub3(int n)
{
  work1(n);
  #pragma omp barrier
  work2(n);
}

Directive Nesting

Dynamic nesting of directives must adhere to the following rules:

A parallel directive dynamically inside another parallel directive logically establishes a new team, which is composed of only the current thread, unless nested parallelism is enabled.
for, sections, and single directives that bind to the same parallel directive are not allowed to be nested inside each other.
critical directives with the same name are not allowed to be nested inside each other.
for, sections, and single directives are not permitted in the dynamic extent of critical, ordered, and master regions.
barrier directives are not permitted in the dynamic extent of for, ordered, sections, single, master, and critical regions.
master directives are not permitted in the dynamic extent of for, sections, and single directives.
ordered directives are not allowed in the dynamic extent of critical regions.
Any directive that is permitted when executed dynamically inside a parallel region is also permitted when executed outside a parallel region. When executed dynamically outside a user-specified parallel region, the directive is executed with respect to a team composed of only the master thread.

The following program is correct because the inner and outer for directives bind to different parallel regions:

#pragma omp parallel default(shared)
{
  #pragma omp for
    for (i=0; i<n; i++) {
      #pragma omp parallel shared(i, n)
      {
        #pragma omp for
          for (j=0; j<n; j++)
            work(i, j);
      }
   }
}

A following variation of the preceding example is also correct:

#pragma omp parallel default(shared)
{
  #pragma omp for
    for (i=0; i<n; i++)
      work1(i, n);
}

void work1(int i, int n)
{
  int j;
  #pragma omp parallel default(shared)
  {
    #pragma omp for
      for (j=0; j<n; j++)
        work2(i, j);
  }
  return;
}

Prev	Table of Contents	Next
Chapter 9. Multiprocessing #pragma Directives		Chapter 11. Precompiled Header #pragma Directives