Chapter 1. 64-bit ABI and Compiler Overview

This chapter gives a brief overview of the 64-bit application binary interface (ABI) and describes the MIPSpro 7.3 32-bit, 64-bit and high performance 32-bit (N32) compilers. It contains six sections:

64-bit ABI Overview

Three different ABIs are currently supported on IRIX platforms:

o32 

The old 32-bit ABI generated by the ucode compiler.

n32 

The 32-bit ABI generated by the MIPSpro 64-bit compiler. N32 is described in the MIPSpro N32 ABI Handbook.

n64 

The 64-bit ABI generated by the MIPSpro 64-bit compiler.

The 64-bit ABI was introduced in IRIX 6.0; it was designed to exploit the high performance capabilities and 64-bit virtual addressing provided by the MIPS R8000 processor. These capabilities include:

  • The ability to execute MIPS1 user code, compatible with the R3000.

  • The ability to execute MIPS2 instruction set extensions introduced in the R4000.

  • The ability to execute MIPS3 64-bit addressing and instructions introduced in the R4400.

  • The ability to execute new instructions which improved floating point and integer performance (MIPS4 instructions).

The MIPS3 and MIPS4 64-bit capabilities provide both 64-bit virtual addressing and instructions which manipulate 64-bit integer data. Processor registers are 64 bits in size. Also provided is the ability to use 32 64-bit floating point registers.

Table 1-1 compares the various ABIs.

Table 1-1. ABI Comparison Summary

 

o32

n32

n64

Compiler Used

ucode

MIPSpro

MIPSpro

Integer Model

ILP32

ILP32

LP64

Calling Convention

mips

new

new

Number of FP Registers

16 (FR=0)

32 (FR=1)

32 (FR=1)

Number of Argument Registers

4

8

8

Debug Format

mdbug

dwarf

dwarf

ISAs Supported

mips1/2

mips3/4

mips3/4

32/64 Mode

32 (UX=0)

64 (UX=1) *

64 (UX=1)

* UX=1 implies 64-bit registers and also indicates that MIPS3 and MIPS4 instructions are legal. N32 uses 64-bit registers but restricts addresses to 32 bits.

Compatibility and Supported ABIs

All versions of IRIX 6.x support development for o32, n32 and n64 programs. All IRIX 6.x systems also support execution of o32 and n32 programs. However, in order to execute 64-bit programs you must be running on IRIX 6.4 or a 64-bit version of IRIX 6.2 or IRIX 6.5. IRIX 6.3 and the 32-bit version of IRIX 6.2 or IRIX 6.5 do not support execution of 64-bit programs. You can tell if you are running on a system capable of executing 64-bit programs by running the uname command. If it returns IRIX64, you are on a 64-bit version of IRIX. If it returns IRIX, you are on a 32-bit version.

On 64-bit versions of IRIX you can execute programs conforming to any of the following Application Binary Interfaces (ABIs):

  • An o32 program built under IRIX 5.x or IRIX 6.x (32-bit MIPS1 or MIPS2 ABI). COFF is no longer supported as of IRIX 6.2.

  • A 64-bit program (64-bit MIPS3 or MIPS4 ABI).

  • An N32 program (N32 MIPS3 or MIPS4 ABI).

Figure 1-1 illustrates the ABIs supported by IRIX 6.x.

Figure 1-1. ABIs supported by IRIX 6.x

Figure 1-1 ABIs supported by IRIX 6.x

More specifically, the execution and development environments under IRIX 6.x provide the following functionality:

  • 32-bit IRIX 5.x binaries and Dynamic Shared Objects (DSOs) execute under IRIX 6.x

  • IRIX 6.x has a set of compilers (32-bit Native Development Environment) that generate 32-bit code. You can mix objects and link with objects produced on IRIX 5.x. (There is no guarantee, however, that this code runs on IRIX 5.x systems.)

  • IRIX 6.x also has a set of compilers (64-bit Native Development Environment) that generates either 64-bit or N32 code. This code cannot run on IRIX 5.x systems.

  • You can specify which compiler you want to run by using the -64, -n32 or -32 (-o32) flags on the compiler command line.

    The compiler driver then executes the appropriate compiler binaries and links with the correct libraries. This also applies to the assembler, linker, and archiver. If these switches are not present, the driver checks for an /etc/compiler.defaults file and an environment variable, SGI_ABI, for these values. See the compiler man pages for more details.

  • All of the compiler-related tools (dbx, nm, dis) can work with either 32-bit, N32 or 64-bit binaries. Prof functionality is rolled into the SpeedShop product line.

  • You cannot mix objects and DSOs produced by the 32-bit compilers with objects and DSOs produced by the 64-bit compilers. In Figure 1-1, this is illustrated by the lines separating the 32-bit, N32 and 64-bit libraries. Therefore, the following rules apply:

    • You cannot link 32-bit objects with 64-bit or N32 objects and shared libraries

    • You cannot link 64-bit objects with 32-bit or N32 objects and shared libraries

    • You cannot link N32 objects with 64-bit or 32-bit objects and shared libraries

  • The /usr/lib directory on IRIX 6.x systems contains the 32-bit libraries and .sos. The 64-bit .sos are located in /usr/lib64. The N32 .sos are located in /usr/lib32. The complete layout looks like this:

    32-bit: This is the IRIX 5.x /usr/lib, including compiler components:

    /usr/lib/
              *.so 
              mips2/
              *.so
    /usr/lib/
              cfe
              fcom
              ugen
              uopt
              as

    64-bit: These are the 64-bit-specific libraries:

    /usr/lib64/
              *.so
              mips3/
              *.so
              mips4/
              *.so

    N32: These are the N32-specific libraries and components:

    /usr/lib32/
              *.so
              mips3/
              *.so
              mips4/
              *.so
    /usr/lib32/cmplrs
              be
              fec
              mfef77
              as

Known Compatibility Issues

The following issues are known to cause errors for 32-bit programs running on IRIX 6.x:

  • Any access to kernel data structures, for example, through /dev/kmem. Many of these structures have changed in size. Programs making these kinds of accesses must be ported to 64-bit. 32-bit programs cannot access all of kernel memory and should probably also be ported to 64-bit.

  • Use of nlist() does not work on any 64-bit .o or a.out. A new nlist64() is supplied for 64-bit ELF.

  • Any assumption that the page size is 4Kbytes (for example, using mmap() and specifying the address). The page size is no longer 4Kbytes. Programs must use getpagesize().

  • Ada programs which catch floating point exceptions do not work.

  • Any program using /proc must have some interfaces changed.

It is possible for a program to determine if it is running on a 64-bit capable kernel in order to work around the issues listed above. Use sysconf(_SC_KERN_POINTERS), which returns 32 or 64.

Compiler System Components

As explained earlier, the MIPSpro compiler system on IRIX 6.x consists of two independent compiler systems. One system supports the 64-bit and high performance 32-bit (N32) ABIs. The other supports the old 32-bit ABI. This section describes and compares them.

Fortran

The MIPSpro Fortran 77 compiler and the MIPSpro 7 Fortran 90 compiler support 32-bit, 64-bit and N32 compiler modes. The following section describe the components of these compiling systems. For more details about using the compilers, see the manuals provided with each compiling system.

Fortran 64-Bit and N32 System

The 64-bit Fortran compilers have the following components:

Fortran driver 

Fortran driver or command line: Executes the components below.

front end (fe) 

Parses the source file into an intermediate representation. It also performs scalar optimization and automatic parallelization.

back end (be) 

Generates code and assembles it into an object file. It also performs a variety of optimizations. It also automatically performs scalar optimizations and inter procedural optimizations. When used with the MIPSpro Auto-Parallelizing Option product it automatically converts programs to parallel code.

dsm_prelink 

Prelinker for routines that use distributed shared memory. If a reshaped array is passed as a parameter to another subroutine, dsm_prelink automatically propagates the distribute_reshape directive to the called subroutine.

linker 

Links the object file(s) with any libraries.

When you run 64-bit compilations for single processor applications, the following components are executed by the compiler driver:

%f77 -64 foo.f
%f77 -64 -O foo.f
       fe --> be --> linker

When you run 64-bit compilations for multiprocessor applications an additional step invoking dsm_prelink is done just before the final linking step:

%f77 -64 -mp foo.f
%f77 -64 -pfa foo.f
       fe --> be --> dsm_prelink --> linker

With the MIPSpro 64-bit compiler, optimizations are performed in the back end. Note that -O3 is available with -c. Unlike the older ucode compilers, -O3 does not result in interprocedural optimizations being performed. Use the -IPA:... control group to perform interprocedural optimizations with the 64-bit compiler. See the ipa(5) man page for more details.

The -sopt switch is NOT supported on the 64-bit compiler. Use the -LNO: ... control group flags to perform the desired scalar optimizations. See the lno(5) man page for details.

The -mp switch is supported on the 64-bit compiler and causes the front end to recognize inserted parallelization directives.

Fortran 32-Bit System

The 32-bit (ucode) Fortran compiler systems contain the following components:

Fortran driver 

Fortran driver or command line: Executes the components below.

cpp 

C preprocessor: Handles #include statements and other cpp constructs such as #define, #ifdef, and so on, in the source file.

fopt 

Special scalar optimizer: Performs scalar optimization on the Fortran source.

fcom 

Fortran front end: Parses the source file into intermediate code (ucode).

uopt 

Optimizer: Performs optimizations on the intermediate file.

ugen 

Code generator: Generates binary assembly code from the intermediate file.

as1 

Binary assembler: Assembles the binasm code into an object file.

linker 

Linker: Links the object file(s) with any libraries.

When you run simple examples through the ucode Fortran compilers, the following components are executed by the compiler driver:

%f77 -32 foo.f
    cpp --> fcom --> ugen --> as1 --> linker

The command

%f77 -32 -O foo.f
    cpp --> fcom --> uopt --> ugen --> as1 --> linker

also invokes the ucode optimizer, uopt. The command

%f77 -32 -sopt foo.f
    cpp --> fopt --> fcom --> ugen --> as1 --> linker

invokes the scalar optimizer but does not invoke the ucode optimizer.

The -mp option signals fcom to recognize inserted parallelization directives:

%f77 -32 -mp foo.f
    cpp --> fcom --> ugen --> as1 --> linker

C

For C, the respective compiler systems are similar to their Fortran counterparts. The front ends, of course, are different in each system.

C 64-Bit and N32 System

The MIPSpro (64-bit) C compiler systems contain the following components:

cc 

C driver: Executes the appropriate components below.

fec 

C front end: Preprocesses the C file, and then parses the source file into an intermediate representation.

be 

Back end: Generates code and assembles it into an object file. It also performs a variety of optimizations which are described in Chapter Four of this book, Compilation Issues. It also automatically performs scalar optimizations and inter procedural optimizations. Available with the MIPSpro AutoParallelizing Option product, is the ability to automatically convert programs to parallel code.

dsm_prelink 

Prelinker for routines that use distributed shared memory. If a reshaped array is passed as a parameter to another subroutine, dsm_prelink automatically propagates the distribute_reshape directive to the called subroutine.

linker 

Links the object file(s) with any libraries.

When you run simple examples through the 64-bit C compilers, the following components are executed by the compiler driver:

%cc -64 foo.c
%cc -64 -O foo.c
        fec --> be --> linker

When you run 64-bit compilations for multiprocessor applications an additional step invoking dsm_prelink is done just before the final linking step:

%cc -64 -mp foo.c
%cc -64 -pca foo.c
        fec --> be --> dsm_prelink --> linker

C 32-Bit System

The 32-bit (ucode) C compiler systems contain the following components:

cc 

C driver: Executes the appropriate components below.

acpp 

ANSI C preprocessor: Handles #include statements and other cpp constructs such as #define, #ifdef, and so on, in the source file.

cfe 

C front end: Preprocesses the C file, and then parses the source file into intermediate code (ucode).

mpc 

Interprets parallel directives.

copt 

C scalar optimizer: Performs scalar optimization.

uopt 

Optimizer: Performs optimizations on the intermediate file.

ugen 

Code Generator: Generates binary assembly code from the intermediate file.

as1 

Binary assembler: Assembles the binasm code into an object file.

linker 

Linker: Links the object file(s) with any libraries.

When you run simple examples through the ucode C compiler, the following components are executed by the compiler driver:

%cc -32 foo.c
    cfe --> ugen --> as1 --> linker


Note: cfe has a built-in C preprocessor.

The command

%cc -32 -O foo.c
    cfe --> uopt --> ugen --> as1 --> linker

also invokes the ucode optimizer, uopt.

The command

%cc -32 -sopt foo.c
    acpp --> copt --> cfe --> ugen --> as1 --> linker

invokes the scalar optimizer but does not invoke the ucode optimizer.

The C preprocessor has to be run before copt can do its source-to-source translation:

%cc -32 -mp foo.c
    acpp --> mpc --> cfe --> ugen --> as1 --> linker

-mp signals mpc to recognize inserted parallelization directives.

Interprocedural Analysis (IPA)

As of version 7.0, the MIPSpro 64-bit or N32 compilers can perform interprocedural analysis and optimization when invoked with the -IPA command line option. Current IPA optimizations include: inlining, interprocedural constant propagation, dead function, dead call and dead variable elimination and others. For more information about IPA and its optimization options, see the ipa(5) man page.

An important difference between the 64-bit compiler's use of -IPA and -c and the 32-bit compilers use of -O3 and -j is that the intermediate files generated by the 64-bit compiler have the .o suffix. This can greatly simplify Makefiles. For example:

% cc -n32 -O -IPA -c main.c
% cc -n32 -O -IPA -c foo.c
% ls 
foo.c   foo.o   main.c  main.o
% cc -n32 -IPA main.o foo.o

An analogous 32-bit compilation would look like:

% cc -32 -O3 -j main.c
% cc -32 -O3 -j foo.c
% ls 
foo.c   foo.u   main.c  main.u
% cc -32 -O3 main.u foo.u


Note: Use of the non-standard -j option and non-standard .u (ucode) files leads to more complicated Makefiles.


Loop Nest Optimizer (LNO)

The loop nest optimizer performs high-level optimizations that can greatly improve program performance by exploiting instruction level parallelism and caches. LNO is run by default at the -O3 optimization level. LNO is integrated into the compiler back end (be) and is not a source-to-source preprocessor. As a result, LNO optimizes C++, C and Fortran programs, although C and C++ often include features that make them inherently more difficult to optimize. For more information about LNO and its optimization options, refer to the lno(5) man page.

In order to view the transformations that LNO performs, you can use the -CLIST:=ON or -FLIST:=ON options to generate C or Fortran listing files respectively. The listing files are generated with the .w2.f (or .w2.c) suffix. For example:

%cat bar.f
subroutine bar(a,b,c,d,j)
real*4 a(1024),b(1024),c(1024)
real*4 d,e,f
sum = 0
do m= 1,j
do i=1,1024
b(i) = b(i) * d
enddo
enddo
call foo(a,b,c)
end

%f77 -64 -O3 -FLIST:=ON foo.f
%cat foo.w2.f
C ***********************************************************
C Fortran file translated from WHIRL Fri May 17 12:07:56 1997
C ***********************************************************


        SUBROUTINE bar(a, b, c, d, j)
        IMPLICIT NONE
        REAL*4 a(1024_8)
        REAL*4 b(1024_8)
        REAL*4 c(1024_8)
        REAL*4 d
        INTEGER*4 j
C
C**** Variables and functions ****
C
        INTEGER*4 m
        INTEGER*4 i
        EXTERNAL foo
C
C**** Temporary variables ****
C
        INTEGER*4 wd_m
        INTEGER*4 i0
C
C**** statements ****
C

        DO m = 1, j + -1, 2
          DO i = 1, 1024, 1
            b(i) = (b(i) * d)
            b(i) = (b(i) * d)
          END DO
        END DO
        DO wd_m = m, j, 1
          DO i0 = 1, 1024, 1
            b(i0) = (b(i0) * d)
          END DO
        END DO
        CALL foo(a, b, c)
        RETURN
        END ! bar

MIPSpro Auto-Parallelizing Option

The MIPSpro Auto-Parallelizing Option analyzes data dependence to guide automatic parallelization. For the 7.2 compiler release this functionality is implemented in the 64-bit and N32 compiler back end (be). It replaces KAP (Kuck and Associates Preprocessor) which was implemented as a separate preprocessor. An advantage to being built into the backend is that automatic parallelization is now available for C++ as well the previously supported C, Fortran 77 and Fortran 90. Another advantage to this design, is that a separate (and orthogonal) set of optimization options is no longer necessary.

Compiling with Automatic Parallelization

To compile with automatic parallelization you must obtain the MIPSpro Auto-Parallelizing Option and install its license. The syntax for compiling programs with automatic parallelization is as follows:

For Fortran 77, C, and Fortran 90 compilations use -apo on your compilation command line. For example:

%f77 -apo foo.f

If you link separately, you must also add -mp to the link line. See the apo(5) man page for details.

Automatic Parallelization Listings

The auto-parallelizer provides a listing mechanism when you use the -apo list option. This causes the compiler to generate a .l file. The .l file lists the original loops in the program along with messages telling if the loops were parallelized. For loops that were not parallelized, an explanation is given. For example:

%cat test.f
       program test
       real*8 a, x(100000),y(100000)
       do i = 1,2000
         y(i) =  y(i-1) + x(i)
       enddo
       do i = 1,2000
         call daxpy(3.7,x,y,100000)
        enddo
        stop
        end

        subroutine daxpy( a, x, y, nn)
        real*8 a, x(*), y(*)
        do i = 1, nn,1
          y(i) =  y(i) + a*x(i)
        end do
        return
        end
%f77 -64 -mp list test.f
%cat test.l
Parallelization Log for Subprogram MAIN__
    3: Not Parallel
        Array dependence from y on line 4 to y on line 4.
    6: Not Parallel
        Call daxpy on line 7.
Parallelization Log for Subprogram daxpy_
    14: PARALLEL (Auto) __mpdo_daxpy_1

The -mplist option will, in addition to compiling your program, generate a .w2f.f file (for Fortran, .w2c.c file for C) that represents the program after the automatic parallelization phase. These programs should be readable and in most cases should be valid code suitable for recompilation. The -mplist option can be used to see what portions of your code were parallelized. Continuing our example from above:

%f77 -64 -apo -mplist test.f
%cat test.w2f.f
C ***********************************************************
C Fortran file translated from WHIRL Sat Jul 26 12:05:52 1997
C ***********************************************************


        PROGRAM MAIN
        IMPLICIT NONE
C
C          **** Variables and functions ****
C
        REAL*8 x(100000_8)
        REAL*8 y(100000_8)
        INTEGER*4 i
        
C
C       **** statements ****
C
        DO i = 1, 2000, 1
          y(i) = (x(i) + y(i + -1))
        END DO
        DO i = 1, 2000, 1
          CALL daxpy(3.7000000477, x, y, 100000)
        END DO
        STOP
        END ! MAIN


        SUBROUTINE daxpy(a, x, y, nn)
        IMPLICIT NONE
        REAL*8 a
        REAL*8 x(*)
        REAL*8 y(*)
        INTEGER*4 nn
C
C       **** Variables and functions ****
C
        INTEGER*4 i
        INTEGER*4 __mp_sug_numthreads_func$
        EXTERNAL  __mp_sug_numthreads_func$
C
C       **** statements ****
C
C       PARALLEL DO will be converted to SUBROUTINE __mpdo_daxpy_1
C$OMP PARALLEL DO if(((DBLE(__mp_sug_numthreads_func$()) *((DBLE(
C$&(__mp_sug_numthreads_func$())*1.23D+02)+2.6D+03)).LT. (DBLE((
C$&(__mp_sug_numthreads_func$()+ -1))*(DBLE(nn)*7.0D00)))), private
C$&(i), shared(y, x, a, nn)

        DO i = 1, nn, 1
          y(i) = (y(i) +(x(i) * a))
        END DO
        RETURN
        END ! daxpy

The -pfa keep option generates a .l file, a .anl file that used by the Workshop ProMPF tool, and a .m file. The .m file is similar to the .w2f.f or .w2c.c file except that the file is annotated with some information used by Workshop ProMPF.

For Fortran 90 and C++, automatic parallelization happens after the source program has been converted into an internal representation. It is not possible to regenerate Fortran 90 or C++ after parallelization.

Multiprocessing Support

IRIX 6.x and the MIPSpro compilers support multiprocessing primitives for 32-bit, N32 and 64-bit applications. The 64-bit (and N32) multiprocessor programming environment is a superset of the 32-bit one. It also contains enhancements.

MP Compatibility

This section describes 64-bit and 32-bit Fortran MP compiler compatibility:

  • The 64-bit Fortran compiler supports all of the parallelization directives (such as C$DOACROSS, C$&, C$MP_SCHEDTYPE, C$CHUNK, C$COPYIN) supported by the 32-bit Fortran compiler. In addition, the Fortran front end supports PCF style parallel directives, which are documented in the MIPSpro Fortran 77 Programmer's Guide.

  • The 64-bit Fortran compiler supports the same set of multiprocessing utility subroutine calls (such as mp_block and mp_unblock) as the 32-bit compiler.

    The 64-bit Fortran compiler supports the same set of environment variables (such as MP_SET_NUMTHREADS and MP_BLOCKTIME) as the 32-bit compiler.

  • The -mp option is supported on both the 32-bit compilers and the 64-bit compilers.

    • -mp allows LNO to recognize manually-inserted parallelization directives in the 64-bit compiler.

    • -apo enables automatic parallelization by the MIPSpro Auto-Parallelizing Option (64-bit and N32).

MP Enhancements

The MIPSpro 64-bit Fortran MP I/O library has been enhanced to allow I/O from parallel regions. In other words, multiple threads can read and write to different files as well as read and write to the same file. The latter case, of course, encounters normal overhead due to file locking.

The MIPSpro 64-bit compilers also have been enhanced to allow parallel C and parallel Fortran programs to share a common runtime. This allows you to link parallel C routines with parallel Fortran routines and have a single master. Figure 1-2 illustrates this.

Figure 1-2. Running Parallel C and Parallel Fortran Programs Together

Figure 1-2 Running Parallel C and Parallel Fortran Programs Together

The MIPSpro 64-bit compilers are also enhanced to provide a variety of primitive synchronization operations. The operations are guaranteed to be atomic (typically achieved by implementing the operation using a sequence of load-linked/store- conditional instructions in a loop).

Associated with each operation are certain memory barrier properties that restrict the movement of memory references to visible data across the intrinsic operation (by either the compiler or the processor). For more information, see the MIPSpro Fortran77 Programmer's Guide and the sync(3i) and the sync(3c) man pages.

New Directives for Tuning on Origin2000

The Origin2000 provides cache-coherent, shared memory in the hardware. Memory is physically distributed across processors. Consequently, references to locations in the remote memory of another processor take substantially longer (by a factor of two or more) to complete than references to locations in local memory. This can severely affect the performance of programs that suffer from a large number of cache misses.

The new programming support consists of extensions to the existing multiprocessing Fortran and C directives (pragmas) as well as support for C++. Also provided are intrinsic functions that can be used to manage and query the distribution of shared memory. For more information, see the MIPSpro Fortran77 Programmer's Guide, and MIPSPro C and C++ Pragmas.

OpenMP Support

Starting with the MIPSpro 7.2.1 release, the Fortran77 and Fortran90 64-bit and N32 compilers support the OpenMP application programming interface (API) when used in conjunction with the -mp flag.The -mp flag enables the processing of the original SGI/PCF directives as well as the OpenMP directives. To selectively disable one or the other set of directives, add the following -MP option group flag to the -mp flag:

-MP:old_mp=off

disable processing of the original SGI/PCF directives, but retain the processing of OpenMP directives.

-MP:open_mp=off

disable processing of the OpenMP directives, but retain processing of the original SGI/PCF directives.

To run OpenMP programs you must install the appropriate version of libmp.so. Please refer to your IRIX Development Foundation Release Notes for more information about this. For more information about the OpenMP directives, see the MIPSPro Fortran 77 Programmer's Guide or the MIPSpro Fortran 90 Commands and Directives Reference Manual.

MP Application Testing

In general, to test 64-bit MP applications, follow these guidelines:

  • First, get the application to run with no parallelization at the highest optimization level.

  • When testing the parallel version, first run it with only one thread (either on a single CPU machine or by setting the environment variable MP_SET_NUMTHREADS to 1).

  • Go down to the -g optimization level for the first MP test, and run that version with one thread, then with multiple threads. Then go up the optimization scale, testing both single and multi-threaded versions.

You can, of course, skip as many steps as you like. In case of failure, however, this method of incremental iterations can help you narrow down and identify the problem.