Chapter 7. Linkage Conventions

This chapter gives rules and examples to follow when designing an assembly language program. The chapter includes a tutorial section that contains information about how calling sequences work. This involves writing a skeleton version of your prospective assembly routine using a high-level language, and then compiling it with the -S option to generate a human-readable assembly language file. The assembly language file can then be used as the starting point for coding your routine.

This assembler works in either 32-bit, high performance 32-bit (N32) or 64-bit compilation modes. While these modes are very similar, due to the difference in data, register and address sizes, the N32 and 64-bit assembler linkage conventions are not always the same as those for 32-bit mode. For details on some of these differences, see the MIPSpro 64-Bit Porting and Transition Guide and the MIPSpro N32 ABI Handbook.

The procedures and examples in this chapter, for the most part, describe 32-bit compilation mode. In some cases, specific differences necessitated by 64-bit mode are highlighted.

Introduction

When you write assembly language routines, you should follow the same calling conventions that the compilers observe, for two reasons:

  • Often your code must interact with compiler-generated code, accepting and returning arguments or accessing shared global data.

  • The symbolic debugger gives better assistance in debugging programs using standard calling conventions.

The conventions for the compiler system are a bit more complicated than some, mostly to enhance the speed of each procedure call. Specifically:

  • The compilers use the full, general calling sequence only when necessary; where possible, they omit unneeded portions of it. For example, the compilers don't use a register as a frame pointer whenever possible.

  • The compilers and debugger observe certain implicit rules rather than communicating via instructions or data at execution time. For example, the debugger looks at information placed in the symbol table by a ".frame" directive at compilation time, so that it can tolerate the lack of a register containing a frame pointer at execution time.

Program Design

This section describes some general areas of concern to the assembly language programmer:

  • Stack frame requirements on entering and exiting a routine.

  • The "shape" of data (scalars, arrays, records, sets) laid out by the various high-level languages.

For information about register format, and general, special, and floating-point registeres, see Chapter 1.

The Stack Frame

This discussion of the stack frame, particularly regarding the graphics, describes 32-bit operations. In 32-bit mode, restrictions such as stack addressing are enforced strictly. While these restrictions are not enforced rigidly for 64-bit stack frame usage, their observance is probably still a good coding practice, especially if you count on reliable debugging information.

The compilers classify each routine into one of the following categories:

  • Non-leaf routines, that is, routines that call other procedures.

  • Leaf routines, that is, routines that do not themselves execute any procedure calls. Leaf routines are of two types:

    • Leaf routines that require stack storage for local variables

    • Leaf routines that do not require stack storage for local variables.

You must decide the routine category before determining the calling sequence.

To write a program with proper stack frame usage and debugging capabilities, use the following procedure:

  1. Regardless of the type of routine, you should include a .ent pseudo-op and an entry label for the procedure. The .ent pseudo-op is for use by the debugger, and the entry label is the procedure name. The syntax is:

    .ent    procedure_name
    procedure_name:

  2. If you are writing a leaf procedure that does not use the stack, skip to step 3. For leaf procedure that uses the stack or non-leaf procedures, you must allocate all the stack space that the routine requires. The syntax to adjust the stack size is:

    subu    $sp,framesize

    where framesize is the size of frame required; framesize must be a multiple of 16. Space must be allocated for:

    • Local variables.

    • Saved general registers. Space should be allocated only for those registers saved. For non-leaf procedures, you must save $31, which is used in the calls to other procedures from this routine. If you use registers $16-$23, you must also save them.

    • Saved floating-point registers. Space should be allocated only for those registers saved. If you use registers $f20-$f30 (for 32-bit) or $f24-$f31 (for 64-bit), you must also save them.

    • Procedure call argument area. You must allocate the maximum number of bytes for arguments of any procedure that you call from this routine.


      Note: Once you have modified $sp, you should not modify it again for the rest of the routine.


  3. Now include a .frame pseudo-op:

    .frame framereg,framesize,returnreg

    The virtual frame pointer is a frame pointer as used in other compiler systems but has no register allocated for it. It consists of the framereg ($sp, in most cases) added to the framesize (see step 2 above). The following figures show the stack components for -32 and -n32 and -64.

    The returnreg specifies the register containing the return address (usually $31). These usual values may change if you use a varying stack pointer or are specifying a kernel trap routine.

    Figure 7-1. Stack Organization for -32

    Stack Organization for -32

    Figure 7-2. Stack Organization for -n32 and -64

    Stack Organization for -n32 and -64

  4. If the procedure is a leaf procedure that does not use the stack, skip to step 7. Otherwise you must save the registers you allocated space for in step 2.

    To save the general registers, use the following operations:

    .mask    bitmask,frameoffset
    sw reg,framesize+frameoffset-N($sp)

    The .mask directive specifies the registers to be stored and where they are stored. A bit should be on in bitmask for each register saved (for example, if register $31 is saved, bit 31 should be `1' in bitmask. Bits are set in bitmask in little-endian order, even if the machine configuration is big-endian).The frameoffset is the offset from the virtual frame pointer (this number is usually negative). N should be 0 for the highest numbered register saved and then incremented by four for each subsequently lower numbered register saved. For example:

    sw    $31,framesize+frameoffset($sp)
    sw    $17,framesize+frameoffset-4($sp)
    sw    $16,framesize+frameoffset-16($sp)

    Figure 7-3, illustrates this example.

    Now save any floating-point registers that you allocated space for in step 2 as follows:

    .fmask    bitmask,frameoffsets.[sd]
    reg,framesize+frameoffset-N($sp)

    Notice that saving floating-point registers is identical to saving general registers except we use the .fmask pseudo-op instead of .mask, and the stores are of floating-point singles or doubles.The discussion regarding saving general registers applies here as well, but remember that N should be incremented by 16 for doubles.The stack framesize must be a multiple of 16.

    Figure 7-3. Stack Example

    Stack Example

  5. This step describes parameter passing: how to access arguments passed into your routine and passing arguments correctly to other procedures. For information on high-level language-specific constructs (call-by-name, call-by-value, string or structure passing), refer to the MIPSpro Compiling and Performance Tuning Guide.

    As specified in step 2, space must be allocated on the stack for all arguments even though they may be passed in registers. This provides a saving area if their registers are needed for other variables.

    General registers must be used for passing arguments. For 32-bit compilations, general registers $4-$7 and float registers $f12, $f14 are used for passing the first four arguments (if possible). You must allocate a pair of registers (even if it's a single precision argument) that start with an even register for floating-point arguments appearing in registers.

    For 64-bit compilations, general registers $4-$11 and float registers $f12, through $f19 are used for passing the first eight arguments (if possible).

    In Table 7-1 and Table 7-2, the "fN" arguments are considered single- and double-precision floating-point arguments, and "nN" arguments are everything else. The ellipses (...) mean that the rest of the arguments do not go in registers regardless of their type. The "stack" assignment means that you do not put this argument in a register. The register assignments occur in the order shown in order to satisfy optimizing compiler protocols:

    Table 7-1. Parameter Passing (-32)

    Argument List

    Register and Stack Assignments

    f1, f2

    $f12, $f14

    f1, n1, f2

    $f12, $6, stack

    f1, n1, n2

    $f12, $6 $7

    n1, n2, n3, n4

    $4, $5, $6, $7

    n1, n2, n3, f1

    $4, $5, $6, stack

    n1, n2, f1

    $4, $5, ($6, $6)

    n1, f1

    $4, ($6, $7)


    Table 7-2. Parameter Passing (-n32 and -64)

    Argument List

    Register and Stack Assignments

    d1,d2

    $f12, $f13

    s1,s2

    $f12, $f13

    s1,d1

    $f12, $f13

    d1,s1

    $f12, $f13

    n1,d1

    $4,$f13

    d1,n1,d1

    $f12, $5,$f14

    n1,n2,d1

    $4, $5,$f14

    d1,n1,n2

    $f12, $5,$6

    s1,n1,n2

    $f12, $5,$6

    d1,s1,s2

    $f12, $f13, $f14

    s1,s2,d1

    $f12, $f13, $f14

    n1,n2,n3,n4

    $4,$5,$6,$7

    n1,n2,n3,d1

    $4,$5,$6,$f15

    n1,n2,n3,s1

    $4,$5,$6, $f15

    s1,s2,s3,s4

    $f12, $f13,$f14,$f15

    s1,n1,s2,n2

    $f12, $5,$f14,$7

    n1,s1,n2,s2

    $4,$f13,$6,$f15

    n1,s1,n2,n3

    $4,$f13,$6,$7

    d1,d2,d3,d4,d5

    $f12, $f13, $f14, $f15, $f16

    d1,d2,d3,d4,d5,s1,s2,s3,s4

    $f12, $f13, $f14, $f15, $f16, $f17, $f18,$f19,stack

    d1,d2,d3,s1,s2,s3,n1,n2,n3

    $f12, $f13, $f14, $f15, $f16, $f17, $10,$11, stack


  6. Next, you must restore registers that were saved in step 4. To restore general purpose registers:

    lw reg,framesize+frameoffset-N($sp)

    To restore the floating-point registers:

    l.[sd] reg,framesize+frameoffset-N($sp)

    Refer to step 4 for a discussion of the value of N.)

  7. Get the return address:

    lw $31,framesize+frameoffset($sp)

  8. Clean up the stack:

    addu framesize

  9. Return:

    j $31

  10. To end the procedure:

    .end procedurename

The difference in stack frame usage for -n32 and -64 operations can be summarized as follows:

The portion of the argument structure beyond the initial eight doublewords is passed in memory on the stack, pointed to by the stack pointer at the time of call. The caller does not reserve space for the register arguments; the callee is responsible for reserving it if required (either adjacent to any caller-saved stack arguments if required, or elsewhere as appropriate). No requirement is placed on the callee either to allocate space and save the register parameters, or to save them in any particular place.

The Shape of Data

In most cases, high-level language routine and assembly routines communicate via simple variables: pointers, integers, booleans, and single- and double-precision real numbers. Describing the details of the various high-level data structures (arrays, records, sets, and so on) is beyond the scope of this book. If you need to access such a structure as an argument or as a shared global variable, refer to the MIPSpro Compiling and Performance Tuning Guide.

Examples

This section contains the examples that illustrate program design rules. Each example shows a procedure written and C and its equivalent written in assembly language.

Example 7-1. Non-leaf procedure

The following example shows a non-leaf procedure. Notice that it creates a stackframe, and also saves its return address since it must put a new return address into register $31 when it invokes its callee:

float
nonleaf(int i, int *j;
     {
     double atof();
     int temp;

     temp = i - *j;
     if (i < *j) temp = -temp;
     return atof(temp);
     }
           .globl       nonleaf
   #   1   float
   #   2   nonleaf(i, j)
   #   3   int i, *j;
   #   4   {
           .ent        nonleaf 2
   nonleaf;
           .cpload $25              ## Load $gp
           subu        $sp, 32      ## Create stackframe
           sw          $31, 20($sp) ## Save the return
                                    ## address
           sw          $sp, 24($sp) ## Save gp
           .mask       0x80000000, -4
           .frame    $sp, 32, $31
   #  5    double atof();
   #  6    int temp;
   #  7
   #  8    temp = i - *j;
           lw      $2, 0($5)        ## Arguments are in
                                    ## $4 and $5
           subu     $3, $4, $2
   #  9    if (i < *j) temp = -temp;
           bge      $4, $2, $32     ## Note: $32 is a label,
                                    ##  not a reg
           negu     $3, $3
$32:
   #  10   return atof(temp);
           move     $4, $3
           jal      atof
           cvt.s.   $f0, $f0       ## Return value goes in $f0
           lw       $gp, 24($sp)   ## Restore gp
           lw       $31, 20($sp)   ## Restore return address
           addu     $sp, 32        ## Delete stackframe
           j        $31            ## Return to caller
           .end     nonleaf   

The -n32 code for the previous example is shown below. Note that this code is under .set noreorder, so be aware of delay slots.

  .set          noreorder
         # Program Unit: nonleaf
  .ent          nonleaf
  .globl        nonleaf
nonleaf:        # 0x0
  .frame        $sp, 32, $31
  .mask         0x80000000, -32
  lw $7,0($5)              # load *j
  addiu $sp,$sp,-32        #.frame.len.nonleaf
  sd $gp,8($sp)            # save $gp
  sd $31,0($sp)            # save $ra
  lui $31,%hi(%neg(%gp_rel(nonleaf+0))) #load new $gp
  addiu $31,$31,%lo(%neg(%gp_rel(nonleaf +0))) #
  addu $gp,$25,$31         #
  slt $1,$4,$7             # compare i to *j
  beq $1,$0,.L.1.1.temp    #
  subu $7,$4,$7            # i-*j, in delay slot of branch
  subu $7,$0,$7            # temp = -temp
.L.1.1.temp:     # 0x2c
  lw $25,%call16(atof)($gp)#
  jalr $25                 #atof
  or $4,$7,$0              # delay slot of jalr loads arg
  ld $31,0($sp)            # restore $ra
  cvt.s.d $f0,$f0          #
  ld $tp,8($sp)            # restore $gp
  jr $31                   #
  addiu $sp,$sp,32         # .frame.len.nonleaf
  .end   nonleaf



Example 7-2. Leaf Procedure

This example shows a leaf procedure that does not require stack space for local variables. Notice that it creates no stackframe, and saves no return address.

int
leaf(p1, p2)
    int p1, p2;
    {
    return (p1 > p2) ? p1 : p2;
    }
                .globl        leaf
   #    1       int
   #    2       leaf(p1, p2)
   #    3         int p1, p2;
   #    4         {
                .ent          leaf2
leaf:
                .frame        $sp, 0, $31
   #    5         return (p1 > p2) ? p1 : p2;
                 ble          $4, $5, $32    ## Arguments in
                                             ##  $4 and $5
                 move         $3, $4
                 b            $33
$32:
                 move         $3, $5
$33:
                 move         $2, $3         ## Return value
                                             ##  goes in $2
                 j            $31            ## Return to
                                             ##  caller
   #    6          }
                 .end    leaf

The -n32 code for the previous example looks like this:

   .set    noreorder
   .ent    leaf
   .globl  leaf
leaf:   #0x0
   .fram$sp, 0, $31
   slt $2,$5,$4           # compare p1 and p2
   beq $2, $0,.L.1.2.temp #
   or $9,$4,$0            # delay slot
   b .L.1.1.temp          #
   or $2,$9,$0            # delay slot, return pl
.L.1.2.temp:  # 0x14
   or $2,$5,$0            # return p2
.L.1.1.temp:  # 0x18
   jr $31                 #
   nop                    # delay slot
   .end     leaf

Writing Assembly Language Code

The rules and parameter requirements that exist between assembly language and other languages are varied and complex. The simplest approach to coding an interface between an assembly routine and a routine written in a high-level language is to do the following:

  • Use the high-level language to write a skeletal version of the routine that you plan to code in assembly language.

  • Compile the program using the -S option, which creates an assembly language (.s) version of the compiled source file (the -O option, though not required, reduces the amount of code generated, making the listing easier to read).

  • Study the assembly-language listing and then, imitating the rules and conventions used by the compiler, write your assembly language code.