Review of Last Time

MIPS is a 32-bit processor: addresses are 32-bits (MIPS calls 32-bits a “word”), all registers are 32-bits in size, and all instructions are 32-bits as well.

The design of the MIPS pipeline, with its single MEM-ory access stage (after the EX-ecute stage) means that general instructions cannot access memory, and a value cannot be read from memory and then operated on (added, subtracted, etc.) in a single instruction. Thus, on MIPS, there are only two basic kinds of instructions which access memory:

Memory operands (which identify the addresses to read from/write to) are similarly more limited, having the form

    offset(register)

where offset is a constant, and register is one of the 32 registers. For example, if $s0 contains the address of an array of 32-bit integers, we can access array[5] by using

    20($s0)

(5 * 4 = 20, because each array element is 32-bits or 4 bytes). (For a load or store instruction, the arithmetic of adding the offset to the register is performed by the EX-ecute stage, just before the MEM-ory access stage.)

To make up for the inability of general instructions to access memory, MIPS has 32 general-purpose registers (although some of these are reserved for specific purposes: $sp$ is the stack pointer, $zero is always 0, $ra is the return address, etc.). The general flow of a MIPS program is to load as much as possible from memory into the registers, then operate on the values, then write the results back once processing is complete.

Hello, world! in MIPS looks like this:

###
### hello.asm
### Hello, world! in MIPS assembly
###

.data

msg:    .asciiz "Hello, world!"

.text

li $v0, 4               # Syscall 4 = print
la $a0, msg             # Argument in $a0 = address of NUL-terminated string
syscall

li $v0, 10              # Syscall 10 = end program
syscall

Syscalls are normally provided by the operating system, and the MIPS simulator we are using (MARS) is not running a “real” operating system, so it provides a simple set of syscalls to make interacting with our programs possible.

Some instructions are “real” instructions provided by the MIPS CPU, but many MIPS instructions are pseudo-instructions implemented as macros in the MIPS assembler. The MIPS CPU is designed for this: one of the registers, $at, is reserved for use by the assembler as a “temporary” register. (This doesn’t mean we can’t use it, but you have to be aware of any pseudo- instructions that also might be using it.)

For example, the li (Load Immediate) instruction that we use above to store the value 4 into register $v0 is a pseudo-instruction. It is actually implemented as

addi $v0, $zero, 4          # $v0 = 0 + 4

where $zero is the special zero register, whose value is always 0 (you can write to $zero but that doesn’t change its value).

Similar la $a0, msg (Load Address) is also a pseudo instruction; here, however, we have a problem. Addresses are 32-bits, registers are 32-bits, but immediates (the constants we can use in instructions) are limited to 16-bits. So la expands into two instructions: the assembler splits the address into its low and high 16-bits (msg_low and msg_high) and then loads them into the destination register $a0 like this:

lui $at, msg_high            # Load high 16-bits (low 16-bits are 0s)
ori $a0, $at, msg_low        # Bitwise OR with low 16-bits

lui loads a 16-bit immediate value into the high 16-bits of a register; to load a 16-bit immediate into the low 16-bits, we can use the addi instruction:

addi $a0, $zero, imm

Defining constants

The MARS equivalent to equ in YASM is .eqv:

.eqv   SIZE      1024

Now, wherever SIZE is used, it will be replaced with 1024. Unlike YASM, the “replacement” can be more than just a value:

.eqv    RETURN    jr $ra

Writing RETURN will expand into the instruction jr $ra. Only single-instruction expansions are supported; see .macro below for multi-line macros.

Branching, looping, and function calls

MIPS does not have the separate flags register that x86 has; similarly, it does not have a separate comparison (cmp, test) instruction. The reason for this is that the compare-branch two-instruction combo of x86 is problematic for the MIPS pipeline:

Branching instructions

Unlike x86-64, there is no separate comparison instruction; the branch instruction itself does the comparison. The basic forms of the branch instruction are

bCC r₁, r₂,  label    # Jump to label if (r₁ CC r₂)
bCC r₁, imm, label    # Jump to label if (r₁ CC imm)

where CC is a condition code describing a comparison between the two operands (either two registers, or a register and an immediate). Note that the register-immediate form of the branch instruction is a pseudo-instruction; it is a macro implemented by the assembler as:

add $at, $zero, imm          # Store imm in $at
bCC r₁,  $at,   label        # Jump to label if (r₁ CC $at)

The possible CCs (condition codes) are

CC Meaning
eq Equal
eqz Equal to 0
ge Greater-or-equal
geu Greater-or-equal (unsigned)
gt Greater-than
gtu Greater-than (unsigned)
le Less-than-or-equal
leu Less-than-or-equal (unsigned)
lt Less-than
ltu Less-than (unsigned)
ne Not equal
nez Not equal to 0
lez Less-than-or-equal to 0
gez Greater-than-or-equal to 0
ltz Less than 0
gtz Greater than 0

Note that many of the above condition codes are actually implemented as pseudo-instructions. In particular, any CC mentioning “zero” is implemented as a pseudo-instruction using the $zero register. E.g., bgtz reg, label (branch if greater-than 0) is really just

bgt reg, $zero, label

Some of the conditional branching instructions come in -al (and-link) forms which set $ra to the address of the following instruction (i.e., the return address). These are used for conditional function calls. For example,

bgezal reg, label

will jump to label if reg >= 0, but if it jumps, will also store the address of the next instruction immediately after the bgezal into the $ra (return address) register. The branch target code then can “return” by just doing

jr $ra           # Jump to return address

The unconditional branch instructions are

j  label            # Jump to label
jr r₁               # Jump to address in r₁

jal  label          # Store return address in $ra, jump to label
jalr r₁             # Store return address in $ra, jump to address in r₁  

Instead of branching on a condition, you can also set a register to 0/1 based on a condition:

sCC r₁, r₂, r₃       # Set r₁ to 1 if (r₂ CC r₃), otherwise set to 0

Example: Rewriting a string to uppercase

.data

msg:    .asciiz     "Hello, world!"

.text

    la $t0, msg
begin:
    ldb $t1, ($t0)             # Load byte into $t1
    beq $t1, 0, endloop        # At end of string?

    blt $t1, 'a', next         # Less than 'a'? skip
    bgt $t1, 'z', next         # Greater than 'z'? skip

    # $t1 is a lower-case character
    sub $t1, $t1, 32           # Convert to uppercase
    stb $t1, $(t0)             # Write modifier char to string
next:
    add $t0, $t0, 1            # Advance to next char
    j begin                    # and loop

endloop:
    # Syscall to print the string

    li $v0, 4                  # Syscall 4 = print
    la $a0, msg                # Address of string
    syscall            

    li $v0, 10                 # Syscall 10 = end program
    syscall

Comparing this to the equivalent x86 code, the main difference is that we cannot compare a value in memory. Thus, we load the character from the string into register $t1 and work with that, rather than using a memory operand in the comparison instruction(s). Again, on MIPS, only the ld instruction can read from memory; all other instructions read from registers or immediates (constants).

If-else structures

The combined comparison-and-branch instruction of MIPS makes some structures simpler than their x86 counterparts. For example, consider an if-else statement:

if( $t0 == $t1 ) {
    
}
else {
    
}

The MIPS equivalent would be

bne $t0, $t1, else
    ⋮
    j endif
else:
    ⋮
endif:

MIPS Function Calls

Push/pop

(We’re using the O32 MIPS calling convention, which was used on Unix/Linux.)

MIPS does not have dedicated push/pop instructions. Instead, a push is accomplished by

addi $sp, $sp, -4       # $sp -= 4
sw   r₁, ($sp)          # Store r₁'s value into top of stack

That is, subtract 4 from the stack pointer, then store the value of register r₁ into the address pointed to by $sp (which is now the new top of the stack). A pop is similarly implemented as two instructions:

lw  r₁, ($sp)           # Store top of stack value into r₁
addi $sp, $sp, 4        # Shift top of stack "down"

Of course, if you want to pop something off the stack and discard it (as opposed to storing it into a register), you can just move the stack pointer:

addi $sp, $sp, 8        # Pop 2 32-bit values off the stack, discarding them

Similarly, to reserve space for local variables in a function prologue, you can just subtract the required amount of space from $sp:

addi $sp, $sp, 16       # Reserve stack space for 4 32-bit values (locals)

As on x86, the MIPS stack grows down. The MIPS stack always pushes/pops 32-bits at a time, so the top of the stack is always aligned to a multiple of 4, however, the function calling convention requires that the top of the stack, as well the addresses of various sections within the function stack frame, be aligned to multiples of 8, so that 64-bit accesses will be valid. Hence, stack padding is required in a few places to keep the 8-byte alignment.

Basic function call example

Suppose we have two functions, f and g, and f calls g. This pattern would look like this:

f:
    ⋮
    jal g             # Save return addr. in $ra, jump to g
    ⋮


g:
    ⋮
    jr $ra            # "Return" by jumping to address in $ra

The jal g instruction performs two operations:

While g is executing, the value in $ra is the address of the instruction in f to return to, so to “return” from g we simply jump to the address in register $ra. This scheme avoids pushing the return address onto the stack as x86 does, leading to more efficient function calls and returns, as no memory access is required. Or is it?

What happens if we have a chain of three functions, f, g, h where f calls g, and then g calls h:

f:
    ⋮
    jal g

g: 
    ⋮
    jal h
    ⋮
    jr $ra

h:
    ⋮
    jr $ra

Will this work? No! The sequence of events is

  1. f stores its return address into $ra and then jumps to g

  2. g stores its return address into $ra overwriting f’s and then jumps to h.

  3. h “returns” by jumping to g‘s return address.

  4. When g tries to return, the value in $ra is not the return address in f, but rather the return address in g. So g effectively jumps into itself, getting stuck in an infinite loop.

In practice, only leaf functions (functions that do not call any other functions) can only use $ra; if a function calls other functions, then it must save the incoming $ra by pushing it onto the stack, and then restore it before returning.

Function call registers

As mentioned above, the registers that apply to function calls are

A function’s stack frame is divided into several sections, from the “bottom up” (i.e., from top down in memory order):

Each of these sections must be padded at the beginning so that their sizes are multiples of 8 bytes.

Thus, the prologue and epilogue of a function looks something like this:

f:
    addi $sp, $sp, -X         # Reserve space for local variables
                              # If necessary, pad size to a multiple of 8

    addi $sp, $sp, -8         # Save return address (in non-leaf functions)
    sw   $ra, ($sp)           

    addi $sp, $sp, -Y         # Reserve space for saved temp registers
    sw   $s0, ($sp)           # Save sN registers (as many as you use)
    sw   $s1, 4($sp) 
    ⋮

    addi $sp, $sp  -Z         # Reserve space for function call args (5-...)

    addi $sp, $sp, -16        # Reserve space for function cal args 0-4

    ...                       # Actual function body.

    addi $sp, $sp, (Z+16)     # Remove arguments

    lw   $s0, ($sp)           # Restore saved temp registers
    lw   $s1, 4($sp)
    ⋮
    addi $sp, $sp, Y          # Remove saved temp registers

    lw   $ra, $(sp)           # Restore return address

    addi $sp, $sp, X          # Remove local variables

    jr ($ra)                  # "Return" (jump to return address)

Not all of these sections are needed in every function. MIPS distinguishes between three classes of functions:

The argument section

One of the sections in the stack frame is reserved for arguments passed to called functions. The first four arguments are passed in registers $a0-$a4, while the remainder are passed on the stack. However, unlike on x86 where we push the arguments just before the function call (and then pop them off immediately after), on MIPS, we reserve space at the beginning of the function for any arguments we might need to pass on the stack.

This means that when you write a MIPS function, you must look through all the functions it calls, determine the number of arguments for each, and reserve space on the stack in the function prologue for the maximum number of arguments any called function takes. For example:

void f()
{
    
    g(1,2);
    
    h(4,5,6,7,8,9);
    
    k(5,6,7,8);
    
}

Here, of all the function calls, the maximum number of arguments is for h, which takes 6 arguments, so you would need to reserve space on the stack for all 6.

How much stack space would be reserved for arguments for this function:

void f()
{
    
    g(1,2)
    
    h(1,2,3)
    
}

This is a trick question: the minimum amount of space reserved for arguments is always 4, even though the first four arguments are passed not on the stack, but in $a0-$a4. Instead, this block of 4 32-bit values is used to save the incoming $a0-$a4, which are caller-preserved. That is, prior to calling g in the example above, we would place $a0 and $a1 into that region on the stack, and then restore them before returning.

A MIPS function will typically setup the stack entirely in the function prologue, and then not modify $sp until the function is ready to return. Again, this is different from x86 where we are often push-ing and pop-ing throughout the body of the function. On x86, we use rbp as a “fixed point of reference” to a function’s stack frame, because rsp is moving around. On MIPS, this is not really necessary, because once the stack is setup, $sp stays in place; all of the above sections can be accessed as $sp + offset (or in the assembly syntax, offset($sp). In this mode, $sp is a fixed point of reference to the end of the function’s stack frame.

It is possible to use $fp for the same purpose as rbp, as a fixed point-of-reference to the beginning of a function’s stack frame. To do this, at the very bottom (top, in memory terms) of a function’s stack frame, at the beginning of the prologue, save the (caller-preserved) existing $fp:

f:
    addi $sp, $sp, -4
    sw   $fp, ($sp)         # Save caller's fp
    move $fp, $sp 
    ⋮

    lw   $fp, ($sp)         # Restore caller's fp
    addi $sp, $sp, 4
    jr ($ra)

With this setup, $fp always points to the beginning of a function’s stack frame, while $sp points to the end of the its stack frame (i.e., the top of the stack). You can access values on the stack as relative from either end, either as +N($sp) (“up” from $sp) or as -N($fp) (down from $fp).

Typically, the only time you need a stack frame pointer is when the size of a function’s stack frame is dynamic. E.g., if you have something like this:

void f(int x)
{
    int arr[x];  // NOTE: This is not valid C++ (and barely valid C)
    
}

Then we don’t know how big the local variable section will be until runtime.

Macros

The MARS simulator’s assembler supports a macro system similar to, but more restrictive than, the YASM macro system. For example, here’s a macro definition which implements a sys_exit macro:

.macro sys_exit %status
    li $v0, 10
    li $a0, %status
    syscall
.end_macro

To use this, we’d write

sys_exit 0

Unlike YASM, macro parameters have names (%status) instead of just numbers. The MARS assembler automatically makes labels defined inside macros unique, so no “macro-local” labeling is needed. Macros can be overloaded on the number of parameters, but there’s no fancy default parameters, * parameters, etc. There’s no equivalent to the “context-local” labels of YASM, so it’s not possible to create “linked” macros like IF/ELSE, etc. (The MARS documentation gives an example of how something like a for loop can be done, by defining a new macro for each loop body you need.)