Review of Last Time

MIPS is a 32-bit processor: addresses are 32-bits (MIPS calls 32-bits a “word”), all registers are 32-bits in size, and all instructions are 32-bits as well.

The design of the MIPS pipeline, with its single MEM-ory access stage (after the EX-ecute stage) means that general instructions cannot access memory, and a value cannot be read from memory and then operated on (added, subtracted, etc.) in a single instruction. Thus, on MIPS, there are only two basic kinds of instructions which access memory:

Loads: (lw, lb) which read a value from memory into a register
Stores: (sw) which write a value from a register into memory.

Memory operands (which identify the addresses to read from/write to) are similarly more limited, having the form

    offset(register)

where offset is a constant, and register is one of the 32 registers. For example, if $s0 contains the address of an array of 32-bit integers, we can access array[5] by using

    20($s0)

(5 * 4 = 20, because each array element is 32-bits or 4 bytes). (For a load or store instruction, the arithmetic of adding the offset to the register is performed by the EX-ecute stage, just before the MEM-ory access stage.)

To make up for the inability of general instructions to access memory, MIPS has 32 general-purpose registers (although some of these are reserved for specific purposes: $sp$ is the stack pointer, $zero is always 0, $ra is the return address, etc.). The general flow of a MIPS program is to load as much as possible from memory into the registers, then operate on the values, then write the results back once processing is complete.

Hello, world! in MIPS looks like this:

###
### hello.asm
### Hello, world! in MIPS assembly
###

.data

msg:    .asciiz "Hello, world!"

.text

li $v0, 4               # Syscall 4 = print
la $a0, msg             # Argument in $a0 = address of NUL-terminated string
syscall

li $v0, 10              # Syscall 10 = end program
syscall

Syscalls are normally provided by the operating system, and the MIPS simulator we are using (MARS) is not running a “real” operating system, so it provides a simple set of syscalls to make interacting with our programs possible.

Some instructions are “real” instructions provided by the MIPS CPU, but many MIPS instructions are pseudo-instructions implemented as macros in the MIPS assembler. The MIPS CPU is designed for this: one of the registers, $at, is reserved for use by the assembler as a “temporary” register. (This doesn’t mean we can’t use it, but you have to be aware of any pseudo- instructions that also might be using it.)

For example, the li (Load Immediate) instruction that we use above to store the value 4 into register $v0 is a pseudo-instruction. It is actually implemented as

addi $v0, $zero, 4          # $v0 = 0 + 4

where $zero is the special zero register, whose value is always 0 (you can write to $zero but that doesn’t change its value).

Similar la $a0, msg (Load Address) is also a pseudo instruction; here, however, we have a problem. Addresses are 32-bits, registers are 32-bits, but immediates (the constants we can use in instructions) are limited to 16-bits. So la expands into two instructions: the assembler splits the address into its low and high 16-bits (msg_low and msg_high) and then loads them into the destination register $a0 like this:

lui $at, msg_high            # Load high 16-bits (low 16-bits are 0s)
ori $a0, $at, msg_low        # Bitwise OR with low 16-bits

lui loads a 16-bit immediate value into the high 16-bits of a register; to load a 16-bit immediate into the low 16-bits, we can use the addi instruction:

addi $a0, $zero, imm

Defining constants

The MARS equivalent to equ in YASM is .eqv:

.eqv   SIZE      1024

Now, wherever SIZE is used, it will be replaced with 1024. Unlike YASM, the “replacement” can be more than just a value:

.eqv    RETURN    jr $ra

Writing RETURN will expand into the instruction jr $ra. Only single-instruction expansions are supported; see .macro below for multi-line macros.

Branching, looping, and function calls

MIPS does not have the separate flags register that x86 has; similarly, it does not have a separate comparison (cmp, test) instruction. The reason for this is that the compare-branch two-instruction combo of x86 is problematic for the MIPS pipeline:

The pipeline is already loading instructions after the branch; the longer we take to determine whether the branch is taken or not, the more instructions are already in the pipeline and will need to be flushed. Flushed instructions = wasted CPU time.
We want to resolve branches (determine whether the branch will be taken) as early as possible, so that minimal CPU cycles are wasted working on instructions that end up not being executed. Because of this, MIPS resolves branches in the ID (instruction decode) stage.
Conditional branching instructions on MIPS combine both the comparison and its operands, and the label to jump to if the comparison succeeds. This is in contrast to x86 where the comparison is one instruction, and the branch target label is provided in the following instruction (again, delaying resolution of the branch).

Branching instructions

Unlike x86-64, there is no separate comparison instruction; the branch instruction itself does the comparison. The basic forms of the branch instruction are

bCC r₁, r₂,  label    # Jump to label if (r₁ CC r₂)
bCC r₁, imm, label    # Jump to label if (r₁ CC imm)

where CC is a condition code describing a comparison between the two operands (either two registers, or a register and an immediate). Note that the register-immediate form of the branch instruction is a pseudo-instruction; it is a macro implemented by the assembler as:

add $at, $zero, imm          # Store imm in $at
bCC r₁,  $at,   label        # Jump to label if (r₁ CC $at)

The possible CCs (condition codes) are

CC	Meaning
`eq`	Equal
`eqz`	Equal to 0
`ge`	Greater-or-equal
`geu`	Greater-or-equal (unsigned)
`gt`	Greater-than
`gtu`	Greater-than (unsigned)
`le`	Less-than-or-equal
`leu`	Less-than-or-equal (unsigned)
`lt`	Less-than
`ltu`	Less-than (unsigned)
`ne`	Not equal
`nez`	Not equal to 0
`lez`	Less-than-or-equal to 0
`gez`	Greater-than-or-equal to 0
`ltz`	Less than 0
`gtz`	Greater than 0

Note that many of the above condition codes are actually implemented as pseudo-instructions. In particular, any CC mentioning “zero” is implemented as a pseudo-instruction using the $zero register. E.g., bgtz reg, label (branch if greater-than 0) is really just

bgt reg, $zero, label

Some of the conditional branching instructions come in -al (and-link) forms which set $ra to the address of the following instruction (i.e., the return address). These are used for conditional function calls. For example,

bgezal reg, label

will jump to label if reg >= 0, but if it jumps, will also store the address of the next instruction immediately after the bgezal into the $ra (return address) register. The branch target code then can “return” by just doing

jr $ra           # Jump to return address

The unconditional branch instructions are

j  label            # Jump to label
jr r₁               # Jump to address in r₁

jal  label          # Store return address in $ra, jump to label
jalr r₁             # Store return address in $ra, jump to address in r₁

Instead of branching on a condition, you can also set a register to 0/1 based on a condition:

sCC r₁, r₂, r₃       # Set r₁ to 1 if (r₂ CC r₃), otherwise set to 0

Example: Rewriting a string to uppercase

.data

msg:    .asciiz     "Hello, world!"

.text

    la $t0, msg
begin:
    ldb $t1, ($t0)             # Load byte into $t1
    beq $t1, 0, endloop        # At end of string?

    blt $t1, 'a', next         # Less than 'a'? skip
    bgt $t1, 'z', next         # Greater than 'z'? skip

    # $t1 is a lower-case character
    sub $t1, $t1, 32           # Convert to uppercase
    stb $t1, $(t0)             # Write modifier char to string
next:
    add $t0, $t0, 1            # Advance to next char
    j begin                    # and loop

endloop:
    # Syscall to print the string

    li $v0, 4                  # Syscall 4 = print
    la $a0, msg                # Address of string
    syscall            

    li $v0, 10                 # Syscall 10 = end program
    syscall

Comparing this to the equivalent x86 code, the main difference is that we cannot compare a value in memory. Thus, we load the character from the string into register $t1 and work with that, rather than using a memory operand in the comparison instruction(s). Again, on MIPS, only the ld instruction can read from memory; all other instructions read from registers or immediates (constants).

If-else structures

The combined comparison-and-branch instruction of MIPS makes some structures simpler than their x86 counterparts. For example, consider an if-else statement:

if( $t0 == $t1 ) {
    ⋮
}
else {
    ⋮
}

The MIPS equivalent would be

bne $t0, $t1, else
    ⋮
    j endif
else:
    ⋮
endif:

MIPS Function Calls

Push/pop

(We’re using the O32 MIPS calling convention, which was used on Unix/Linux.)

MIPS does not have dedicated push/pop instructions. Instead, a push is accomplished by

addi $sp, $sp, -4       # $sp -= 4
sw   r₁, ($sp)          # Store r₁'s value into top of stack

That is, subtract 4 from the stack pointer, then store the value of register r₁ into the address pointed to by $sp (which is now the new top of the stack). A pop is similarly implemented as two instructions:

lw  r₁, ($sp)           # Store top of stack value into r₁
addi $sp, $sp, 4        # Shift top of stack "down"

Of course, if you want to pop something off the stack and discard it (as opposed to storing it into a register), you can just move the stack pointer:

addi $sp, $sp, 8        # Pop 2 32-bit values off the stack, discarding them

Similarly, to reserve space for local variables in a function prologue, you can just subtract the required amount of space from $sp:

addi $sp, $sp, 16       # Reserve stack space for 4 32-bit values (locals)

As on x86, the MIPS stack grows down. The MIPS stack always pushes/pops 32-bits at a time, so the top of the stack is always aligned to a multiple of 4, however, the function calling convention requires that the top of the stack, as well the addresses of various sections within the function stack frame, be aligned to multiples of 8, so that 64-bit accesses will be valid. Hence, stack padding is required in a few places to keep the 8-byte alignment.

Basic function call example

Suppose we have two functions, f and g, and f calls g. This pattern would look like this:

f:
    ⋮
    jal g             # Save return addr. in $ra, jump to g
    ⋮


g:
    ⋮
    jr $ra            # "Return" by jumping to address in $ra

The jal g instruction performs two operations:

Save the address of the following instruction into $ra as the return address.
Jump to the given label

While g is executing, the value in $ra is the address of the instruction in f to return to, so to “return” from g we simply jump to the address in register $ra. This scheme avoids pushing the return address onto the stack as x86 does, leading to more efficient function calls and returns, as no memory access is required. Or is it?

What happens if we have a chain of three functions, f, g, h where f calls g, and then g calls h:

f:
    ⋮
    jal g

g: 
    ⋮
    jal h
    ⋮
    jr $ra

h:
    ⋮
    jr $ra

Will this work? No! The sequence of events is

f stores its return address into $ra and then jumps to g
g stores its return address into $ra overwriting f’s and then jumps to h.
h “returns” by jumping to g‘s return address.
When g tries to return, the value in $ra is not the return address in f, but rather the return address in g. So g effectively jumps into itself, getting stuck in an infinite loop.

In practice, only leaf functions (functions that do not call any other functions) can only use $ra; if a function calls other functions, then it must save the incoming $ra by pushing it onto the stack, and then restore it before returning.

Function call registers

As mentioned above, the registers that apply to function calls are

$v0,$v1: Used for return values. If a function returns a single 32-bit value, then only $v0 is used.
$a0,$a1,$a2,$a3: Used for function arguments. Only use as many as are needed. If more than four arguments are used, the remainder are pushed onto the stack.
$s0-$s7: Callee-preserved temporaries. Only these, and the $_p registers are callee-preserved; all others are caller-preserved.
$ra: The function-call instruction jal (and related) set this to the return address. “Returning” from a function means
```
 jr $ra
```

A function’s stack frame is divided into several sections, from the “bottom up” (i.e., from top down in memory order):

Local variables
Saved return address ($ra) (padded to 8 bytes)
Saved temporary registers ($s0-$s7)
Arguments to called functions (at least 4)

Each of these sections must be padded at the beginning so that their sizes are multiples of 8 bytes.

Thus, the prologue and epilogue of a function looks something like this:

f:
    addi $sp, $sp, -X         # Reserve space for local variables
                              # If necessary, pad size to a multiple of 8

    addi $sp, $sp, -8         # Save return address (in non-leaf functions)
    sw   $ra, ($sp)           

    addi $sp, $sp, -Y         # Reserve space for saved temp registers
    sw   $s0, ($sp)           # Save sN registers (as many as you use)
    sw   $s1, 4($sp) 
    ⋮

    addi $sp, $sp  -Z         # Reserve space for function call args (5-...)

    addi $sp, $sp, -16        # Reserve space for function cal args 0-4

    ...                       # Actual function body.

    addi $sp, $sp, (Z+16)     # Remove arguments

    lw   $s0, ($sp)           # Restore saved temp registers
    lw   $s1, 4($sp)
    ⋮
    addi $sp, $sp, Y          # Remove saved temp registers

    lw   $ra, $(sp)           # Restore return address

    addi $sp, $sp, X          # Remove local variables

    jr ($ra)                  # "Return" (jump to return address)

Not all of these sections are needed in every function. MIPS distinguishes between three classes of functions:

“Non-leaf functions”: These are functions that call other functions. Thus, they need (probably) all the sections: they are required to save the return address, and the saved temporary registers, and they must reserve space on the stack for the arguments of the functions they intend to call. The only section they might not have is the section for local variables.
“Leaf-with-data functions”: These functions do not call other functions, but they do have local variables which are stored on the stack. They will have to reserve space for local variables, and then remove it before they return, but otherwise don’t need any of the other sections.

For example, consider this function
```
 int f()
 {
    int arr[32];
    ⋮
    return 0;
 }
```
This would be implemented as
```
 f:
    addi $sp, $sp, -128            # 32 * 4 = 128
    ⋮
    addi $sp, $sp, +128            # Remove the array
    li $v0, 0
    jr ($ra)
```
“Simple leaf functions”: These functions do not call other functions, and have no local variables. They do not need to use the stack at all. As an example, consider a function which computes the average of two integers:
```
 int avg(int x, int y) { return (x+y) / 2; }
```
(We’ll use a shift to implement the division.) Because this function has no local variables, and calls no other functions, it’s a simple leaf function. It’s assembly code is just
```
 avg:
    # Arguments in $a0, $a1
    # Return in $v0

    add $v0, $a0, $a1     # $v0 = $a0 + $a1
    sra $v0, $v0, 1       # Shift right by 1 (divide by 2)
    jr $ra                # And return.
```

The argument section

One of the sections in the stack frame is reserved for arguments passed to called functions. The first four arguments are passed in registers $a0-$a4, while the remainder are passed on the stack. However, unlike on x86 where we push the arguments just before the function call (and then pop them off immediately after), on MIPS, we reserve space at the beginning of the function for any arguments we might need to pass on the stack.

This means that when you write a MIPS function, you must look through all the functions it calls, determine the number of arguments for each, and reserve space on the stack in the function prologue for the maximum number of arguments any called function takes. For example:

void f()
{
    ⋮
    g(1,2);
    ⋮
    h(4,5,6,7,8,9);
    ⋮
    k(5,6,7,8);
    ⋮
}

Here, of all the function calls, the maximum number of arguments is for h, which takes 6 arguments, so you would need to reserve space on the stack for all 6.

How much stack space would be reserved for arguments for this function:

void f()
{
    ⋮
    g(1,2)
    ⋮
    h(1,2,3)
    ⋮
}

This is a trick question: the minimum amount of space reserved for arguments is always 4, even though the first four arguments are passed not on the stack, but in $a0-$a4. Instead, this block of 4 32-bit values is used to save the incoming $a0-$a4, which are caller-preserved. That is, prior to calling g in the example above, we would place $a0 and $a1 into that region on the stack, and then restore them before returning.

A MIPS function will typically setup the stack entirely in the function prologue, and then not modify $sp until the function is ready to return. Again, this is different from x86 where we are often push-ing and pop-ing throughout the body of the function. On x86, we use rbp as a “fixed point of reference” to a function’s stack frame, because rsp is moving around. On MIPS, this is not really necessary, because once the stack is setup, $sp stays in place; all of the above sections can be accessed as $sp + offset (or in the assembly syntax, offset($sp). In this mode, $sp is a fixed point of reference to the end of the function’s stack frame.

It is possible to use $fp for the same purpose as rbp, as a fixed point-of-reference to the beginning of a function’s stack frame. To do this, at the very bottom (top, in memory terms) of a function’s stack frame, at the beginning of the prologue, save the (caller-preserved) existing $fp:

f:
    addi $sp, $sp, -4
    sw   $fp, ($sp)         # Save caller's fp
    move $fp, $sp 
    ⋮

    lw   $fp, ($sp)         # Restore caller's fp
    addi $sp, $sp, 4
    jr ($ra)

With this setup, $fp always points to the beginning of a function’s stack frame, while $sp points to the end of the its stack frame (i.e., the top of the stack). You can access values on the stack as relative from either end, either as +N($sp) (“up” from $sp) or as -N($fp) (down from $fp).

Typically, the only time you need a stack frame pointer is when the size of a function’s stack frame is dynamic. E.g., if you have something like this:

void f(int x)
{
    int arr[x];  // NOTE: This is not valid C++ (and barely valid C)
    ⋮
}

Then we don’t know how big the local variable section will be until runtime.

Macros

The MARS simulator’s assembler supports a macro system similar to, but more restrictive than, the YASM macro system. For example, here’s a macro definition which implements a sys_exit macro:

.macro sys_exit %status
    li $v0, 10
    li $a0, %status
    syscall
.end_macro

To use this, we’d write

sys_exit 0

Unlike YASM, macro parameters have names (%status) instead of just numbers. The MARS assembler automatically makes labels defined inside macros unique, so no “macro-local” labeling is needed. Macros can be overloaded on the number of parameters, but there’s no fancy default parameters, * parameters, etc. There’s no equivalent to the “context-local” labels of YASM, so it’s not possible to create “linked” macros like IF/ELSE, etc. (The MARS documentation gives an example of how something like a for loop can be done, by defining a new macro for each loop body you need.)