Review of Last Time
MIPS is a 32-bit processor: addresses are 32-bits (MIPS calls 32-bits a “word”), all registers are 32-bits in size, and all instructions are 32-bits as well.
The design of the MIPS pipeline, with its single MEM-ory access stage (after the EX-ecute stage) means that general instructions cannot access memory, and a value cannot be read from memory and then operated on (added, subtracted, etc.) in a single instruction. Thus, on MIPS, there are only two basic kinds of instructions which access memory:
Loads: (
lw
,lb
) which read a value from memory into a registerStores: (
sw
) which write a value from a register into memory.
Memory operands (which identify the addresses to read from/write to) are similarly more limited, having the form
offset(register)
where offset is a constant, and register is one of the 32 registers. For
example, if $s0
contains the address of an array of 32-bit integers, we
can access array[5]
by using
20($s0)
(5 * 4 = 20, because each array element is 32-bits or 4 bytes). (For a load
or store instruction, the arithmetic of adding the offset
to the register
is performed by the EX-ecute stage, just before the MEM-ory access stage.)
To make up for the inability of general instructions to access memory, MIPS
has 32 general-purpose registers (although some of these are reserved for
specific purposes: $sp$
is the stack pointer, $zero
is always 0, $ra
is
the return address, etc.). The general flow of a MIPS program is to load as
much as possible from memory into the registers, then operate on the values,
then write the results back once processing is complete.
Hello, world! in MIPS looks like this:
###
### hello.asm
### Hello, world! in MIPS assembly
###
.data
msg: .asciiz "Hello, world!"
.text
li $v0, 4 # Syscall 4 = print
la $a0, msg # Argument in $a0 = address of NUL-terminated string
syscall
li $v0, 10 # Syscall 10 = end program
syscall
Syscalls are normally provided by the operating system, and the MIPS simulator we are using (MARS) is not running a “real” operating system, so it provides a simple set of syscalls to make interacting with our programs possible.
Some instructions are “real” instructions provided by the MIPS CPU, but many
MIPS instructions are pseudo-instructions implemented as macros in the
MIPS assembler. The MIPS CPU is designed for this: one of the registers,
$at
, is reserved for use by the assembler as a “temporary” register.
(This doesn’t mean we can’t use it, but you have to be aware of any pseudo-
instructions that also might be using it.)
For example, the li
(Load Immediate) instruction that we use above to store
the value 4 into register $v0
is a pseudo-instruction. It is actually
implemented as
addi $v0, $zero, 4 # $v0 = 0 + 4
where $zero
is the special zero register, whose value is always 0 (you can
write to $zero
but that doesn’t change its value).
Similar la $a0, msg
(Load Address) is also a pseudo instruction; here, however, we
have a problem. Addresses are 32-bits, registers are 32-bits, but immediates
(the constants we can use in instructions) are limited to 16-bits. So la
expands into two instructions: the assembler splits the address into its
low and high 16-bits (msg_low
and msg_high
) and then loads them into the
destination register $a0
like this:
lui $at, msg_high # Load high 16-bits (low 16-bits are 0s)
ori $a0, $at, msg_low # Bitwise OR with low 16-bits
lui
loads a 16-bit immediate value into the high 16-bits of a register;
to load a 16-bit immediate into the low 16-bits, we can use the addi
instruction:
addi $a0, $zero, imm
Defining constants
The MARS equivalent to equ
in YASM is .eqv
:
.eqv SIZE 1024
Now, wherever SIZE
is used, it will be replaced with 1024. Unlike YASM, the
“replacement” can be more than just a value:
.eqv RETURN jr $ra
Writing RETURN
will expand into the instruction jr $ra
. Only single-instruction
expansions are supported; see .macro
below for multi-line macros.
Branching, looping, and function calls
MIPS does not have the separate flags register that x86 has; similarly, it does
not have a separate comparison (cmp
, test
) instruction. The reason for this
is that the compare-branch two-instruction combo of x86 is problematic for the
MIPS pipeline:
The pipeline is already loading instructions after the branch; the longer we take to determine whether the branch is taken or not, the more instructions are already in the pipeline and will need to be flushed. Flushed instructions = wasted CPU time.
We want to resolve branches (determine whether the branch will be taken) as early as possible, so that minimal CPU cycles are wasted working on instructions that end up not being executed. Because of this, MIPS resolves branches in the ID (instruction decode) stage.
Conditional branching instructions on MIPS combine both the comparison and its operands, and the label to jump to if the comparison succeeds. This is in contrast to x86 where the comparison is one instruction, and the branch target label is provided in the following instruction (again, delaying resolution of the branch).
Branching instructions
Unlike x86-64, there is no separate comparison instruction; the branch instruction itself does the comparison. The basic forms of the branch instruction are
bCC r₁, r₂, label # Jump to label if (r₁ CC r₂)
bCC r₁, imm, label # Jump to label if (r₁ CC imm)
where CC is a condition code describing a comparison between the two operands (either two registers, or a register and an immediate). Note that the register-immediate form of the branch instruction is a pseudo-instruction; it is a macro implemented by the assembler as:
add $at, $zero, imm # Store imm in $at
bCC r₁, $at, label # Jump to label if (r₁ CC $at)
The possible CCs (condition codes) are
CC | Meaning |
---|---|
eq |
Equal |
eqz |
Equal to 0 |
ge |
Greater-or-equal |
geu |
Greater-or-equal (unsigned) |
gt |
Greater-than |
gtu |
Greater-than (unsigned) |
le |
Less-than-or-equal |
leu |
Less-than-or-equal (unsigned) |
lt |
Less-than |
ltu |
Less-than (unsigned) |
ne |
Not equal |
nez |
Not equal to 0 |
lez |
Less-than-or-equal to 0 |
gez |
Greater-than-or-equal to 0 |
ltz |
Less than 0 |
gtz |
Greater than 0 |
Note that many
of the above condition codes are actually implemented as pseudo-instructions.
In particular, any CC mentioning “zero” is implemented as a pseudo-instruction
using the $zero
register. E.g., bgtz reg, label
(branch if greater-than 0)
is really just
bgt reg, $zero, label
Some of the conditional branching instructions come in -al
(and-link) forms
which set $ra
to the address of the following instruction (i.e., the
return address). These are used for conditional function calls. For example,
bgezal reg, label
will jump to label
if reg >= 0
, but if it jumps, will also store the address
of the next instruction immediately after the bgezal
into the $ra
(return address) register. The branch target code then can “return” by just
doing
jr $ra # Jump to return address
The unconditional branch instructions are
j label # Jump to label
jr r₁ # Jump to address in r₁
jal label # Store return address in $ra, jump to label
jalr r₁ # Store return address in $ra, jump to address in r₁
Instead of branching on a condition, you can also set a register to 0/1 based on a condition:
sCC r₁, r₂, r₃ # Set r₁ to 1 if (r₂ CC r₃), otherwise set to 0
Example: Rewriting a string to uppercase
.data
msg: .asciiz "Hello, world!"
.text
la $t0, msg
begin:
ldb $t1, ($t0) # Load byte into $t1
beq $t1, 0, endloop # At end of string?
blt $t1, 'a', next # Less than 'a'? skip
bgt $t1, 'z', next # Greater than 'z'? skip
# $t1 is a lower-case character
sub $t1, $t1, 32 # Convert to uppercase
stb $t1, $(t0) # Write modifier char to string
next:
add $t0, $t0, 1 # Advance to next char
j begin # and loop
endloop:
# Syscall to print the string
li $v0, 4 # Syscall 4 = print
la $a0, msg # Address of string
syscall
li $v0, 10 # Syscall 10 = end program
syscall
Comparing this to the equivalent x86 code, the main difference is that we
cannot compare a value in memory. Thus, we load the character from the string
into register $t1
and work with that, rather than using a memory operand
in the comparison instruction(s). Again, on MIPS, only the ld
instruction
can read from memory; all other instructions read from registers or immediates
(constants).
If-else structures
The combined comparison-and-branch instruction of MIPS makes some structures simpler than their x86 counterparts. For example, consider an if-else statement:
if( $t0 == $t1 ) {
⋮
}
else {
⋮
}
The MIPS equivalent would be
bne $t0, $t1, else
⋮
j endif
else:
⋮
endif:
MIPS Function Calls
Push/pop
(We’re using the O32 MIPS calling convention, which was used on Unix/Linux.)
MIPS does not have dedicated push
/pop
instructions. Instead, a push
is
accomplished by
addi $sp, $sp, -4 # $sp -= 4
sw r₁, ($sp) # Store r₁'s value into top of stack
That is, subtract 4 from the stack pointer, then store the value of register
r₁
into the address pointed to by $sp
(which is now the new top of the stack).
A pop
is similarly implemented as two instructions:
lw r₁, ($sp) # Store top of stack value into r₁
addi $sp, $sp, 4 # Shift top of stack "down"
Of course, if you want to pop something off the stack and discard it (as opposed to storing it into a register), you can just move the stack pointer:
addi $sp, $sp, 8 # Pop 2 32-bit values off the stack, discarding them
Similarly, to reserve space for local variables in a function prologue, you can
just subtract the required amount of space from $sp
:
addi $sp, $sp, 16 # Reserve stack space for 4 32-bit values (locals)
As on x86, the MIPS stack grows down. The MIPS stack always pushes/pops 32-bits at a time, so the top of the stack is always aligned to a multiple of 4, however, the function calling convention requires that the top of the stack, as well the addresses of various sections within the function stack frame, be aligned to multiples of 8, so that 64-bit accesses will be valid. Hence, stack padding is required in a few places to keep the 8-byte alignment.
Basic function call example
Suppose we have two functions, f
and g
, and f
calls g
. This pattern
would look like this:
f:
⋮
jal g # Save return addr. in $ra, jump to g
⋮
g:
⋮
jr $ra # "Return" by jumping to address in $ra
The jal g
instruction performs two operations:
Save the address of the following instruction into
$ra
as the return address.Jump to the given label
While g
is executing, the value in $ra
is the address of the instruction in
f
to return to, so to “return” from g
we simply jump to the address in
register $ra
. This scheme avoids pushing the return address onto the stack as
x86 does, leading to more efficient function calls and returns, as no memory
access is required. Or is it?
What happens if we have a chain of three functions, f, g, h
where f
calls
g
, and then g
calls h
:
f:
⋮
jal g
g:
⋮
jal h
⋮
jr $ra
h:
⋮
jr $ra
Will this work? No! The sequence of events is
f
stores its return address into$ra
and then jumps tog
g
stores its return address into$ra
overwriting f’s and then jumps toh
.h
“returns” by jumping tog
‘s return address.When
g
tries to return, the value in$ra
is not the return address inf
, but rather the return address ing
. Sog
effectively jumps into itself, getting stuck in an infinite loop.
In practice, only leaf functions (functions that do not call any other
functions) can only use $ra
; if a function calls other functions, then it must
save the incoming $ra
by pushing it onto the stack, and then restore it
before returning.
Function call registers
As mentioned above, the registers that apply to function calls are
$v0,$v1
: Used for return values. If a function returns a single 32-bit value, then only$v0
is used.$a0,$a1,$a2,$a3
: Used for function arguments. Only use as many as are needed. If more than four arguments are used, the remainder are pushed onto the stack.$s0-$s7
: Callee-preserved temporaries. Only these, and the$_p
registers are callee-preserved; all others are caller-preserved.$ra
: The function-call instructionjal
(and related) set this to the return address. “Returning” from a function meansjr $ra
A function’s stack frame is divided into several sections, from the “bottom up” (i.e., from top down in memory order):
Local variables
Saved return address (
$ra
) (padded to 8 bytes)Saved temporary registers (
$s0-$s7
)Arguments to called functions (at least 4)
Each of these sections must be padded at the beginning so that their sizes are multiples of 8 bytes.
Thus, the prologue and epilogue of a function looks something like this:
f:
addi $sp, $sp, -X # Reserve space for local variables
# If necessary, pad size to a multiple of 8
addi $sp, $sp, -8 # Save return address (in non-leaf functions)
sw $ra, ($sp)
addi $sp, $sp, -Y # Reserve space for saved temp registers
sw $s0, ($sp) # Save sN registers (as many as you use)
sw $s1, 4($sp)
⋮
addi $sp, $sp -Z # Reserve space for function call args (5-...)
addi $sp, $sp, -16 # Reserve space for function cal args 0-4
... # Actual function body.
addi $sp, $sp, (Z+16) # Remove arguments
lw $s0, ($sp) # Restore saved temp registers
lw $s1, 4($sp)
⋮
addi $sp, $sp, Y # Remove saved temp registers
lw $ra, $(sp) # Restore return address
addi $sp, $sp, X # Remove local variables
jr ($ra) # "Return" (jump to return address)
Not all of these sections are needed in every function. MIPS distinguishes between three classes of functions:
“Non-leaf functions”: These are functions that call other functions. Thus, they need (probably) all the sections: they are required to save the return address, and the saved temporary registers, and they must reserve space on the stack for the arguments of the functions they intend to call. The only section they might not have is the section for local variables.
“Leaf-with-data functions”: These functions do not call other functions, but they do have local variables which are stored on the stack. They will have to reserve space for local variables, and then remove it before they return, but otherwise don’t need any of the other sections.
For example, consider this function
int f() { int arr[32]; ⋮ return 0; }
This would be implemented as
f: addi $sp, $sp, -128 # 32 * 4 = 128 ⋮ addi $sp, $sp, +128 # Remove the array li $v0, 0 jr ($ra)
“Simple leaf functions”: These functions do not call other functions, and have no local variables. They do not need to use the stack at all. As an example, consider a function which computes the average of two integers:
int avg(int x, int y) { return (x+y) / 2; }
(We’ll use a shift to implement the division.) Because this function has no local variables, and calls no other functions, it’s a simple leaf function. It’s assembly code is just
avg: # Arguments in $a0, $a1 # Return in $v0 add $v0, $a0, $a1 # $v0 = $a0 + $a1 sra $v0, $v0, 1 # Shift right by 1 (divide by 2) jr $ra # And return.
The argument section
One of the sections in the stack frame is reserved for arguments passed to
called functions. The first four arguments are passed in registers $a0-$a4
,
while the remainder are passed on the stack. However, unlike on x86 where
we push the arguments just before the function call
(and then pop them off
immediately after), on MIPS, we reserve space at the beginning of the function
for any arguments we might need to pass on the stack.
This means that when you write a MIPS function, you must look through all the functions it calls, determine the number of arguments for each, and reserve space on the stack in the function prologue for the maximum number of arguments any called function takes. For example:
void f()
{
⋮
g(1,2);
⋮
h(4,5,6,7,8,9);
⋮
k(5,6,7,8);
⋮
}
Here, of all the function calls, the maximum number of arguments is for h
,
which takes 6 arguments, so you would need to reserve space on the stack for
all 6.
How much stack space would be reserved for arguments for this function:
void f()
{
⋮
g(1,2)
⋮
h(1,2,3)
⋮
}
This is a trick question: the minimum amount of space reserved for arguments
is always 4, even though the first four arguments are passed not on the stack,
but in $a0-$a4
. Instead, this block of 4 32-bit values is used to save the
incoming $a0-$a4
, which are caller-preserved. That is, prior to calling g
in the example above, we would place $a0
and $a1
into that region on the
stack, and then restore them before returning.
A MIPS function will typically setup the stack entirely in the function prologue,
and then not modify $sp
until the function is ready to return. Again, this
is different from x86 where we are often push
-ing and pop
-ing throughout
the body of the function. On x86, we use rbp
as a “fixed point of reference”
to a function’s stack frame, because rsp
is moving around. On MIPS, this is
not really necessary, because once the stack is setup, $sp
stays in place;
all of the above sections can be accessed as $sp + offset
(or in the assembly
syntax, offset($sp
). In this mode, $sp
is a fixed point of reference to the
end of the function’s stack frame.
It is possible to use $fp
for the same purpose as rbp
, as a fixed
point-of-reference to the beginning of a function’s stack frame. To do this, at
the very bottom (top, in memory terms) of a function’s stack frame, at the
beginning of the prologue, save the (caller-preserved) existing $fp
:
f:
addi $sp, $sp, -4
sw $fp, ($sp) # Save caller's fp
move $fp, $sp
⋮
lw $fp, ($sp) # Restore caller's fp
addi $sp, $sp, 4
jr ($ra)
With this setup, $fp
always points to the beginning of a function’s stack
frame, while $sp
points to the end of the its stack frame (i.e., the top of
the stack). You can access values on the stack as relative from either end,
either as +N($sp)
(“up” from $sp
) or as -N($fp)
(down from $fp
).
Typically, the only time you need a stack frame pointer is when the size of a function’s stack frame is dynamic. E.g., if you have something like this:
void f(int x)
{
int arr[x]; // NOTE: This is not valid C++ (and barely valid C)
⋮
}
Then we don’t know how big the local variable section will be until runtime.
Macros
The MARS simulator’s assembler supports a macro system similar to, but more
restrictive than, the YASM macro system. For example, here’s a macro definition
which implements a sys_exit
macro:
.macro sys_exit %status
li $v0, 10
li $a0, %status
syscall
.end_macro
To use this, we’d write
sys_exit 0
Unlike YASM, macro parameters have names (%status
) instead of just numbers.
The MARS assembler automatically makes labels defined inside macros unique, so
no “macro-local” labeling is needed. Macros can be overloaded on the number of
parameters, but there’s no fancy default parameters, *
parameters, etc. There’s
no equivalent to the “context-local” labels of YASM, so it’s not possible to
create “linked” macros like IF/ELSE
, etc. (The MARS documentation gives an
example of how something like a for
loop can be done, by defining a new macro
for each loop body you need.)