Review
Again, the System V C ABI specification has all the details on how function calls actually work.
Calling C standard library functions
To call a function from the C standard library, we must
Declare it in our
.sfile asextern. (E.g.,extern printf). Extern is the opposite ofglobal; whileglobalmakes a symbol visible to things outside our program,externsays that our program is using a symbol defined somewhere outside it (and, in fact,printfmust have been declaredglobalor the C equivalent in order for us to be able to access it this way!).pushany caller-preserved registers (rax, any of the argument registers,r10,r11) onto the stack if you are using them.Ensure that the stack is correctly aligned before calling any functions. See the next section. Note that stack re(alignment) must be done after pushing caller-preserved registers, but before pushing any stack-based arguments.
Place the arguments to the function in registers
rdi,rsi,rdx,rcx,r8, andr9, from first to last. Place floating-point arguments in registersxmm0throughxmm7. If there are more than six arguments (or more than 8 floating-point arguments), push the remainder onto the stack, in right-to-left order (i.e., the 7th argument should be pushed last).If you have any stack-based arguments, you will need to re-align the stack before pushing them. I.e., you have to compute an offset to subtract from
rspthat will result in the stack being correctly aligned after all thepushes have taken place.callthe function. E.g.,call printf.The return value, if any, will be placed in
rax. 128-bit return values are placed in bothrdx:rax. (Return values larger than 128-bits are handled specially: the caller passes an address to use for the result as an “invisible” first argument, inrdi. On return, the results are written into this address, which is “returned” inrax.)Floating-point return values will be in
xmm0(andxmm1if big enough).popany registers you pushed in (3). Un-align the stack if needed.
Stack alignment
Before call-ing any functions, the stack pointer rsp must be aligned to
a multiple of 16, plus 8. Upon entry to your main function it will be
aligned to a multiple a 16, so every function must normally begin with a
prologue which either
pushesrbp(the calling function’s stack frame pointer) onto the stack.Manually adjusts
rspby subtracting 8 from it.
If you call any functions which take stack-based arguments, or if you use any caller-preserved registers, you may need to re-align the stack before each function call, rather than just once at the beginning of the function.
Most functions don’t need a frame base pointer; it’s really only useful if you’re storing a dynamic amount of data on the stack, which most functions don’t do. Hence, the most common prologue is simply
sub rsp, 8
We subtract because the stack grows down in memory. Then, at the end of the
function, add rsp, 8 to revert the stack alignment before returning.
If a function uses some callee-preserved registers (rbp, rbx, r12-r15) then
those will have to be pushed onto the stack during the prologue. Depending on
the number of registers pushed, the extra alignment may or may not be needed.
(E.g., if you use rbx, you’ll have to push it, but doing so in addition
to rbp will give 16 extra bytes, so we’ll need to sub rsp, 8 in order to
get another 8 bytes of alignment.) It doesn’t matter whether you do the
alignment before or after pushing the callee-preserved registers, so long as
you undo in the opposite order before returning.
When we call a function, the CPU pushes the (qword) rip onto the stack,
thus making the stack aligned to a multiple of 16 again. Thus, every function
which intends to call other functions must perform this preamble.
Before returning, the rip pushed by call must be on top of the stack.
Hence, every function needs an epilogue, which is simply the reverse of the
prologue:
popany callee-preserved registers you pushed in the prologue.Either
pop rbporadd rsp, 8
Either way, the stack is adjusted back to the stack that it was immediately
after the call.
The state of the stack during a function g called by a function f can be visualized as
| Position | Contents | Frame |
|---|---|---|
8n + 16 + rbp… 16 + rbp |
stack argument n … stack argument 0 |
function f (previous frame) |
8 + rbp |
Return address | function g (current frame) |
rbp |
Previous rbp/unused 8 bytes |
|
rbp - 8… rsp |
Callee-saved registers … Top of stack |
|
rsp - 128 |
“Red zone” |
(The “red zone” is a Posix-specific optimization: the 128 bytes above rsp
are reserved for the function’s use, and can be used freely without any
special action. Of course, this space will be overwritten by any called
functions!)
As an example, suppose we have a function with declaration
long f(int a, // rdi
char* d, // rsi
int* e, // rdx
long f, // rcx
char g, // r8
unsigned long h, // r9
unsigned long i, // stack
int* j, // stack
long k // stack
);
How would we right a call to this function equivalent to
f(1,nullptr,nullptr,5,'6',7,8,nullptr,9);
In assembly, this would be
; Assuming stack is already aligned to 16+8
mov rdi, 1
mov rsi, 0
mov rdx, 0
mov rcx, 5
mov r8, '6'
mov r9, 7
sub rsp, 8 ; Align stack for call
push 9 ; k = 9
push 0 ; j = nullptr
push 8 ; i = 8
call f
Note that stack-based arguments are passed in reverse order. Because we are
pushing 3 qword arguments onto the stack, we must re-align the stack pointer.
However, the callee will expect to find its stack arguments immediately below
rip on the stack, so we must adjust the stack alignment first, before
pushing any stack-based arguments.
The stack layout within the callee will look like this:
| Stack | Addr |
|---|---|
| … | |
| 9 | |
| nullptr | |
| 8 | |
| rip | <- rsp |
After the callee sets up the base pointer, we will have
| Stack | Addr |
|---|---|
| … | |
| 9 | rbp+32 |
| nullptr | rbp+24 |
| 8 | rbp+16 |
| rip | rbp+8 |
| rbp | <- rbp <- rsp |
and hence the callee can access its stack arguments as [rbp + 16], [rbp + 24],
and [rbp + 32].
(Note that, while pushing arguments in reverse order might be counter-intuitive,
from the perspective of the callee it makes perfect sense: from left-to-right,
the 7th argument is directly below rip, the 8th below that, and so forth.)
Note that, no matter the arguments’ sizes, the stack pointer rsp should
always be a
multiple of 8. Thus, if we had a dword stack argument, we would round its
size up to qword before pushing.
Writing functions
Writing a “well-behaved” function is simply the opposite process of the above:
Assume that the stack is aligned to a multiple of 16 at the start of the function, so use the prologue to adjust it to 16+8.
Save any of the callee-preserved registers you use on the stack.
Access any stack-based arguments by offset from
rbp.Do whatever your function does, including call other functions.
Place the return value in
rax(xmm0if floating-point)Pop callee-preserved registers
De-align the stack
ret
A function written in this way, declared global can be called from C/C++.
Note that if a function calls no other functions, and does not need to be
called from C/C++, then you can ignore many of these rules. That’s what we
did earlier in the semester. Similarly, a leaf function, one which calls no
other function, is free to use an unaligned stack
Example: student grades
Here’s a simple example: we want to write a program, using the C standard library, which
Reads in a number of student grades
When a -1 is entered, stops reading grades and reports the highest and lowest grades entered.
In C, this looks like this:
#include <stdio.h>
#include <limits.h>
int main() {
long high = LONG_MIN, low = LONG_MAX;
long grade;
printf("Enter grades: ");
do {
scanf("%ld", &grade);
if(grade == -1)
break;
if(grade > high)
high = grade;
if(grade < low)
low = grade;
} while(1);
printf("Highest grade: %ld\n", high);
printf("Lowest grade: %ld\n", low);
return 0;
}
(We should be checking the return value of scanf, but I’m begin lazy.)
Here’s a sample run:
Enter grades: 1 10 100 4 6 57 92 28 -1
Highest grade: 100
Lowest grade: 1
To translate this into assembly, we have to carry out the following steps:
Any string literals must be placed in the
.datasection, terminated with anulcharacter.Because
scanftakes the address of a variable, we must have a qword somewhere in memory. We could use a global variable, declared in the.datasection, but instead, we’ll store it on the stack, in the “unused” space we would normally create to alignrsp. Thehighandlowvariables, because we never use their addresses, can be stored in registers.Because we have a function call inside the loop, we will have to avoid using any of the caller-saved registers.
The
if-elseongradecan be done entirely with conditional moves; no jumps necessary!Any library functions used must be declared
extern.Write
mainEnsure that the resulting object file is linked with the C library (The
asmscript will do this for you.)
;;;;
;;;; grades.s
;;;;
section .data
scanf_format: db "%ld", 0
scanf_result: dq 0
printf_prompt: db "Enter grades: ", 0
printf_high: db "Highest grade: %ld", 10, 0
printf_low: db "Lowest grade: %ld", 10, 0
; Note that we don't need string lengths, because the strings are
; nul-terminated
LONG_MIN: equ -9223372036854775808
LONG_MAX: equ 9223372036854775807
section .text
extern printf
extern scanf
global main
main:
mov r14, rsp ; Note: [r14] (rsp+8) is used for the result of scanf
sub rsp, 8 ; Align stack
; r12 = high, r13 = low
mov r12, LONG_MIN
mov r13, LONG_MAX
.begin_loop:
; Call scanf to get input
mov rdi, scanf_format
mov rsi, r14
call scanf
; Check for -1
cmp qword [r14], -1
je .print_results
; Update low/high
cmp qword [r14], r12
cmovg r12, qword [r14]
cmp qword [r14], r13
cmovl r13, qword [r14]
jmp .begin_loop
.print_results:
; Print low/high grades
mov rdi, printf_high
mov rsi, r12
call printf
mov rdi, printf_low
mov rsi, r13
call printf
add rsp, 8 ; Un-align stack
ret
The design of this program demonstrates in important facet of non-leaf
functions (functions which call other functions): when choosing registers,
choose caller-preserved registers first! We use r12,13,14 for our
intermediate results, as these are caller-preserved; this means that we don’t
have to worry about push/pop-ing them when we call scanf.
Calling C functions from assembly
You can call C functions that you write, not just the standard library
functions. The process is similar, except that you’ll have to link manually
instead of using asm:
Write your assembly and C code in
.sand.cfiles.Declare any C functions you want to call from assembly as
extern.Assemble your
.sfiles; compile your.cfiles.Link all the resulting object files together, using
gcc. E.g.,gcc -o my_program asm_part.o c_part.o.
Calling Assembly functions from C
To call an assembly function from a C program, use the opposite procedure:
Write your assembly and C code in
.sand.cfiles.In your C source code, write declarations for the assembly functions you wish to call. Remember that the compiler will use your declarations to determine how to pass the arguments, so make sure your declaration matches the arguments the function expects to receive!
In your assembly source code, declare the functions
global. If you have amainin C, don’t write one in assembly also!Assembly/compile source files.
Link everything together with
gcc
As an example, let’s write a replacement for the strlen function in assembly.
strlen takes the address of a nul-terminated string in memory, and returns
its length, as an unsigned 64-bit integer. In C, it’s declaration
would be
size_t strlen_asm(char* s);
(We have to give our function a different name in order to avoid a name
collision with the standard library strlen when we link.)
Here’s the assembly function:
section .text
global strlen_asm
strlen_asm:
sub rsp, 8 ; Align stack
; rdi = addr of string
; Return: rax = Length
mov rax, 0
.begin_loop:
cmp byte [rdi + rax], 0
je .done
inc rax
jmp .begin_loop
.done:
add rsp, 8
ret
This is equivalent to the C/C++ loop
size_t strlen_asm(char* s)
{
size_t rax = 0;
while(*(s + rax) != 0)
++rax;
return rax;
}
(We’ll see later that there are a number of string-specific instructions that can accelerate this.)
We can save this as strlen_asm.s and assemble it, producing strlen_asm.o:
yasm -g dwarf2 -f elf64 strlen_asm.s
(Note that we use yasm manually to assemble this; if we used the asm script,
it would try to link and fail, because our .s file does not have either a
_start or main entry point.)
Meanwhile, we can write the following C code:
#include <stdio.h>
// In C++, declare this as `extern "C"`
size_t strlen_asm(char* s);
int main() {
size_t len = strlen_asm("Hello, world!");
printf("Length: %ld\n", len);
return 0;
}
Save as strlen_test.c, compile, and link the two together with
gcc -o strlen_test strlen_test.o strlen_asm.o
and then run with
./strlen_test