Calling Functions in Assembly

With branching control flow, we can accomplish a lot in RISC-V assembly. We can implement if statements, for loops, and so on. But one thing we can’t do yet is call functions. That’s what this lecture is about. This process is complex and multifaceted, so we will build up to real function calls in a series of simpler steps.

One important “design goal” throughout is that we want a standard protocol that different functions can all agree on when calling each other. We need functions in different source files, compiled by different compilers, or even written in different source languages to be able to interoperate. So we will concoct a standard set of rules that all compilers can follow to implement either site of the function call relationship (caller or callee).

This set of rules is known as a calling convention. It amounts to a contract between callers and callees. A given platform defines a calling convention as part a broader standard called its application binary interface (ABI). Different combinations of operating systems and ISAs typically have their own ABIs. RISC-V defines a standard calling convention that we will use as our example here. But remember that the choices are somewhat arbitrary—the important thing is that everyone agrees on the standard so that functions can interoperate. There is nothing sacred about the standardized RISC-V calling convention; it is just a choice some humans made once and other humans agreed to follow.

Arguments & Return Values in Registers

Let’s try to implement this contrived C program:

#include <stdio.h>

int add(int a, int b) {
    return a + b;
}

int main() {
    int x = 3;
    int y = 4;
    int sum = add(x, y);
    printf("%d\n", sum);
    return 0;
}

Ignoring the printf and the function call for now, we can write most of the assembly for the two functions:

add:
  add a0, a0, a1
  ??? return a0

main:
  li a0, 3
  li a1, 4
  ??? call add(a0, a1)

In add, we’ve chosen to put the variable a into register a0 and b into register a1. In main, we have put x into a0 and y into a1.

This is an especially lucky choice of registers, because in the function call add(x, y), we bind a too x and b to y—so in both cases, the main variable goes in the same register as the add parameter. It means we can make this function call work with a pair of j (unconditional jump) instructions:

add:
  add a0, a0, a1
  j .aftercall

main:
  li a0, 3
  li a1, 4
  j add
.aftercall:
  # call printf with a0 here

I’ve added an extra label, .aftercall, to mark the point where add should return to. (As a convention, I’ll use labels without a leading dot for functions and labels with a leading dot for locations within a function. But from the perspective of j instructions, there is no difference.)

This little example illustrates the first thing that any two functions need to agree on: which registers to use for arguments and return values. main needs to know where add will be looking for its a and b parameters, and it needs to know where to look for the returned value. A calling convention must include rules about this so that functions know how to communicate.

Let’s look at the standard RISC-V calling convention. It says that arguments go in the registers a0 through a7, which are also known as x10 through x17. (In fact, that’s why the fancy names start with the letter A: it’s for argument!) It also says that a0 and a1 are also used as return values.

Return Address

Let’s make a seemingly small change to our C program by adding a second call to the same function:

#include <stdio.h>

int add(int a, int b) {
    return a + b;
}

int main() {
    int x = 3;
    int y = 4;
    int sum = add(x, y);
    int other_sum = add(sum, x);
    printf("%d %d\n", sum, other_sum);
    return 0;
}

Let’s try the same “jump back and forth” strategy to implement this in assembly:

add:
  add a0, a0, a1
  j .aftercall

main:
  li a0, 3
  li a1, 4
  mv a3, a0  # preserve x for later
  j add
.aftercall:
  mv a4, a0  # preserve sum too
  mv a1, a3  # put x into the second argument position
  j add
.aftercall2:
  # call printf with a4 and a0

I’ve needed to insert a few mv instructions to save variables across function calls, freeing up a0 and a1 for implementing the function calls (more on this later). But I’ve immediately run into a problem: where should add return to? There are two different call sites, so there is no single j instruction that suffices to return to the right place for both calls.

We could duplicate the add function and hard-code the two different return labels. But this not only feels bad; it is extremely inefficient (we’d need one function per call) and impossible in general (what if you’re writing a library and you don’t know who will call you?).

This problem motivates the second component of a calling convention: functions need to agree on where to store the return address. For a given function call, the return address is the address of the next instruction to run in the caller after the callee is finished.

Calling conventions define which register to use for the return address. Returning to the standard RISC-V calling convention, it dictates that this register is ra, also known as x1. In very fake pseudo-assembly, then, this is what we want to do:

add:
  add a0, a0, a1
  j ra                 # WIP: we want to jump to the address in ra

main:
  li a0, 3
  li a1, 4
  mv a3, a0
  li ra, .aftercall    # WIP: put the address for .aftercall in ra
  j add
.aftercall:
  mv a4, a0
  mv a1, a3
  li ra, .aftercall    # WIP: as above
  j add
.aftercall2:
  # call printf with a4 and a0

To make this work, we need instructions to do two things:

  • Put an instruction address into a register (namely, ra).
  • Jump to an address in a register (again, here we want the address in the register ra).

RISC-V has special instructions for both of these. They are:

  • jal rd, label: Put the address of the next instruction after the current one into rd and then jump to label. In other words, jal sets rd to PC+4 where PC is the current value of the program counter and then acts like j. (The mnemonic stands for jump and link.)
  • jalr rd, offset(rs1): As above, but jump to an address stored in a register (with an offset) instead of a constant label. (The r at the end of the mnemonic is for register.)

To implement function calls and returns, we use these two functions in specific ways:

  • jal ra, <func>: Call the function at label <func>, and store the return address (the next instruction in the caller) in ra.
  • jalr x0, 0(ra): Return to the caller. We throw away the “return address” that would otherwise be computed by using x0 as the destination register.

In fact, these two patterns are so common that RISC-V gives us pseudo-instructions to make them easier to type:

  • call func: Shorthand for jal ra, func.
  • ret: Shorthand for jalr x0, 0(ra).

Let’s use those pseudo-instructions to implement our two function calls:

add:
  add a0, a0, a1
  ret                  # jump to the address in `ra`

main:
  li a0, 3
  li a1, 4
  mv a3, a0
  call add             # set `ra` and jump to `add`
  mv a4, a0
  mv a1, a3
  call add             # as above
  # call printf with a4 and a0

Notice that we no longer need the .aftercall labels. The semantics of jal suffice to compute the return address using the current value of the PC.

The Stack

Let’s take this one step farther by involving three functions. I apologize for the even more contrived example, but check this out:

#include <stdio.h>

int add(int a, int b) {
    return a + b;
}

int sub(int a, int b) {
    return add(a, ~b+1);
}

int main() {
    int x = 3;
    int y = 4;
    int diff = sub(x, y);
    printf("%d\n", diff);
    return 0;
}

Let’s use our strategy so far to translate this to assembly:

add:
  add a0, a0, a1
  ret

sub:
  not a1, a1
  addi a1, a1, 1
  call add
  ret

main:
  li a0, 3
  li a1, 4
  call sub
  # call printf with a0

Do you see the problem in this implementation?

The issue is that we’ll eventually execute two different call instructions that have different return addresses. The first call sets ra to the return address inside main, and then the second call (the one inside sub) will overwrite ra with a different return address inside sub. When sub eventually executes its ret, it will jump to that return address instead of the original one—creating an infinite loop.

Fundamentally, the issue is that the ra register can only hold one return address, and we need to keep track of many: one for every currently active function call. This sounds like we need a stack of return addresses, and indeed we’ll solve the problem by implementing the call stack.

Recall that the call stack is a region of memory. Every active function invocation gets a chunk of that region called a stack frame. We will use the corresponding stack frame to store each call’s return address: i.e., where ret should jump to when the function is done.

So, our next goal is to create (and destroy) a stack frame for every function call. The idea is that every function will start by allocating some space for a stack frame and end by releasing that same space. To do this, we need one crucial ingredient: the calling convention needs a way to keep track of the address of the top of the stack, which is called the stack pointer.

In the standard RISC-V calling convention, there is a special register designated for storing the stack pointer: sp, also known as x2. To create and destroy a stack frame, we just need to “move” the stack pointer by adding to it. Remember that the stack grows downward, so creating a stack frame entails adding a negative number to sp, and destroying it entails adding a positive number to move the pointer back up. Therefore, we can set all our functions roughly like this:

func:
  addi sp, sp, -64
  ...
  addi sp, sp, 64
  ret

The function decides how many bytes it needs for its stack frame (64 in this example). At the beginning of the function, we subtract that many bytes from sp to create our stack frame; just before returning from the function, we add the same immediate to sp to restore the stack pointer to its old location. In the body of the function, we can load and store parts of our stack frame using positive offsets based on sp. For example, use sd x3, 0(sp) to store a value in the first “slot” of the function’s stack frame.

Finally, because every function call now has its own private region of memory, we can start storing our return addresses there. A simple strategy is to stash the value of ra at the beginning and end of the function, like this:

func:
  addi sp, sp, -64
  sd ra, 0(sp)
  ...
  ld ra, 0(sp)
  addi sp, sp, 64
  ret

Immediately after creating our stack frame, we store ra into it. Then, immediately before returning, we reload ra from the same location. (We use double-word loads and stores because ra holds a pointer, and pointers are always 64 bits (8 bytes) words on our platform.) This way, the body of our function can muck about with ra as much as it likes, and we’ll still be sure to return to the right place.

Let’s apply this strategy to our full example above:

add:
  addi sp, sp, -16
  sd ra, 0(sp)

  add a0, a0, a1

  ld ra, 0(sp)
  addi sp, sp, -16
  ret

sub:
  addi sp, sp, -16
  sd ra, 0(sp)

  not a1, a1
  addi a1, a1, 1
  call add

  ld ra, 0(sp)
  addi sp, sp, -16
  ret

main:
  li a0, 3
  li a1, 4
  call sub
  # call printf with a0

I’ve omitted the stack-frame manipulation for main for simplicity. I’ve chosen a stack-frame size of 16 bytes for both sub and add: I only need to store the (8-byte) return address, and I’ve rounded up to 16 for alignment (see the next section).

Alignment

The standard RISC-V calling convention says that the stack pointer should be aligned to 16 bytes: in other words, sp should be a multiple of 16 when entering a function.

Therefore, if you need \(n\) bytes of stack space, you should decrement/increment sp by the smallest multiple of 16 greater than or equal to \(n\).

Leaf Functions

In our example, sub really needs to save and restore ra because it contains a function call that will overwrite ra. However, add doesn’t contain a call, so it’s safe to skip this step. (You can call add a leaf function because it’s at a leaf of the call tree.) We could dispense with all the stack pointer and just write:

add:
  add a0, a0, a1
  ret

Omitting the stack-frame stuff isn’t necessary—it’s just an optimization you are allowed to do for leaf functions if you want.

Local Variables

A reasonable question you might have had, somewhere along your assembly programming journey, is: what do you do if your function has more local variables than your ISA has registers?

The answer is that you can put them in your stack frame! Deciding to put a variable on the stack is sometimes called spilling the variable (because the register file has overflowed with data, I suppose). If you need extra space in your stack frame, just make sure it’s big enough and then load and store using offsets on sp. Something like this:

myfunc:
  addi sp, sp, -128  # lots of space!
  sd ra, 0(sp)

  li a5, 341
  sd a5, 32(sp)      # store stuff starting at byte 32 of my stack frame
  # ...
  ld a7, 32(sp)

  ld ra, 0(sp)
  addi sp, sp, 128
  ret

Arguments on the Stack

The standard RISC-V calling convention says that the first 8 arguments (assuming they fit within 8 bytes) go in the registers a0 through a7. But what if you need to pass more than 8 arguments?

They go on the stack. The caller puts them at the “end” (lowest addresses) of its stack frame, and the callee looks for them at positive offsets from the initial value of sp.

The callee can find its first stack-passed argument at 0(sp) before it adjusts sp to create its own stack frame. Assuming all 8-byte arguments, the second stack-passed argument is at 8(sp), the third is at 16(sp), and so on.

Saved Registers

There’s one final decision that a calling convention dictates: which registers are guaranteed to retain their value across a function call. Let’s imagine a function like this:

void myfunc() {
    int myvar = 341;
    int sum = add(3, 4);
    // do more stuff with myvar and sum here
}

Here’s an implementation in RISC-V assembly:

myfunc:
  addi sp, sp, -32
  sd ra, 0(sp)

  li a5, 341        # myvar goes in a5

  li a0, 3
  li a1, 4
  call add          # call add(3, 4) and put result in a0

  # what is a5 here?

  # do more stuff with a5 and a0 here

  ld ra, 0(sp)
  addi sp, sp, 32
  ret

We’ve implemented our call to add using the standard RISC-V calling convention’s requirement to put the arguments in a0 and a1 and the return value in a0. The important question here is: what happens to a5, the register we’re using to store myvar? Can we rely on it still containing the value we put into it, or is add allowed to use that register itself and potentially overwrite our value?

The calling convention decides the contract between callers and callees that dictates which variables the callee is allowed to overwrite. Typically, a calling convention tells you that each register follows one of two rules:

  • Caller-saved or temporary: The callee is allowed to do whatever it wants with this register, including overwriting it with any data it likes. The caller must not depend on values being preserved across the function call.
  • Callee-saved or just plain saved: The callee must ensure that the register holds the same value on function return as it did when the function was called. The caller can rely on the value being preserved across calls. If the callee wants to use that register for other purposes, that’s fine, but it has to save the original value and restore it before returning.

In practice, both styles have their advantages in different situations. So most calling conventions have some caller-saved and some callee-saved registers.

Let’s return to the standard RISC-V calling convention. It tells us what the rule is for every register. Here are the most important observations:

  • Our argument/return registers, a0 through a7, are temporary (caller-saved).
  • There is a range of callee-saved registers called s0 through s11 (a.k.a. x8, x9, and x18 through x27). The S is for saved.
  • Similarly, there is a range of caller-saved registers t0 through t6 (a.k.a. x5 through x7 and x28 through x31). The T is for temporary.
  • The return address, ra, is temporary. That’s why functions must save ra in their stack frame if they need it.
  • The stack pointer, sp, is saved. That’s why functions must take care of restoring sp to its old value with addi before they return.

The consequence for writing functions is this: if you want to use a callee-saved register in your function body, add code to save and restore it. (Using temporary registers is “free” and you don’t need to do anything special, with the downside that calling a function from inside your function might overwrite them.) An easy way to do this is to tack on this saving and restoring to the “prologue” and “epilogue” that starts and ends your function. Here’s an example:

myfunc:
  # Prologue:
  addi sp, sp, -32
  sd ra, 0(sp)      # save return address
  sd s4, 8(sp)      # save s4 so we can use it

  # In the function body, now we can freely use s4 and any of the tN
  # registers (and ra). But don't overwrite any of the other sN registers other
  # than s4!

  # ...function body...

  # Epilogue:
  ld ra, 0(sp)      # restore return address
  ld s4, 8(sp)      # restore s4 to its old value
  addi sp, sp, 32
  ret

Recap

In the above sections, we have identified several rules that a calling convention must describe:

  • Which registers to use for arguments and return values.
  • When a function ends, where it should find the return address to jump to.
  • Where to store the stack pointer (the address of the top of the stack).
  • Which registers are preserved across function calls (which are callee-saved and which are caller-saved).

For the 64-bit RISC-V architecture, you can find answers to each of these questions in the document describing the standard calling convention. Here is a brief summary:

  • For arguments and return values, use the aN registers. If you have more than 8 arguments, put them on the stack.
  • The return address goes in ra.
  • The stack pointer goes in sp and is aligned to 16 bytes.
  • sN and sp are saved (callee-saved). aN, ra, and tN are temporary (caller-saved).

And here is a sketch showing how to write the prologue and epilogue for a function that adheres to this calling convention:

func:
  # Prologue:
  addi sp, sp, -<size>   # how many bytes do you need in the stack frame (rounded up for alignment)?
  sd ra, 0(sp)           # save the return address if you need to call
  sd sX, Y(sp)           # stash every saved register you want to use

  # function body goes here

  # to call another function:
  mv aX, <argument>      # put arguments in aX registers
  sd tX, Y(sp)           # save any temporary registers we need to use
  call
  ld tX, Y(sp)           # restore those temporary registers
  # return value is now in a0

  # Epilogue:
  sd ra, 0(sp)           # restore the return address, if we saved it
  sd sX, Y(sp)           # restore everything else we stashed too
  addi sp, sp, <size>    # must match the size in the prologue

Note

Some implementations of the call stack also use a frame pointer, called fp in RISC-V. The frame pointer points at the “other end” of the current frame from sp, i.e., its highest address. Frame pointers are not necessary to make function calls work; instead, they are useful for debugging and implementing exceptions. Therefore, using fp is optional in the RISC-V calling convention. In CS 3410, we do not use it (and you do not need to know about it for this class).

Try It Out

If a calling convention is a contract, then our C compiler should follow it too. This implies that we should be able to write a program in a mixture of C and assembly, and each one should be able to call the other. Let’s take our little example with add and sub from above and “port” the sub function from C to assembly.

We’ll start by deleting the body of sub and leaving just the declaration:

#include <stdio.h>

int sub(int a, int b);

int add(int a, int b) {
    return a + b;
}

int main() {
    int x = 3;
    int y = 4;
    int diff = sub(x, y);
    printf("%d\n", diff);
    return 0;
}

Now let’s write sub in assembly, in a separate file:

.global sub

sub:
  addi sp, sp, -16
  sd ra, 0(sp)

  not a1, a1
  addi a1, a1, 1
  call add

  ld ra, 0(sp)
  addi sp, sp, 16
  ret

The .global directive in this assembly file makes the sub label visible to other files as a symbol. You can compile, assemble, and link these two pieces together with this command:

rv gcc add.c sub.s -o addsub

Try it out!

Another Example

Here’s another example you can try. It’s a recursive function to get the Nth element of a linked list:

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>

struct ilist {
    struct ilist* next;
    int64_t value;
};

long nth(struct ilist* list, uint64_t n) {
    if (n == 0) {
        return list->value;
    } else {
        return nth(list->next, n - 1);
    }
}

int main() {
    // Construct a 4-element list.
    struct ilist node1 = { NULL,   3 };
    struct ilist node2 = { &node1, 4 };
    struct ilist node3 = { &node2, 1 };
    struct ilist node4 = { &node3, 0 };

    int64_t res = nth(&node4, 2);
    printf("list[2] = %ld\n", res);

    return 0;
}

Try porting the nth function to assembly. You can try your work out using the technique above: start with the C file, delete the implementation of nth, and then link the C file and your new assembly file into an executable.

The important thing to notice here is that recursive calls work exactly the same as non-recursive calls. There is no magic: just like any other call, a recursive call must allocates a stack frame and preserve its return address.