Calling Functions in Assembly
Pseudo-Instructions
While assembly languages mostly have a 1-1 correspondence to some processor’s machine code, sometimes it’s helpful for the assembly language to have a few convenient features that just make it easier for humans to read and write. The primary such feature in RISC-V assembly is its pseudo-instructions. A pseudo-instruction is an assembly-language instruction that does not actually correspond to any distinct machine-code instruction (with its own opcode and such).
Here are some common pseudo-instructions:
mv rd, rs1: Copy the value of registerrs1into registerrd.li rd, imm: Put the immediate valueimminto registerrd.nop: A no-op: do nothing at all.
All three of these pseudo-instructions are equivalent to special cases of the addi instructions:
mv rd, rs1does the same thing asaddi rd, rs1, 0li rd, immisaddi rd, x0, immnopisaddi x0, x0, 0
Try to convince yourself that these addi instructions do in fact work to implement these pseudo-instructions’ semantics.
The RISC-V assembler translates pseudo-instructions into their equivalent real instructions for you. So you can write li x11, 42 and that will translate to exactly the same machine-code bits as addi x11, x0, 42.
Why doesn’t RISC-V implement these pseudo-instructions as real, distinct instructions? By keeping the number of instructions small, it simplifies the hardware—especially the decode stage—making it smaller, faster, and more efficient.
Functions in Assembly
With branching control flow, we can accomplish a lot in RISC-V assembly.
We can “fake” if statements, for loops, and so on.
But one thing we can’t do yet is call functions.
That’s what this lecture is about.
Here’s an example C program we can work with:
int addfn(int a, int b) {
return a + b;
}
int main() {
int sum1, sum2;
sum1 = addfn(1, 2);
sum2 = addfn(3, 4);
printf("sum1=%d and sum2=%d\n", sum1, sum2);
}
You already know how to implement the body of the addfn function in RISC-V.
But nothing we’ve done so far will let us call that code multiple times with different arguments, as main does in this example.
Calling a function is a multi-step process, and it requires collaboration between both the caller code and the callee code (the function being called). At a high level, every function call needs to follow these steps:
- The caller puts arguments in a place where the callee function can access them.
- The caller transfers control to the callee (i.e., it jumps to the first instruction in the function).
- The function creates a stack frame to hold its own local variables.
- The function actually does stuff: i.e., the function body.
- The function puts the return value in a place where caller can access it. It also restores any registers it used to the state the caller expects. And finally, it releases the stack frame that holds its local variables.
- The callee returns control to the caller (i.e., jumps to the next instruction in the caller right after the function call).
The caller and callee need to agree on all the details for how this multi-step process works. For example, they must agree on which registers hold the arguments and which registers hold the return value. A standardized protocol for how to implement all these details is called a calling convention. The RISC-V ISA itself defines a particular calling convention, which we will learn about in this lecture. C compilers that generate RISC-V code also use the same calling convention to implement function definitions and function calls—and because it’s standardized, even functions compiled by different C compilers can call each other.
The RISC-V Calling Convention
We’ll break down the components next, but here are the most important parts of the RISC-V calling convention:
- Arguments go in registers
a0througha7(a.k.a.x10throughx17). (In fact, that is why these registers have an alternative name starting with an “a”! It’s for argument.) - Return values also go in registers
a0anda1. (Yes, this means that functions overwrite their arguments with their return values before they return.) - Register
ra(a.k.a.x1) holds the return address: the address of the next instruction to run after the function call finishes. - Registers
s1throughs11(a.k.a.x9, andx18throughx27) are callee-saved registers. This means that callers can safely expect that, after they make a call and the call returns, the registers will be carefully restored to the value they had before the call. - Registers
t0throught6(a.k.a.x5tox7, andx28throughx31) are temporary registers. This means that callee functions can use these registers without saving them. If the caller needs the contents of these temporary registers after the callee returns, then the caller has to save them before making a function call to the callee. As a result, these temporary registers are called caller-saved registers.
Control Flow for Call and Return
Let’s start with the basic mechanism for transferring control:
jumping from the caller to the callee and then back.
The interesting thing is that the branch instructions we’ve seen so far, such as beq, won’t suffice.
The problem is that functions, by their very nature, can be called from multiple locations.
Like in our example above:
sum1 = addfn(1, 2);
sum2 = addfn(3, 4);
Imagine that we implemented both of these calls with a plain unconditional jump, j, like this.
Then the calls might look like this:
li a0, 1;
li a1, 2;
j addfn;
mv <register containing sum1>, a0;
mv a0, 3;
mv a1, 4;
j addfn;
mv <register containing sum2>, a0;
All those li instructions would take care of setting up the argument registers and mv consuming the return-value register.
We imagine here that addfn is an assembly-language label that points to the start of the addfn function’s instructions.
There’s a problem.
In the implementation of the addfn function, how do we know where to jump back to?
After each call is done, we need to transfer control to the next instruction after the jump.
Even if we inserted labels on those instructions, if there is only a single block of instructions to implement addfn, those instructions would need to contain j <label> to return.
But somehow it would need to pick a different label for each call, which is impossible!
The solution is to designate a register to hold the return address for the call.
Instead of just using j to call a function, we’ll do two things:
- Record the next instruction’s address as the return address, in register
ra. - Jump to the first instruction of the called function.
Then, to return, the function just needs to jump to the instruction address in register ra.
Regardless of who called the function, doing this will suffice to transfer control to the point right after the call.
RISC-V has instructions to support these strategies: both the call and the return.
For the call, you use the jal instruction (the mnemonic stands for jump and link):
jal rd, label
The jal instruction does the two things we need for a call:
- Put the address of the next instruction after the
jalinto registerrd. - Unconditionally jump to
label.
So our function calls will generally look like jal ra, <function label>.
Then, to return from a function, we’ll use the jr instruction (the mnemonic means jump register):
jr rs1
The jr unconditionally jumps to the address stored in the register rs1.
So function returns generally look like jr ra.
In fact, this pattern is so common that RISC-V has pseudo-instructions for function calls and returns:
jal label: short forjal ra, labelcall label: like the above, but with an extraauipcinstruction so it supports larger PC offsetsret: short forjr ra
(Going one level deeper, it turns out that jr rs1 is itself a pseudo-instruction that is short for jalr x0, 0(rs1). But that’s not really important for learning about function calls.)
Managing the Stack
The notes below mention the frame pointer register fp – this is not covered this
semester (in Fall 2025), so you do not need to handle fp in your assignments.
Beyond just jumping around, functions also have another important responsibility: they need to keep track of the their local variables. As you already know, local variables go in stack frames on the call stack. You also know that the stack is a region in memory grows downward (from higher memory addresses to lower ones) when we call functions, and it shrinks when function calls return. This section is about the bookkeeping that functions must to do create and use their stack frames.
The central idea is that we must use a register to keep track of the address of our current stack frame.
According to the RISC-V calling convention, register sp (a.k.a. x2) contains the address of the top (the smallest address since the stack grows down) of the current stack frame. Further, the RISC-V calling convention has a frame pointer register, fp, that contains the address of the bottom of the stack frame (the fp has a higher address than the fp since the stack grows down).
Code interacts with sp and fp in three main ways:
- At the beginning of the function, it will “push a stack frame onto the call stack” by moving
spdownward to make space for its own stack frame. Remember, this stack frame will contain the function’s local variables. - During the execution of the function, it will use (positive) offsets on
spto locate each of its local variables. So you’ll see stuff likeld a7, 16(sp)andsd a9, 40(sp)to load and store local variables using offsets fromsp. Equivalently, negative offsets can be used with thefpto access any local variable within a stack frame. The advantage of using thefpversus thespis that the offsets to values on the stack are constant relative to thefp, where as the offsets may change relative to thesp. Note that according the RISC-V calling convention,fpis optional. - At the end of the function, before it returns, it will “pop the stack frame off the call stack” by moving
spback up to wherever it used to be, “destroying” its stack frame. No memory literally gets destroyed, of course, but adjustingspback to its pre-call value indicates that we’re done using all our local variables, and it lets the caller locate its own stack frame.
This means that functions usually look like this:
func_label:
addi sp, sp, -16
sd ra, 8(sp)
sd fp, 0(sp)
addi fp, sp, 8
...
ld fp, 0(sp)
ld ra, 8(sp)
addi sp, sp, 16
ret
or, equivalently:
func_label:
addi sp, sp, -16
addi fp, sp, 8
sd ra, 0(fp)
sd fp, -8(fp)
...
ld fp, -8(fp)
ld ra, 0(fp)
addi sp, sp, 16
ret
The addi at the top and bottom of the function “creates” and “destroys” (a.k.a. “push” and “pop”) the stack frame.
The function’s code must know how big its stack frame needs to be:
in this case, it’s 16 bytes, so we move the stack pointer down by 16 bytes at the beginning and back up by the same 16 bytes at the end.
The stack frame size needs to be big enough to contain the function’s local variables, for instance, space the return address and frame pointer, ra, fp;
C compilers compute this stack-frame size for you by adding up the size of all the local variables you declare.
Further, when the stack frame is “created” (“pushed”), the return address, ra, and frame pointer, fp, are stored on the stack, then the ra and fp are restored before the stack frame is “destroyed” (“popped”).
- Why is
rastored on the stack? Storingraon the stack allows functions to be called recursively. For instance, assume we did not storeraon the stack andmaincallsaddfnandaddfncallsprintf, what would happen tora? Whenmaincallsjal addfn(orcall addfn),rawill contain the return address inmain. Then, whenaddfncallsprintf,jal printf(orcall printf) will overwritera. Next, whenprintfreturns toaddfnandaddfnwants to return tomainthe contents ofrawill have been “clobbered” and there will be no way foraddfnto return tomain. Fortunately, however, by storingraon the stack,addfnwill restorerafrom the stack, which will contain the address back tomain.
Passing Arguments
RISC-V provides a consistent way of passing arguments and receiving the result of a subroutine invocation.
In particular, args a0 to a7 are used for arguments and a0 and a1 are used for return values. Note that a0 and a1 are both argument and value-return registers; as a result, the contents of argument registers in general are “clobbered” and not preserved.
If a function has more than eight arguments, then the arguments are “spilled” to the stack. The calling convention allocates space for all arguments on the child stack frame, placing the first eight args in registers a0 to a7 and “spills” any remaining args to the child stack frame. This means that space is allocated on the stack for the first eight args, even though that space is not initially used since the arg registers are used instead. Allocating space on the stack for all args is particular useful for functions with variable length inputs such as printf(“Scores: %d %d %d\n”, 1, 2, 3); and to treat the arguments as an array in memory.
Let’s see an example for passing ten arguments:
int addfn(int a, int b, int c, int d, int e, int f, int g, int h, int i, int j) {
return a + b + c + d + e + f + g + h + i + j;
}
int main(){
sum = addfn(0, 1, 2, 3, 4, 5, 6, 7, 8 9);
printf("%d\n", sum);
}
assembly for main calling addfn:
main:
li a0, 0
li a1, 1
...
li a7, 7
li t0, 8
sd t0, -16(sp)
li t0, 9
sd t0, -8(sp)
jal addfn
The stack with respect to the caller will look like:
-8(sp): 9
-16(sp): 8
-24(sp): space for a7
-32(sp): space for a6
-40(sp): space for a5
-48(sp): space for a4
-56(sp): space for a3
-64(sp): space for a2
-72(sp): space for a1
-80(sp): space for a0
In particular, the caller passes the first eight args in registers a0-a7 and “spills” the ninth and tenth args to the stack and makes room for all ten args on the stack. Further, note that args are passed on the callee (child) stack frame.
Leaf Functions
Note that if a function does not call another function, then it is a leaf function. addfn functions above are all leaf functions. It is possible for leaf functions not to push or pop a stack frame. That is, not to adjust the sp, or save the ra, fp, any args on the stack. A leaf function can use temporary caller-save (t) registers since they do not need to be saved before using them. But, a leaf function that does not have a stack frame cannot use callee-save (s) registers since callee-save registers require saving them on the stack before using them.
Calling Convention Example
Let’s go through a couple calling convention examples. First, assume that we have the code below:
int test(int a, int b) {
int tmp = (a&b)+(a|b);
int s = sum(tmp,1,2,3,4,5,6,7,8);
int u = sum(s,tmp,b,a,b,a);
return u + a + b;
}
Next, let’s pretend that we are the RISC-V C compiler and write the assembly for the above test function:
To proceed, we will complete the following steps:
- write the assembly for the Body of the function
- Determine stack frame size
- Complete Prologue/Epilogue that performs the stack frame push/pop
Calling Convention Body Example
In this first step, we will write the Body for test
# Prologue:
# stack frame size = sizeof(registers) bytes x (2x args + 2x (ra/fp) + 0x #callee-save registers [+ 1x of temporary caller-save regsters stored on the stack])
# = 8 bytes x 5 = 40 bytes
#
# stack frame layout
# 32(sp): a1 (b)
# 24(sp): a0 (a)
# 16(sp): ra
# 8(sp): fp
# 0(sp): t0
# Body
# store args a and b
SD a0, 24(sp) # a
SD a1, 32(sp) # b
# int tmp = (a&b)+(a|b);
AND t0, a0, a1
OR t1, a0, a1
ADD t0, t0, t1
# store tmp
SD t0, 0(sp)
# int s = sum(tmp,1,2,3,4,5,6,7,8);
MV a0, t0
LI a1, 1
LI a2, 2
...
LI a7, 7
LI t1, 8
SD t1, -8(sp) # spill ninth arg to the child stack frame
JAL sum
# restore tmp, a, b
LD t0, 0(sp) # tmp
LD t1, 24(sp) # a
LD t2, 32(sp) # b
# int u = sum(s,tmp,b,a,b,a);
MV a0, a0 # s
MV a1, t0 # tmp
MV a2, t2 # b
MV a3, t1 # a
MV a4, t2 # b
MV a5, t1 # a
JAL sum
# restore a and b
LD t1, 24(sp) # a
LD t2, 32(sp) # b
# add u (a0), a (t1), b (t2)
ADD a0, a0, t1 # u + a
ADD a0, a0, t2 # u + a + b
# a0 = u + a + b
# Epilogue
Several notes for the above assembly of test.
aandbwere stored in the space allocated for them on the stack.aandbhad to be restored several times becausea0anda1are temporary caller-save. I.e. after the call tosum1andsum2,aandbhad to be restored.tmp, stored int0, needed to be saved in theteststack frame sincet0is a temporary caller-save register andt0(tmp) is needed after the first call tosumreturns.- The ninth argument (value
8) had to be spilled to the child stack frame. InstructionsLI t1, 8andSD t1, -8(sp)store the value8on the child stack frame.
Calling Convention Prologue/Epilogue Example
Next, let’s take a look how to create and destory (push and pop) the stack frame for test in the prologue and epilogue, respectively.
# stack frame layout
# 32(sp): b (a1)
# 24(sp): a (a0)
# 16(sp): ra
# 8(sp): fp
# 0(sp): t0
test:
# Prologue
ADDI sp, sp, -40 # allocate stack frame
SD ra, 16(sp) # save ra
SD fp, 8(sp) # save old fp
ADDI fp, sp, 32 # set new frame pointer
# Body
...
#Epilogue
LD fp, 8(sp) # restore fp
LD ra, 16(sp) # restore ra
ADDI sp, sp, 40 # dealloc frame
ret # JR ra
The test stack frame size is 40 bytes, which is space to store the two args, a and b, ra/fp, and tmp variable. Further, in the prologue and epilogue, only ra and fp are stored. The arguments for test, a and b, and tmp (t0) are stored on the stack in the # Body.
Another consideration is the total number of stores and loads for this implementation of test. Specifically, there are two stores and two loads in the prologue/epilogue and three stores and five loads in the body for a total of five stores (SD) and seven loads (LD).
Calling Convention Example 2
Now let’s look at a different implementation for test. It is the same C code for test, but a different assembly implementation. In this assembly, we will use callee-save registers (s) to save on access to memory, and, hopefully, reduce the number of stores/loads (SD/LD). The stack size may increase because we need to save the callee-save registers before we use them, but there may be less overall stores/loads.
# Prologue
# stack frame size = sizeof(registers) x (2x args + 2x (ra/fp) + 3x callee-save registers [+ 0x temporary caller-save regsters stored on the stack])
# = 8 bytes x 7 = 56 bytes
#
# stack frame layout
# 48(sp): b
# 40(sp): a
# 32(sp): ra
# 24(sp): fp
# 16(sp): s3
# 8(sp): s2
# 0(sp): s1
# Body
# store args in callee-save registers s1 and s2
MV s1, a0 # a
MV s2, a1 # b
# int tmp = (a&b)+(a|b);
AND s3, a0, a1
OR t1, a0, a1
ADD s3, s3, t1 # store tmp in a callee-save register s3
# int s = sum(tmp,1,2,3,4,5,6,7,8);
MV a0, s3
LI a1, 1
LI a2, 2
...
LI a7, 7
LI t1, 8
SD t1, -8(sp) # spill ninth arg to the child stack frame
JAL sum
# int u = sum(s,tmp,b,a,b,a);
MV a0, a0 # s
MV a1, s3 # tmp
MV a2, s2 # b
MV a3, s1 # a
MV a4, s2 # b
MV a5, s1 # a
JAL sum
# add u (a0), a (s1), b (s2)
ADD a0, a0, s1 # u + a
ADD a0, a0, s2 # u + a + b
# a0 = u + a + b
# Epilogue
In this assembly, there is space allocated for args a and b; however, we use callee-save registers s1 and s2 for a and b instead. As a result, the body of test has one store (SD) and zero loads (LD) in the body. Note that test still needs to spill the ninth argument on the stack before calling sum.
Calling Convention Prologue/Epilogue Example 2
Now, let’s take a look at the prologue and epilogue to push and pop the test stack frame for this second implementation.
# stack frame layout
# 48(sp): b
# 40(sp): a
# 32(sp): ra
# 24(sp): fp
# 16(sp): s3
# 8(sp): s2
# 0(sp): s1
test:
# Prologue
ADDI sp, sp, -56 # allocate stack frame
SD ra, 32(sp) # save ra
SD fp, 24(sp) # save old fp
SD s3, 16(sp) # store callee-save reg s1
SD s2, 8(sp) # store callee-save reg s2
SD s1, 0(sp) # store callee-save reg s3
ADDI fp, sp, 48 # set new frame pointer
# Body
...
#Epilogue
LD s1, 0(sp) # restore s1
LD s2, 8(sp) # restore s2
LD s3, 16(sp) # restore s3
LD fp, 24(sp) # restore fp
LD ra, 32(sp) # restore ra
ADDI sp, sp, 56 # dealloc frame
ret # JR ra
In this assembly, the test stack frame size is 56 bytes, which is space to store the two args, a and b, ra/fp, and space for three callee-save (s) registers. We store s1-s3 so that we can use them a, b, and tmp. variable.
In terms of the total number of stores and loads, there are five stores and five loads in the prologue/epilogue and one store and zero loads in the body for a total of six stores (SD) and five loads (LD), reducing the total number of loads by two compared to the prior assembly.
Summary and Cheat Sheet for the RISC-V Calling Convention
- first eight args passed in registers
a0,a1, … ,a7 - Space for args passed in childs’s stack frame
- return value (if any) in
a0,a1 - stack frame at
sp- contains
ra(clobbered on JAL to sub-functions) - contains
fp - contains local vars (possibly clobbered by sub-functions)
- contains space for incoming args
- contains
- Saved registers (callee save regs) are preserved
- Temporary registers (caller save) regs are not
- Global data accessed via
gp

RISC-V Registers
- Return address:
x1(ra) - Stack pointer:
x2(sp) - Frame pointer:
x8(fp/s0) - First eight arguments:
x10-x17(a0-a7) - Return result:
x10-x11(a0-a1) - Callee-save free regs:
x18-x27(s2-s11) - Caller-save free regs:
x5-x7,x28-x31(t0-t6) - Global pointer:
x3(gp) - Thread pointer:
x4(tp)