A7: Functions in Assembly
Instructions:
Remember, all assignments in CS 3410 are individual.
You must submit work that is 100% your own.
Remember to ask for help from the CS 3410 staff in office hours or on Ed!
If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.
The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.
Submission Requirements
Please submit the following files.
genai_survey.png: a screenshot of your completed gen ai surveyrecursive.s: implementation of a recursive Fibonacci functionmemoization.s: implementation of a memoized Fibonacci functiontail_recursive.s: implementation of a tail recursive Fibonacci functionopt_tail.s: implementation of a tail-call optimized Fibonacci function
Restrictions
- You must adhere to the specified implementation guidelines for each function.
- You may not make non-recursive function calls
- For a call to be considered recursive, the function must invoke itself by calling its own entry point/label.
- For the recursive implemetations, you must recurse on the label/entry point that defines the start of the function being implemented
Provided Files
We have provided you the following:
recursive.c,memoization.c,tail_recursive.c, andopt_tail.c, as the starter codecompare.c, a program that compares the performance of the different versions of the Fibonacci functionMakefile, which you can use to build executables for the above programs
Note about Stack Pointer
It is important to note that RISC-V architectures mandate a 16-byte alignment for the stack pointer. You are not required to maintain this alignment in this assignment, however. On an exam, you will be told whether to preserve the 16-byte aligned stack pointer. See the lecture slides for more information.
Getting Started
To get started, obtain the release code by cloning your assignment repository from GitHub:
git clone git@github.coecis.cornell.edu:cs3410-2025fa-student/<NETID>_asmfunc.git
Replace <NETID> with your NetID. All the letters in your NetID should be in lowercase.
Overview
This assignment will expand your understanding of RISC-V assembly programming with a focus on managing the call stack. You will get direct experience with defining and calling functions and adhering to RISC-V calling conventions. You will learn about optimizing recursive functions for performance.
Optimizing Fibonacci
In this assignment, you will implement several different versions of a function that calculates the nth number of the Fibonacci sequence.
We’ll start with a straightforward recursive implementation and then explore various performance optimizations.
Version A: Recursive Fibonacci
Here’s a straightforward recursive implementation of a Fibonacci function in C:
unsigned long r_fibonacci(int n) {
if (n == 0)
return 0;
else if (n == 1)
return 1;
else
return r_fibonacci(n - 2) + r_fibonacci(n - 1);
}
Your task is to translate this code into RISC-V assembly.
Put your implementation in a file called recursive.s.
Make sure to add the following line at the top of the file
.global r_fibonacci
This directive tells the assembler that the r_fibonacci label is a global symbol, meaning it’s accessible to other code.
You should have a .global directive at the top of your other fibonacci assembly implementations.
We have provided a main function you can use to test your code in recursive.c.
To test your code, type:
rv make recursive # Build the `recursive` executable.
rv qemu recursive 10 # Run it.
The recursive executable takes a command-line argument:
the index of the Fibonacci to calculate.
So qemu recursive 10 should print the 10th Fibonacci number, which is 55.
Version B: Memoized Fibonacci
The recursive implementation works, but it is very slow. Try timing the execution of a few Fibonacci calculations:
time rv qemu recursive 35
time rv qemu recursive 40
time rv qemu recursive 42
On my machine, calculating the 40th Fibonacci number took 4 seconds, and calculating the 42nd took 11 seconds. That suggests that the asymptotic complexity is pretty bad.
Part of the problem is that the recursive version recomputes the same answer many times.
For example, if you call r_fibonacci(4), it will eventually call r_fibonacci(2) twice:
once directly, and once indirectly via the recursive call to r_fibonacci(3).
This redundancy can waste a lot of work.
A popular way to avoid wasteful recomputation is memoization.
The idea is to maintain a memo table of previously-computed answers and to reuse them whenever possible.
For our function, the memo table can just be an array, where the ith index holds the ith Fibonacci number.
Here’s some Python code that illustrates the idea:
def m_fibonacci(n, memo_table, size):
# Check the memo table. A nonzero value means we've already computed this.
if n < size and memo_table[n] != 0:
return memo_table[n]
# We haven't computed this, so do the actual recursive computation.
if n == 0:
return 0
elif n == 1:
return 1
answer = (m_fibonacci(n - 2, memo_table, size) +
m_fibonacci(n - 1, memo_table, size))
# Save the answer in the memo table before returning.
if n < size:
memo_table[n] = answer
return answer
In C, the type of memo_table will be unsigned long*, i.e., an array of positive numbers.
size is the length of that array.
Here’s the function signature for our new function:
unsigned long m_fibonacci(int n, unsigned long* memo_table, int size);
Implement this m_fibonacci function in RISC-V assembly.
Put your code in memoization.s.
We have provided a memoization.c wrapper that you can use to test your code.
You can use the same procedure as above to try your implementation:
rv make memoization
followed by
rv qemu memoization <number>.
Notice how much faster the new implementation is! Take some number that was especially slow in the recursive implementation and time it using your memoized version:
time rv qemu memoization 42
On my machine, that takes just 0.5 seconds. That’s 22× faster!
Version C: Tail Recursive Fibonacci
While the new version is a lot faster, it still makes a lot of function calls. Some of those function calls turn out to be fast, because they just look up the answer in the memo table. But we can do better by changing the algorithm to need only one recursive call.
Again using Python syntax, here’s the algorithm for a faster recursive version:
def tail_r_fibonacci(n, a, b):
if n == 0:
return a
if n == 1:
return b
return tail_r_fibonacci(n - 1, b, a + b)
This version is called tail-recursive because the recursive call is the very last thing the function does before returning.
Marvel at the fact that this version makes only n recursive calls to calculate the nth Fibonacci number!
Here’s the function signature for this version:
unsigned long tail_r_fibonacci(int n, unsigned long a, unsigned long b);
Implement this tail_r_fibonacci function in tail_recursive.s.
As usual, we have provided a C wrapper so you can test your implementation:
rv make tail_recursive followed by rv qemu tail_recursive <number>.
Version D: Tail-Call Optimized Fibonacci
Making n recursive calls is pretty good, but is it possible to optimize this code to do no recursion at all?
That would mean that the algorithm uses O(1) stack space instead of O(n).
That’s the idea in tail-call optimization.
The plan is to exploit that, once the recursive call to tail_r_fibonacci is done, the caller has nothing more to do.
The callee puts its return value in a0, and that is exactly what the caller wants to return too.
Because there is no more work to do after the tail call,
we don’t need to waste time maintaining the stack frame for the caller.
We can just reuse the same stack frame for the recursive call!
Implement an optimized version of the tail-recursive Fibonacci algorithm in opt_tail.s.
Instead of using a jal (or call) instruction for the recursive call, you can just use a plain unconditional jump (j in RISC-V).
Be sure to carefully think through when and where you need to save and restore the return address to make this work.
Your function should be named opt_tail_fibonacci,
and it should have the same function parameters as the previous version.
As usual, opt_tail.c can help you test your implementation: rv make opt_tail followed by rv qemu opt_tail <number>.
Compare Performance
We have provided a program, in compare.c, that can compare the performance of these various optimizations more precisely than the time command.
(That was also measuring the time it takes to start the executable up, which can be slow, especially when it entails launching a Docker container.)
Build the tool and invoke it like this:
rv make compare
rv qemu compare <method> <n>
You can give it the name of a method (recursive, memoization, tail_recursive, or opt_tail) and a number n to measure the time taken to compute the nth Fibonacci number.
Or use the all method to compare all the implementations.
When I ran this once on my machine with n=20, it reported that the recursive implementation took about 2.6 seconds, memoization brought this down to just 7 milliseconds, tail recursion was even faster at 3 ms, and the optimized tail call version was blazingly fast at only half a millisecond.
Every computer is different, so your numbers will vary, but see if you observe the same overall performance trend.
There is nothing to turn in for this part—it’s just cool!
Submission
Submit all the files listed in Submission Requirements to Gradescope. Upon submission, we will provide a smoke test to ensure your code compiles and passes the public test cases.
Gen AI survey
To help you remember to submit the Gen AI survey along with the assignment, you will now submit a screenshot called genai_survey.png. All you need to do is submit the A7 AI Survey on Gradescope, take a screenshot (all it needs to show is that you submitted it), rename the file to genai_survey.png, and include the file along with your code. You do not need to do this until you plan on making your final submission. The autograder will work correctly without it. The failed test that appears if you do not submit the file is simply to act as a reminder. Submitting this screenshot is worth 1 point.
Rubric
recursive.s: 10 pointsmemoization.s: 15 pointstail_recursive.s: 10 pointsopt_tail.s: 15 points