Processing math: 100%

The RISC-V ISA

So far, we have used the raw materials of switches and transistors to build circuits that can do arithmetic and store state. At this point I think it’s interesting to ask yourself a philosophical question: what is a “computer”? It’s clearly a subjective definitional question, so you can decide for yourself. Take a minute or two to ponder!

I would argue that we do not yet have a computer as it is missing a key aspect: programmability. One definition of a computer is a machine that can be programmed to automatically execute sequences of arithmetic or logical operations. But before we can program our processor, we need a language.

Instructions

Recall that we can manually control our arithmetic and state circuits by turning on certain bits/wires. For example, registers have an enable input that decides whether or not to store the new input. Multiplexers have a select bit which determines which input to output. Even the inputs to adders are simply sequences of bits. Ultimately, what the circuit does is wholly determined by which of these bits are set and which ones are not.

As you know by now, if we collected all of the “control” bits together we would get a number in binary. However, this number is special—it means something to our circuit. We call this special number an instruction as it tells the circuit what to do.

Machine Code

Instructions encode a single action: “add 2 to the value in register 1”, “store 42 in register 5”, etc. In a weird way, this view means we’ve defined a programming language. A really bad, primitive programming language.

This bit-level “programming language” exists in every processor in existence. It is called machine code, and it is how all software on the computer works. Every program you’ve ever run, and every program you’ve ever written in every language, eventually translates down to machine code for your processor.

Instruction Set Architecture

A machine code language is called an instruction set architecture (ISA). Some popular ISAs for “real” computers include:

  • RISC-V, which we are using in this course.
  • ARM, which your phone almost certainly uses and your laptop might use.
  • Intel’s x86, which your laptop might use.

Each of these ISAs defines a “meaning” for strings of bits. Then, processors interpret those bits to decide which actions to take.

RISC-V

We will now take a leap to a full-featured processor and a standard, popular ISA: RISC-V.

Like all ISAs, RISC-V is an extremely primitive programming language made of bits, and it has a textual assembly format that makes it easier to read and write than entering binary values manually. Each instruction is like an extremely simple statement in a different programming language, and it describes a single small action that the processor can take.

As a general-purpose ISA, RISC-V has enough instructions so that arbitrary C programs can be translated to RISC-V code. In fact, that’s what happened every time you typed gcc during this whole semester.

Why Learn Assembly Programming?

Understanding assembly is important because it is the language that the computer actually speaks. So while it would be infeasible in the modern age to write entire large software projects entirely in assembly, it remains relevant for the small handful of exceptional cases where higher levels of abstraction obscure important information. Here are some examples:

  • People hand-write assembly for extremely performance-sensitive loops. A classic example is audio/video encoding/decoding: the popular FFmpeg library, for example, is mostly written in C but contains hand-written RISC-V assembly for performance-critical functions. While modern compiler optimizations are amazing, humans can still sometimes beat them.
  • Operating system internals typically need some platform-specific assembly to deal with the edge cases that arise with controlling user processes.
  • Code that must be secure, such as encryption and decryption routines, are often written directly in assembly to avoid timing channels. If an encryption routine takes different amounts of time depending on the key, an attacker can learn the key by repeatedly measuring the time taken to encrypt or decrypt. By taking direct control over which instructions get executed, humans can sometimes ensure that the code takes a constant amount of time, so that the attacker can’t learn anything by timing it. This is hard to do by writing C because the compiler tries to be clever: by optimizing your code, it can “accidentally” make its timing input-dependent.
  • Even more commonly: reading assembly is an important diagnostic skill. When something goes wrong, sometimes reading the assembly is the only way to track down the root cause. If it’s a performance problem, for example, understanding the source code only gets you so far. If it’s a compiler bug (and compilers do have bugs!), then debugging is hopeless unless you can read assembly.

For these reasons and others, it is important to know how to read and write assembly code. We will program in RISC-V during this semester, but the skills you learn as a RISC-V programmer will translate to other ISAs such as ARM and x86.

Let’s See Some RISC-V Assembly

To get started, let’s look at some RISC-V assembly code. I mentioned already that, every time you have typed gcc so far this semester, you have been invoking a compiler whose job it is to translate your C into machine code. We can ask it to instead stop at the assembly and print that out using the -S command-line flag.

Let’s start with an extremely simple C program:

unsigned long mean(unsigned long x, unsigned long y) { return (x + y) / 2; }

To see the assembly code, try a command like this:

$ rv gcc -O1 -S mean.c -o mean.s

The -S tells GCC to emit assembly, and -o mean.s determines the output file. I’m also using some optimizations, with -O1, that clean up the code somewhat (in addition to making the code faster, it also makes the assembly more readable). This is just a text file, so you can open it in the same editor you use to write C code. Try opening it up.

There’s a lot going on in this output, but let’s zoom in on these 3 lines:

add a0,a0,a1 srli a0,a0,1 ret

This is a sequence of 3 assembly instructions. Each one works like a statement in a “real” programming language, and it describes a single, small action for the program to take. Even though we don’t know what these instructions do, we can puzzle through what this code does:

  • add probably adds two numbers together. Which is good, because that’s what our original C program does first.
  • srli is a little more mysterious. It turns out that this mnemonic stands for shift right logical immediate. The important part is that this is a bitwise right shift. So the compiler has cleverly decided to use something like >> 1 instead of / 2.
  • ret returns from the function.

The takeaway here is that our “second interpretation” of assembly code works for RISC-V too. We can think of it as an extremely primitive programming language and understand the code that way, forgetting about the fact that each instruction corresponds to some control bits that orchestrate the circuitry in a processor.

A Look at the Bits

Now let’s return to the first interpretation of assembly code: it’s a roughly 1-1 reflection of the (binary) machine code for a program that actually executes. Let’s look at those bits.

Object Files and Disassembly

We can translate our .s assembly code into machine by assembling it. Try this command:

$ rv gcc -c mean.s -o mean.o

The -c flag instructs GCC to just compile the code to an object file (with the .o extension), and not to link the result into an executable. (You can also ask GCC to go all the way from C to a .o in one step if you want; just provide the .c file as the input and remember to use -c.)

You could look directly at this object file with xxd mean.o if you want, but that’s not very informative. It’s more useful to disassemble the code in this file so you can see the text form of the instructions. (Disassembling is the opposite of assembling: it’s a translation from machine code back to assembly code.) Our container comes with a tool called objdump that can do this:

$ rv objdump -d mean.o

The important part of the output is:

0000000000000000 <mean>: 0: 00b50533 add a0,a0,a1 4: 00155513 srli a0,a0,0x1 8: 00008067 ret

Here’s how to read this output:

function address <function name>: addr: machine code assembly instruction

On the right, we see the same three instructions in the textual assembly format. On the left the tool is also printing out the hex form of the machine code (and the corresponding address). For example, the first instruction consists of the bytes 00b50533, starting at address 0. In RISC-V, every instruction is exactly 4 bytes long, so the next instruction starts at address 4.

Raw Machine Code

The .o object files that our compiler produces don’t just contain machine code; they also contain other metadata to make linking possible. Sometimes (like on this week’s assignment), it is useful to have a “raw” binary file just containing the instructions. In the CS 3410 container, we have provided a convenient command that makes it easy to produce these raw files, called asbin.

Let’s put just the instructions we want into a new file:

add a0, a0, a1 srli a0, a0, 1 ret

Try this command:

$ rv asbin mean.s

Then take a look at the bytes:

$ xxd mean.bin 00000000: 3305 b500 1355 1500 6780 0000 3....U..g...

You can see the bits for same 4-byte instructions here, with a twist. The bytes are backward, for a reason we’ll explain next (named endianness).

For the curious only: our little asbin script just runs a couple of commands. You can run them yourself too:

$ as something.s -o something.o $ objcopy something.o -O binary something.bin

The objcopy command is a powerful tool for converting between binary file formats, but we just need it to do this one thing. We just thought this was common enough in CS 3410 that it would be handy to have a single command to do it all.

Endianness

The reason the instruction bytes appear backward in the file is because of a concept called endianness or byte order. Different computers have different conventions for how to order the bytes within a multi-byte value. For example, in RISC-V, both int and instructions are 4 bytes—which order should we put those bytes into memory?

The options are:

  • Big endian: The “obvious” order. The most-significant byte goes at the lowest address.
  • Little endian: The other order. The least-significant byte goes at the lowest address.

Fortunately or unfortunately, most modern computers use little endian. That includes all of x86, ARM, and RISC-V (in their most common modes). That’s why the lowest byte in our instructions appears first when we look at the binary file with xxd. File I/O routines will hide this different from you, so if you read an int from a file, it will put the bytes in the right order by the time your program sees the bytes.

Why are these called big and little “endian”? It’s one of the all-time great examples of computer scientists being terrible at naming things: these names come from the 1726 novel Gulliver’s Travels by Jonathan Swift, from a part about a war between people who believe you should crack an egg on the big end or the little end.

RISC-V Assembly Basics

Let’s cover a few fundamental concepts that RISC-V will use for every instruction. We will break down this instruction from our example:

add a0, a0, a1

Registers

There are 32 registers. RISC-V names them x0 through x31. We’re using the 64-bit version of the RISC-V ISA, so each register holds a 64-bit value.

Alternative Names for Registers

While all the registers just hold bits, there are conventions about how each one is usually used. To help remind you of these purposes, RISC-V also gives the instructions alternative symbolic names. Wikipedia has a detailed table with all of these names that I won’t reproduce here. Here are some register names that will be relevant immediately:

  • x0 is also known as zero. It is unique among all RISC-V registers because it cannot be written: it always holds the all-0s value. If you try to update this register, the write is ignored. Having quick access to “64 zeroes” turns out to be useful for many programs.
  • x10 through x17 are also known as a0 through a7.
  • x5, x6, x7, and x28 through x31 are also known as t0 through t6.
  • x8, x9, and x18 through x27 are also known as s0 through s11.

The latter 3 sets of registers (aN, tN, and sN) have subtly different conventions that have to do with function calls, which we’ll cover later. For now, however, you can think of them as interchangeable places to put values when we’re operating on them. You absolutely do not need to memorize the alternative names for every register—you just need to know that there are multiple names. This way, you know that our instruction above is exactly equivalent to:

add x10, x10, x11

…because it just uses different names for the same registers. These alternate names are just an assembly language phenomenon (i.e., for human readability), and the machine code for these two versions looks exactly the same.

Three-Operand Form

Most RISC-V instructions take three operands, so they look like this:

<name> <operand>, <operand>, <operand>

The name tells us what operation the instruction should do, and the three operands tell us what values it will operate on. So our example is an add instruction, with three register operands: a0, a0, and a1.

In these three-operand instructions, the first one is the destination register and the second two are the source registers. You’ll sometimes see the format off the add instruction written like this:

add rd, rs1, rs2

The mnemonic is that r* are register operands, d means destination, and s means source. So our instruction add a0, a0, a1 adds the values in a0 and a1 and puts the result in a0. It is allowed, and extremely common, for the same register to be used both as a source and a destination.

Using the Manual

Working with assembly code entails reading the manual. A lot. In other languages, you can quickly build up an intuition for what all the basic components mean. In assembly languages, there are usually so many instructions that you need to look them up continuously. Expect to work with assembly with your code in one hand and the ISA manual in the other.

Navigate to this site’s RISC-V Assembly resource page. I recommend using the RISC-V reference card linked there all the time. In rare circumstances where you need more details, you can use the (very long) specification document. I’ll refer to the reference card here.

The first page of the reference card tells us what each instruction means. To understand our add instruction, we can find it on the list to see the format, a short English description, and a somewhat cryptic pseudocode description of the semantics.

The second page tells us how to encode the instruction as actual machine-code bits. We’ll cover the encoding strategy next.

Instruction Encodings

Every assembly instruction corresponds to a 32-bit value. This correspondence is called the instruction encoding.

For example, we know that the add instruction we’re working with, when assembled, encodes to the value 0x00b50533. Why those particular bits?

In RISC-V, instruction encodings use one of a few different formats, which it calls “types.” You can see a list of all the formats on the second page of the reference card: R-, I-, S-, B-, U-, and J-type (another list that you should not attempt to memorize). Each format comes with a little diagram mapping out the purpose of each bit in the 32-bit range.

Add Instruction

add is an R-type instruction (so named because all the operands are registers). Reading from the least-significant to most-significant bits, the map of the bits in an R-type instruction consists of:

  • 7 bits for the opcode. The opcode determines which instruction this is. The reference card tells us that the opcode for add is 0110011, in binary.
  • 5 bits for rd, the destination register. It makes sense that the register is 5 bits because there are a total of 25=32 possible registers. So to use destination register x10, we’d put the binary value 01010 into this field.
  • 3 function bits. (We’ll come back to this in a moment.)
  • The first source register operand, rs1. Also 5 bits.
  • The second source register, rs2. 5 bits again.
  • 7 more function bits.

In RISC-V, the function bit fields—labeled funct3 and funct7—specify more about how the instruction should work. They’re kind of a supplement to the opcode. For example, the table tells us that add and sub (and many others) actually share an opcode, and the bits in funct3 and funct7 tell us which operation to perform. To encode an add, set all the bits are zero.

So now we can describe exactly how to encode our example instruction, add x10, x10, x11. Again starting with the least-significant bits:

  • The opcode (7 bits): 0110011.
  • rd (5 bits): decimal 10, binary 01010.
  • funct3 (3 bits): 000.
  • rs1 (5 bits): decimal 10, binary 01010 (again).
  • rs2 (5 bits): decimal 11, binary 01011.
  • funct7 (7 bits): 0000000.

Try stringing these bits together and converting to hex. You should get the hex value the assembler produced for us, 0x00b50533. Some handy tools for doing these conversions include:

  • Bitwise, an interactive tool that runs in your terminal for experimenting with data encodings.
  • The macOS Calculator app. Press ⌘3 to switch to “programmer mode.”

Add-Immediate Instruction

To try another format, consider this instruction:

addi a0, a1, 42

This add-immediate instruction is different from add because one of the operands isn’t a register, it’s an immediate integer. The reference card tells us that this instruction uses a different format: I-type (the I is for immediate). The distinguishing feature in this format is that the most-significant 11 bits are used for this immediate value. (This field replaces the funct7 and rs2 fields from the R-type format.)

If we assemble this instruction, we get the 32-bit value 0x02a58513. The interesting part is the top 12 bits, which are 00000010 1010 or, in decimal, 42.

Let’s Write an Assembly Program

Let’s try out our new reading-the-manual skills to write an assembly program from scratch. Our program will compute (3413)×2. We’ll implement the multiplication with a left shift, so our program will work like the C expression (34 - 13) << 1.

When writing assembly, it can help to start by writing out some pseudocode where each statement is roughly the complexity of an instruction and all the variables are named like registers. Here’s a Python-like reformatting of that expression:

a0 = 34 a1 = a0 - 13 a2 = a1 << 1

I’ve used three different registers just for illustrative purposes; we could definitely have just reused a0.

Let’s translate this program to assembly one line at a time:

  1. We need to put the constant value 34 into register a0. Remember the add-immediate instruction? And remember the special x0 register that is always zero? We can combine these to do something like a0 = 0 + 34, which works just as well. The instruction is addi a0, x0, 34.
  2. Now we need to subtract 13. Let’s look at the reference card. There is no subtract-immediate instruction… but we can add a negative number. Let’s try the instruction addi a1, a0, -13.
  3. Finally, let’s look for a left-shift instruction in the reference card. We can find slli, for shift left logical immediate. The final instruction we need is slli a2, a1, 1.

Here’s our complete program:

addi a0, x0, 34 addi a1, a0, -13 slli a2, a1, 1

To try this out, we could compile it to machine code, but this would be a little hard to work with because we’d need to craft the assembly code to print stuff out. (We’ll cover more about how to do this over the coming weeks.) Instead, a handy resource that you can find linked from our RISC-V assembly resources page is this online RISC-V simulator. Try pasting this program into the web interface and clicking the “Run” or “Step” buttons to see if we got it right: i.e., that the program puts the result (3413)×2 into register a2.

Logical Operations in RISC-V

RISC-V has a full complement of instructions to do bitwise logical operations. Remember using &, |, <<, and >> for masking and combining in bit packing code? These instructions implement those C-level constructs.

Basic Logic

To start with:

  • Bitwise and: and, andi
  • Bitwise or: or, ori
  • Bitwise exclusive or (xor): xor, xori

These are all three-operand instructions. All of these instructions operate on all 64 bits in the registers at once. They also all have a register version and an immediate version; the latter one has the i suffix. The forms of the instructions are like:

xor rd, rs1, rs2 xori rd, rs1, imm

So the first version takes two register inputs, while the second takes a register and an immediate.

What About Not?

There is no (real) bitwise “not” instruction. The reason is that ~x is equivalent to x ^ -1, i.e., XORing the value with the all-ones value. If you spend some quality time with the XOR truth table, you’ll notice that you can think of it this way:

  • The first input to the XOR is a bunch of bits. You want to flip some of these bits.
  • The second input contains 1s in all the places where you want to flip the bit in the first input. Where this input is zero, leave the other bits alone.

So XORing with an all-ones value means “flip all the bits.” Instead of a proper “not” instruction, you can use xori:

xori rd, rs1, -1

In fact, RISC-V has made your life somewhat easier: it lets you write a pseudo-instruction to mean this. So in assembly code, you can actually pretend there is a not instruction:

not rd, rs1

But there is no separate opcode for not; it is not a real instruction. The assembler will translate the line of assembly above into an xori instruction for you. Keeping the number of “real” instructions small—by eliminating needless instructions that can be easily implemented with other instructions—keeps processors small, simple, and efficient. This is the reduced instruction set computer (RISC) philosophy.

Aside: Extension and Truncation

We will frequently need to change the size (the number of bits) of various values. For example, we’ll need to take an 8-bit value and treat it as a 64-bit value, and we’ll need to take a 64-bit value and treat it as a 32-bit value. When you increase the number of bits, that’s called extension, and when you decrease the size, that’s called truncation. The goal in both situations is to avoid losing information whenever possible: that is, to keep the same represented integer value when converting between sizes.

Truncation

Truncation from m bits to n bits works by extracting the lowest (least significant) n bits from the value. There is, sadly, no way to avoid losing information in some cases. Here are some examples:

  • Let’s truncate the 64-bit value 0x00000000000000ab to 32 bits. In decimal, this number has the value 171. Truncating to 32 bits yields 0x000000ab. That’s also 171. Awesome!
  • Let’s truncate 0xffffffffffffffab to 32 bits. That’s the value -85 in two’s complement. Truncating yields 0xffffffab. That’s still -85. Excellent!
  • Now let’s truncate the bits 0x80000000000000ab (note the 8 in the most-significant hex digit). That’s a really big negative value, because the leading bit is 1. Truncating yields 0x000000ab, which represents 171. That’s bad—we now have a different value. But losing some information is inevitable when you lose some bits.

Extension

There are two modes for extending from m bits to n bits. Both work by putting the value in the m least-significant bits of the n-bit output. The difference is in what we do with the extra nm bits, which are the most-significant (upper) bits in the output.

  • Zero extension fills the upper bits with zeroes.
  • Sign extension fills them with copies of the most-significant bit in the input. (That is, the sign bit.)

Let’s see some examples.

  • Let’s zero-extend 0xffffffab (remember, that’s -85) to 64 bits. The result is 0x00000000ffffffab a pretty big positive number (4294967211 in decimal). So we didn’t preserve the value.
  • Now let’s sign-extend the same value. Because the most significant bit in the 32-bit input is 1, we fill in the upper 32 bits with 1s. The output is 0xffffffffffffffab in hex, or -85 in decimal. So we preserved the value!

The moral of the story is: when extending unsigned numbers, use zero extension; when extending signed numbers, use sign extension.

Shifts

RISC-V has bit-shifting instructions to implement C’s << and >>. Here are the ones for shifting left:

  • slli rd, rs1, imm: Shift left by an immediate amount.
  • sll rd, rs1, rs2: Shift left by an amount in a register.

No surprises here. But for rightward shifts, RISC-V has twice as many versions:

  • srl and srli: Shift right logical.
  • sra srai: Shift right arithmetic.

What is the difference between an arithmetic and a logical shift? The difference is similar to the deal with sign extension and zero extension. the difference is in what you do with the most-significant n bits that weren’t there before. That is, if you shift right by n bits, you just drop the original value’s least significant n bits, but what should you put for the output value’s most significant n bits? The two versions differ in their answer:

  • Logical shift right: Fill in those n most-significant bits with 0s.
  • Arithmetic shift right: Fill them in with copies of the sign bit.

Say, for example, that you have a register containing the negative number -3410, in two’s complement.

  • If you use srai to do an arithmetic shift right, you fill in the top bit with a copy of the original number’s sign bit, which is a 1. So the result is still negative: -1705.
  • If you instead use srli to do a logical shift right, the most-significant bit of the output will be a 0. So the result will be a very large positive number.

As with sign- and zero-extension, you want to use logical right shifts for unsigned numbers and arithmetic right shifts for signed numbers.

Consider asking yourself: why is there no separate arithmetic left shift?

An Example

Imagine that x10 contains the value 0x34ff. What does x12 contain after you run these instructions?

slli x12, x10, 0x10 srli x12, x12, 0x08 and x12, x12, x10

Try working through the instructions one step at a time. It can save time to write the values in the registers in hex, if you can imagine the corresponding binary in your head.

The result value is 0x3400.