Introduction and C

Syllabus and Setup

Please carefully read over the syllabus. Seriously! There is a lot in there that you will want to know.

In case you didn’t already hear it via the rumor mill, CS 3410 is changing. We have updated the curriculum this time around to focus on the essential topics we believe are critical to anyone studying computer science. Among many other changes, this means that there is more focus on programming in C and assembly, we regretfully needed to sacrifice all the digital-design assignments that used the Logisim for visual circuit design, and there is much more of an emphasis on parallelism (because, in the modern era, all computers are parallel).

There are two things you need to do this week:

  • An introductory survey on Gradescope. This is due on Friday.
  • Set up the RISC-V infrastructure that you will need for all assignments. Please make sure you get this done before your first lab section. If you need help, please post on Ed or find a TA in office hours.

This week’s assignment is the printf assignment. It serves as an introduction to the C programming language and lets you exercise your skills with numerical representation, binary, and other bases.

As with every assignment in this class, the lab is there to help you get started on the assignment. The lab instructors will help guide you through “step 0” for the printf assignment; then, the rest is up to you.

Recap: Two’s Complement

Let’s use a six-bit two’s complement representation. What numbers (in standard decimal notation) do these bit patterns represent?

  • 011000
  • 111111
  • 111011

The answers are:

  • \(24\). For positive numbers (where the sign bit is 0), you don’t have to think much about two’s complement; just read the remaining bits as a normal binary number.
  • \(-1\). Remember the tip from last time: the all-ones bit pattern is always \(-1\).
  • \(-5\). There are many ways to get here. One option is to notice that this number is exactly \(100_2\) less than the all-ones bit pattern, so it’s \(-1 - 4\).

Hello, C!

Much of the work for CS 3410 will consist of programming in C. If you have mainly programmed in the other Cornell-endorsed languages (Python, Java, and OCaml), the main difference you’ll notice in C is that it operates at a much lower level of abstraction. It gives you a far greater level of control over exactly what the computer does.

While this kind of low-level control is undeniably inconvenient and verbose, it has some extremely important advantages. The most common reasons to use a low-level language like C are:

  • Performance. Higher-level languages trade off convenience for speed. Often, programming in a low-level language is the only way to get the kind of efficiency you need.
  • Interactions with hardware. When you’re writing an operating system, a device driver, or anything else that interacts with hardware directly, you really need a low-level language.

There are other low-level languages that have the same advantages, such as C++ and Rust. However, C is unique because of its central position in the modern computing landscape. We can confidently say that almost everything you’ve ever done with a computer has eventually relied on software written in C. As just a few examples:

  • The Linux kernel is written in C.
  • The primary implementation of Python is written in C.
  • The C standard library is the de facto standard way that software interacts with operating systems. Even Rust programs rely on C’s standard library for things like printing to the console and opening files.
  • In general, whenever two different languages want to talk to each other, they go through C (via a foreign function interface).

Getting Started

Let’s write the smallest possible C program:

int main() {
    return 0;
}

Even this minimal program brings up a few basic things about C:

  • In basic ways, the syntax looks a little like Java. There are curly braces and semicolons. There is even a type called int. (This is because the designers of Java based its syntax on C.)
  • Unlike Java, however, there is no class definition here. You just write a main function at the top level; it’s not a method on some class. In fact, C doesn’t have classes or objects at all.
  • C is a statically typed language (like Java but not like Python). This means that C makes you declare the types of everything you write down. This example shows one type: the return type of the main function is int.
  • That return 0 for main determines the exit status for your program.

Let’s run our program. The commands you see here will assume you have followed our guide to setting up 3410’s RISC-V infrastructure, including setting up the rv alias. The rv alias works as a prefix that gives you access to the tools you need, so you can type any command you like after it. For example, you can type:

$ rv ls

and you’ll see similar results to running plain old ls.

Let’s compile the program, like this:

$ rv gcc minimal.c

where minimal.c is the name of the source file. GCC is the name of the compiler we’ll be using in this course.

That worked, but we actually recommend providing some more command-line options to the compiler whenever you use it. You can copy and paste our recommended options from the C compilation page. Then, add -o minimal to tell GCC where to put the output file (if you don’t, GCC picks the name a.out). So here’s a complete command:

$ rv gcc -Wall -Wextra -Wpedantic -Wshadow -Wformat=2 -std=c17 -o minimal minimal.c

That produces an executable file, minimal. Now let’s run it:

$ rv qemu minimal

That runs the QEMU emulator to execute the compiled minimal program. It won’t print anything at all!

Printing

Here’s a slightly more exciting program:

#include <stdio.h>

int main() {
    printf("Hello, 3410!\n");
    return 0;
}

We’ve added two lines:

The \n in the string is an escape sequence that means a newline character. That’s the same as in Java.

Now let’s declare and print a variable:

#include <stdio.h>

int main() {
    int n = 3410;
    printf("Hello, %d!\n", n);
    return 0;
}

We added a variable declaration of n, with type int. Read more about the basic types in C.

To print out the number, printf exploits format specifiers in the string that you pass to it. Format specifiers look like %d: they always start with %, followed by a few characters that tell printf how to format stuff. The d in this one stands for decimal, because that’s the base it uses.

If you have n format specifiers in your printf string, you should pass n extra arguments after the string to printf. It will print each extra argument using each specified format, in order.

Let’s try some other format specifiers. %b prints ints in binary, and %x prints them in hex:

#include <stdio.h>

int main() {
    int n = 3410;
    printf("Decimal: %d\n", n);
    printf("Binary: %b\n", n);
    printf("Hexadecimal: %x\n", n);
    return 0;
}

Read more about format specifiers for printf.

Playing with Numbers

C makes it easy to put our new knowledge about binary numbers and two’s complement into practice. We’ll use the int8_t type, which is an integer with exactly 8 bits. (In lots of “normal” code, you can just use int to get a default-sized integer—but for these examples, we really want to use just 8 bits.)

#include <stdio.h>
#include <stdint.h>

int main() {
    int8_t n = 7;
    printf("n = %hhd\n", n);
    return 0;
}

The %hhd format specifier is for printing the int8_t type in decimal. We also need to #include the stdint.h library to get the int8_t type.

We can also write our 8-bit number in binary notation:

#include <stdio.h>
#include <stdint.h>

int main() {
    int8_t n = 0b00000111;
    printf("n = %hhd\n", n);
    return 0;
}

This should also print 7. An important thing to reassure yourself is that, in the two programs above, the variable n contains exactly the same value. There is no difference between the same number specified in decimal notation and binary notation; the choice is just a convenience for the programmer, and the compiler will translate either one into exactly the same value for the computer. (And that value will be in binary because, of course, everything is bits.)

We can also use the sign bit. What’s this value if we flip the top bit of 7 from 0 to 1?

#include <stdio.h>
#include <stdint.h>

int main() {
    int8_t n = 0b10000111;
    printf("n = %hhd\n", n);
    return 0;
}

That prints -121. Maybe you can convince yourself this is correct by thinking of the largest negative value in 8 bits.

A Little More C

Let’s try the inversion trick from last time: the identity that, in two’s complement, ~x + 1 is equal to -x.

#include <stdio.h>
#include <stdint.h>

int main() {
    int8_t n = 7;
    printf("n (binary) = %hhd\n", n);
    printf("n (decimal) = %hhb\n", n);

    int8_t flipped = ~n + 1;
    printf("flipped (binary) = %hhd\n", flipped);
    printf("flipped (decimal) = %hhb\n", flipped);

    return 0;
}

That worked for 7. To see a little more of C, let’s try checking that this works for every number we can represent with a char.

#include <stdio.h>
#include <stdint.h>

int8_t flip(int8_t num) {
    return ~num + 1;
}

int main() {
    for (int8_t i = -128; i < 127; ++i) {
        printf("i = %hhd\n", i);
        int8_t negated = -i;
        int8_t flipped = flip(i);
        if (negated != flipped) {
            printf("mismatch!\n");
        }
    }
    return 0;
}

This example shows off C’s for loops and if conditions. If you’re familiar with Java, these should look pretty familiar. Read more about control flow in C.

It also demonstrates function definitions in C.

If you run this program, there are no mismatches! So we can be pretty sure this trick works for all the int8_t values, even if you don’t want to try doing the math.

Overflow

Computer representations of integers (usually) have a fixed width, i.e., the number of bits they use: for example, int8_t always has 8 bits. This has some fun consequences.

In our last example, we had to think through the minimum and maximum values you can store in an int8_t. What happens if you exceed this value?

The C language has pretty annoying rules about this. For signed numbers, it is actually a silent error (a concept known as undefined behavior) to exceed the maximum, e.g., to add 1 to the biggest possible signed number. But it’s legal to do this for unsigned numbers. So we’ll try it out with the type uint8_t, which is the unsigned (only-positive) version of our friend int8_t. Here’s a loop that just adds 1 to an int8_t value many times:

#include <stdio.h>
#include <stdint.h>

int main() {
    uint8_t num = 0;
    for (int i = 0; i < 500; ++i) {
        num += 1;
        printf("num = %hhu\n", num);
    }
    return 0;
}

If you run this program, you’ll see the number counting up from 1. When we reach 255, adding 1 takes us right back down to 0.

It can be helpful to think about the bits. 255 is the all-ones bit pattern: in 8 bits, 1111 1111. (Sometimes it’s helpful to put spaces in your binary numbers to group together 4 bits, just for legibility.) Adding one to this will “carry” all the way across, setting every bit to zero. The last carry bit would go in position 9, but because this is an 8-bit representation, the computer just drops that bit. And so, the result of the addition 1111 1111 + 0000 0001 is 0000 0000.

This behavior is called integer overflow and it is the source of many fun bugs in all kinds of software. Memorably, YouTube originally used a signed 32-bit number (i.e., an int) to represent the number of views for a video. That meant that the largest number of views that any video could have was \(2^{32 - 1} - 1\), or 2,147,483,647 views. The first video to exceed this number of views was PSY’s “Gangnam Style.” YouTube made a cute announcement when they had to change that value to a 64-bit integer. That should be plenty of views for a long time (more than 9 quintillion views).

Prototypes, Headers, Libraries, and Linking

There is a lot more to explore about C programming that you will learn through doing assignments in 3410. But here is one more concept I think will be helpful to see early.

Declarations Must Precede Uses

Here’s a tiny program with one function call:

#include <stdio.h>

void greet(const char* name) {
    printf("Hello, %s!\n", name);
}

int main() {
    greet("Lefty");
}

(As an aside, void is the “return type” you use for functions that don’t return anything, and const char* is the type of a string literal. We’ll learn more about why the * is in there later in the course.)

A fun quirk about C is that it wants declarations to come before uses. That means that it won’t work to call greet before we define it, like in this broken program:

#include <stdio.h>

int main() {
    greet("Lefty");
}

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

Prototypes, a.k.a. Declarations

As you can imagine, this restriction can get frustrating, and unworkable if you need mutual recursion. The way to fix it is to use a prototype, a.k.a. a declaration. A function declaration looks a lot like a function definition but omits the body. So this program works:

#include <stdio.h>

void greet(const char *name);

int main() {
    greet("Lefty");
}

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

We just need to copy and paste the “signature” part of the function definition, put it at the top of the file, and add a semicolon. That makes it a declaration that means that the call to greet is legal.

Header Files

The need for these declarations is so common that programmers typically put them in a whole separate C source code file, called a header file. Header files are C files that, by convention, end with a .h instead of a .c and mostly just contain declarations. So we might put the declaration in greet.h:

void greet(const char *name);

We can use this declaration by #include-ing it:

#include <stdio.h>
#include "greet.h"

int main() {
    greet("Lefty");
}

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

Notice the difference between the #include <stdio.h> line and the #include "greet.h" line. The angle brackets search for built-in library headers; the quotation marks are for header files you write yourself and tell the compiler to look in the same directory as the source file.

In either case, #include works a lot like just “copying and pasting” the entire text of the file into your source program. So #include-ing greet.h looks the same to the compiler as a version that just includes the declaration right there.

Separating Source Files

Headers are also part of the mechanism that lets you break up long .c source files. Let’s say we want to create a separate greet.c library that just contains our greeting function:

#include <stdio.h>
#include "greet.h"

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

Then, our main.c can use the library like this:

#include <stdio.h>
#include "greet.h"

int main() {
    greet("Lefty");
}

By “copying and pasting” the contents of greet.h here, the #include sorta works as a way to “import” the greet function so we can use it in main.

Linking Multiple Files

Now, however, we need a way to combine the two .c files into a single executable. One option is to just give both source files on the command line:

$ rv gcc main.c greet.c -o main

Notice that we don’t list header files when compiling the whole thing: only .c files, not .h files. Header files are just for #include-ing into other files, so the compiler already sees the contents of those files implicitly.

There’s another way too: it can be useful to compile the .c files separately and then link them together. Here’s what that looks like:

$ rv gcc -c main.c -o main.o
$ rv gcc -c greet.c -o greet.o
$ rv gcc main.o greet.o -o main

The first two lines, with -c, compile the source files to object files that end in .o. Then, the last command links the two object files together into an executable.

Separating it out this way can save you time. If you only change greet.c, for example, then you only need to re-compile that file and then re-link; you can skip re-compiling the unchanged main.c.