Threads
The next several lectures will all be about doing multiple computations at once. As we saw in the previous lecture, real software needs to deal with concurrency (managing different events that might all happen at the same time) and parallelism (harnessing multiple processors to get work done faster than a single processor on its own). Compared to sequential code, concurrency and parallelism require fundamental changes to the way software works and how it interacts with hardware.
Here are some examples of software that needs concurrency or parallelism:
- A web server needs to handle concurrent requests from clients. It cannot control when requests arrive, so they may be concurrent.
- A web browser might want to issue concurrent requests to servers. This time, the software can control when requests happen—but for performance, it is a good idea to let requests overlap. For example, you can start a request to server A, start a request for server B, and only then wait for either request to finish. That’s concurrency.
- A machine learning application wants to harness multiple CPU cores to make its linear-algebra operations go faster: for example, by dividing a matrix across several cores and working on each partition in parallel.
Threads are an OS concept that a single process can use to exploit concurrency and parallelism.
What Is a Thread?
A thread is an execution state within a process. One process has one or more threads. Each thread has its own thread-specific state: the program counter, the contents of all the CPU registers, and the stack. However, all the threads within a process share a virtual address space, and they share a single heap.
One way to define a thread is to think of it as “like a process, but within a process.” That is, you already know that processes have separate code (so they can run separate programs), register states, separate program counters, and separate memory address spaces. Threads are similar, except that threads exist within a process, and all threads within a process share their virtual memory. All threads within a process are running the same program (they have same text segment)—they may just execute different parts of that program concurrently. Threads also share the data segment and file descriptors.
When a process has multiple threads, it has multiple stacks in memory. Recall the typical memory layout for a process. When there are multiple threads, everything remains the same (the heap, text, and data segments are all unchanged) except that there are multiple stacks coexisting side-by-side in the virtual address space.
The threads within a process share a single heap. That means that threads can easily communicate through the heap: one thread can allocate some memory and put some data there and then simply let another thread read that data. This shared memory mechanism is both incredibly convenient and ridiculously error prone. (We will get more experience with the problems it can cause later.)
The thread’s state includes the registers (including the program counter and the stack pointer). The OS scheduler takes care of switching not only between processes but also between the threads in a process. When the computer has multiple CPU cores (as all modern machines do), the OS may also choose to schedule concurrent threads onto separate cores when there are multiple threads with work to do.
Why Threads?
You may be wondering why we might use threads. Further, do threads make sense with just a single core (spoiler: yes!)?
The key benefit of threads over processes is that all threads within a process run the same program and share virtual memory.
This encourages a natural program structure, as opposed to using processes, for example.
It would be rather clunky and tedious to fork()
off separate child processes to update the screen, fetch data, and receive user input.
Processes need to use an inter-process communication mechanism (e.g., signals, pipes, files) to pass data between each other.
These mechanisms also tend to be significantly more expensive performance-wise.
Since they share memory, threads make it easy to write programs which must logically concurrent tasks. Even on a system with a single core, threads can make programs more responsive and efficient. One thread could be processing data in a buffer while another is fetching new data to push to the end of the same buffer. Yet another thread could be responsible for updating the screen.
pthreads
Now that we know what threads are and why they are important, how do we program with them?
Unsurprisingly, Unix provides a standard library, called POSIX Threads, or affectionately, pthreads, that contains procedures for managing threads and synchronizing them.
Next week we will dive deeper into the world of parallel programming, but for now, we will stick with the basics.
You can read the entire pthread.h
header to see what’s available.
Spawning & Joining Threads
The pthread_create
function launches a new thread.
It’s a tiny bit like fork
and exec
for processes, but for threads within the current process instead of creating new subprocesses.
Here’s its signature:
int pthread_create(pthread_t* thread, pthread_attr_t* attr,
void *(*thread_func)(void*), void* arg);
We’ll revisit the other arguments next week, but the important ones for now are:
- The first argument,
thread
, is apthread_t
pointer to initialize. This struct is what the parent will use to interact with its brand-new child thread. - The third argument,
thread_func
, is a function pointer to the code to run in the new thread. The thread function has to have a specific signature:void* thread_func(void* arg)
. Thevoid*
argument and return types are C’s way of letting the thread function receive and return “anything.”
It’s OK (for now) to pass NULL
for the other parameters.
So the basic recipe for spawning a new thread looks like this:
void* thread_func(void* arg) {
// code to run in a new thread!
}
// ...
pthread_t thread;
pthread_create(&thread, NULL, thread_func, NULL);
Whenever you spawn a thread, you will also want to wait for it to finish, a.k.a. join the thread.
There is a pthreads call for that too, in the pthread_join
function:
int pthread_join(pthread_t thread, void** out_value);
We will again ignore the second parameter for a moment (it can be NULL
).
The first parameter is the pthread_t
value that we previously initialized with pthread_create
.
The call to pthread_join
blocks until the given thread finishes.
Putting it all together, here’s a complete program that launches a thread and then properly waits for it to finish:
#include <stdio.h>
#include <pthread.h>
void* my_thread(void* arg) {
printf("Hello from a child thread!\n");
return NULL;
}
int main() {
printf("Hello from the main thread!\n");
pthread_t thread;
pthread_create(&thread, NULL, my_thread, NULL);
pthread_join(thread, NULL);
printf("Main thread is done!\n");
return 0;
}
In order to compile this program, we need to include the -lpthread
option to tell GCC to link the pthreads library:
rv gcc threads.c -o threads -lpthread
When we run the program, three messages are printed in order:
Hello from the main thread!
Hello from a child thread!
Main thread is done!