Lecture 5: Intro to synchronization

Finished up scheduling
- details for multilevel queue, realtime scheduling
Intro to sycnchronization
- design exercise: milk problem
- key terms: safety, liveness, fairness, race condition

Scheduling coda

We filled in the last details of the adaptive multilevel queue: - high-priority jobs should have short quanta (they are I/O bound and should not need long CPU allocations). Similarly, low-priority jobs should have long quanta

Running high-priority jobs first can lead to starvation. Instead, we can cycle between the queues, spending a fixed amount of time in each queue before moving on to the next. For example, we may have four queues with quanta of 2, 4, 8 and 16ms, and then spend 16ms in each of these queues before starting again. That is (assuming the queues are all full) we'd run 8 high priority jobs, 4 medium priority jobs, 2 lower priority jobs, and then 1 low priority job, and then repeat, running another 8 high priority jobs and so on.

We also briefly discussed real time scheduling. Real time schedulers allow processes to request scheduling guarantees, such as a CPU burst of 10 ms sometimes within the next 100ms. In order to provide these guarantees, the scheduler must perform admission control: it needs the ability to deny requests for resources, and to kill or deschedule processes that attempt to use more resources than requested.

We will discuss other forms of admission control in more detail when we discuss deadlock avoidance and thrashing later in the course.

The milk problem

We spent the remainder of class working on the following problem. Suppose we wished to write code for two threads to ensure that after completing its code, a common resource will have been acquired once and only once.

For example, the threads may represent roommates who both wish to use some milk from a shared fridge. If the milk is gone, one of them should run to the store and purchase it, but we should avoid having both roommates purchase milk at the same time.

The tools at our disposal so far: threads share memory, so they can load and store values to shared variables. We can think of this as a shared notepad that the roommates can both read and write on.

Evaluation criteria

Whenever solving synchronization problems, we must consider three criteria:

safety: the code does not violate the functional spec. In the milk example, the code would violate safety if one of the threads completes without milk having been bought, or if milk is bought twice. Safety is often summarized by saying "bad things don't happen."
liveness: the code does not prevent threads from making progress. In the milk example, both roommates must eventually be able to complete the milk acquisition and go on to make their omelettes. Liveness is often summarized by "good things do happen."
fairness: the code should not favor one participant over another. A solution to the milk problem would not be fair if, for example, only one roommate ever bought milk.

These criteria are easy to satisfy independently; it is difficult to satisfy all of them together. For example, the following code is safe but not live:

safe but not live
Shared state: (none)
Thread one code: 1: while true: 2: do nothing	Thread two code: 3: while true: 4: do nothing

The following is live but not safe:

live but not safe
Shared state: (none)
Thread one code: 1: do nothing	Thread two code: 2: do nothing

The following code is safe and live but not fair:

live and safe but not fair
Shared state: has_milk = False
Thread one code: 1: while not has_milk: 2: do nothing	Thread two code: 3: buy_milk() 4: has_milk = True

First attempts

Perhaps the most obvious thing to try is the following:

First attempt (not safe)
Shared state: has_milk = False
Thread one code: 1: if not has_milk: 2: buy_milk() 3: has_milk = True	Thread two code: (same) 4: if not has_milk: 5: buy_milk() 6: has_milk = True

seems safe: each thread calls buy_milk if and only if there is not milk.
not safe: consider following sequence of events:

thread 1 executes line 1, discovers that has_milk is false, continues to line 2.
a context switch to thread 2 occurs
thread 2 executes line 4, discovers that has_milk is false, so continues to execute lines 5 and 6.
a context switch occurs, returning to thread one, which is about to execute line 2. Thread one executes line 2 and 3

The milk has been bought twice, violating safety.

One idea that was proposed is a "lock variable" that prevents one thread from going out if the other thread is working at all:

second attempt (still not safe)
Shared state: has_milk = False someone_busy = False
Thread one code: 1: while someone_busy: 2: do nothing 3: someone_busy = True 4: if not has_milk: 5: buy_milk() 6: has_milk = True 7: someone_busy = False	Thread two code: (same) 11: while someone_busy: 12: do nothing 13: someone_busy = True 14: if not has_milk: 15: buy_milk() 16: has_milk = True 17: someone_busy = False

second attempt (still not safe)

Shared state:

has_milk     = False
someone_busy = False

Thread one code:

1: while someone_busy:
2:   do nothing
3: someone_busy = True
4: if not has_milk:
5:   buy_milk()
6:   has_milk = True
7: someone_busy = False

Thread two code: (same)

11: while someone_busy:
12:   do nothing
13: someone_busy = True
14: if not has_milk:
15:   buy_milk()
16:   has_milk = True
17: someone_busy = False

The intent is that only one thread can be executing between lines 3 and 7 at a time, because the other threads will notice that there is already someone in the critical section and spin in the loop on lines 1 and 2.

Unfortunately this code is still not safe, because a context switch can occur after a thread finishes line 1 but before it executes line 3. Specifically:

thread one executes line 1. someone_busy is false, so it proceeds to line 3
a context switch occurs; thread 2 is scheduled
thread two executes line 11. someone_busy is still false, so it proceeds to execute lines 13, 14, and 15.
a context switch occurs. Thread one (which was paused at line 3) executes lines 3 and 4. has_milk is still false, so it also executes lines 5, 6 and 7.
a context switch occurs, returning to thread two (which was paused on line 15). It executes lines 15, 16, and 17.

Again, milk has been bought twice, violating safety.

A third proposal was to use an operating-system level lock to do the synchronization for us, perhaps by descheduling the other process:

a third attempt (defines away the problem)
Shared state: has_milk = False
Thread one code: 1: system_call_to_force_thread_2_to_wait() 2: if not has_milk: 3: buy_milk() 4: has_milk = True 5: system_call_to_wake_up_thread_2()	Thread two code: (symmetric) 11: system_call_to_force_thread_1_to_wait() 12: if not has_milk: 13: buy_milk() 14: has_milk = True 15: system_call_to_wake_up_thread_1()

a third attempt (defines away the problem)

Shared state:

has_milk     = False

Thread one code:

1: system_call_to_force_thread_2_to_wait()
2: if not has_milk:
3:   buy_milk()
4:   has_milk = True
5: system_call_to_wake_up_thread_2()

Thread two code: (symmetric)

11: system_call_to_force_thread_1_to_wait()
12: if not has_milk:
13:   buy_milk()
14:   has_milk = True
15: system_call_to_wake_up_thread_1()

However, since this is 4410, we can't just assume that our operating system magically works. If we think about how we would implement this, the system call handler for the system_call_to_force_thread_to_wait must solve a similar synchronization problem: access to the shared ready and waiting queues and TCBs needs to be carefully coordinated. This can be done on a single processor machine by disabling interrupts or programming the ready and waiting cues very carefully, but to solve the problem on a multiprocessor machine will require us to solve an equivalent problem to the original problem.

Working code

The following solution is safe, live and fair. Note that it can be generalized to multiple threads, but it is not obvious how to do so.

a correct solution
Shared state: has_milk = False working_1 = False working_2 = False turn = 0
Thread one code: 1: working_1 = True 2: turn = 2 3: while working_2 and turn == 2: 4: do nothing 5: if not has_milk: 6: buy milk 7: has_milk = True 8: working_1 = False	Thread two code: (symmetric) 11: working_2 = True 12: turn = 1 13: while working_1 and turn == 1: 14: do nothing 15: if not has_milk: 16: buy milk 17: has_milk = True 18: working_2 = False

a correct solution

Shared state:

has_milk  = False
working_1 = False
working_2 = False
turn      = 0

Thread one code:

 1: working_1 = True
 2: turn = 2
 3: while working_2 and turn == 2:
 4:   do nothing
 5: if not has_milk:
 6:   buy milk
 7:   has_milk = True
 8: working_1 = False

Thread two code: (symmetric)

11: working_2 = True
12: turn = 1
13: while working_1 and turn == 1:
14:   do nothing
15: if not has_milk:
16:   buy milk
17:   has_milk = True
18: working_2 = False

The idea behind this code is that neither can take control from the other, they can only yield control to the other.

This code is safe, live, and fair, although the argument is rather complicated:

safety: clearly, by the time either thread finishes, milk will have been bought at least once. However, we must show that it is bought at most once.
Suppose otherwise, that is, that both lines 6 and 16 are executed. This implies that thread one must have been on lines 5-7 at the same time that thread two was on lines 5-7. One of the two threads must have exited the while loop first. Without loss of generality, assume it was thread one. When it exited the loop on line 3, one of two things was true:
- working_2 was false. In this case thread two has not executed line 11 yet. It will be impossible for thread two to proceed past line 14 until thread one reaches line 18, because by the time it reaches line 14, turn will be 1 and working_1 will be true.
- working_2 was true but turn == 1. In this case thread two must have executed line 12 after thread one executed line 2. This means that turn can never become 2. Thus the only way that thread two can escape the loop on line 13 is if working_1 becomes false, which only happens after thread one completes line 8.
liveness: the only place that the threads can get stuck is in the spin loops on lines 3 and 13. However, both threads cannot be stuck simultaneously, because turn cannot be both 1 and 2. Once one of the threads proceeds past the spin lock, it will eventually set its working variable to false, which will allow the other thread to exit from the spin loop
fairness: the code is completely symmetric, and thus fair.