Lecture 11: Deadlock mitigation

Deadlock prevention

We can design a system to avoid deadlock by making any of the 4 conditions impossible.

Breaking mutual exclusion

In some cases, deadlock can be mitigated by making resources more shareable. For example, using a reader/writer lock instead of a mutex can make deadlock less likely (since many readers can share the read lock). Using a lock-free data structure is another way to allow multiple threads to access a data structure simultaneously (without blocking).

However, many resources are inherently non-shareable (e.g. printers: can't print two documents simultaneously!). Mutual exclusion is a good condition to break if you can, but often you can't.

Breaking no-preemption

In some situations we can make resources preemptable. If a process tries to acquire a resource that is held by another process, we can make it possible for the new process to steal the resource.

In order to do this, we need some mechanism for rollback: we need to be able to restore whatever program invariants that the resource was held in order to satisfy.

For example, if the resource is a lock protecting a shared variable, we could roll back the thread that holds the lock by restoring the state of the shared variable to the state it held before the lock was acquired, and restarting the process that was performing the update.

Once we allow computations to be rolled back, we introduce the possibility that two threads can continue to preempt each other forever. Although the system is not deadlocked (both threads seem to be making forward progress), the system may never actually finish its tasks. This state is called livelock: when competing threads are continuously being rolled back before they can finish.

It is not possible to make all resource preemptible. I/O is a well-known impediment to rollback: once some output has been performed, it may be impossible to return to a consistent state. Once you tell the user you've started processing their order, you can't take it back.

Breaking hold-and-wait

Can break hold-and-wait by having threads release all locks and re-acquire them all at once.

Releasing locks may require rollback, which leads to the same issues described above.

Monitors partially use this strategy to avoid hold-and-wait: calling wait on a condition variable automatically releases the lock, so that acquiring the monitor lock cannot cause deadlock. However, it is still possible to create a form of deadlock with a monitor where one thread needs to wait for some predicate before updating state in a way that satisfies another predicate, while a second thread waits for the second predicate before making the first predicate true.

Breaking circular wait: lock ordering

An alternative approach is

Deadlock detection

Another general strategy for dealing with deadlock is to simply detect it and respond if it occurs. One can respond by either killing a thread (and releasing all of its resources), or by forcing it to roll back.

A simple practical solution to detecting deadlock is to simply put a time limit on the acquisition of resources. You may end up killing too many threads, but if you are writing code that is expected to be run with deadlock detection, it needs to be able to handle thread death anyway, so killing a few extra threads isn't so bad (this is an example of an end-to-end argument, which we'll discuss in more detail when studying networking).

A more precise method is to keep track of the resource allocation graph and check it for deadlock. This check can be done periodically or when a new resource is requested.

In order to detect deadlock, we can use the following algorithm:

  1. Make a copy of the resource allocation graph
  2. While there is a process or resource with no outgoing edge:
    • erase it (this is like running it to completion)
    • and all incoming edges to it (since it relinquishes all its resources when it finishes)
  3. If you can remove all processes, there is no deadlock: you can run all processes to completion
  4. If you can't, then there must be a cycle, so the system is deadlocked.

Deadlock avoidance: Banker's algorithm

The banker's algorithm is a slight variation on deadlock detection: instead of detecting whether there is currently a deadlock, we keep track of the maximum potential requests that each process might make, and block before granting a request that could lead to deadlock in the future if some processes request their maximum allocation.

The idea behind the banker's algorithm is that we keep track of every process's maximum and current allocations, as well as the current number of unallocated resources. We maintain the invariant that the state is safe: there is some sequence of processes P1, P2, P3, ... such that we can - run P1 to completion using all of the currently available resources - then run P2 to completion using both the currently available resources and P1's resources (which we can, since P1 is finished) - then run P3 using resources allocated to no-one, P1, or P2, - then run P4 using resources allocated to no-one, P1, P2, or P3, - etc.

Whenever a process requests a resource, we check whether granting that request would leave the system in a safe state or not. If it would, we grant the request. If not, we block the request until more resources become available.

Checking for safety is straightforward, because running a process to completion only frees up more resources for future processes. Thus, we can choose any completable process to run; we will not prevent ourselves from finding a safe schedule.

to check for safety:
  make a copy of the current allocation table
  while processes exist:
    choose any process that can run to completion with available resources
      if there are none: state is not safe
    add that process's resources to the available resources
    remove that process from the list

  if you complete the loop, the state is safe.