Basic Definitions
- A spin lock guarantees that no 2 process are executing the same critical section
code simultaneously.
- A barrier requires that all processes reach a particular point before any process
continues past that point.
|
Key Concepts
- The poor performance of spin locks and barriers significantly effect the overall
capability of shared-memory multiprocessors.
- Special-purpose hardware is not necessary. By minimizing the number of remote
references, software can reduce synchronization contention to be effectively zero.
- The software must be designed to complement the specific hardware architecture.
|
Algorithms
Spin Locks
- Each process repeatedly executes a Test_and_Set instruction to query a global flag and,
if possible, set the flag. When a process succeeds in setting the flag, it has the lock.
Optimizations tinker with when the polling occurs, but the approach is fundamentally
unfair and results in heavy traffic/contention.
- Each process waits for its turn to get the lock by holding a ticket or a place in an
array. The MCS lock uses a linked list instead of an array so that a small, constant
amount of space is needed per lock and coherent caching is not required for good
performance. These approaches create less contention then Test_and_Set and they also
guarantee fairness.
Barriers
- When a process arrives at the barrier, it decrements a global counter. The process which
reaches the barrier last (decrements the counter to 0) resets the counter and changes a
global boolean to allow the other processes to continue.
- Processes pair-up, synchronizing with a series of partners.
- Processes pair-up in a tournament system where, after both reach the boundary, 1 of the
processes is passed up to the next round.
- Each process is part of a tree structure. When the process and its children reach the
boundary, the process's parent is informed. When all process have reached the boundary,
the go-ahead is passed from parent to children
|
Experimental
Results
- The algorithms were tested with 2 different multiprocessor architectures. One used
distributed shared memory, while the other had a cache-coherent, shared bus. The author
does not believe an efficient algorithm can be designed for 'dance-hall' architectures.
- The MCS lock (which the authors designed) proved to be the most efficient in competitive
environments. Ticket locks worked best when the hardware fetch operations were slow and
1-processor latency was a concern.
- The best barrier algorithm was the tree-based one the author's designed. If the number
of processes varies from boundary to boundary, however, the centralized approach is best.
|
Questions
- Can these algorithms be used to solve other performance problems?
- Is there a way to design the lock and barrier algorithms so that they are less dependant
on the type of underlying hardware?
- Does the cost of special-purpose hardware justify the use of fairly complex algorithms?
|