Multicore
One of the two motivations we used when introducing threads was the idea of harnessing parallel hardware to make computations go faster. Parallelism is important because the overwhelming majority of computers in the modern world are parallel. When was the last time (if ever) that you saw a laptop for sale with a single-core CPU? Core counts like 8 are much more common today. Even the Apple Watch has a dual-core processor and Samsung Watch has five cores! And on the other end of the spectrum, server processors have core counts like 96 and 192. The result is that, when performance matters, parallelism is the only way to take full advantage of the hardware.
Multicore processors are designed to enhance computing performance by incorporating multiple cores within a single chip. Each core can execute instructions independently, allowing for parallel processing of tasks. This architecture is crucial for modern computing devices, which require high performance for various applications.
Amdahl’s Law
However, Amdahl’s Law highlights the limitations of performance improvement in parallel computing. It states that the overall performance gain from parallelism is limited by the portion of the task that must be executed serially. This law serves as a caution against expecting unlimited scaling by adding more parallel resources. For example, if we had a matrix sum that had 80% of its computation which could be partitioned and performed in parallel, but there was also 20% that was scalar and needed to be peformed in serielly, then eventually the serial portion would dominate and limit performance. In this particular example, 5x speedup would be the maximum no matter much we divided the parallel portion.
Multicore and Parallelism
The need for multiple cores in devices like smartphones is driven by the demand for higher computing power. Moore’s Law, which predicts the exponential growth of transistors on a chip, has been a guiding principle in the development of multicore processors. Further, increasing clock frequencies was once the primary strategy for improving performance. However, increasing clock frequencies has hit its limits due to heat and power constraints. See the breakdown of Dennard Scaling.
Other methods to increase performance included instruction level parallelism such as pipelining, multi-issue (also known as superscalar) processors, out-of-order execution, speculative execution, register renaming, and many other techniques. Take CS 4420 (ECE 4750) to learn more. Utlimately, these techniques used too much power and dissipated too much heat. Instead, modern RISC based processors with simple pipleines and without a lot of these advanced ILP approaches have become dominant again for multicore processors because they better balance performance and power.
Threads and Synchronization
Parallel programming involves partitioning work so that all cores have tasks to execute. Coordination and synchronization are crucial to manage communication overhead and ensure efficient execution. Writing parallel programs requires careful consideration of the underlying architecture to optimize performance.
Threads are a fundamental mechanism for exploiting parallelism. They allow multiple sequences of instructions to be executed concurrently. Synchronizing parallel programs involves using atomic instructions and hardware support to manage access to shared resources, preventing race conditions and ensuring correct execution.
Writing parallel programs requires understanding threads and processes, critical sections, race conditions, and mutual exlusion (mutexes). These concepts help in managing the execution of multiple threads and ensuring that they do not interfere with each other.
Cache Coherency
One of the challenges in multicore systems is cache coherency. When multiple processors cache shared data, they might see different values for the same memory location, leading to inconsistencies. Ensuring cache coherency is essential for maintaining the integrity of data across all cores.