# CS 316: Pipelined Architectures

Kavita Bala Fall 2007 Computer Science Cornell University

### Announcements

- PA 3
  - Lectures on it this Tue/Thu/Fri
  - Due on the Friday after Fall break
- Don't wait till the last minute
  - We are happy to help
  - Hazards will take time









# <section-header><section-header><section-header><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item>







# Stage 2: Decode

- Reads the IF/ID pipeline register, decodes instruction, and reads register file (specified by regA and regB of instruction bits)
  - Decode can be easy, just pass on the opcode and let later stages figure out their own control signals for the instruction
- Write everything needed to complete execution to the pipeline register (ID/EX)
  - Pass on the offset field and destination register specifiers (or simply pass on the whole instruction!)
  - Pass on PC+1 even though decode didn't use it



# Stage 3: Execute

- Design a datapath that performs the proper ALU operation for the instruction specified and values present in the ID/EX pipeline register
  - The inputs are the contents of regA and either the contents of regB or the offset field in the instruction
  - Also, calculate PC+1+offset, in case this is a branch
- Write everything needed to complete execution to the pipeline register (EX/Mem)
  - ALU result, contents of regB and PC+1+offset
  - Instruction bits for opcode and destReg specifiers



### Stage 4: Memory Operation

- Design a datapath that performs the proper memory operation for the instruction specified and values present in the EX/Mem pipeline register
  - ALU result contains address for Id and st instructions
  - Opcode bits control memory R/W and enable signals
- Write everything needed to complete execution to the pipeline register (Mem/WB)
  - ALU result and MemData
  - Instruction bits for opcode and destReg specifiers



# Stage 5: Write Back

- Design a datapath that conpletes the execution of this instruction, writing to the register file if required
  - Write MemData to destReg for Id instruction
  - Write ALU result to destReg for arithmetic/logic instructions

hanks to Sally McK

 Opcode bits also control register write enable signal





























| Time Graphs                                       |       |        |         |         |           |           |           |           |           |
|---------------------------------------------------|-------|--------|---------|---------|-----------|-----------|-----------|-----------|-----------|
|                                                   |       |        |         |         |           |           |           |           |           |
| Time: 1                                           |       | 2      | 3       | 4       | 5         | 6         | 7         | 8         | 9         |
| add                                               | fetch | decode | execute | memory  | writeback |           |           |           |           |
| nand                                              |       | fetch  | decode  | execute | memory    | writeback |           |           |           |
| lw                                                |       |        | fetch   | decode  | execute   | memory    | writeback |           |           |
| add                                               |       |        |         | fetch   | decode    | execute   | memory    | writeback |           |
| SW                                                |       |        |         |         | fetch     | decode    | execute   | memory    | writeback |
| Kavita Bala, Computer Science, Cornell University |       |        |         |         |           |           |           |           |           |

# **Pipelining Recap**

- Powerful technique for masking latencies
  - Logically, instructions execute one at a time
  - Physically, instructions execute in parallel
    - Instruction level parallelism
- Decouples the processor model from the implementation
  - Interface vs. implementation
- BUT dependencies between instructions complicate the implementation

## What can go wrong?

Structural hazards

Two instructions in the pipeline try to simultaneously access the same resource

- Data hazards
  - A required operand is not ready
  - Usually because a previous instruction in the pipeline has not committed it to the register file yet
- Control hazards
  - The next instruction to fetch cannot be determined
  - Usually because a jump or branch instruction has not determined the next PC yet



# Handling Data Hazards

- Avoidance
  - Make sure there are no hazards in the code
  - Some compilers have done this (Multiflow Trace)
- Detect and Stall
  - If hazards exist, stall the processor until they go away
  - Safe, but not great for performance
- Detect and Forward
  - If hazards exist, fix up the pipeline to get the correct value (if possible)
  - Most common solution for high performance Kavita Bala, Computer Science, Cornell University

### Handling Data Hazards I

- Just Avoid the Problems
- Compiler problem
- Beyond scope of this class
- BUT:
  - Know it's an option
  - Know it's doable

## Handling Data Hazards II

- Detect and Stall
- Detection:
  - Compare regA with previous DestRegs
  - Compare regB with previous DestRegs
- Stall:
  - Insert a bubble in pipeline
  - Keep current instructions in fetch and decode
  - Pass a nop to execute

Kavita Bala, Computer Science, Cornell University

# Handling Data Hazards III:

- Detect: same as detect and stall
- Forward:
  - New bypass datapaths route computed data to where it is needed
  - New MUX and control to pick the right data
- Beware: Stalling may still be required even in the presence of forwarding