# Pipeline Hazards

Hakim Weatherspoon CS 3410, Spring 2013

Computer Science
Cornell University

## Goals for Today

#### **Data Hazards**

- Revisit Pipelined Processors
- Data dependencies
- Problem, detection, and solutions
  - (delaying, stalling, forwarding, bypass, etc)
- Hazard detection unit
- Forwarding unit

#### Next time

Control Hazards

What is the next instruction to execute if a branch is taken? Not taken?

## MIPS Design Principles

#### Simplicity favors regularity

• 32 bit instructions

#### Smaller is faster

Small register file

#### Make the common case fast

Include support for constants

## Good design demands good compromises

Support for different type of interpretations/classes

## Recall: MIPS instruction formats

All MIPS instructions are 32 bits long, has 3 formats



## Recall: MIPS Instruction Types

#### Arithmetic/Logical

- R-type: result and two source registers, shift amount
- I-type: 16-bit immediate with sign/zero extension

#### **Memory Access**

- load/store between registers and memory
- word, half-word and byte operations

#### Control flow

- conditional branches: pc-relative addresses
- jumps: fixed offsets, register absolute

## Recall: MIPS Instruction Types

#### Arithmetic/Logical

- ADD, ADDU, SUB, SUBU, AND, OR, XOR, NOR, SLT, SLTU
- ADDI, ADDIU, ANDI, ORI, XORI, LUI, SLL, SRL, SLLV, SRLV, SRAV, SLTI, SLTIU
- MULT, DIV, MFLO, MTLO, MFHI, MTHI

#### **Memory Access**

- LW, LH, LB, LHU, LBU, LWL, LWR
- SW, SH, SB, SWL, SWR

#### Control flow

- BEQ, BNE, BLEZ, BLTZ, BGEZ, BGTZ
- J, JR, JAL, JALR, BEQL, BNEL, BLEZL, BGTZL

## Special

• LL, SC, SYSCALL, BREAK, SYNC, COPROC

## Pipelined Processor



Pipelined Processor



## Example: : Sample Code (Simple)

```
add r3, r1, r2;
nand r6, r4, r5;
lw r4, 20(r2);
add r5, r2, r5;
sw r7, 12(r3);
```

## Example: Sample Code (Simple)

Assume eight-register machine

Run the following code on a pipelined datapath

```
add r3 r1 r2 ; reg 3 = reg 1 + reg 2

nand r6 r4 r5 ; reg 6 = \sim(reg 4 & reg 5)

lw r4 20 (r2) ; reg 4 = Mem[reg2+20]

add r5 r2 r5 ; reg 5 = reg 2 + reg 5

sw r7 12(r3) ; Mem[reg3+12] = reg 7
```

## Clock cycle Time Graphs

Concurrency:

























## **Takeaway**

# Pipelining is a powerful technique to mask latencies and increase throughput

- Logically, instructions execute one at a time
- Physically, instructions execute in parallel
  - Instruction level parallelism

## Abstraction promotes decoupling

Interface (ISA) vs. implementation (Pipeline)

#### **Next Goal**

What about data dependencies (also known as a data hazard in a pipelined processor)?

i.e. add(r3) r1, r2 sub r5,(r3) r4

## Data Hazards

#### Data Hazards

- register file reads occur in stage 2 (ID)
- register file writes occur in stage 5 (WB)
- next instructions may read values about to be written

## **Data Hazards**



#### Data Hazards

#### **Data Hazards**

- register file reads occur in stage 2 (ID)
- register file writes occur in stage 5 (WB)
- next instructions may read values about to be written

#### How to detect?

## **Detecting Data Hazards**



## **Takeaway**

Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards.

## **Next Goal**

What to do if data hazard detected?

## How to stall an instruction in ID stage

- prevent IF/ID pipeline register update
  - stalls the ID stage instruction
- convert ID stage instr into nop for later stages
  - innocuous "bubble" passes through pipeline
- prevent PC update
  - stalls the next (IF stage) instruction

## **Detecting Data Hazards**



| time           | Clock cycle |   |   |   |   |   |   |   |  |
|----------------|-------------|---|---|---|---|---|---|---|--|
|                | 1           | 2 | 3 | 4 | 5 | 6 | 7 | 8 |  |
| add r3, r1, r2 |             |   |   |   |   |   |   |   |  |
| sub r5, r3, r5 |             |   |   |   |   |   |   |   |  |
| or r6, r3, r4  |             |   |   |   |   |   |   |   |  |
| add r6, r3, r8 |             |   |   |   |   |   |   |   |  |
|                | ,           |   |   |   |   |   |   |   |  |

| time           | Clock cycle |   |   |   |   |   |   |   |         |
|----------------|-------------|---|---|---|---|---|---|---|---------|
|                | 1           | 2 | 3 | 4 | 5 | 6 | 7 | 8 | <b></b> |
| r3 = 10        |             |   |   |   |   |   |   |   |         |
| add r3, r1, r2 |             |   |   |   |   |   |   |   |         |
| r3 = 20        |             |   |   |   |   |   |   |   |         |
| sub r5, r3, r5 |             |   |   |   |   |   |   |   |         |
| or r6, r3, r4  |             |   |   |   |   |   |   |   |         |
| add r6, r3, r8 |             |   |   |   |   |   |   |   |         |
| •              | ,           |   |   |   |   |   |   |   |         |





# Stalling



# Stalling

## How to stall an instruction in ID stage

- prevent IF/ID pipeline register update
  - stalls the ID stage instruction
- convert ID stage instr into nop for later stages
  - innocuous "bubble" passes through pipeline
- prevent PC update
  - stalls the next (IF stage) instruction

## Takeaway

Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards.

Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards. Stalling introduces NOPs ("bubbles") into a pipeline. Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file. Bubbles in pipeline significantly decrease performance.

# Next Goal: Resolving Data Hazards via Forwarding

What to do if data hazard detected?

- A) Wait/Stall
- B) Reorder in Software (SW)
- C Forward/Bypass

## Forwarding

Forwarding bypasses some pipelined stages forwarding a result to a dependent instruction operand (register).

- Forwarding from Ex/Mem registers to Ex stage ( $M \rightarrow Ex$ )
- Forwarding from Mem/WB register to Ex stage (W→Ex)
- RegisterFile Bypass



- Forwarding from Ex/Mem registers to Ex stage ( $M\rightarrow$ Ex)
- Forwarding from Mem/WB register to Ex stage (W  $\rightarrow$  Ex)
- RegisterFile Bypass



- Forwarding from Ex/Mem registers to Ex stage ( $M\rightarrow$ Ex)
- Forwarding from Mem/WB register to Ex stage (W  $\rightarrow$  Ex)
- RegisterFile Bypass

## Ex/MEM to EX Bypass

- EX needs ALU result that is still in MEM stage
- Resolve:

```
Add a bypass from EX/MEM.D to start of EX
```

How to detect? Logic in Ex Stage:

```
forward = (Ex/M.WE && EX/M.Rd != 0 &&
ID/Ex.Ra == Ex/M.Rd)
|| (same for Rb)
```



## Mem/WB to EX Bypass

- EX needs value being written by WB
- Resolve:

```
Add bypass from WB final value to start of EX
```

How to detect? Logic in Ex Stage:

Check pg. 369



## Register File Bypass

## Register File Bypass

Reading a value that is currently being written

#### **Detect:**

```
((Ra == MEM/WB.Rd) or (Rb == MEM/WB.Rd)) and (WB is writing a register)
```

### Resolve:

Add a bypass around register file (WB to ID)

Better: (Hack) just negate register file clock

- writes happen at end of first half of each clock cycle
- reads happen during second half of each clock cycle

## Register File Bypass



# Forwarding Example

| time            | Clock cycle |   |   |   |   |   |   |   |         |
|-----------------|-------------|---|---|---|---|---|---|---|---------|
|                 | 1           | 2 | 3 | 4 | 5 | 6 | 7 | 8 | <b></b> |
| r3 = 10         |             |   |   |   |   |   |   |   |         |
| add r3, r1, r2  |             |   |   |   |   |   |   |   |         |
| r3 = 20         |             |   |   |   |   |   |   |   |         |
| sub r5, r3, r5  |             |   |   |   |   |   |   |   |         |
|                 |             |   |   |   |   |   |   |   |         |
|                 |             |   |   |   |   |   |   |   |         |
| or r6, r3, r4   |             |   |   |   |   |   |   |   |         |
|                 |             |   |   |   |   |   |   |   |         |
| משל ארב איט איס |             |   |   |   |   |   |   |   |         |
| add r6, r3, r8  |             |   |   |   |   |   |   |   |         |
|                 |             |   |   |   |   |   |   |   |         |
|                 | ,           |   |   |   |   |   |   |   |         |

# Forwarding Example 2

| time           | Clock cycle |    |    |    |   |   |   |   |  |
|----------------|-------------|----|----|----|---|---|---|---|--|
| •              | 1           | 2  | 3  | 4  | 5 | 6 | 7 | 8 |  |
| add r3, r1, r2 | IF          | ID | Ex | M  | W |   |   |   |  |
| sub r5, r3, r4 |             | IF | ID | Ex | M | W |   |   |  |
| [lw r6, 4(r3)  |             |    |    |    |   |   |   |   |  |
| or r5, r3, r5  |             |    |    |    |   |   |   |   |  |
| (sw r6, 12(r3) |             |    |    |    |   |   |   |   |  |
|                | ,           |    |    |    |   |   |   |   |  |

# Tricky Example





- Forwarding from Ex/Mem registers to Ex stage ( $M\rightarrow$ Ex)
- Forwarding from Mem/WB register to Ex stage (W  $\rightarrow$  Ex)
- Register File Bypass

## Takeaway

Data hazards occur when a operand (register) depends on the result of a previous instruction that may not be computed yet. A pipelined processor needs to detect data hazards.

Stalling, preventing a dependent instruction from advancing, is one way to resolve data hazards. Stalling introduces NOPs ("bubbles") into a pipeline. Introduce NOPs by (1) preventing the PC from updating, (2) preventing writes to IF/ID registers from changing, and (3) preventing writes to memory and register file. Bubbles (nops) in pipeline significantly decrease performance.

Forwarding bypasses some pipelined stages forwarding a result to a dependent instruction operand (register). Better performance than stalling.

## Administrivia

## Prelim1: next Tuesday, February 26th in evening

- Time: We will start at 7:30pm sharp, so come early
- Prelim Review: This Thur 6-8pm in Upson B14 and Fri, 5-7pm in Phillips 203
- Closed Book
  - Cannot use electronic device or outside material
- Practice prelims are online in CMS
- Material covered everything up to end of this week
  - Appendix C (logic, gates, FSMs, memory, ALUs)
  - Chapter 4 (pipelined [and non-pipeline] MIPS processor with hazards)
  - Chapters 2 (Numbers / Arithmetic, simple MIPS instructions)
  - Chapter 1 (Performance)
  - HW1, HW2, Lab0, Lab1, Lab2

## Administrivia

#### HW2 is due tomorrow

- *Fill out Survey online*. Receive credit/points on homework for survey:
- Should have received email from Kathryn Dimiduk
- Survey is anonymous

### Project1 (PA1) due week after prelim

Continue working diligently. Use design doc momentum

#### Save your work!

- Save often. Verify file is non-zero. Periodically save to Dropbox, email.
- Beware of MacOSX 10.5 (leopard) and 10.6 (snow-leopard)

#### Use your resources

- Lab Section, Piazza.com, Office Hours, Homework Help Session,
- Class notes, book, Sections, CSUGLab

## Administrivia

## Check online syllabus/schedule

http://www.cs.cornell.edu/Courses/CS3410/2013sp/schedule.html

Slides and Reading for lectures

Office Hours

Homework and Programming Assignments

Prelims (in evenings):

- Tuesday, February 26<sup>th</sup>
- Thursday, March 28<sup>th</sup>
- Thursday, April 25<sup>th</sup>

Schedule is subject to change

# Collaboration, Late, Re-grading Policies

#### "Black Board" Collaboration Policy

- Can discuss approach together on a "black board"
- Leave and write up solution independently
- Do not copy solutions

#### Late Policy

- Each person has a total of four "slip days"
- Max of two slip days for any individual assignment
- Slip days deducted first for any late assignment, cannot selectively apply slip days
- For projects, slip days are deducted from all partners
- 25% deducted per day late after slip days are exhausted

#### Regrade policy

- Submit written request to lead TA,
   and lead TA will pick a different grader
- Submit another written request, lead TA will regrade directly
- Submit yet another written request for professor to regrade.

## Quiz

Find all hazards, and say how they are resolved:

```
add r3, r1, r2
sub r3, r2, r1
nand r4, r3, r1
or r0, r3, r4
xor r1, r4, r3
sb r4, 1(r0)
```

# Memory Load Data Hazard



# Resolving Memory Load Hazard

#### Load Data Hazard

- Value not available until WB stage
- So: next instruction can't proceed if hazard detected

#### **Resolution:**

- MIPS 2000/3000: one delay slot
  - ISA says results of loads are not available until one cycle later
  - Assembler inserts nop, or reorders to fill delay slot
- MIPS 4000 onwards: stall
  - But really, programmer/compiler reorders to avoid stalling in the load delay slot

## Quiz 2

```
add r3, r1, r2
nand r5, r3, r4
add r2, r6, r3
lw r6, 24(r3)
sw r6, 12(r2)
```

## Data Hazard Recap

## Delay Slot(s)

Modify ISA to match implementation

### Stall

Pause current and all subsequent instructions

## Forward/Bypass

- Try to steal correct value from elsewhere in pipeline
- Otherwise, fall back to stalling or require a delay slot

#### Tradeoffs?