## PIPE DREAM

Out-of-Order Speculative CPU

Karthik Balakrishnan and Michal Karczmarek



### The Plan

- Out-of-order
- Speculative
- Physical register file
- Single-issue
- Non-blocking, in-order memory unit
- Branch prediction
- Precise exceptions

## M

# The Basic Design: Data Flow



## M

# The Basic Design: Branch Taken



## Register Rename Unit



## ÞΑ

### Re-order Buffer



## ÞΑ

## Design Exploration: Data Flow





- Branch Predictor

#### Use a standard 2-bit predictor

- No global history (even though it was originally planned)
- Carry a "branchTaken" bit to allow Execute to check if prediction was correct

#### 2-bit Branch Predictor





- Branch Predictor

Hash on bottom bits of instruction PC

Initialize to WT

 Predict all branch instructions and force take J and JAL Branch History Table

PC
WNT
WT
WT

take/

don't take

SNT



- Out of Order Scheduling
- Find which instructions are ready to go
- Dispatch memory operations in order (speculatively)
- Send stores to memory when retired
- Use barrier instructions for COP0



- Out of Order Scheduling
- Schedule in-order and out-of-order instructions separately

Be careful about the wrap-around



## r,e

### Place and Route





### Results

- Final clock period was 9.803ns
- Final area 458569.6 um<sup>2</sup>
- qsort retires 24249 instructions (ipc=0.4)
- vvadd retires 18026 instructions (ipc=0.5)
- vvadd loop: 10 instructions, 16 cycles
- using our own harness, still adapting to Chris's
- Number of cycles to complete benchmark:

|       | basic | bp     | 000    | bp/ooo | fast<br>mispredict | fast<br>predict | speculative<br>memory unit |
|-------|-------|--------|--------|--------|--------------------|-----------------|----------------------------|
| qsort | 97909 | 101139 | 87675  | 72549  | 68861              | 66473           | 61275                      |
| vvadd | 96123 | 102127 | 100131 | 49196  | 49174              | 45170           | 36163                      |



## Interesting Problems

- Bluespec infinite compile times: 12-entry ROB compiles in 2 mins; 14-entry ROB hangs forever
- RWires are necessary but EVIL
- Branch predict: Bug with predicting JR and JALR, noting it's been "taken" but going to the wrong target!
- Lots of bug in decode and execute: all fixed once we passed self\_test, test\_spim and test\_all (renaming/outof-order didn't introduce execution bugs)



### Special Thanks To:

- 6884-bluespec
- Chris

The End

### Slowest Data Path

retireInstrQ

*retireIPCQ* 



