### Constructive Computer Architecture: Data Hazards in Pipelined Processors

Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

October 16, 2017

# Consider a different twostage pipeline



# Converting the old pipeline into the new one

#### rule doFetch;

```
... let instF = iMem.req(pc);
```

f2d.enq(Fetch2Execute{... inst: instF...}); ...

#### endrule

rule doExecute;

let x = f2e.first;

let instD=x.inst; let pcD=x.pc; let inEp=x.epoch;

... let dInst = decode (instD); instF

let rVal1 = rf.rd1(fromMaybe(?, dInst.src1));

let rVal2 = rf.rd2(fromMaybe(?, dInst.src2));

let eInst = exec(dInst, rVal1, rVal2, pcD, ppcD);

#### endrule

Not quite correct. Why?

#### Fetch is potentially reading stale values from rf

### Data Hazards



 $I_1 \qquad R1 \leftarrow R2 + R3 \\ I_2 \qquad R4 \leftarrow R1 + R2$ 

 $I_2$  must be stalled until  $I_1$  updates the register file

### Dealing with data hazards

- Keep track of instructions in the pipeline and determine if the register values to be fetched are stale, i.e., will be modified by some older instruction still in the pipeline. This condition is referred to as a read-after-write (RAW) hazard
- Stall the Fetch from dispatching the instruction as long as RAW hazard prevails
- RAW hazard will disappear as the pipeline drains

Scoreboard: A data structure to keep track of the instructions in the pipeline beyond the Fetch stage

# Data Hazard

- Data hazard depends upon the match between the source registers of the fetched instruction and the destination register of an instruction already in the pipeline
- Both the source and destination registers must be Valid for a hazard to exist

# Scoreboard: Keeping track of instructions in execution

- Scoreboard: a data structure to keep track of the destination registers of the instructions beyond the fetch stage
  - method insert: inserts the destination (if any) of an instruction in the scoreboard when the instruction is decoded
  - method search1(src): searches the scoreboard for a data hazard
  - method search2(src): same as search1
  - method remove: deletes the oldest entry when an instruction commits

### 2-Stage-DH pipeline: Scoreboard and Stall logic



# 2-Stage-DH pipeline

#### module mkProc(Proc); EHR#(2,Addr) pc <- mkEHR(U); RFile rf <- mkRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; Fifo#(Decode2Execute) d2e <- mkFifo; Reg#(Bool) epoch <- mkReg(False); Scoreboard#(n) sb <- mkScoreboard; // n, the number of slots in the sb must be ≥ // the number of instructions in the Execute // phse (including d2e)

rule doFetch ...
rule doExecute ...

#### Assume doFetch < doExecute

October 16, 2017

# 2-Stage-DH pipeline doFetch rule

| n to pc when Fetch stalls?                 |
|--------------------------------------------|
|                                            |
|                                            |
|                                            |
| <pre>sb.search2(dInst.src2);</pre>         |
|                                            |
| c: ppcF,                                   |
|                                            |
| ;                                          |
|                                            |
| pc should change only when the instruction |
|                                            |

# 2-Stage-DH pipeline doFetch rule *corrected*

```
rule doFetch;
let instF = iMem.req(pc[0]);
let ppcF = nap(pc[0]); pc[0] <= ppcF;
let dInst = decode(instF);
let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2);
if(!stall) begin
...fetch register values
d2e.enq(Decode2Execute{pc: pc, ppc: ppcF,
dLinst: dInst, epoch: fEpoch,
rVal1: rVal1, rVal2: rVal2});
sb.insert(dInst.rDst); pc[0] <= ppcF; end
endrule
```

To avoid structural hazards, scoreboard must allow two search ports

# 2-Stage-DH pipeline doExecute rule

```
rule doExecute;
    let x = d2e.first;
    let dInstE = x.dInst; let pcE = x.pc; let inEp = x.epoch;
    let rVal1E = x.rVal1; let rVal2E = x.rVal2;
    if(epoch == inEp) begin
      let eInst = exec(dInstE, rVallE, rVal2E, pcE);
      if (eInst.iType == Ld) eInst.data <-
        dMem.req(MemReq{op:Ld, addr:eInst.addr, data:?});
      else if (eInst.iType == St) let d <+
        dMem.req(MemReq{op:St, addr:eInst.addr, data:eInst.data});
      if (isValid(eInst.dst))
        rf.wr(fromMaybe(?, eInst.dst), eInst.data);
      let nextPC = eInst.brTaken ? eInst.addr : pcE + 4;
      if (x.ppc != nextPC) begin pc[1] <= eInst.addr;</pre>
                                  epoch <= !epoch; end</pre>
                        end
    d2e.deg; sb.remove;
                                             The same as before
endrule
```

### A correctness issue



◆ If the search by Decode does not see an instruction in the scoreboard, then its effect must have taken place. This means that any updates to the register file by that instruction must be visible to the subsequent register reads ⇒

- remove and wr should happen atomically
- search and rd1, rd2 should happen atomically

October 16, 2017

#### Concurrency and Performance doFetch < doExecute



October 16, 2017

# 2-Stage-DH pipeline doExecute < doFetch

```
rule doFetch;
      let instF = iMem.req(pc[1]);
      let ppcF = nap(pc[1]);;
      let dInst = decode(instF);
      let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2);
      if(!stall)
                                  begin
          ...fetch register values
         d2e.enq(Decode2Execute{pc: pc[1], ppc: ppcF,
               dIinst: dInst, epoch: fEpoch,
               rVal1: rVal1, rVal2: rVal2});
         sb.insert(dInst.rDst); pc[1] <= ppcF end</pre>
endrule
rule doExecute;
      the same as before ...
      if (x.ppc != nextPC) begin pc[0] <= eInst.addr;</pre>
                                   epoch <= !epoch; end
      end
             d2e.deg; sb.remove;
endrule
```

```
October 16, 2017
```

#### **Concurrency and Performance** doFetch < doExecute



# WAW hazards

- Can a destination register name appear more than once in the scoreboard ?
- If multiple instructions in the scoreboard can update the register which the current instruction wants to read, then the current instruction has to read the update for the youngest of those instructions
- This is not a problem in our design because
  - instructions are committed in order
  - the RAW hazard for the instruction at the decode stage will remain as long as the any instruction with the required destination is present in sb

# An alternative design for sb

One counter for each register in rf (Initially 0) VS One slot to hold rd for each instruction in the pipeline

- Insert: increment the counter for register rd
- Remove: decrement the counter for register rd
- Search: If the counter for the source register is >0, return True

This design takes less hardware for deep pipelines and is more efficient because it avoids associative searches

### Summary

- Instruction pipelining requires dealing with control and data hazards
- Speculation is necessary to deal with control hazards
- Data hazards are avoided by withholding instructions in the decode stage until the hazard disappears
- Performance issues are subtle
  - For instance, the value of having a bypass network depends on how frequently it is exercised by programs
  - Bypassing necessarily increases combinational path lengths which can slow down the clock
    - The rest of the slides will be discussed in the Recitation

### Normal Register File

```
module mkRFile(RFile);
```

```
Vector#(32,Reg#(Data)) rfile <- replicateM(mkReg(0));</pre>
```

```
method Action wr(RIndx rindx, Data data);
    if(rindx!=0) rfile[rindx] <= data;
    endmethod
    method Data rd1(RIndx rindx) = rfile[rindx];
    method Data rd2(RIndx rindx) = rfile[rindx];
endmodule</pre>
```

```
{rd1, rd2} < wr
```

### Bypass Register File using EHR

module mkBypassRFile(RFile);

Vector#(32,Ehr#(2, Data)) rfile <-</pre>

replicateM(mkEhr(0));

method Action wr(RIndx rindx, Data data); if(rindex!=0) (rfile[rindex])[0] <= data; endmethod method Data rd1(RIndx rindx) = (rfile[rindx])[1]; method Data rd2(RIndx rindx) = (rfile[rindx])[1]; endmodule

#### wr < {rd1, rd2}

October 16, 2017

# **Bypass Register File**

#### with external bypassing

begin rf.wr(bypass.first); bypass.deq end;

endrule

method Action wr(RIndx rindx, Data data);

```
if(rindex!=0) bypass.enq(tuple2(rindx, data));
```

endmethod

```
method Data rd1(RIndx rindx) =
```

return (!bypass.search1(rindx)) ? rf.rd1(rindx)

: bypass.read1(rindx);

method Data rd2(RIndx rindx) =

return (!bypass.search2(rindx)) ? rf.rd2(rindx)

: bypass.read2(rindx);

endmodule

October 16, 2017



rf

move

### Scoreboard implementation using searchable Fifos

```
function Bool isFound
        (Maybe#(RIndx) dst, Maybe#(RIndx) src);
  return isValid(dst) && isValid(src) &&
             (fromMaybe(?,dst) == fromMaybe(?,src));
endfunction
module mkCFScoreboard(Scoreboard#(size));
  SFifo#(size, Maybe#(RIndx), Maybe#(RIndx))
      f <- mkCFSFifo(isFound);</pre>
  method insert = f.enq;
  method remove = f.deq;
  method search1 = f.search1;
  method search2 = f.search2;
endmodule
```