#### Constructive Computer Architecture:

#### **Control Hazards**

Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

October 8, 2014

http://csg.csail.mit.edu/6.175

L12-1

### Killing fetched instructions

- In the simple design with combinational memory we have discussed so far, the mispredicted instruction was present in the f2d. So the Execute stage can atomically
  - Clear the f2d
  - Set the pc to the correct target

In highly pipelined machines there can be multiple mispredicted and partially executed instructions in the pipeline; it will generally take more than one cycle to kill all such instructions

Need a more general solution then clearing the f2d FIFO

October 8, 2014

# Epoch: a method for managing control hazards

- Add an epoch register in the processor state
- The Execute stage changes the epoch whenever the pc prediction is wrong and sets the pc to the correct value
- The Fetch stage associates the current epoch with every instruction when it is fetched
- The epoch of the instruction is examined when it is ready to execute. If the processor epoch has changed the instruction is thrown away



#### An epoch based solution

| rule doFetch ;                   | Can these rules execute concurrently?     |
|----------------------------------|-------------------------------------------|
| <b>let</b> instF=iMem.req(       | pc[0]);                                   |
| <b>let</b> ppcF=nextAddr(p       | c[0]); pc[0]<=ppcF;                       |
| f2d.enq(Fetch2Decod              | e{pc:pc[0],ppc:ppcF,epoch:epoch, yes      |
|                                  | <pre>inst:instF});</pre>                  |
| endrule                          | two values for epoch are sufficient       |
| rule doExecute;                  |                                           |
| <pre>let x=f2d.first; l</pre>    | <pre>et pcD=x.pc; let inEp=x.epoch;</pre> |
| <pre>let ppcD = x.ppc;</pre>     | <pre>let instD = x.inst;</pre>            |
| <pre>if(inEp == epoch) 1</pre>   | begin                                     |
| <b>let</b> dInst = deco          | de(instD); register fetch;                |
| <b>let</b> eInst = exec          | (dInst, rVal1, rVal2, pcD, ppcD);         |
| memory operat                    | ion                                       |
| rf update                        |                                           |
| <pre>if (eInst.mispredict)</pre> |                                           |
| pc[1] <= eIn                     | st.addr; epoch <= epoch + 1; end          |
|                                  | end                                       |
| f2d.deq; endrule                 |                                           |
| October 8, 2014 ht               | tp://csg.csail.mit.edu/6.175 L12-4        |

#### Discussion

- Epoch based solution kills one wrong-path instruction at a time in the execute stage
- It may be slow, but it is more robust in more complex pipelines, if you have multiple stages between fetch and execute or if you have outstanding instruction requests to the iMem
- It requires the Execute stage to set the pc and epoch registers simultaneously which may result in a long combinational path from Execute to Fetch

#### **Decoupled Fetch and Execute**



 In decoupled systems a subsystem reads and modifies only local state atomically

 In our solution, pc and epoch are read by both rules

 Properly decoupled systems permit greater freedom in independent refinement of subsystems

October 8, 2014

http://csg.csail.mit.edu/6.175

### A decoupled solution using epochs

#### fEpoch fetch

eEpoch execute

- Add fEpoch and eEpoch registers to the processor state; initialize them to the same value
- The epoch changes whenever Execute detects the pc prediction to be wrong. This change is reflected immediately in eEpoch and eventually in fEpoch via a message from Execute to Fetch



Associate the fEpoch with every instruction when it is fetched

In the execute stage, reject, i.e., kill, the instruction if its epoch does not match eEpoch

#### **Control Hazard resolution**

A robust two-rule solution



October 8, 2014

### Two-stage pipeline Decoupled code structure

#### module mkProc(Proc);

Fifo#(Fetch2Execute) f2d <- mkFifo; Fifo#(Addr) execRedirect <- mkFifo; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False);</pre>

```
rule doFetch;
    let instF = iMem.req(pc);
    . . .
    f2d.eng(... instF ..., fEpoch);
  endrule
  rule doExecute;
    if(inEp == eEpoch) begin
      Decode and execute the instruction; update state;
      In case of misprediction, execRedirect.eng(correct pc);
                                   end
    f2d.deq;
  endrule
endmodule
```

October 8, 2014

#### The Fetch rule



L12-10

### The Execute rule

exec returns a flag if there was a fetch misprediction

rule doExecute;

let instD = f2d.first.inst; let pcF = f2d.first.pc; = f2d.first.epoch; **let** ppcD = f2d.first.ppc; **let** inEp if(inEp == eEpoch) begin **let** dInst = decode(instD); let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); let eInst = exec(dInst, rVal1, rVal2, pcD, ppcD); if (eInst.iType == Ld) eInst.data <dMem.req(MemReq{op: Ld, addr: eInst.addr, data: ?}); else if (eInst.iType == St) let d <dMem.req(MemReq{op: St, addr: eInst.addr, data: eInst.data}); if (isValid(eInst.dst)) rf.wr(validRegValue(eInst.dst), eInst.data); **if**(eInst.mispredict) **begin** execRedirect.eng(eInst.addr); eEpoch <= !inEp;</pre> end Can these rules execute concurrently? end f2d.deq; yes, assuming CF FIFOs endrule

October 8, 2014

http://csg.csail.mit.edu/6.175

#### Epoch mechanism is independent of the branch prediction scheme used. We will study sophisticated branch prediction schemes later

October 8, 2014

### Consider a different twostage pipeline



Suppose we move the pipeline stage from Fetch to after Decode and Register fetch for a better balance of work in two stages

Pipeline will still have control hazards

#### A different 2-Stage pipeline: 2-Stage-DH pipeline

Fetch, Decode, RegisterFetch Execute, Memory, WriteBack



#### Type Decode2Execute

The Fetch stage, in addition to fetching the instruction, also decodes the instruction and fetches the operands from the register file. It passes these operands to the Execute stage

typedef struct {

Addr pc; Addr ppc; Bool epoch; DecodedInst dInst; Data rVal1; Data rVal2; } Decode2Execute deriving (Bits, Eq);

values instead of register names

#### 2-Stage-DH pipeline

#### module mkProc(Proc);

| Reg#(Addr) | pc <-   | mkRegU;    |
|------------|---------|------------|
| RFile      | rf <-   | mkRFile;   |
| IMemory    | iMem <- | mkIMemory; |
| DMemory    | dMem <- | mkDMemory; |

Fifo#(Decode2Execute) d2e <- mkFifo;</pre>

Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); Fifo#(Addr) execRedirect <- mkFifo;</pre>

rule doFetch ...
rule doExecute ...

### 2-Stage-DH pipeline doFetch rule *first attempt*

```
rule doFetch;
         let instF = iMem.req(pc);
         if (execRedirect.notEmpty) begin
           fEpoch <= !fEpoch; pc <= execRedirect.first;</pre>
           execRedirect.deq; end
         else
         begin
           let ppcF = nextAddrPredictor(pc); pc <= ppcF;</pre>
                                                              moved
           let dInst = decode(instF);
                                                              from
           let rVal1 = rf.rd1(validRegValue(dInst.src1));
           let rVal2 = rf.rd2(validRegValue(dInst.src2));
                                                              Execute
           d2e.enq(Decode2Execute{pc: pc, ppc: ppcF,
                    dlinst: dlnst, epoch: fEpoch,
                    rVal1: rVal1, rVal2: rVal2});
         end
     endrule
October 8, 2014
                          http://csg.csail.mit.edu/6.175
                                                                   L12 - 17
```

### 2-Stage-DH pipeline doExecute rule *first attempt*



#### Data Hazards



 $I_{1} = Add(R1,R2,R3)$   $I_{2} = Add(R4,R1,R2)$   $I_{2} = Mathematical I_{1} = I_{1} =$ 

timet0t1t2t3t4t5t6t7. . .FDstage $FD_1$  $FD_2$  $FD_2$  $FD_3$  $FD_4$  $FD_5$ . . .EXstage $EX_1$  $EX_2$  $EX_3$  $EX_4$  $EX_5$ 

#### Dealing with data hazards

- Keep track of instructions in the pipeline and determine if the register values to be fetched are stale, i.e., will be modified by some older instruction still in the pipeline. This condition is referred to as a read-after-write (RAW) hazard
- Stall the Fetch from dispatching the instruction as long as RAW hazard prevails
- RAW hazard will disappear as the pipeline drains

Scoreboard: A data structure to keep track of the instructions in the pipeline beyond the Fetch stage

#### Data Hazard

Data hazard depends upon the match between the source registers of the fetched instruction and the destination register of an instruction already in the pipeline

 Both the source and destination registers must be Valid for a hazard to exist

## Scoreboard: Keeping track of instructions in execution

- Scoreboard: a data structure to keep track of the destination registers of the instructions beyond the fetch stage
  - method insert: inserts the destination (if any) of an instruction in the scoreboard when the instruction is decoded
  - method search1(src): searches the scoreboard for a data hazard
  - method search2(src): same as search1
  - method remove: deletes the oldest entry when an instruction commits

#### 2-Stage-DH pipeline: Scoreboard and Stall logic

