#### Microprogramming

#### Joel Emer Computer Science and Artificial Intelligence Laboratory M.I.T.

http://www.csg.csail.mit.edu/6.823

# ISA to Microarchitecture Mapping

- An ISA often designed for a particular microarchitectural style, e.g.,
  - CISC  $\Rightarrow$  microcoded
  - RISC  $\Rightarrow$  hardwired, pipelined
  - VLIW  $\Rightarrow$  fixed latency in-order pipelines
  - JVM  $\Rightarrow$  software interpretation
- But an ISA can be implemented in any microarchitectural style
  - Core i7: hardwired pipelined CISC (x86) machine (with some microcode support)
  - This lecture: a microcoded RISC (MIPS) machine
  - Current IA-64 processors are hardwired, inorder pipelines
  - PicoJava: A hardware JVM processor

## Microarchitecture: Implementation of an ISA



*Behavior:* How data moves between components *Dynamic* 

February 24, 2014

http://www.csg.csail.mit.edu/6.823

## Microcontrol Unit Maurice Wilkes, 1954

Embed the control logic state table in a memory array



http://www.csg.csail.mit.edu/6.823

# Microcoded Microarchitecture



February 24, 2014

http://www.csg.csail.mit.edu/6.823

# **MIPS Instruction Formats**



# A Bus-based Datapath for MIPS



# Memory Module



#### Assumption: Memory operates asynchronously and is slow as compared to Reg-to-Reg transfers

February 24, 2014

http://www.csg.csail.mit.edu/6.823

Execution of a MIPS instruction involves

- 1. instruction fetch
- 2. decode and register fetch
- 3. ALU operation
- 4. memory operation (optional)
- 5. write back to register file (optional)
   + the computation of the next instruction address

## Microprogram Fragments

instr fetch:

 $\begin{array}{l} \mathsf{MA} \leftarrow \mathsf{PC} \\ \mathsf{A} \leftarrow \mathsf{PC} \\ \mathsf{IR} \leftarrow \mathsf{Memory} \\ \mathsf{PC} \leftarrow \mathsf{A} + 4 \\ \mathsf{dispatch} \text{ on Opcode} \end{array}$ 

can be treated as a macro

ALU:

 $\begin{array}{l} \mathsf{A} \leftarrow \mathsf{Reg}[\mathsf{rs}] \\ \mathsf{B} \leftarrow \mathsf{Reg}[\mathsf{rt}] \\ \mathsf{Reg}[\mathsf{rd}] \leftarrow \mathsf{func}(\mathsf{A},\mathsf{B}) \\ \textit{do instruction fetch} \end{array}$ 

ALUi:

 $A \leftarrow \text{Reg[rs]}$  $B \leftarrow \text{Imm}$  $\text{Reg[rt]} \leftarrow \text{Opcode}(A,B)$ do instruction fetch

sign extension ...

February 24, 2014

http://www.csg.csail.mit.edu/6.823

## Microprogram Fragments (cont.)

| LW:               | $A \leftarrow Reg[rs]$<br>$B \leftarrow Imm$<br>$MA \leftarrow A + B$<br>$Reg[rt] \leftarrow Memory$<br><i>do</i> instruction fetch |                                          |
|-------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|
| J:                | $A \leftarrow PC$<br>$B \leftarrow IR$<br>$PC \leftarrow JumpTarg(A,B)$<br><i>do</i> instruction fetch                              | JumpTarg(A,B) =<br>{A[31:28],B[25:0],00} |
| beqz:             | $A \leftarrow Reg[rs]$<br>If zero?(A) then go to<br>do instruction fetch                                                            | o bz-taken                               |
| bz-taken:         | $A \leftarrow PC$<br>$B \leftarrow Imm << 2$<br>$PC \leftarrow A + B$<br><i>do</i> instruction fetch                                |                                          |
| February 24, 2014 | http://www.csg.csail.mit.edu/6.823                                                                                                  | Sanchez and Emer                         |

# MIPS Microcontroller: first attempt



# Microprogram in the ROM worksheet

| _ | State                                                                                                                            | Ор          | zero?            | busy                     | Control points n                                                                                        | ext-state                                                                               |
|---|----------------------------------------------------------------------------------------------------------------------------------|-------------|------------------|--------------------------|---------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|
|   | fetch <sub>0</sub><br>fetch <sub>1</sub><br>fetch <sub>1</sub><br>fetch <sub>2</sub><br>fetch <sub>3</sub><br>fetch <sub>3</sub> | *<br>*<br>* | *<br>*<br>*<br>* | *<br>yes<br>no<br>*<br>* | $MA \leftarrow PC$ $IR \leftarrow Memory$ $A \leftarrow PC$ $PC \leftarrow A + 4$ $PC \leftarrow A + 4$ | fetch <sub>1</sub><br>fetch <sub>2</sub><br>fetch <sub>3</sub><br>?<br>ALU <sub>0</sub> |
|   | ALU <sub>0</sub><br>ALU <sub>1</sub><br>ALU <sub>2</sub>                                                                         | *<br>*<br>* | *<br>*<br>*      | * *                      | $A \leftarrow Reg[rs]$<br>$B \leftarrow Reg[rt]$<br>$Reg[rd] \leftarrow func(A,B)$                      | ALU <sub>1</sub><br>ALU <sub>2</sub><br>) fetch <sub>0</sub>                            |

# Microprogram in the ROM

|     | State              | Ор   | zero? | busy | Control points                 | next-state         |
|-----|--------------------|------|-------|------|--------------------------------|--------------------|
|     | fetch <sub>0</sub> | *    | *     | *    | $MA \leftarrow PC$             | fetch <sub>1</sub> |
|     | fetch <sub>1</sub> | *    | *     | yes  |                                | fetch              |
|     | fetch <sub>1</sub> |      | *     | no   | $IR \leftarrow Memory$         | fetch <sub>2</sub> |
|     | fetch <sub>2</sub> |      | *     | *    | $A \leftarrow PC$              | fetch <sub>3</sub> |
|     | fetch <sub>3</sub> |      | *     | *    | $PC \leftarrow A + 4$          | ALU                |
|     | fetch <sub>3</sub> |      | *     | *    | $PC \leftarrow A + 4$          | ALUi <sub>0</sub>  |
|     | fetch <sub>3</sub> | LW   | *     | *    | $PC \leftarrow A + 4$          | LW                 |
|     | fetch <sub>3</sub> | SW   | *     | *    | $PC \leftarrow A + 4$          | SW <sub>0</sub>    |
|     | fetch <sub>3</sub> | J    | *     | *    | $PC \leftarrow A + 4$          | J <sub>0</sub>     |
|     | fetch <sub>3</sub> | JAL  | *     | *    | $PC \leftarrow A + 4$          | JĂL                |
|     | fetch <sub>3</sub> |      | *     | *    | $PC \leftarrow A + 4$          | JR <sub>0</sub>    |
|     | fetch <sub>3</sub> |      | *     | *    | $PC \leftarrow A + 4$          | JAĽR               |
|     | fetch <sub>3</sub> | beqz | *     | *    | $PC \leftarrow A + 4$          | beqz <sub>0</sub>  |
|     |                    |      |       |      |                                |                    |
|     | $ALU_0$            | *    | *     | *    | $A \leftarrow Reg[rs]$         | ALU <sub>1</sub>   |
|     | $ALU_1$            | *    | *     | *    | $B \leftarrow Reg[rt]$         | ALU <sub>2</sub>   |
|     | $ALU_2$            | *    | *     | *    | $Reg[rd] \leftarrow func(A,B)$ | fetch <sub>0</sub> |
| arv | 24 2014            |      | 6++   |      | sail mit adu/6 972             | Sanchez and Eme    |

February 24, 2014

http://www.csg.csail.mit.edu/6.823

# Microprogram in the ROM Cont.

| _ | State                                                                                                 | Ор                     | zero?                    | busy             | Control points                                                                                                                              | next-state                                                                        |
|---|-------------------------------------------------------------------------------------------------------|------------------------|--------------------------|------------------|---------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
| - | ALUi <sub>0</sub><br>ALUi <sub>1</sub><br>ALUi <sub>1</sub><br>ALUi <sub>2</sub>                      | *<br>sExt<br>uExt<br>* | *<br>*<br>*              | * * *            | $A \leftarrow Reg[rs]$<br>$B \leftarrow sExt_{16}(Imm)$<br>$B \leftarrow uExt_{16}(Imm)$<br>$Reg[rd] \leftarrow Op(A,B)$                    | ALUi <sub>1</sub><br>ALUi <sub>2</sub><br>ALUi <sub>2</sub><br>fetch <sub>0</sub> |
|   | J <sub>0</sub><br>J <sub>1</sub><br>J <sub>2</sub>                                                    | *<br>*<br>*            | *<br>*<br>*              | *<br>*<br>*      | $A \leftarrow PC$<br>$B \leftarrow IR$<br>$PC \leftarrow JumpTarg(A,B)$                                                                     | J <sub>1</sub><br>J <sub>2</sub>                                                  |
|   | beqz <sub>0</sub><br>beqz <sub>1</sub><br>beqz <sub>1</sub><br>beqz <sub>2</sub><br>beqz <sub>3</sub> | *<br>*<br>*<br>*       | *<br>yes<br>no<br>*<br>* | *<br>*<br>*<br>* | $A \leftarrow \text{Reg[rs]} \\ A \leftarrow \text{PC} \\ \dots \\ B \leftarrow \text{sExt}_{16}(\text{Imm}) \\ \text{PC} \leftarrow A + B$ | $beqz_1$<br>$beqz_2$<br>$fetch_0$<br>$beqz_3$<br>$fetch_0$                        |

 $JumpTarg(A,B) = \{A[31:28], B[25:0], 00\}$ 

. . .

http://www.csg.csail.mit.edu/6.823

# Size of Control Store



February 24, 2014

http://www.csg.csail.mit.edu/6.823

# **Reducing Control Store Size**

#### Control store has to be *fast* $\Rightarrow$ *expensive*

- Reduce the ROM height (= address bits)
  - reduce inputs by extra external logic each input bit doubles the size of the control store
  - reduce states by grouping opcodes find common sequences of actions
  - condense input status bits combine all exceptions into one, i.e., exception/no-exception
- Reduce the ROM width
  - restrict the next-state encoding
    - Next, Dispatch on opcode, Wait for memory, ...
  - encode control signals (vertical microcode)

# MIPS Controller V2



# Jump Logic

μPCSrc = *Case* μ**JumpTypes** 

| next     | $\Rightarrow$ | μ <b>PC+1</b>                           |
|----------|---------------|-----------------------------------------|
| spin     | $\Rightarrow$ | if (busy) then $\mu$ PC else $\mu$ PC+1 |
| fetch    | $\Rightarrow$ | absolute                                |
| dispatch | $\Rightarrow$ | op-group                                |
| feqz     | $\Rightarrow$ | if (zero) then absolute else $\mu$ PC+1 |
| fnez     | $\Rightarrow$ | if (zero) then $\mu$ PC+1 else absolute |

http://www.csg.csail.mit.edu/6.823

## Instruction Fetch & ALU: MIPS-Controller-2

| State                                                                                | Control points                                                                          | next-state                       |
|--------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|----------------------------------|
| fetch <sub>0</sub><br>fetch <sub>1</sub><br>fetch <sub>2</sub><br>fetch <sub>3</sub> | $MA \leftarrow PC$ $IR \leftarrow Memory$ $A \leftarrow PC$ $PC \leftarrow A + 4$       | next<br>spin<br>next<br>dispatch |
| $ALU_0$ $ALU_1$ $ALU_2$                                                              | $A \leftarrow Reg[rs]$<br>$B \leftarrow Reg[rt]$<br>$Reg[rd] \leftarrow func(A,B)$      | next<br>next<br>fetch            |
| ALUi <sub>0</sub><br>ALUi <sub>1</sub><br>ALUi <sub>2</sub>                          | $A \leftarrow Reg[rs]$<br>$B \leftarrow sExt_{16}(Imm)$<br>$Reg[rd] \leftarrow Op(A,B)$ | next<br>next<br>fetch            |

#### Load & Store: MIPS-Controller-2

| State                                          | Control points                                                                                                                                                                                                         | next-state                            |
|------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------|
| $LW_0$<br>$LW_1$<br>$LW_2$<br>$LW_3$<br>$LW_4$ | $\begin{array}{rl} A & \leftarrow \operatorname{Reg}[rs] \\ B & \leftarrow \operatorname{sExt}_{16}(\operatorname{Imm}) \\ MA \leftarrow A + B \\ \operatorname{Reg}[rt] \leftarrow \operatorname{Memory} \end{array}$ | next<br>next<br>next<br>spin<br>fetch |
| $SW_0 \\ SW_1 \\ SW_2 \\ SW_3 \\ SW_4$         | $A \leftarrow \text{Reg[rs]}$<br>$B \leftarrow \text{sExt}_{16}(\text{Imm})$<br>$MA \leftarrow A+B$<br>$Memory \leftarrow \text{Reg[rt]}$                                                                              | next<br>next<br>next<br>spin<br>fetch |

#### Branches: MIPS-Controller-2

| State                                                                                                 | Control points                                                                                                                            | next-state                            |
|-------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------|
| $BEQZ_0$ $BEQZ_1$ $BEQZ_2$ $BEQZ_3$ $BEQZ_4$                                                          | $A \leftarrow \text{Reg[rs]}$ $A \leftarrow \text{PC}$ $B \leftarrow \text{sExt}_{16}(\text{Imm} < <2)$ $\text{PC} \leftarrow \text{A+B}$ | next<br>fnez<br>next<br>next<br>fetch |
| BNEZ <sub>0</sub><br>BNEZ <sub>1</sub><br>BNEZ <sub>2</sub><br>BNEZ <sub>3</sub><br>BNEZ <sub>4</sub> | $A \leftarrow \text{Reg[rs]}$ $A \leftarrow \text{PC}$ $B \leftarrow \text{sExt}_{16}(\text{Imm} < <2)$ $\text{PC} \leftarrow A + B$      | next<br>feqz<br>next<br>next<br>fetch |

#### Jumps: MIPS-Controller-2

| S           | State                                                                        | Control points                                                                                           | next-state                      |
|-------------|------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|---------------------------------|
| J<br>J<br>J | 1                                                                            | $\begin{array}{rcl} A & \leftarrow PC \\ B & \leftarrow IR \\ PC & \leftarrow JumpTarg(A,B) \end{array}$ | next<br>next<br>) fetch         |
|             | R <sub>0</sub><br>R <sub>1</sub>                                             | $A \leftarrow Reg[rs]$<br>PC $\leftarrow A$                                                              | next<br>fetch                   |
| J,<br>J,    | AL <sub>0</sub><br>AL <sub>1</sub><br>AL <sub>2</sub><br>AL <sub>3</sub>     | $A \leftarrow PC$<br>Reg[31] $\leftarrow A$<br>B $\leftarrow IR$<br>PC $\leftarrow$ JumpTarg(A,B)        | next<br>next<br>next<br>) fetch |
| J,<br>J,    | ALR <sub>0</sub><br>ALR <sub>1</sub><br>ALR <sub>2</sub><br>ALR <sub>3</sub> | $A \leftarrow PC$<br>$B \leftarrow Reg[rs]$<br>$Reg[31] \leftarrow A$<br>$PC \leftarrow B$               | next<br>next<br>next<br>fetch   |

February 24, 2014

http://www.csg.csail.mit.edu/6.823

# **Implementing Complex Instructions**



Why is microprogramming good for complex instructions?

Amortize fetch cost, allow more operation parallelism

February 24, 2014

http://www.csg.csail.mit.edu/6.823

# **Complex Instructions**

 $\begin{array}{l} \textit{Reg-Memory-src ALU op:} \\ \textit{rd} \leftarrow \mathsf{M}[(\texttt{rs})] \ \texttt{op} \ (\texttt{rt}) \\ \hline \textit{Reg-Memory-dst ALU op:} \\ \textit{M}[(\texttt{rd})] \leftarrow (\texttt{rs}) \ \texttt{op} \ (\texttt{rt}) \\ \hline \textit{Mem-Mem ALU op:} \\ \textit{M}[(\texttt{rd})] \leftarrow \mathsf{M}[(\texttt{rs})] \ \texttt{op} \ \mathsf{M}[(\texttt{rt})] \\ \hline \textit{String instructions:} \\ \textit{M}[(\texttt{rd}):(\texttt{rd})+\texttt{rc}] \leftarrow \mathsf{M}[(\texttt{rs}):(\texttt{rs})+\texttt{rc}] \ \texttt{op} \ \mathsf{M}[(\texttt{rt}):(\texttt{rt})+\texttt{rc}] \\ \end{array}$ 

Complex instructions usually do not require datapath modifications in a microprogrammed implementation -- only extra space for the control program

Implementing these instructions using a hardwired controller is difficult without datapath modifications

February 24, 2014

http://www.csg.csail.mit.edu/6.823

#### Mem-Mem ALU Instructions: MIPS-Controller-2

| Mem-Mem AL                                                                                                                                             | .U op                                                                                                                  | $M[(rd)] \leftarrow$ | M[(rs)] op M[(rt)]                                    |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|----------------------|-------------------------------------------------------|
| ALUMM <sub>0</sub><br>ALUMM <sub>1</sub><br>ALUMM <sub>2</sub><br>ALUMM <sub>3</sub><br>ALUMM <sub>4</sub><br>ALUMM <sub>5</sub><br>ALUMM <sub>6</sub> | $MA \leftarrow Reg$ $A \leftarrow Men$ $MA \leftarrow Reg$ $B \leftarrow Men$ $MA \leftarrow Reg[$ $Memory \leftarrow$ | nory<br>[rt]<br>nory | next<br>spin<br>next<br>spin<br>next<br>spin<br>fetch |

## **Performance Issues**

 $\begin{array}{l} \mbox{Microprogrammed control} \\ \Rightarrow \mbox{ multiple cycles per instruction} \end{array}$ 

Cycle time ?

 $t_{C} > max(t_{reg-reg}, t_{ALU}, t_{\mu ROM}, t_{RAM})$ 

Given complex control,  $t_{ALU}$  &  $t_{RAM}$  can be broken into multiple cycles. However,  $t_{\mu ROM}$  cannot be broken down. Hence

 $t_{C} > max(t_{reg-reg}, t_{\mu ROM})$ 

Suppose 10 \*  $t_{\mu ROM} < t_{RAM}$ Good performance, relative to the single-cycle hardwired implementation, can be achieved even with a CPI of 10

February 24, 2014

## VAX 11-780 Microcode

|       | DZ .MI | c´[600,  | 205] Procedure call : C       | ALLG, CAL | LS                                                | PCS 01, FPLA OD, WCS122 age 77   |
|-------|--------|----------|-------------------------------|-----------|---------------------------------------------------|----------------------------------|
|       |        |          | 129744                        | HERE F    | OR CALLS OR CALLS, AFTER PROBING                  | G THE EXTENT OF THE STACK        |
|       |        |          | 129746                        | =0        |                                                   |                                  |
|       |        |          |                               |           | ,                                                 |                                  |
| 6557K | 0      | U 11F4.  | 0811,2035,0180,F910,0000,0CD8 | 129748    | CALL TANDING                                      | STRIP MASK TO BITS 11-0          |
|       |        |          | 129749                        | 127140    | CALL, J/MPUSH                                     | PUSH REGISTERS                   |
|       |        |          | 129750                        |           |                                                   |                                  |
|       |        |          | 129751                        |           | CACHE_D[LONG],                                    |                                  |
| 5557K | 7763K  | U 11F5.  | 0000,003C,0180,3270,0000,134A | 129752    | LAB_R[SP]                                         | PUSH PC                          |
|       |        |          | 129753                        | 129/32    | DAB_R[SP]                                         | I BY SP                          |
|       |        |          | 129754                        |           |                                                   | Ť                                |
| 6856K | 0      | 11 1344. | 0018,0000,0180,FAF0,0200,134C |           |                                                   |                                  |
|       | •      | 0 104A)  | 129756                        | 129155    | CALL.8: R[SP]&VA_LA-K[.8]                         | JUPDATE SP FOR PUSH OF PC &      |
|       |        |          | 129757                        |           |                                                   |                                  |
| 5856K | 0      | 11 1340. | 0800,003C,0180,FA68,0000,11F8 | 129758    | D_R (FP)                                          |                                  |
|       |        |          | 129759                        | 129/50    | DERLEPT                                           | FREADY TO PUSH FRAME POINTER     |
|       |        |          | 129760                        | -0        |                                                   |                                  |
|       |        |          | 129761                        | -0        | CACHE_D[LONG],                                    |                                  |
|       |        |          | 129762                        |           |                                                   | STORE FP,                        |
|       |        |          | 129763                        |           | SC_K[.FFF0],                                      | ; GET SP AGAIN                   |
| 856K  | 21 M   | U 11F8.  | 0000,003D,6D80,3270,0084,6CD9 | 129764    | CALL.J/PSHSP                                      | 1-16 TO SC                       |
|       |        |          | 129765                        | 129/04    | CALL, J/PSHSP                                     |                                  |
|       |        |          | 129766                        |           |                                                   |                                  |
|       |        |          | 129767                        |           | D_R[AP],                                          |                                  |
| 5856K | 0      | U 11F9.  | 0800,003C,3DF0,2E60,0000,134D | 129768    |                                                   | READY TO PUSH AP                 |
|       |        |          | 129769                        | 129/00    | GID(PSL)                                          | # AND GET PSW FOR COMBINATIO     |
|       |        |          | 129770                        |           |                                                   |                                  |
|       |        |          | 129771                        |           | CACHE DILONCI                                     |                                  |
|       |        |          | 129772                        |           | O O ANDNOT KE 1F1                                 | STURE OLD AP                     |
| 5856K | 21M    | U 134D.  | 0019,2024,8DC0,3270,0000,134E | 129773    | CACHE_D[LONG],<br>Q_O,ANDNOT.K[.1F],<br>LAB_R[SP] | TUDEAR PORTINIZIVIC              |
|       |        |          | 129774                        |           | DAC-RLOP]                                         | IGET SP INTO LATCHES AGAIN       |
|       |        |          | 29775                         |           |                                                   |                                  |
| 5856K | 0      | U 134E.  | 2010,0038,0180,F909,4200,1350 | 129776    | PCGVA_RC(T1), FLUSH.IB                            | . TOND NEW DO AND CLEAD OUT      |
|       |        |          | 129777                        |           | FCurrencellin, Futon.15                           | I LOAD NEW PC AND CLEAR OUT      |
|       |        |          | 129778                        |           |                                                   |                                  |
|       |        |          | 129779                        |           | D_DAL.SC.                                         | PSW TO D<31:16>                  |
|       |        |          | 129780                        |           |                                                   | RECOVER MASK                     |
|       |        |          | 29781                         |           |                                                   | PUT -13 IN SC                    |
| 856K  | 0      | U 1350.  | 0D10,0038,0DC0,6114,0084,9351 | 129782    | SC_SC+K(.3),<br>LOAD.IB, PC_PC+1                  | START FETCHING SUBROUTINE I      |
|       |        |          | 129783                        |           | bonding, togeteri                                 | FORME FERCIEND BOBROUITNE I      |
|       |        |          | 129784                        |           |                                                   |                                  |
| 100   |        |          | 129785                        |           | D_DAL.SC.                                         | MASK AND PSW IN D<31:03>         |
|       |        |          | 129786                        |           |                                                   | GET LOW BITS OF OLD SP TO Q<1:0> |
| 5856K | 0      | U 1351.  | 0D10,0038,F5C0,F920,0084,9352 | 129787    |                                                   | PUT -3 IN SC                     |
|       |        |          | 129788                        |           | Democritical and                                  | FOI -3 TH OC                     |

L06-28

#### L06-29

## Some more history ...

- IBM 360
- Microcoding through the seventies
- Microcoding now

# Microprogramming in IBM 360

|                           | M30   | M40   | M50   | M65   |
|---------------------------|-------|-------|-------|-------|
| Datapath<br>width (bits)  | 8     | 16    | 32    | 64    |
| µinst width<br>(bits)     | 50    | 52    | 85    | 87    |
| μcode size<br>(K minsts)  | 4     | 4     | 2.75  | 2.75  |
| μstore<br>technology      | CCROS | TCROS | BCROS | BCROS |
| μstore cycle<br>(ns)      | 750   | 625   | 500   | 200   |
| memory<br>cycle (ns)      | 1500  | 2500  | 2000  | 750   |
| Rental fee<br>(\$K/month) | 4     | 7     | 15    | 35    |

Only the fastest models (75 and 95) were hardwired

February 24, 2014

http://www.csg.csail.mit.edu/6.823

# Microcode Emulation

- IBM initially miscalculated the importance of software compatibility with earlier models when introducing the 360 series
- Honeywell stole some IBM 1401 customers by offering translation software ("Liberator") for Honeywell H200 series machine
- IBM retaliated with optional additional microcode for 360 series that could emulate IBM 1401 ISA, later extended for IBM 7000 series
  - one popular program on 1401 was a 650 simulator, so some customers ran many 650 programs on emulated 1401s
    - (650 simulated on 1401 emulated on 360)

February 24, 2014

http://www.csg.csail.mit.edu/6.823

# Microprogramming thrived in 70's

- Significantly faster ROMs than DRAMs were available
- For complex instruction sets, datapath and controller were *cheaper and simpler*
- *New instructions*, e.g., floating point, could be supported without datapath modifications
- *Fixing bugs* in the controller was easier
- ISA compatibility across various models could be achieved easily and cheaply

#### Except for the cheapest and fastest machines, all computers were microprogrammed

February 24, 2014

# Horizontal vs Vertical µCode



Bits per µInstruction

# μ**Instructions** 

- Horizontal μcode has wider μinstructions
  - Multiple parallel operations per  $\mu instruction$
  - Fewer steps per macroinstruction
  - Sparser encoding  $\Rightarrow$  more bits

#### • Vertical μcode has narrower μinstructions

- Typically a single datapath operation per  $\mu$ instruction
  - separate  $\boldsymbol{\mu}\text{instruction}$  for branches
- More steps to per macroinstruction
- More compact  $\Rightarrow$  less bits
- Nanocoding
  - Tries to combine best of horizontal and vertical  $\mu$ code

February 24, 2014

http://www.csg.csail.mit.edu/6.823

# Nanocoding



- MC68000 had 17-bit  $\mu code$  containing either 10-bit  $\mu jump$  or 9-bit nanoinstruction pointer
  - Nanoinstructions were 68 bits wide, decoded to give 196 control signals

February 24, 2014

#### Microprogramming: *early eighties*

- Evolution bred more complex micro-machines
  - Complex instruction sets led to the need for subroutine and call stacks in  $\mu$ code
  - Need for fixing bugs in control programs was in conflict with read-only nature of  $\mu$ ROM  $\Rightarrow$  WCS (B1700, QMachine, Intel432, ...)
- With the advent of VLSI technology assumptions about ROM & RAM speed became invalid -> more complexity
- Better compilers made complex instructions less important.
- Use of numerous micro-architectural innovations, e.g., pipelining, caches and buffers, made multiple-cycle execution of reg-reg instructions unattractive

# Microcode Pipelining

To compete against RISC pipelines micro-coded machines pipelined micro-code execution



http://www.csg.csail.mit.edu/6.823

# Modern Usage

- *Microprogramming is far from extinct*
- Played a crucial role in micros of the Eighties DEC uVAX, Motorola 68K series, Intel 386 and 486
- Microcode plays an assisting role in most modern CISC micros (AMD and Intel)
  - Most instructions are executed directly, i.e., with hard-wired control
  - Infrequently-used and/or complicated instructions invoke the microcode engine
- Patchable microcode common for post-fabrication bug fixes, e.g. Intel Pentiums load mcode patches at bootup

February 24, 2014

# Writable Control Store (WCS)

- Implement control store with SRAM not ROM
  - MOS SRAM memories now almost as fast as control store (core memories/DRAMs were 2-10x slower)
  - Bug-free microprograms difficult to write
- User-WCS provided as option on several minicomputers
  - Allowed users to change microcode for each processor
- User-WCS failed
  - Little or no programming tools support
  - Difficult to fit software into small space
  - Microcode control tailored to original ISA, less useful for others
  - Large WCS part of processor state expensive context switches
  - Protection difficult if user can change microcode
  - Virtual memory required *restartable* microcode

February 24, 2014

http://www.csg.csail.mit.edu/6.823

Thank you.

http://www.csg.csail.mit.edu/6.823

# A Bus-based Datapath for MIPS



Microinstruction: register to register transfer (17 control signals)

February 24, 2014

http://www.csg.csail.mit.edu/6.823