## **Transient Side Channels**

Mengjia Yan Fall 2020

Based on slides from Christopher W. Fletcher





### Reminder

• 1st paper review due midnight on 09/27 (before the next lecture)

- You will receive an invitation from HotCRP
  - https://mit-6888-fa20.hotcrp.com/

| 9/28 (Mon)    | Hardware to<br>Enforce Non-<br>interference | Mengjia | Tiwari et al. Complete information flow tracking from the gates up. ASPLOS. 2009.  Optional: Ferraiuolo et al. HyperFlow: A processor architecture for nonmalleable, timing-safe information flow security. CCS. 2018.    |  |
|---------------|---------------------------------------------|---------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 9/30<br>(Wed) | Transient<br>Execution<br>Defenses          | Lindsey | Yu et al. Speculative Taint Tracking (STT) A Comprehensive Protection for Speculatively Accessed Data. MICRO. 2019.  Optional: Guarnieri et al. Hardware-Software Contracts for Secure Speculation. arXiv preprint. 2020. |  |

### Micro-architecture Side Channels

secret-dependent execution

A Channel
(a micro-architecture structure)

Attacker

### Micro-architecture Side Channels



Kiriansky et al. DAWG: a defense against cache timing attacks in speculative execution processors. MICRO'18

## **Recap: 5-stage Pipeline**



## **5-stage Pipeline**



```
t5
                                t0
time
                                                                t3
                                                                           t4
                                                                                                t6 t7
                               \mathsf{IF}_1
instruction1
                                          ID<sub>1</sub> EX<sub>1</sub> MA<sub>1</sub> WB<sub>1</sub>
                                           IF<sub>2</sub> ID<sub>2</sub> EX<sub>2</sub> MA<sub>2</sub> WB<sub>2</sub>
instruction2
instruction3
                                                      IF<sub>3</sub> ID<sub>3</sub> EX<sub>3</sub> MA<sub>3</sub> WB<sub>3</sub>
                                                                 IF<sub>4</sub> ID<sub>4</sub> EX<sub>4</sub> MA<sub>4</sub> WB<sub>4</sub>
instruction4
instruction5
                                                                            IF<sub>5</sub> ID<sub>5</sub> EX<sub>5</sub> MA<sub>5</sub> WB<sub>5</sub>
```

## 5-stage Pipeline





### **5-stage Pipeline**



- In-order execution:
  - Execute instructions according to the program order

### **Data Hazard and Control Hazard**

```
Loop: ...... LD(R1, 0, R2) IF<sub>1</sub> ID<sub>1</sub> EX<sub>1</sub> MA<sub>1</sub> WB<sub>1</sub> ADD(R2, 10, R3) IF<sub>2</sub> ID<sub>2</sub> EX<sub>2</sub> MA<sub>2</sub> WB<sub>2</sub> BNE(R3, Loop) IF<sub>3</sub> ID<sub>3</sub> EX<sub>3</sub> MA<sub>3</sub> WB<sub>3</sub> ......
```

## **Resolving Hazards**

Stall or Bypass

- Speculation (e.g., branch predictor)
  - Guess a value and continue executing anyway
  - When actual value is available, two cases
    - Guessed correctly → do nothing
    - Guessed incorrectly 

      restart with correct value (roll back)

### **Branch Predictor**

- Predict Taken/Not taken
  - Not taken: PC+4
  - Taken: need to know target address

#### **Branch Predictor**

- Predict Taken/Not taken
  - Not taken: PC+4
  - Taken: need to know target address

- Predict target address
  - Branch target buffer (BTB)
  - Map <current PC, target PC>

### **Branch Predictor**

- Predict Taken/Not taken
  - Not taken: PC+4
  - Taken: need to know target address

- Predict target address
  - Branch target buffer (BTB)
  - Map <current PC, target PC>
- Use history information to setup the predictor

## **Complex In-order Pipeline**



## **Complex In-order Pipeline**



• When the pipeline is stalled, find something else to do



• When the pipeline is stalled, find something else to do



- When the pipeline is stalled, find something else to do
- When we do out-of-order execution, we are speculating that previous instructions do not cause exception



- When the pipeline is stalled, find something else to do
- When we do out-of-order execution, we are speculating that previous instructions do not cause exception
- If instruction n is speculative instruction, instruction n+i is also speculative



t7

 $WB_1$ 

 $WB_2$ 

 $WB_3$ 













## **Terminology**

A **speculative** instruction may squash.

When executed, can change uArch state

## **Terminology**

A **speculative** instruction may squash.

• When executed, can change uArch state

A **Transient** instruction will squash, i.e., will not commit.

A **Non-Transient** instruction will not squash, i.e., will eventually retire.

## **Terminology**

A **speculative** instruction may squash.

When executed, can change uArch state

A **Transient** instruction will squash, i.e., will not commit.

A **Non-Transient** instruction will not squash, i.e., will eventually retire.

That is, transient instructions are unreachable on a non-speculative microarchitecture.

### **General Attack Schema**



### **General Attack Schema**



- The difference between transient and non-transient side channels
  - Whether the secret access or transmitter execution is transient

# Meltdown & Spectre





## **Kernel/User Pages**

Virtual memory

0x00000000 Kernel pages User pages

- In x86, a process's virtual address space includes kernel pages, but kernel pages are only accessible in kernel mode
  - For performance purpose
  - Avoids switching page tables on context switches

## **Kernel/User Pages**

Virtual memory

0x00000000

Kernel pages

User pages

- In x86, a process's virtual address space includes kernel pages, but kernel pages are only accessible in kernel mode
  - For performance purpose
  - Avoids switching page tables on context switches

 What will happen if accessing kernel addresses in user mode?

## **Kernel/User Pages**

Virtual memory

0x00000000

Kernel pages

User pages

- In x86, a process's virtual address space includes kernel pages, but kernel pages are only accessible in kernel mode
  - For performance purpose
  - Avoids switching page tables on context switches

- What will happen if accessing kernel addresses in user mode?
  - Protection fault

### Meltdown

• Problem: Speculative instructions can change uArch state, e.g., cache

### Meltdown

- Problem: Speculative instructions can change uArch state, e.g., cache
- Attack procedure
- 1. Setup: Attacker allocates <a href="mailto:probe\_array">probe\_array</a>, with 256 cache lines. Flushes all its cache lines
- 2. Transmit: Attacker executes

```
Ld1: uint8_t byte = *kernel_address;
Ld2: unit8_t dummy = probe_array[byte*64];
```

- Problem: Speculative instructions can change uArch state, e.g., cache
- Attack procedure
- 1. Setup: Attacker allocates <a href="mailto:probe\_array">probe\_array</a>, with 256 cache lines. Flushes all its cache lines
- 2. Transmit: Attacker executes

```
.....
Ld1: uint8_t byte = *kernel_address;
Ld2: unit8_t dummy = probe_array[byte*64];
```



- Problem: Speculative instructions can change uArch state, e.g., cache
- Attack procedure
- 1. Setup: Attacker allocates <a href="mailto:probe\_array">probe\_array</a>, with 256 cache lines. Flushes all its cache lines
- 2. Transmit: Attacker executes

```
Ld1: uint8_t byte = *kernel_address;
Ld2: unit8_t dummy = probe_array[byte*64];
```



- Problem: Speculative instructions can change uArch state, e.g., cache
- Attack procedure
- 1. Setup: Attacker allocates <a href="mailto:probe\_array">probe\_array</a>, with 256 cache lines. Flushes all its cache lines
- 2. Transmit: Attacker executes

```
.....
Ld1: uint8_t byte = *kernel_address;
Ld2: unit8_t dummy = probe_array[byte*64];
```



Exception handling is deferred when the instruction reaches the head of ROB.

- Problem: Speculative instructions can change uArch state, e.g., ca
- Attack procedure
- 1. Setup: Attacker allocates probe\_array, with 256 cache lines. Fluxes all its cache lines
- 2. Transmit: Attacker executes

```
Ld1: uint8_t byte = *kernel_address;
Ld2: unit8_t dummy = probe_array[byte*64];
```



Exception handling is deferred when the instruction reaches the head of ROB.

- Problem: Speculative instructions can change uArch state, e.g., ca
- Attack procedure
- 1. Setup: Attacker allocates probe\_array, with 256 cache lines. Fluxes all its cache lines
- 2. Transmit: Attacker executes

```
Ld1: uint8_t byte = *kernel_address;
Ld2: unit8_t dummy = probe_array[byte*64];
```



3. Receive: After handling protection fault, attacker performs cache side channel attack to figure out which line of probe\_array is accessed → recovers byte

# **Meltdown Type Attacks**

- Can be used to read arbitrary memory
- Leaks across privilege levels
  - OS ← → Application
  - SGX ←→ Application (e.g., Foreshadow)
  - Etc

# **Meltdown Type Attacks**

- Can be used to read arbitrary memory
- Leaks across privilege levels
  - OS ← → Application
  - SGX ← → Application (e.g., Foreshadow)
  - Etc
- Mitigations:
  - Stall speculation
  - Register poisoning

# **Meltdown Type Attacks**

- Can be used to read arbitrary memory
- Leaks across privilege levels
  - OS ← → Application
  - SGX ← → Application (e.g., Foreshadow)
  - Etc
- Mitigations:
  - Stall speculation
  - Register poisoning
- We generally consider it as a design bug

• Consider the following kernel code, e.g., in a system call

```
Br: if (x < size_array1) {
Ld1:     secret = array1[x]*64
Ld2:     y = array2[secret]
}</pre>
```



Consider the following kernel code, e.g., in a system call

```
Br: if (x < size_array1) {
Ld1:          secret = array1[x]*64
Ld2:          y = array2[secret]
     }</pre>
```



Attacker to read arbitrary memory:

1. Setup: Train branch predictor

• Consider the following kernel code, e.g., in a system call

```
Br: if (x < size_array1) {
Ld1:        secret = array1[x]*64
Ld2:        y = array2[secret]
    }</pre>
```



- 1. Setup: Train branch predictor
- 2. Transmit: Trigger branch misprediction; &array1[x] maps to some desired kernel address

• Consider the following kernel code, e.g., in a system call

```
Br: if (x < size_array1) {
Ld1:          secret = array1[x]*64
Ld2:          y = array2[secret]
     }</pre>
```



- 1. Setup: Train branch predictor
- 2. Transmit: Trigger branch misprediction; &array1[x] maps to some desired kernel address

Consider the following kernel code, e.g., in a system call

```
Br: if (x < size_array1) {
Ld1:          secret = array1[x]*64
Ld2:          y = array2[secret]
     }</pre>
```



- 1. Setup: Train branch predictor
- 2. Transmit: Trigger branch misprediction; &array1[x] maps to some desired kernel address
- 3. Receive: Attacker probes cache to infer which line of array2 was fetched

• Consider the following kernel code, e.g., in a syste

Br: if (x < size\_array1) {

Ld1: secret = array1[x]\*64

Ld2: y = array2[secret]

}

- 1. Setup: Train branch predictor
- 2. Transmit: Trigger branch misprediction; &array1[x] maps to some desired kernel address
- 3. Receive: Attacker probes cache to infer which line of array2 was fetched

• Consider the following kernel code, e.g., in a syste

Br: if (x < size\_array1) {

Ld1: secret = array1[x]\*64

Ld2: y = array2[secret]

}

ROB head

ROB head

| ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB

- 1. Setup: Train branch predictor
- 2. Transmit: Trigger branch misprediction; &array1[x] maps to some desired kernel address
- 3. Receive: Attacker probes cache to infer which line of array2 was fetched

• Consider the following kernel code, e.g., in a syste

Br: if (x < size\_array1) {

Ld1: secret = array1[x]\*64

Ld2: y = array2[secret]

}

ROB head

ROB head

| ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB head | ROB

- 1. Setup: Train branch predictor
- 2. Transmit: Trigger branch misprediction; &array1[x] maps to some desired kernel address
- 3. Receive: Attacker probes cache to infer which line of array2 was fetched

# **Spectre Variant 2 – Exploit Branch Target**

- Most BTBs store partial tags and targets...
  - <last n bits of current PC, target PC>

# **Spectre Variant 2 – Exploit Branch Target**

- Most BTBs store partial tags and targets...
  - <last n bits of current PC, target PC>



# **Spectre Variant 2 – Exploit Branch Target**

- Most BTBs store partial tags and targets...
  - <last n bits of current PC, target PC>



Train BTB properly -> Execute arbitrary gadgets speculatively



- Traditional (non-transient) attacks
  - Data-dependent program behavior
- Transient attacks
  - Meltdown = transient execution + deferred exception handling
  - Spectre = transient execution on wrong paths



Hard to fix

- Traditional (non-transient) attacks
  - Data-dependent program behavior
- Transient attacks
  - Meltdown = transient execution + deferred exception handling
  - Spectre = transient execution on wrong paths



- Traditional (non-transient) attacks
- Hard to fix
- Data-dependent program behavior
- Transient attacks
  - Meltdown = transient execution + deferred exception handling

"Easy" to fix

• Spectre = transient execution on wrong paths



• Traditional (non-transient) attacks

Hard to fix

- Data-dependent program behavior
- Transient attacks
  - Meltdown = transient execution + deferred exception handling
  - Spectre = transient execution on wrong paths



"Easy" to fix

# **Takeaways**

Transient execution attacks use (not "are") side/covert channels.

# **Takeaways**

Transient execution attacks use (not "are") side/covert channels.

"Spectre" (wrong-path execution) is **fundamental**. Speculation/prediction is not perfect.

# **Takeaways**

Transient execution attacks use (not "are") side/covert channels.

"Spectre" (wrong-path execution) is **fundamental**. Speculation/prediction is not perfect.

"Meltdown" (deferred exceptions) is not fundamental.

# Transient v.s. Non-transient





## Classification



| Secret accessed | Transmitter   | Classification                    |
|-----------------|---------------|-----------------------------------|
| Non-transient   | Non-transient | Traditional side channels         |
| Transient       | Non-transient | Not possible on today's machines? |
| Non-transient   | Transient     | Spectre                           |
| Transient       | Transient     | Spectre                           |

### What can leak?

A subset of committed architectural state, at each point in the program's dynamic execution.

### What can leak?

A subset of committed architectural state, at each point in the program's dynamic execution.

```
secret <- load(0x5)
secret <- secret + 1
secret -> store(0x5)
```

### What can leak?

A subset of committed architectural state, at each point in the program's dynamic execution.

```
secret <- load(0x5)
secret <- secret + 1
secret -> store(0x5)
```

secret does not leak
(assume '+' data independent)

### What can leak?

A subset of committed architectural state, at each point in the program's dynamic execution.

```
secret <- load(0x5)
secret <- secret + 1
secret -> store(0x5)
```

secret <- load(0x5)
Dummy<- load(secret)</pre>

secret does not leak
(assume '+' data independent)

**secret** leaks

### What can leak?

A subset of committed architectural state, at each point in the program's dynamic execution.

```
secret <- load(0x5)
secret <- secret + 1
secret -> store(0x5)
```

```
secret <- load(0x5)
Dummy<- load(secret)</pre>
```

secret <- load(0x5)
if (false)
 Dummy<-load(secret)</pre>

secret does not leak
(assume '+' data independent)

**secret** leaks

**secret** does not leak

```
secret <- load(0x5)
secret <- secret + 1
secret -> store(0x5)
```

```
secret <- load(0x5)
Dummy<- load(secret)</pre>
```

```
secret <- load(0x5)
if (false)
  Dummy<-load(secret)</pre>
```

**Non-transient secret + Non-transient transmitter:** 

**secret** does not leak

secret leaks

**secret** does not leak

```
secret <- load(0x5)
secret <- secret + 1
secret -> store(0x5)
```

```
secret <- load(0x5)
Dummy<- load(secret)</pre>
```

```
secret <- load(0x5)
if (false)
  Dummy<-load(secret)</pre>
```

Non-transient secret + Non-transient transmitter:

secret does not leak

secret leaks

et leaks secret does not leak

Non-transient secret + Transient secret :

```
secret <- load(0x5)
secret <- secret + 1
secret -> store(0x5)
```

```
secret <- load(0x5)
Dummy<- load(secret)</pre>
```

```
secret <- load(0x5)
if (false)
  Dummy<-load(secret)</pre>
```

**Non-transient secret + Non-transient transmitter:** 

**secret** does not leak

**secret** leaks

secret does not leak

Non-transient secret + Transient secret :

secret does not leak

secret leaks

```
secret <- load(0x5)
secret <- secret + 1
secret -> store(0x5)
```

```
secret <- load(0x5)
Dummy<- load(secret)</pre>
```

```
secret <- load(0x5)
if (false)
  Dummy<-load(secret)</pre>
```

**Non-transient secret + Non-transient transmitter:** 

secret does not leak

secret leaks

Non-transient secret + Transient secret :

secret does not leak

**secret** leaks

**secret** does not leak



secret leaks (!)









# **Next Lecture:**

Tiwari et al. Complete information flow tracking from the gates up. ASPLOS. 2009.



