# Fuzzing and Formal Verification to Find Hardware Bugs

Mengjia Yan

Spring 2023





### What is Errata?



### 8<sup>th</sup> and 9<sup>th</sup> Generation Intel<sup>®</sup> Core<sup>™</sup> Processor Family

**Specification Update** 

Supporting 8<sup>th</sup> Generation Intel<sup>®</sup> Core<sup>™</sup> Processor Families for S/H/U Platforms, formerly known as Coffee Lake

Supporting 9<sup>th</sup> Generation Intel® Core™ Processor Families Processors for S/H Platforms, formerly known as Coffee Lake Refresh

November 2019

Revision 002

It is a compilation of device and document errata and specification clarifications and changes, which is intended for hardware system manufacturers and for software developers of applications, operating system, and tools.

**Errata** are design defects or errors. Errata may cause the processor's behavior to deviate from published specifications. Hardware and software designed to be used with any given stepping must assume that all errata documented for that stepping are present on all devices.

# **Errata Table Example**

### **3.2 Errata Summary Information**

**Status** 

**Table 4-3. Errata Summary Table** 

003

004

|     |          |          | Processo | or Line / |          |          |           |                                                                                              |  |  |
|-----|----------|----------|----------|-----------|----------|----------|-----------|----------------------------------------------------------------------------------------------|--|--|
| ID  | s        |          |          |           | н        |          | U         | Title                                                                                        |  |  |
|     | B0<br>42 | U0<br>62 | P0<br>82 | R0<br>82  | U0<br>62 | R0<br>82 | D0<br>43e |                                                                                              |  |  |
| 001 | No Fix   | No Fix   | No Fix   | No Fix    | No Fix   | No Fix   | No Fix    | Reported Memory Type May Not Be Used to<br>Access the VMCS and Referenced Data<br>Structures |  |  |
| 002 | No Fix   | No Fix   |          | 001       | Rep      | orted N  | demory    | Type May Not Be Used to Access the                                                           |  |  |

| No Fix | No Fix | 001         | Reported Memory Type May Not Be Used to Access the VMCS and Referenced Data Structures                                                                |  |  |  |  |  |
|--------|--------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| No Fix | No Fix | Problem     |                                                                                                                                                       |  |  |  |  |  |
| No Fix | No Fix |             | instead use the memory type that the memory-type range registers (MTRRs) specify for the physical address of the access.                              |  |  |  |  |  |
|        |        | Implication | Bits 53:50 of the IA32_VMX_BASIC MSR report that the write-back (WB) memory type will be used, but the processor may use a different memory type.     |  |  |  |  |  |
|        |        | Workaround  | Software should ensure that the VMCS and referenced data structures are located at physical addresses that are mapped to WB memory type by the MTRRs. |  |  |  |  |  |

For the steppings affected, refer the Summary Table of Changes.

### **More Errata**



Occasionally, AMD identifies product errata that cause the processor to deviate from published specifications. Descriptions of identified product errata are designed to assist system and software designers in using the processors described in this revision guide. This revision guide may be updated periodically.

### 298 L2 Eviction May Occur During Processor Operation To Set Accessed or Dirty Bit

#### **Description**

The processor operation to change the accessed or dirty bits of a page translation table entry in the L2 from 0b to 1b may not be atomic. A small window of time exists where other cached operations may cause the stale page translation table entry to be installed in the L3 before the modified copy is returned to the L2.

In addition, if a probe for this cache line occurs during this window of time, the processor may not set the accessed or dirty bit and may corrupt data for an unrelated cached operation.

#### **Potential Effect on System**

One or more of the following events may occur:

- Machine check for an L3 protocol error. The MC4 status register (MSR0000\_0410) is B2000000\_000B0C0Fh or BA000000\_000B0C0Fh. The MC4 address register (MSR0000\_0412) is 26h.
- Loss of coherency on a cache line containing a page translation table entry.
- Data corruption.

#### **Suggested Workaround**

BIOS should set MSRC001 0015[3] (HWCR[TlbCacheDis]) to 1b and MSRC001 1023[1] to 1b.

In a multiprocessor platform, the workground above should be applied to all processors regardless of



### **Errata Statistics**



The Core-i7 processor with integrated graphics card in 2012 with 1,400M Transistors

4 years; 136 errata; 3 bugs/month

A **4-fold increase** in bugs in Intel processor designs **per generation**. Approximately 8000 bugs designed into the Pentium 4 ('Willamette')

from https://www.cl.cam.ac.uk/~jrh13/slides/nijmegen-21jun02/slides.pdf

### **Outline**

- Hardware Bug Examples
  - How do they look like? The discovery process? Impact?
  - #1: The famous Pentium FDIV bug
  - #2: SYSRET 64-bit OS privilege escalation vulnerability on Intel CPU
  - #3: Branch history injection attack
- How to discover hardware bugs?
  - Manual efforts
  - Fuzzing
  - Formal verification

# **Bug #1: Pentium FDIV Bug**

- What is the specification for floating-point computation?
  - Floating is encoded as  $(1 + f) \times 2^e$ ,  $0 \le f < 1$ ,  $e \in Z$
  - Example:  $1/10 = 1.9999 \dots 9a \times 2^{-4}$  (in hexadecimal)
  - We always have errors when doing floating-point computation, because we have limited number of bits for each floating number
- The specification allows error to occur after bit x

|                                                                             | Single<br>precision                                                           | Double<br>precision                                                               | Extended precision                                                                                  |
|-----------------------------------------------------------------------------|-------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|
| Word size in bits Bits for f Bits for e Relative accuracy Approximate range | 32 23 8 $2^{-23} \approx 1.2 \cdot 10^{-7}$ $2^{\pm 127} \approx 10^{\pm 38}$ | 64 52 11 $2^{-52} \approx 2.2 \cdot 10^{-16}$ $2^{\pm 1023} \approx 10^{\pm 308}$ | 80<br>63<br>15<br>$2^{-63} \approx 1.1 \cdot 10^{-19}$<br>$2^{\pm_{16383}} \approx 10^{\pm_{4964}}$ |



# The Discovery Process #1: Nicely's Prime

- Thomas Nicely, a mathematics professor, tried to compute reciprocal of prime numbers: p = 824,633,702,441
- The correct result:

$$1/p = 1.212659629408667 \times 10^{-12}$$

• But the new Pentium processor gives:

$$1/p = 1.212659624891158 \times 10^{-12}$$

Took him four months to confirm the problem was not in his program
 -> math libraries -> compilers -> operating system, but in the hardware

CI3 ...:

Any other

Differ after

the 9th digit

# The Discovery Process #2: Kaiser's List

- Andreas Kaiser, a computer consultant
  - Generate 25 *billion* random integers and checked the accuracy of the computed reciprocals. 23 are incorrect.

```
3221224323 = 1.7ffff70600000 \cdot 2^{31}
12884897291 = 1.7ffff70580000 \cdot 2^{33}
206158356633 = 1.7ffff704c8000 \cdot 2^{37}
824633702441 = 1.7fffff7052000 \cdot 2^{39}
1443107810341 = 1.4fffedac25000 \cdot 2^{40}
6597069619549 = 1.7fffff7057400 \cdot 2^{42}
9895574626641 = 1.1fffc6bc2a200 \cdot 2^{43}
13194134824767 = 1.7ffff704e7e00 \cdot 2^{43}
26388269649885 = 1.7ffff704fdd00 \cdot 2^{44}
52776539295213 = 1.7ffff7046f680 \cdot 2^{45}
```

#### Patterns?

- Many are started with 1.7ffff
- In another word, the first 20 bits after the leading bit have to be a single zero, followed by at least 19 ones

# The Discovery Process #3: Coe's Ratio

• Tim Coe, electrical engineer, has designed floating-point chips

• 
$$\frac{4,195,835}{3,145,727} = 1.33382044$$
 ... (correct) 1.33373906... (Pentium)



The erorrs involve y/x where x and y's bit patterns conspire to excite the bug at an early stage in the division.

Differ after

the 4th digit

# **Bug Explanation: FDIV**



A combination of trial and error, experience, pattern matching and luck.

• Old processors: choose quotient from 0, 1

- Faster <u>Sweeney</u>, <u>Robertson</u>, <u>and Tocher</u> (SRT) algorithm Radix-4:
  - Choose quotient from 0, +1, -1, +2, -2;
  - If the current quotient is incorrectly chosen, we can recover it from the next iteration
  - Guess the quotient based on the first few digits => use a 2D table to lookup

# **Bug Explanation: SRT Table**

first 5 bits of the divisor

first 7 bits of the remainder





- 2048 cells in total
- 1066 cells in use
- 5 cells are not initialized
- When the bug will be triggered?



# How Frequently the bug can be triggered?

- Intel: an average spreadsheet user could encounter this flaw once in every 27,000 years, assuming 1,000 divisions per day.
- IBM: suspended sales of Pentium-based models and said it is as many as 20 mistakes per day.
- Who actually got affected?
  - Normal users?
  - Wall street? Financial pre-diction programs? Did the Pentium bug flip a trading decision from buy to hold to sell?
  - Difficult to calibrate

# Consequences/Impacts

- Intel's bad responses
  - Conditional replacement (customers need to claim they do get influenced by the bug) → disastrous press
  - No-questions-asked replacement → \$475M cost in 1994, 10% replacements
- Potential long-term impact:
  - Random test is not be a good idea.
     Exhaustive test has scalability problem.
  - A marked increase in the use of formal verification and number theory in hardware design

#### Some humor for you:

Q: How many Pentium designers does it take to screw in a light bulb?

A: 1.99904274017, but that's close enough for non-technical people.

Q: What do you get when you cross a Pentium PC with a research grant?

A: A mad scientist.

Do you think it bothers x86 users that the 486 is a functional upgrade to the Pentium?

In response to the Pentium bug, PowerMac officials have announced that they will be adding the control panel "Pentium Switcher" that allows users to decide whether the PowerMac should emulate pre-Pentium or post-Pentium FDIV behaviour.

#### TOP TEN NEW INTEL SLOGANS FOR THE PENTIUM

-----

9.999973251 It's a FLAW, Dammit, not a Bug

8.9999163362 It's Close Enough, We Say So

**7.9999414610 Nearly 300 Correct Opcodes** 

6.9999831538 You Don't Need to Know What's Inside

5.9999835137 Redefining the PC--and Mathematics As Well

4.999999021 We Fixed It, Really

3.9998245917 Division Considered Harmful

2.9991523619 Why Do You Think They Call It \*Floating\* Point?

1.9999103517 We're Looking for a Few Good Flaws

**0.999999998** The Errata Inside

# **Bug #2: A SYSRET Bug**

64-bit x86 instruction set: AMD64, Intel 64

#### **SYSCALL**

- HW transits from user mode to kernel mode
- Save the userspace next-PC to the RCX register
- Jump to a kernel syscall entry point



#### **SYSRET**

- HW transits from kernel mode to user mode
- Restore the userspace next-PC from the RCX register

# **Two Different Specifications for SYSRET**

AMD SYSRET HW transits from kernel mode to user mode

Restore the userspace next-PC from the RCX register

Restore the userspace next-PC from the RCX register

HW transits from kernel mode to user mode



# **SYSRET Vulnerability**



If RCX holds a non-canonical address, the SYSRET will generates a #GP (general protection fault) Canonical means that given 48-bit virtual address space, the high 16 bits (bits 63-48) of a virtual address have same value as bit 47.

### **SYSRET Attack on Intel Processors**





Before executing SYSRET, all registers have been restored using usermode context

Assume rip points to kernel stack and start using it --> can overwrite kernel data

### Who to blame?

- Intel claims it is not an errata
  - Errata are design defects or errors that may cause ... behavior to deviate from published specifications.
  - This behavior is consistent with Intel's specification
  - So the problem is the specification is incorrect
- Intel SDM (software development manual) 3400 pages. We cannot assume the specification is always correct.
- Some research efforts to verify ISA specification

# **Bug #3: eIBRS Vulnerability**

- Recap Spectre v2
- eIBRS: Enhanced Indirect Branch Restricted Speculation. Advertised as a mitigation against Spectre v2.



#### **Specification:**

Do not let lower-privileged code to interfere the branch prediction target of the high-privilege code.

OR

Isolate BTB entries across privilege levelso

What does this mean? Non-interference? A vague specification.

Barberis et al. Branch History Injection: On the Effectiveness of Hardware Mitigations Against Cross-Privilege Spectre-v2 Attacks. USENIX'22 https://www.vusec.net/projects/bhi-spectre-bhb/

### The Problem

• #1: Userspace code can trigger different system calls and let kernel

• #2: Userspace prediction history can affect kernel space btb

prediction



Problem: Security
definition in human
language and can be
vague and interpreted
in various ways.

# Summary

- Hardware bugs
  - Errata that deviate from the functional and security specification
  - Incorrect specification
  - Vague specification
- How to find hardware bugs?
  - Get ideas from the software

# **Software Bugs Hunting/Fixing**

- Approach 1: Hire a lot of experts and stare at the code
  - Basically Intel was on it in the last few years without showing the code
  - Black-box hacking



- Fuzzing
- Approach 3: Formal verification



# **Fuzzing**

# **Fuzzing In A Nutshell**

- Automatic generate test examples
- 1999, Alan Cox at University of Wales discovered a vulnerability in Linux kernel by simply running a proram generating random input and feed into the kernel
- Crash is generated by assertions/specifications
- Simple yet effective
- Industry standard



# **Fuzzing Components**



- Random seeds
  - Sometimes need formatted inputs, e.g.,
     PDF reader, the demo from last lecture
- A criteria to check whether the outcome is as expected or not.
  - Specification
  - Security invariant (paper discussion SPECS)
  - Assertions (address sanitizer)
- Heuristics for generating new tests => feedback loop for better efficiency

# **Types of Fuzzing**

- Blackbox
- Greybox
- Whitebox



Collected coverage:

1 6 2 6 0 2 1 7

# **Example: Hidden Instructions**

 Hidden instructions: secret instructions that give backdoor or powerful access to processor internals

Secret processor functionality: Appendix H

- An example:
  - Pentium F00F bug, an invalid instruction freezes the cpu, discovered in 1997
  - A Ring 3 process can DOS (denial of service) a process
  - The invalid instruction encoding is: FO OF C7 [C8-CF]

### **Search for Hidden Instructions**

Instructions:

OF 6A 60 6A 79 6D C6 02 ...

**Valid instructions (in spec)** 

Invalid instructions (#UD exception, invalid opcode)

Hidden instructions (not in spec, but can execute, no #UD exception)

|   | 0      | 1                         | 2                        | 3      | 4      | 5       | 6                 | 7                  |
|---|--------|---------------------------|--------------------------|--------|--------|---------|-------------------|--------------------|
| 0 |        | PUSH<br>ES <sup>i64</sup> | POP<br>ES <sup>i64</sup> |        |        |         |                   |                    |
|   | Eb, Gb | Ev, Gv                    | Gb, Eb                   | Gv, Ev | AL, Ib | rAX, Iz | ES <sup>164</sup> | ES <sup>164</sup>  |
| 1 | ADC    |                           |                          |        |        |         |                   | POP                |
|   | Eb, Gb | Ev, Gv                    | Gb, Eb                   | Gv, Ev | AL, Ib | rAX, Iz | SS <sup>i64</sup> | SS <sup>i64</sup>  |
| 2 | AND    |                           |                          |        |        |         |                   | DAA <sup>i6</sup>  |
|   | Eb, Gb | Ev, Gv                    | Gb, Eb                   | Gv, Ev | AL, Ib | rAX, Iz | (Prefix)          |                    |
| 3 | XOR    |                           |                          |        |        |         |                   | AAA <sup>i64</sup> |
|   | Eb, Gb | Ev, Gv                    | Gb, Eb                   | Gv, Ev | AL, Ib | rAX, Iz | (Prefix)          |                    |
| 4 |        | •                         |                          |        |        |         |                   |                    |

**ISA** specification:

# **Challenges #1: Large Space**

- CISC: Variable length instructions
  - One-byte instruction: 0x40 -> inc eax
  - 15-byte instruction: 2e67f048 818480 23df067e 89abcdef -> lock add qword cs:[eax + 4 \* eax + 07e06df23h], 0efcdab89h
  - Worst-case exhaustive search: 256<sup>15</sup>
- Observation: the meaningful bytes of an x86 instruction impact either its length or its exception behavior
- A potential solution: depth-first search

```
OF 6A 60 6A 79 6D C6 02 ...
```

# **Challenges #2: Measure Instruction Length**

- Trap flag
  - Execute an instruction, set PC to the next instruction, and go to trap handler
  - Inside the trap hander, observe instruction length
- How to deal with privilege instructions?
  - Trap in user space. Will not advance the PC
- A potential solution: page fault analysis

A page fault means the instruction length is longer than guessed



# **Engineering Efforts to Survive**

Hack the kernel to hook page fault handler to catch the instruction

 Hack various fault handler inside the kernel in case the the hidden instruction traps

- A lot more...
  - watch the talk, learn in recitation #4 and lab 6

# SandSifter and Findings

```
sbb byte ptr [edi], ah
  mov dword ptr [0x141a1726], eax
  shr byte ptr [esi - 0x4fc2db6c], 0xfa
  xor eax, 0xdlaa9221
  adc dword ptr [esi + 0x46], 0x2084b8d1
  jmp 0x15
stosb byte ptr es:[edi], al
scasd eax, dword ptr es:[edi]
imul esp, dword ptr [edx + ecx*4 - 0x17], -0x75
insd dword ptr es:[edi], dx
```

Hidden instructions across
 Intel and AMD processors

 Software bugs in disassemblers, such as IDA, objdump, VS, etc.

 Hardware errata, something like FOOF

### **Formal Verification**

- Limitations of fuzzing:
  - Heuristics coverage, no guarantee of comprehensiveness
  - Not good at certain types of bugs, waste computation powers
- Many formal verification techniques, requiring formal-methods expertise (6.512)
  - We cover some automatic technique: symbolic execution

```
int obscure(int x, int y)
{
   if (x==hash(y))
     error();
   return 0;
}
```

**Fuzzing and Concrete Execution** 

**Concrete instructions:** 

OF 6A 60 6A 79 6D C6 02 ...



**ISA** specification:

Valid instructions (in spec)

Invalid instructions (#UD exception, invalid opcode)

Hidden instructions (not in spec, but can execute, no #UD exception)

|   | 0      | 1                         | 2                        | 3      | 4      | 5       | 6                 | 7                  |
|---|--------|---------------------------|--------------------------|--------|--------|---------|-------------------|--------------------|
| 0 |        | PUSH                      | POP                      |        |        |         |                   |                    |
|   | Eb, Gb | Ev, Gv                    | Gb, Eb                   | Gv, Ev | AL, Ib | rAX, Iz | ES <sup>i64</sup> | ES <sup>i64</sup>  |
| 1 |        | PUSH<br>SS <sup>i64</sup> | POP<br>SS <sup>i64</sup> |        |        |         |                   |                    |
|   | Eb, Gb | Ev, Gv                    | Gb, Eb                   | Gv, Ev | AL, lb | rAX, Iz | SS <sup>104</sup> | SSID4              |
| 2 | AND    |                           |                          |        |        |         |                   | DAA <sup>i64</sup> |
|   | Eb, Gb | Ev, Gv                    | Gb, Eb                   | Gv, Ev | AL, lb | rAX, Iz | (Prefix)          |                    |
| 3 | XOR    |                           |                          |        |        |         |                   | AAA <sup>i64</sup> |
|   | Eb, Gb | Ev, Gv                    | Gb, Eb                   | Gv, Ev | AL, lb | rAX, Iz | (Prefix)          |                    |
| 4 |        | •                         |                          |        |        |         |                   |                    |

# **Symbolic Execution**

//4 x 2 Priority encoder
module priority\_encoder(out, in);

input [3:0] in;
output reg [1:0] out;
always @ (in)

begin

casex(in)
 4'b0001:out = 2'b00;
 4'b01x:out = 2'b01;
 4'b01x:out = 2'b10;
 4'b1xx:out = 2'b10;
 4'b1xx:out = 2'b11;
 default:out = 2'b00;
endcase

end
endmodule



Symbolic execution engine



**ISA** specification:

Pass (Implementation matches specification)

Counter example (e.g., hidden instructions)



# A Simple Example #1

### C code:

```
int hash(int z){
   return (z+10)*2;
int obscure(int x, int y)
  if (x==hash(y))
     error();
  return 0;
```

### Rosette code:

```
(define (hash z)
  (* (+ z 10) 2)
(define (obscure x y)
  (if (= x (hash y))
         (assert #f)
```

# A Simple Example #2

```
int obscure(int x, int y)
  if (x==hash(y))
     error();
  return 0;
int hash(int z){
   if (z>10)
     z = z-10;
   return z;
```

Build execution tree with all the execution paths

 Each execution path has logical formula to describe path conditions

# **Symbolic Execution**

- Convert the program into a large math formula and ask solvers to solve it.
- The usual pitfall: scalability issues

- Recitation #5:
  - Tool to lift Verilog code to Rosette code -> automatic hardware bug finding

# Summary

- Hardware bugs
  - Deviate from specification (errata)
  - Incorrect and vague specification

- Potential approaches to find hardware bugs
  - Manual analysis, testing
  - Fuzzing
  - Symbolic execution