CPU Fuzzing Lab

Overview: Pretty Secure Processor
Part 1: Mystery of the Faulty CPU (10%)
Part 2: Fuzzing for Faulty Inputs (30%)
Part 3: Fuzzing for a Hidden CPU Backdoor (45%)
Part 4: Using the Backdoor to Exploit a Remote System (15%)
Part 5: Bug Bounty (Optional, 10% Bonus)
Part 6: Challenge Problem (Optional, No Credit)

Lab Details

The CPU Fuzzing lab was released on April 26, and is due on May 16.

Collaboration Policy

Our full Academic Honesty policy can be found on the Course Information page of our website. As a reminder, all 6.S983 labs should be completed individually. You may discuss the lab at a high level with a classmate, but you may not work on code together or share any of your code.

Getting Started

Log into Unicorn: unicorn.csail.mit.edu. To connect via ssh run ssh your_username@unicorn.csail.mit.edu.

We are using git for this lab – instructions for setting up the git repository can be found on the labs page. In addition to submitting code, you are required to submit a project report containing your answers to Exercises/Discussion Questions to gradescope.

Warning: Start Early!

This lab is a potpourri of many systems level programming tasks, including writing low-level RISC-V assembly and C and automating tasks with Python. As there are a large number of open-ended things to do in this lab, we highly encourage you start early on this one.

Overview: Pretty Secure Processor

In this lab we will explore utilizing CPU fuzzing techniques to discover RTL bugs in a pipelined RISC-V CPU. Specifically, we will be executing all of our attacks on a modified version of Pretty Secure Processor, a CPU designed by your CA Joseph. This modified CPU was custom built specifically for this lab.

This page contains a complete description of the PSP platform- we highly encourage you to refer to it. Specifically, the sections on CSRs and serial IO will be useful.

What am I building in this lab?

In this lab you will implement a number of low-level CPU tasks in C and RISC-V assembly to build a CPU fuzzing engine.

Your code will run on the simulated CPU bare metal. That is, you have complete control of the hardware, and your code runs inside the CPU being tested. Your job is to write code that tests the hardware and reliably reports whether the hardware (which may have CPU bugs) is operating correctly or not.

At the end of the lab, you will use the CPU bugs you have discovered and the knowledge of return-oriented programming from the prefetch lab to build an end-to-end exploit on a remote Pretty Secure Processor.

What happens when I run my code?

All of your lab code runs on Pretty Secure Processor, a 5 stage pipelined RISC-V CPU. When your code is built, we send it to a cycle-accurate simulated Pretty Secure Processor core and let it run.

By simulated, we mean that the RTL (the SystemVerilog that defines the CPU) itself is simulated in a cycle-accurate simulator. Your code is not running in an emulator (like Qemu) nor an ISA simulator- we are simulating everything down to the cycle level! That means that microarchitectural properties like pipelining, hazard forwarding, etc. are all in play for this lab.

The entire CPU pipeline is simulated at a cycle accurate level: no emulation here!

(Note that this diagram is for demonstrative purposes only and does not show the entire pipeline. You do not need to understand this diagram in detail, it is provided purely to illustrate the core from a bird’s eye view).

Running your code

From the top directory of the lab code, there are two scripts- run.sh and gdb.sh. To run a given part, first cd into the part and compile it with make. You will need to rebuild each part with make whenever you change any of the code.

Then, from the lab root directory, you can run a given part with ./run.sh partX.

Adding Code

Our Makefile will automatically build and link any extra files you create. You can create new C files with the .c extension and assembly source files with the .s extension. Note that for symbols to be exported from an assembly file they need to be marked .global.

Debug Mode

To enable debug mode, edit debug.sh and fill in SHD_DEBUG_PORT with the debug port that was emailed to you.

Then, you can add --debug to the end of the run.sh invocation to start debug mode.

You should see something like the following:

$ ./run.sh part2 --debug
Waiting for debugger on port 1337...

You can now open a new terminal window and run gdb.sh.

You should see Debugger attached! printed in the original terminal, and GDB should give you a session like the following:

Reading symbols from obj_dir/kernel.elf...
Remote debugging using localhost:1337
start () at bringup.s:44
44      la x1, exception_handler_entry
(gdb)

Now you are in a GDB session running on PSP. You can set breakpoints, inspect registers, and step through the assembly to see how your code is behaving. In case you have forgotten the GDB commands, you can refer to the ASLR bypass lab cheat sheet.

Part 1: Mystery of the Faulty CPU (10%)

Congratulations! You landed your dream job as a CPU design engineer at the Pear Computer Company. Your manager has tasked you with using Pretty Secure Processor as the CPU for the next Pear Phone Pro.

The PearPhone Pro

You sent an early version of your phone out to the Maps test team so they could test the navigation algorithms. To your surprise, they reported that the new phone was producing different shortest paths than expected. The Maps team conducted a thorough review of their software, and found no possible errors- they have concluded it must be a hardware bug, and your manager agrees with them.

The Maps team provided the following example graph and expected output:

The expected output:

Distance of node 1=0x400
Path=1<-0
Distance of node 2=0x500
Path=2<-1<-0
Distance of node 3=0x600
Path=3<-2<-1<-0

The output from the faulty CPU:

Distance of node 1=0x400
Path=1<-0
Distance of node 2=0x500
Path=2<-1<-0
Distance of node 3=0x1000
Path=3<-0

The team is using Dijkstra’s Algorithm to find the shortest path between the nodes. They have tracked the bug to happening at this line in the code:

if(mindistance+cost[nextnode][i]<distance[i])

In this part, we will analyze the CPU to figure out what is going wrong.

Getting Familiar with the Codebase

Before we dig into the CPU internals, let’s get familiar with the lab infrastructure. In this part, you will be asked to navigate the codebase and answer several questions. No need to write any code.

1.1 Building and Running the Starter Code

Checkout the lab codebase into your account on Unicorn.
cd into the part1 directory and build it with make
Run the code with ./run.sh part1 from within the top level directory. You should see the incorrect output as shown above.

Now, open the part1 folder in your favorite code editor, and answer the following questions about the starter code in your report. These questions are intended to help guide you through the starter code- feel free to explore the code on your own too!

1-1 Discussion Question

Open bringup.s and read through the start method. What is the name of the C method that start jumps to when it finishes by running mret? Which CSR is used to hold that address?

start specifies an exception handler by writing to CSR_MTVEC. What is the name of the handler? What file is it implemented in?

linker.ld defines the memory map of the binary built by the build system. What section is the stack located in? How large is the stack?

Open asm_offsets.h. This file is automatically generated when you compile the project. What is the offset of the mepc field of saved_regs_t? Where is saved_regs_t defined?

Which RISC-V register is used as the return address in the calling convention? Which register is used as the stack pointer? (Hint: Refer to Table 18.2 of the RISC-V Calling Convention)

What privilege mode does the starter code run the CPU in?

1.2 Debugging the CPU

Let’s inspect the code from the Maps team to learn why the CPU is not finding the correct shortest path. Open dijkstra.c and navigate to line 55.

1-2 Exercise

Let’s debug the CPU using the basic approach: using printf. Insert printf statements into dijkstra.c before line 55 to print the values of the operations performed by the if statement. Print all information relevant to the operation- that is, print the value of mindistance, cost[nextnode][i], their sum as computed by the CPU, and distance[i].

Look at the CPU reported output of mindistance+cost[nextnode][i] and see whether you can capture the bug. We will not grade the coding for this part, since they are just a few printf.

1-3 Discussion Question

Based on the printf result, record what are the inputs (eg. values of mindistance and cost[nextnode][i]) that cause the addition instruction to return an incorrect sum? Does the operation fail for all inputs or just specific ones?

(Note: You do not need to exhaustively test the input space, just report the ones you observe from running the starter program).

Submission and Grading

Write the answers to the questions in your report. No code for this part.

Part 2: Fuzzing for Faulty Inputs (30%)

Impressed with your ability to understand incorrect CPU behavior, your boss has assigned you a new task- to write a program that can automatically test a given CPU to see if the addition bug is present, and if so, return what inputs cause the sum to fail.

In the industry, we call such a program a regression test used to ensure that known bugs do not accidentally get reintroduced to the RTL before tapeout (when the CPU design is sent to the factory).

We assume a bug is a set of inputs a and b that, when added together, produces an output that is not equal to a+b, just like we saw with the faulty results running Dijkstra’s algorithm above. We assume that the order of operands does not matter (that is, if a+b fails, then b+a fails too). We limit the search space of operands to [0x100,0x1FF] for both a and b (that is, only consider values of a and b from 256 to 511 inclusive).

Your task is to check the addition operator for all operands in the search space, and print out any operands that fail. Note that you cannot trust the CPU to perform the add instruction correctly (you can assume all other arithmetic instructions are bug-free). How do you know if the CPU made a mistake? Or alternatively, how do you implement the specification for the add instruction in your code? This is an open-ended problem with multiple possible solutions.

2-1 Discussion Question

What is your approach for finding the operands that can trigger the bug with the add instruction? Briefly describe it.

Starting from this point, you are free to modify any C or assembly file you’d like!

Additionally, you can add as many source files as you want- the build system will automatically build and link them for you!

2-2 Exercise

Navigate into the part2 folder and create two new files- fuzz.c and fuzz.h.

Create a method void part2_fuzz_inputs() and modify shd.c to call your new method from shd_main. (You may need to #include "fuzz.h" and define a function signature for your new method).

Your method should analyze all possible sums for operands between 256 and 511 (inclusive). If it finds any inputs that fail to produce the correct sum, it should print the input pair and what the erroneous sum was.

For interacting with our grading scripts, you should use the following format specifier to print out failing operands: printf("0x%X+0x%X=0x%X,0x%X\n",a,b,a+b,expected) (where a and b are the buggy inputs, and expected is what they should have added up to). It is ok if the same inputs appear twice in swapped order (eg. if you print both a,b,a+b,expected and b,a,b+a,expected).

Your output should look like:

+------------------+
| MIT SHD Fuzz Lab |
|      Part 2      |
+------------------+
0x10+0x10=0x30,0x20
0x15+0x15=0x16,0x2A
...

We will grade your lab on a CPU with random bugs!

Do not assume the bugs present on the lab handout CPU will be the same that your code is graded against- your code should exhaustively search the input space to find and report all bugs!

You should assume there can be zero or multiple addition bugs (inputs that cause the addition to fail) with randomly distributed operands.

2-3 Discussion Question (optional)

After completing part 2, please share with us: Did you encounter any bugs while implementing the fuzzer? How did you handle the possibility of add instructions being incorrect during control flow instructions (eg. loop condition checks)?

Suggested Strategies / Hints

We recommend testing your code by compiling it for the host Linux machine (gcc fuzz.c -o fuzz) and using that as a “Gold” CPU model. On the Linux host your code should report no bugs, and on Pretty Secure Processor it should report some bugs.
Note that you cannot trust the CPU hardware to perform addition correctly. What happens if your loop logic requires a sum that is computed incorrectly?

You can use Python or a shell script to automate generation of your test cases. If you do, please commit these scripts to your repository as well, and include instructions for how to use them in your report.

Submission and Grading

git add all the new files you created for Part 2, including any Python or shell scripts you wrote to automate generation of your code. Running ./run.sh part2 should create a printout of the bugs your fuzzer found. Your code should run for no longer than 10 minutes.

Part 3: Fuzzing for a Hidden CPU Backdoor (45%)

Bad news! This morning, a news report came out describing in the wild cases of PearPhone kernels being compromised by a new strain of malware. This malware is somehow capable of elevating the CPU privilege level from PSP_PRIV_USER to PSP_PRIV_MACHINE, bypassing all OS restrictions. While the CPU design team believes this is impossible, your boss has a suspicion- that someone inserted a malicious backdoor into the CPU itself!

Whenever the CPU attempts to execute an undefined instruction, it is supposed to throw an exception with mcause set to EXCEPTION_CAUSE_INVALID_INST (see defines.h:51). Your boss believes the CPU backdoor is a single instruction encoded in an undefined region of the RISC-V ISA instruction encoding space that, when executed, does not throw an exception, but instead sneakily switches the privilege mode of the pipeline. An attacker who knows the encoding of this hidden instruction can use this to bypass the PearPhone OS security mechanisms.

In this part, you will write a hidden instruction fuzzer to search automatically for backdoors in the CPU that elevate your privilege level to machine mode. We will start with searching for undocumented instructions, and then study their behavior.

The following provides an overview of the intended approach for finding undocumented instructions:

Whenever the CPU encounters an instruction it cannot execute, it will throw an invalid instruction exception. For every instruction in the search space, we attempt to execute it, and watch for an exception. If any of them don’t generate an exception, we know that the CPU “understands” it as a valid opcode, and therefore it is a good candidate for being a CPU backdoor.

3.1 Writing an Exception Handler

Before we can search for undefined instructions, we need to be able to handle exceptions. For actually undefined instructions, trying to run them will cause the exception handler to run. For any backdoors, the exception handler will not be called (as the backdoor is decoded as a valid instruction).

Here is a description of a baseline exception handler. It saves all registers (x1-x31 and mepc, which holds the pc value of the faulting instruction), handles the exception logic (possibly changing the values of some of the saved registers), and then loads the new CPU state before finally executing mret.

We have provided you the first half of the exception handler that 1) creates space on the stack for a new saved_regs_t, 2) stores the registers into it, and 3) passes the address of this struct to exception.c. We have also provided a test method named exception_test() that generates an illegal instruction exception and tests your exception handler to see whether it is saving and restoring the CPU state correctly. Start by modifying shd_main to call exception_test() (defined in exception_test.s). You should see something like the following:

+------------------+
| MIT SHD Fuzz Lab |
|      Part 3      |
+------------------+
Triggering exception...
=============================
Exception!
pc:  0x000005F0
x1:  0x11111111    x2:  0x00002FD0    x3:  0x33333333    x4:  0x44444444
x5:  0x55555555    x6:  0x66666666    x7:  0x77777777    x8:  0x88888888
x9:  0x99999999    x10: 0x10101010    x11: 0x11111111    x12: 0x12121212
x13: 0x13131313    x14: 0x14141414    x15: 0x15151515    x16: 0x16161616
x17: 0x17171717    x18: 0x18181818    x19: 0x19191919    x20: 0x20202020
x21: 0x21212121    x22: 0x22222222    x23: 0x23232323    x24: 0x24242424
x25: 0x25252525    x26: 0x26262626    x27: 0x27272727    x28: 0x28282828
x29: 0x29292929    x30: 0x30303030    x31: 0x31313131
TODO: You need to write the exception handler return code!

3-1 Exercise

Read through the starter provided code in exception_entry.s. Your task is to implement the second half of the exception handler assembly in exception_entry.s, and fill in exception.c with any logic your code needs.

First, remove the code that causes the handler to print TODO: You need to write the exception handler return code! and halt. Next, you should fill in the TODO’s in exception_entry.s to restore the saved state (the opposite of what the starter code does).

Finally, fill in the TODO in exception_handler in exception.c (the C method called by the handler entry assembly) to properly handle the fault. If the exception cause was an illegal instruction, at minimum this means incrementing the saved mepc by one RISC-V instruction length to resume execution at the next assembly instruction after the undefined one. You will add more logic here in the later parts.

If implemented correctly, exception_test() should print Passed exception test!.

3-2 Discussion Question

Why does the exception handler restore x2 after all the other registers?

Here is how RISC-V exception returns are handled by the Linux kernel.

Note that the starter code is just a suggested approach- feel free to remove it and rewrite it however you want!

This part is considered complete if exception_test reports your exception handler passed.

In later parts, you are allowed to modify your exception handler in any way you want, including in ways that make exception_test fail.

exception_test will report a failure if any register x1-x31 is changed by the exception handler. For this lab, you may actually want to modify registers in the saved_regs_t, which is acceptable!

You will not lose points if exception_test says your code fails. It is just a guideline to help you implement Part 3.

We recommend starting with an exception handler that passes exception_test to ensure your assembly is correct, and adjusting it later in the lab as you want. For this reason, we also recommend git commit‘ing your code at this current state, in case you later break your exception handler and want to restore it to the previously working version.

3.2 Searching for Undocumented Hidden Instructions

Now that we can catch illegal instruction exceptions, it’s time to scan for backdoors. Your boss wants you to only consider the custom-0 subspace of the instruction encoding space. custom-0 has an opcode of 0001011, or 0x0B (See Table 24.1 in Volume I of the RISC-V ISA Specification on page 129). The CPU does not officially support any custom-0 instructions, so if you find one that doesn’t throw an illegal instruction exception, it is a hidden instruction!

Your boss said that while the opcode is probably custom-0, the decode stage of the pipeline is likely stealthily rejecting all instructions with this opcode whose upper immediate bits do not match some secret. You should treat this backdoor instruction as a U type RISC-V instruction, where the upper immediate bits (bits 31 to 12) are a secret, and the opcode is custom-0.

Recall the RISC-V 32I instruction encodings (Volume I Figure 2.2 on Page 16):

In this part, you will loop over all custom-0 instructions with different possible upper immediate bits. That is, you should generate all instructions where the opcode field (bits 6:0) is set to custom-0, and the upper bits (31:12) are set to a guessed secret value. You will execute each instruction, and then check if a CPU illegal instruction exception occurred. If it did not, you should print out the full encoding that caused no exception.

You can calculate a single instruction encoding with the following:

OP_CUSTOM0 = 0x0B # See Table 24.1 in Volume I of the ISA Spec.
def encode_instruction(secret):
  return OP_CUSTOM0 | (secret << 12)

For this lab, only consider the cases where the upper immediate bits (secret value) are between 0 and 65535 inclusive.

High-level Guideline

You will need to test all possible instruction encodings in this region. If an instruction is invalid, it will trigger the execution of the invalid opcode exception handler. After executing the handler, we will return to the test code. If it is a hidden instruction, the code will just run through without executing the handler. You key task to identify a hidden instruction will be to distinguish between the case when an exception has happened and the case when an exception has not happened. At a high level, you will need to figure out a way for the invalid opcode exception handler to communicate information to the test code. This can be done via setting flags in memory or in specific registers or many other ways. You can pick the one you like to implement. It is ok if your design decision causes your exception handler to fail exception_test as long as you are able to discover the backdoor opcode.

Note that a key challenge in this part is that you will need to test many instructions. You will not want to write the code manually. Possible solutions include generating the test code automatically with Python, or making use of self-modifying code. If you write automation scripts, make sure to commit them to your repo, and include documentation of how they work in your report. (Only submitting a massive auto-generated assembly file is unacceptable).

For now, your test code runs with PSP_PRIV_MACHINE privileges (see bringup.s).

3-3 Discussion Question

Describe your design decisions for the exception handler. What does it do while trying to find the undocumented instruction? How does your exception handler communicate with the test code?

3-4 Exercise

Write a method that tries all 65536 possible custom-0 instructions, with the upper bits (bits 31 to 12) set to a value between [0,65535] inclusive. If it finds an instruction that does not throw an illegal instruction exception, it prints the opcode to serial with printf. If no instruction is found, it prints that it could not find any hidden instructions.

3-5 Discussion Question

Include the hidden backdoor instruction found by your code/script in the report.

This is an open ended programming task!

You are allowed to modify any and all parts of the starter code. You can implement this method in assembly or in C. You can create as many extra files as you need to. You will likely need to modify the exception handler as well to support your code.

A Few Assembly Tips

Here are a few assembly tips that may come in handy. Note that you do not necessarily need to use any or all of them.

Shared variables between C and ASM

Recall that assembly symbols can be exported to C with the .global keyword. So, to create a variable that is shared between C and assembly, I can write the following in assembly:
.global shared_var
shared_var: .word 0x1234
And then in C refer to it with the extern keyword:
extern uint32_t shared_var;

Emitting Arbitrary Instructions

You can emit an arbitrary encoding for an instruction with the .word keyword.

For example, here is how to add the instruction encoded by 0x41414141 to the program:
some_asm_method:
   add sp, sp, -4
   .word 0x41414141
   add sp, sp, 4
See exception_test.s:111 for an example!

Inline Assembly

You may find the inline assembly gcc features to be useful.

Hints/ Warnings

Recall that the CPU does have bugs with the add instruction, so be careful if your opcode enumeration code requires addition!
If your approach makes use of self modifying code, insert 5 nop instructions after your write instruction to flush the CPU pipeline (as if you try to write to an instruction and then immediately run it, a stale opcode may be in the pipeline).
Don’t forget to commit any extra files you add to your git repo!
You can automate parts of your solution by writing Python code to produce assembly in bulk (eg. print(f"la a0, {test_case}") will emit the assembly code to load a0 with the Python variable test_case). If you write Python code, commit the Python code to your repo.
Your exception handler needs to communicate with your test code in some way to report when exceptions occur. How this works is up to you.

This part is considered complete if your code finds and prints the opcode of the hidden backdoor instruction.

3.3 Elevating Privileges to Kernel Mode

Congrats on finding a hidden instruction! Your boss, amazed at your skills, wants to see if you can figure out how to activate the backdoor. You may have noticed that even after running the backdoor, you still cannot access privileged CSRs from PSP_PRIV_USER (try it!). Attempting to load a privileged CSR when the core is booted in PSP_PRIV_USER should throw a EXCEPTION_CAUSE_ILLEGAL_ACCESS (mcause=0x00000001) exception, even if the backdoor instruction is run first.

So, why doesn’t the backdoor work?

It seems that the authors of the backdoor were even sneakier than we thought, and added another condition that must be satisfied before the backdoor instruction elevates your privileges.

Your boss took the CPU to a focused-ion beam (FIB), and found a wire connecting the decode stage to register x10 that wasn’t in the original design. Your boss suspects that x10 must be loaded with a secret value before the backdoor is called.

So far, your code has been running in PSP_PRIV_MACHINE. Before continuing, update bringup.s to start your code with PSP_PRIV_USER instead. This way, we can check to see if we gained any extra privileges by running the backdoor instruction.

Additionally, for Part 3.3, you can hard-code the opcode you discovered in Part 3.2.

3-6 Discussion Question

Describe your approach for finding the correct x10 value. Does the exception handler behave differently when searching for the correct x10 value of that instruction compared to searching for hidden instructions?

3-7 Exercise

Write some code to try and load a privileged CSR (pick any CSR you want!) and run it without running the backdoor instruction. You should see that the load succeeds if bringup.s starts your code in PSP_PRIV_MACHINE mode, and throws an exception if it starts in PSP_PRIV_USER.

Now, while running in PSP_PRIV_USER, add a call to the backdoor instruction before the CSR read. You should still see an exception being thrown, even if you ran the backdoor instruction. This is because the backdoor is not called with the correct value for x10, and so it does not change the privilege level.

Modify your code to set x10 to values between [0,65535] inclusive before calling the backdoor. For each value in [0,65535], set x10 to the value, run the backdoor instruction discovered in Part 3.2, and then try to load a CSR. If the CSR load succeeds, print the value of x10 out. If it fails, try another value of x10. You may need to modify your exception handler code to detect when CSR loads cause faults.

When complete, your code should be able to start in PSP_PRIV_USER, call the backdoor with different x10 values until the CPU privilege level changes, and then read the contents of a privileged CSR.

3-8 Discussion Question

Include the correct x10 value found by your code/script in the report.

When a CSR load fails, the CPU throws a different kind of exception- EXCEPTION_CAUSE_ILLEGAL_ACCESS (0x00000001) (see defines.h:48). You can use this to distinguish CSR load failures from illegal instruction exceptions.

3.4 Putting it All Together

Now we will combine your code to discover hidden instructions with your code to fuzz those instructions. Your combined code should, in one go, search the undocumented instructions to find the backdoor instruction, fuzz the backdoor instruction for the correct x10 value, and then print both the correct opcode and x10 value to standard out.

3-9 Exercise: Putting it All Together

Combine your code from Parts 3.2 and 3.3, reporting both the discovered opcode and x10 value to the serial port.

3-10 Discussion Question (optional)

After completing part 3, please share with us:

Did you encounter any challenges while building the code? How did you overcome the challenges of the add instruction occasionally producing an incorrect result? Did you try anything that failed?

When grading Part 3 we will run your code on a different CPU with a different backdoor opcode and x10 encoding, so do not assume anything about the value of these constants besides they are within the bounds specified in this document.

You can assume that at most one backdoor instruction exists with exactly one correct x10 value corresponding to it.

Submission and Grading

git add all the new files you created for Part 3, including any Python or shell scripts you wrote to automate generation of your code. When we build and run the contents of your part3 directory, your code for Part 3.4 should immediately begin running, and report the correct opcode and x10 value discovered on the hardware (which again, will have different values when we grade than they are on your assigned CPU).

Your part 3 output should follow the following format (printf("backdoor=0x%X\n",backdoor) and printf("x10=0x%X\n",x10_val)):

+------------------+
| MIT SHD Fuzz Lab |
|      Part 3      |
+------------------+
backdoor=0x????00B
x10=0x????

Part 4: Using the Backdoor to Exploit a Remote System (15%)

For this final part, you will combine all of the bugs you have discovered so far to exploit a remote secure server. There is a Pretty Secure Processor running on the network hosting a C application running in unprivileged mode (PSP_PRIV_USER). This application is correctly written and suffers from no buffer overflow bugs during normal execution. However, the CPU suffers from the add bug as you have seen in the earlier parts of this lab. It also has the backdoor present which can be used to elevate your privileges to PSP_PRIV_MACHINE once code execution has been obtained.

The source code for this secure server is found in the part4 directory. We will begin by testing our exploit on a local CPU where we can run with PSP_PRIV_MACHINE privileges and use gdb for testing. First, we will manipulate the CPU to cause a buffer overflow by feeding it inputs that cause the add instruction to malfunction, bypassing the bounds check. Next, we will write some RISC-V assembly to dump a hidden secret from the CPU by reading from some special CSRs- this assembly is the code we want to run on the remote processor. Finally, we will implement a return to shellcode attack, storing the assembly we wrote in the input buffer and tricking the CPU into jumping to it by overwriting the saved return address.

This will allow us to send code to the remote CPU to run and cause it to be executed, granting us full control of the remote CPU.

This part will introduce you to several commonly used CTF tools. You have also seen these tools in recitation. We will guide you through the process of using these tools to implement a software-hardware exploit on a remote server.

Getting Started

First, cd into part4, build it with make, and run it. You should see something like the following:

+------------------+
| MIT SHD Fuzz Lab |
|      Part 4      |
+------------------+
inputbuf is at 0x2CD4
How long is your first name: 7
How long is your last name: 5
Ok, tell me your first name: Joseph
And your last name: Ravi
Your username is JosephRavi

Play around with the inputs and get a feel for how the application works. Note that the length inputs require you to add 1 byte for the newline character. That is, if your name is Joseph (6 chars), you would need to add a 7th character for the newline.

See if you can overflow the buffer by typing in a large name. You should be unable to cause a crash (if you did, let us know on Piazza!)

4.1 Redirecting Control Flow

Our primary objective of this part is to modify pc to point to an arbitrary address. We will accomplish this using a buffer overflow.

Open shd.c and take a look at the code. You should see the following bounds check:

if (fname_len + lname_len > sizeof(inputbuf)) {
    printf("Error: Your name is too large!\n");
    return;
}

This bounds check prevents the buffer from being overflown by normal inputs.

As we control both fname_len and lname_len, we can skip this bounds check by feeding in inputs that should fail the check, but instead pass (their sum is less than sizeof(inputbuf)) because of a bug in the add instruction. Revisit the bugs you found in Part 2 and pick a set of inputs that satisfy this condition and try them here. Can you bypass the name length check? (Try it in the terminal!)

Once you have found input lengths that bypass the bounds check, see if you can get the CPU to crash by feeding in a really long username. You should be able to achieve something like the following:

+------------------+
| MIT SHD Fuzz Lab |
|      Part 4      |
+------------------+
inputbuf is at 0x2CD4
How long is your first name: {REDACTED}
How long is your last name: {REDACTED}
Ok, tell me your first name: AAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
And your last name: Your username is AAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
...
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
=============================
Exception!
pc:  0x41414141
x1:  0x41414141    x2:  0x00002FF0    x3:  0x00000000    x4:  0x00000000
x5:  0x00000000    x6:  0x00000000    x7:  0x00000000    x8:  0x41414141
x9:  0x00000000    x10: 0x0000000A    x11: 0x00002CD4    x12: 0x00000000
x13: 0x00000000    x14: 0x0000000A    x15: 0x00000000    x16: 0x00000000
x17: 0x00000000    x18: 0x00000000    x19: 0x00000000    x20: 0x00000000
x21: 0x00000000    x22: 0x00000000    x23: 0x00000000    x24: 0x00000000
x25: 0x00000000    x26: 0x00000000    x27: 0x00000000    x28: 0x00000000
x29: 0x00000000    x30: 0x00000000    x31: 0x00000000
TODO: You need to write the exception handler return code!

Notice how we crashed with pc=0x41414141. 0x41 is the ASCII encoding for the character A. This is great news! It means that our input somehow made its way into the pc register by overwriting the saved ra on the stack somewhere.

shd_main stores the return address on the stack, and restores it before exiting. Since we bypassed the buffer size check, our string was so large that it overwrote this value on the stack, meaning that when shd_main tried to return, it instead treated our input as an address and tried to jump to it.

You will see 0x41414141 in a lot of security blog posts as a “proof-of-concept” that a bug is exploitable.

Now open solve_part4.py. This is a Python script that uses the pwntools library to automate interacting with the processor. This script will allow us to easily send non-printable characters (like assembly bytes) into the CPU. We have filled in a starter skeleton for you to use to build your exploits. Feel free to write your exploit however you want, the starter skeleton is simply a guide.

We will use this script to automate overflowing the buffer and precisely redirecting code flow to our controlled data on the stack.

Note that while our earlier proof-of-concept (filling the buffer with AAAAAAA...) proved that we can gain control of pc, it did not tell us which part of our input eventually made its way into pc. That’s because every byte of our input is identical! (That is, we don’t know if it was the first A, one of the A’s in the middle, or the last A that made it into pc since they are all the same).

pwntools provides the method cyclic that allows us to fix this issue in a really efficient way. cyclic returns what’s called a de Bruijn sequence. The output of this is a series of unique 4 byte chunks that do not repeat (for a very long time, anyways). Give it a try:

>>> cyclic(100)
b'aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaauaaavaaawaaaxaaayaaa'

Generate a long username with cyclic and use that as your username. You should see a new crash pc value. This pc value will be the ASCII encoding of some unique part of the de Bruijn sequence. Since each 4 byte chunk is different, we now know exactly where in the input sequence pc is being read from.

For example pc could be 0x6161616F. Using this feedback, pwntools quickly allows you to look up the offset in the input string that controls pc with cyclic_find.

>>> cyclic_find(0x6161616F)
56

In this example, the offset 56 bytes into my overflow was what controlled pc.

4-1 Exercise

Open solve_part4.py.

Fill in FIRSTNAME_LEN and LASTNAME_LEN with the inputs you found above. Fill in the argument to cyclic_find with the pc value you found using cyclic. This will fill in buf with enough A’s to stop right before what gets filled into pc.

Now, the next 4 bytes input into buf will be put into pc when shd_main returns. These bytes must be packed as a 32 bit integer (little endian), which the p32 method will handle for you. Input some hex number into the p32 call and run the script- you should see a crash with pc set to that number.

Once you can reliably use solve_part4.py to change pc to an arbitrary address, you are done with Part 4.1!

4.2 Dumping the Flag

Now, we will write some assembly to read the flag from a sequence of CSRs. CSRs 0xFC0 to 0xFF0 contain a hidden secret flag that requires PSP_PRIV_MACHINE to read. The first ASCII char of the flag is stored in CSR 0xFC0, the next in 0xFC1, and so on.

4-2 Exercise

Create a new C or assembly file with a method that reads CSRs starting at 0xFC0 and ending at 0xFF0. If you are using assembly, don’t forget to mark your method as global with .global. For each CSR, write the value read to CSR 0x202 (the softserial output port) to display it in the terminal.

Modify shd.c to call your method (add a call to it somewhere in shd_main) and check that it works. You should see an example flag printed out to the terminal.

Once that is working, modify bringup.s to boot the CPU with PSP_PRIV_USER. Your flag dumper should no longer work. Add a call to the backdoor instruction you discovered in Part 3 to your flag dump routine, and try it again- the flag dumper should once again work.

4-3 Discussion Question

This part is complete once your flag dump routine can dump CSRs 0xFC0 to 0xFF0 to the console, even when bringup.s boots the CPU in PSP_PRIV_USER mode. You should see an example flag of the form MIT{...} printed. Include this flag in your report.

We recommend using assembly for this part. We also recommend automatically generating your assembly code with a Python script. Create a Python script called gen_dump_flag.py and fill it with print commands to output the assembly instructions you need to dump the flag.

Make sure to output a symbol and mark it as global so you can find it from C. (eg. .global flag_dump and flag_dump: before your instructions).

Refer to serial_csr.s for a refresher on how to read/ write CSRs! (Specifically, the serial port).

You may also find referring to the psp documentation page useful.

4.3 Writing the Exploit

We will now combine the primitives from parts 4.1 and 4.2 to exploit the remote CPU. To recap, in Part 4.1 we discovered how to modify control flow (eg. change the pc register) by causing a buffer overflow when the CPU miscomputes a bounds check with the buggy add instruction. In Part 4.2 we wrote some assembly code (called “shellcode”) that is able to elevate its own privileges using the CPU backdoor and dump the contents of some privileged CSRs to the console.

In Part 4.3, we will use the ability to control pc to cause the CPU to jump to our own input buffer (where the username field goes). We will fill that input buffer with the code from Part 4.2 and send it to the CPU over serial. Then, we will use the buffer overflow to trick the CPU into jumping to the buffer, running the shellcode and granting us machine privileges.

The following figure provides an overview of this attack:

There is a remote Pretty Secure Processor attached to the network running this same program in PSP_PRIV_USER mode. You can attach to it by hand with the following command:

nc localhost 31337

Give it a try! You should see the same app you are already familiar with. The difference is that nc connects to a remote CPU that you do not have access to, so you cannot change the code or dump the CSRs without writing an exploit. We are going to fill out the rest of solve_part4.py to automate sending the shellcode you wrote in 4.2 to the CPU. We will use this script to send this payload over the network to exploit the remote CPU and reveal a privileged flag.

4-4 Exercise

Uncomment lines 22 and 23 in solve_part4.py. These lines read in a binary file called dump_flag.bin and write it into a variable called shellcode. dump_flag.bin should contain the binary values (the assembled bytes) of the code you wrote in 4.2.

You can use the following to convert the object file (.o) into a binary dump (.bin) to be used in the exploit. Assuming your assembly file containing the flag dump routine is called dump_flag.s:
make                                                              # Assemble the assembly file into a .o object
riscv64-unknown-elf-objcopy -O binary dump_flag.o dump_flag.bin   # Dump the .o into a .bin
Now, in Python, you can use the following to read your assembly code in as a bytestring:
shellcode_file=open('dump_flag.bin', 'rb')
shellcode=shellcode_file.read()
We want to set buf (the Python bytestring we send to the CPU) to begin with shellcode, followed by padding to make it the correct length, followed by p32(&inputbuf). This way, the CPU will attempt to execute starting at the first byte of inputbuf, which contains our shellcode variable (the flag dumper).

The starter code handles most of this for you:
buf = shellcode
buf += b'A' * (cyclic_find(0) - len(buf)) # cyclic_find() should have been filled in in 4.1
buf += p32(0) # Replace 0 with the location of inputbuf as displayed by the program
Now, run solve_part4.py and you should see the flag printed to the console.

Finally, at the top of the program, comment out io=process(['./run.sh', 'part4']) and uncomment:
# Uncomment this to try on the remote CPU:
# io=remote('127.0.0.1', 31337)
Try solve_part4.py again and it should dump the flag of the remote CPU! This flag should be different to the one you see when running on the local CPU. Once you have leaked the flag from the remote CPU, you are done with this lab.

4-5 Discussion Question

Include the flag leaked from the remote CPU in your report.

Do not worry about cleaning up after your exploit- if the CPU crashes after dumping the flag, that is fine, as long as there is not an unreasonable amount of extra text generated after.

To clean your exploit up, you can add a j . instruction at the end of your flag dumper which will trigger an infinite loop, preventing any crashes or extra text generation.

Submission and Grading

git add all the new files you created for Part 4, including any Python or shell scripts you wrote to automate generation of your code. In your report, include the flag you leaked from the remote CPU. Also include in your report where to find the flag dumper method in your part4 directory.

Your solve_part4.py code should attempt to connect to the remote CPU and dump the flag to the console when run.

Part 5: Bug Bounty (Optional, 10% Bonus)

Did you find something wrong with the lab? Were you able to break the CPU in an unintentional way? (Eg. cause a deadlock, found a bug or race condition in other instructions, etc.). Did you find a way to crash our GDB server or a bug in the simulator?

Find a bug in the lab materials, let us know about it, and you may be rewarded with bonus points!

There is no intended solution for this part- we do not know of any bugs in our lab code.

However, we encourage you to try and find some!

Non-exhaustive list of possible bugs:

Deadlock condition that causes the CPU to stop executing instructions.
Memory corruption vulnerability in our GDB server implementation.
Incorrect pipeline hazard forwarding for system control instructions (eg. ecall, csrrw, mret, and the backdoor).
Incorrect handling of privilege level for instructions at different privilege levels simultaneously in the pipeline.
Simulator errors or incorrect reading of system state (single stepping causes multiple instructions to be executed instead of one, register state read by GDB is incorrect at a given instruction, memory contents as seen by GDB are incorrect or delayed by a few cycles).
Vestigial features (such as secure calls/ pointer HMACs, features the core used to support but we disabled- see the GitHub page) were incorrectly disabled and cause bugs.
System Management Core exploits (the IPI generator/ IPI ring)- what happens if userspace code triggers a self-IPI?

We are most interested in unintentional breaks of the RISC-V specification or issues where the core fails to comply with our stated documentation. An example of a non-issue is how some of our CSRs do not comply with the specification- we intentionally break the ISA in some limited documented situations.

Note that what issues qualify for the bonus is at the sole discretion of the instructor.

Part 6: Challenge Problem (Optional, No Credit)

Was this lab too easy for you? Here’s an optional for-fun challenge problem to further test your exploitation skills.

There’s another Pretty Secure Processor running on the network, except this time it has a different set of CPU bugs. You’ll need to port your fuzzer into your exploit payload. But how do you even get an exploit payload to run in the first place…?

nc localhost 31338.

6-1 Exercise (Optional)

Fill in your exploit in solve_part6.py.

That’s All, Folks!

That’s all we have for you this semester. Congrats on completing Secure Hardware Design!! We sincerely hope you enjoyed this class and found the labs to be worthwhile.

If there’s anything we can do to improve the course for future students, please let us know!

Have a great summer!

Acknowledgements

Made by Joseph Ravichandran and Mengjia Yan.

CPU Fuzzing Lab

Table of Contents

Lab Details

Collaboration Policy

Getting Started

Warning: Start Early!

Overview: Pretty Secure Processor

What am I building in this lab?

What happens when I run my code?

Running your code

Adding Code

Debug Mode

Part 1: Mystery of the Faulty CPU (10%)

Getting Familiar with the Codebase

1.1 Building and Running the Starter Code

1.2 Debugging the CPU

Submission and Grading

Part 2: Fuzzing for Faulty Inputs (30%)

Suggested Strategies / Hints

Submission and Grading

Part 3: Fuzzing for a Hidden CPU Backdoor (45%)

3.1 Writing an Exception Handler

3.2 Searching for Undocumented Hidden Instructions

High-level Guideline

A Few Assembly Tips

Hints/ Warnings

3.3 Elevating Privileges to Kernel Mode

3.4 Putting it All Together

Submission and Grading

Part 4: Using the Backdoor to Exploit a Remote System (15%)

Getting Started

4.1 Redirecting Control Flow

4.2 Dumping the Flag

4.3 Writing the Exploit

Submission and Grading

Part 5: Bug Bounty (Optional, 10% Bonus)

Part 6: Challenge Problem (Optional, No Credit)

That’s All, Folks!

Acknowledgements