Table of Contents
- References
- ISA Overview
- Calling Convention
- Example Function Call
- Privileged Extensions
- Hands-On Activities
RISC-V Warmup
In case your RISC-V assembly is a bit rusty, here’s a quick guide to the architecture that will come in handy in the fuzzing lab. We will be exploring the rv32i
ISA.
References
If you want to learn more about the ISA, the official manual is the best place to look!
RISC-V ISA Volume I: Unprivileged Specification (regular instructions)
This walks through the base RISC-V instructions. The rv32i
ISA is described in Chapter 2 (page 13).
RISC-V ISA Volume II: Privileged Specification (system instructions and CSRs)
This walks through the privileged ISA extensions (privilege transitions and system instructions).
RISC-V Calling Convention
This document walks through how to call C methods from assembly. Table 18.2 will be useful!
RISC-V Assembly Programmer’s Manual
This provides an overview of what the RISC-V assembler is doing “under the hood”- demystifying various pseudoinstructions.
ISA Overview
In RISC-V, there are 32 general purpose registers (x0
to x31
) and a program counter (pc
):
Figure 2.1 from the RISC-V ISA Volume I: RISC-V base unprivileged integer register state.
Here, XLEN
is 32 (as we are on the 32-bit flavor of the ISA). x0
is hardcoded to always be zero when read, and the remaining registers (x1-x31
) are free for programmer use. The ISA calling convention defines specific uses for each of these registers- more on that later.
RISC-V instructions are 32 bits long (4 bytes) and are always byte aligned. Figures 2.2-2.4 in the RISC-V ISA Volume I show the binary encoding of instructions. In general, the opcode occupies bits 6:0
, and the meaning of the remaining bits are dependent on the kind of instruction being executed.
The program counter pc
points to the current instruction being executed.
Calling Convention
The RISC-V ISA defines a calling convention (application binary interface, or ABI), assigning meanings to the general purpose registers. The following is a breakdown of all of the registers and their meanings:
Table 18.2 from the RISC-V Calling Convention Specification: RISC-V calling convention register usage.
Note how each register (x1-x31
) has a corresponding ABI name. In assembly, you can refer to the register using either. For example, x2
means exactly the same register as sp
(one is simply more readable than the other).
Callee Save VS Caller Save
Since functions tend to modify registers as part of their execution, the ABI splits the registers into two kinds- either callee-save or caller-save. Registers that are defined to be callee-saved need to be saved and restored by the callee (the function being called). That is, functions need to save and restore these registers. On the other hand, registers defined to be caller-saved are allowed to be changed freely by a function being called (you don’t need to save and restore these), however, a function that calls other functions should not assume these registers hold their value across method calls.
Argument Passing
In RISC-V, all arguments are passed via registers (and the stack if there aren’t enough registers). Refer to the calling convention document for the specifics, but at a high level, arguments are passed via a0
, a1
, a2
, and so on.
When a function is complete, the return value is passed via a0
(and a1
if needed).
Function Call Linkage
The RISC-V instruction to call a method is jal
(Jump and Link). jal
allows the program to jump to a function (setting pc
to the function to execute), and records the next instruction after jal
(pc+4
) into the return address register ra
.
When a function is ready to exit, it executes the ret
instruction. This is not a “real” instruction, rather, ret
is a shorthand way of writing jalr x0, ra
to jump to the return address (effectively undoing the original jalr
that brought us into the function!)
Stack Usage
The function must save and restore any registers it uses to the stack. In the beginning, a function will make some space on the stack (by subtracting from sp
, the stack pointer), and save any necessary registers in the new space. At the end, the function will teardown its stack frame by restoring any saved registers, and resetting sp
to its original value before executing ret
.
A function can also use the stack for storing local variables!
Example Function Call
Let’s put it all together and inspect a sample RISC-V function! Here’s one you will remember from the prefetch attacks lab:
int call_me_maybe(uint32_t a0, uint32_t a1, uint32_t a2) {
if ((a0 & 0x02) != 0) {
if (a1 == 2 * a0) {
if (a2 == 1337) {
printf("MIT{and_thats_all_folks!}\n");
return 0;
}
}
}
printf("Incorrect arguments!\n");
printf("You did call_me_maybe(0x%X, 0x%X, 0x%X);\n", a0, a1, a2);
return -1;
}
Let’s call it and see what assembly is generated (with riscv64-unknown-elf-objdump
):
int call_it() {
return call_me_maybe(0,1,2) + 1;
}
# call_it() will return the result of: 'call_me_maybe(0,1,2) + 1'.
call_it:
# Make space for 16 bytes on the stack for call_it()
# We won't use all 16 bytes, but the compiler likes to keep the stack 16 byte aligned for
# performance reasons, so we get 16 bytes on the stack :)
add sp,sp,-16
# Store the callee-saved registers we will be changing to the stack (just ra)
# Since we are going to call other functions, ra will be clobbered, so
# we need to remember its current value for when we want to return from call_it()!
sw ra,12(sp)
# Load immediate values into a0, a1, and a2
# These are how we pass the arguments to call_me_maybe
# Note that these are caller-saved, so we are free to overwrite them here!
li a2,2
li a1,1
li a0,0
# Perform the call!
# call_me_maybe(0,1,2)
# Note that this will overwrite ra!
jal call_me_maybe
# Now, a0 contains the return value of call_me_maybe
# call_it() wants to return call_me_maybe(0,1,2) + 1, so we can add 1 to a0
# When we exit this function, a0 will be used as the return value
add a0,a0,1
# Restore the callee-saved registers (just ra) to undo the effects of running call_me_maybe
lw ra,12(sp)
# Reset the stack pointer to its old value
add sp,sp,16
# Return from call_it!
# This jumps to ra, the return address, returning the return value in a0.
ret
Note that ra
contains a pointer read from the stack (lw ra, 12(sp)
) and we are now jumping to it. We implicitly trusted that nothing overwrote our saved value of ra on the stack. If the stack was compromised, perhaps a malicious attacker could chose a new pc value… (HINT: This is foreshadowing what you will do in the fuzzing lab!)
Privileged Extensions
Take a look at the RISC-V ISA Volume II. This manual provides an overview of the privilege modes available to RISC-V systems and how to transition between them. As you will recall from earlier labs, the OS kernel runs in a higher privilege level than user programs- this volume describes that mechanism on RISC-V. In the fuzzing lab, you will be writing programs that run with higher privilege levels, and will write code to handle privilege transitions (AKA exceptions).
Privilege Levels
There are 4 privilege levels in the RISC-V ISA, of which two of them we will use in our labs.
Table 1.1 from the RISC-V ISA Volume II: RISC-V privilege levels.
For the sake of this class, we will use the convention that level 0 (U mode) is userspace, and level 3 (M mode) is kernelspace. In the fuzzing lab, these privilege modes will be referred to as PSP_PRIV_USER
and PSP_PRIV_MACHINE
, respectively.
Levels 1 and 2 are not used in this class.
Control and Status Registers (CSRs)
Control and Status Registers (CSRs for short) are special privileged CPU registers that configure how the CPU behaves. They can be read/ written with the csrr
, csrw
, and csrrw
instructions while operating in M mode.
Instruction | Example | Usage |
---|---|---|
csrr - CSR Read | csrr x1, SOME_CSR | Read SOME_CSR into register x1 . |
csrw - CSR Write | csrw SOME_CSR, x1 | Write x1 into SOME_CSR . |
csrrw - CSR Read and Write | csrrw x1, SOME_CSR, x2 | Read SOME_CSR into register x1 and simultaneously write x2 into SOME_CSR . |
One of the most important CSRs is mepc
(CSR address 0x341
). When an exception occurs, the current PC of the faulting instruction will be written into mepc
. mepc
can be read by system software using the csrr
instruction.
mepc
will contain whatever the PC was at the exception, regardless of what mode the CPU was previously in.
Exception Conditions
Whenever a RISC-V CPU encounters an exception condition (perhaps dividing by zero, a usermode program attempts to perform an illegal access, or an undefined instruction is executed), the CPU will execute a privilege transition into machine mode.
Recall in lecture when Mengjia introduced the SYSRET
bug on x86_64 machines:
This is how x86_64 machines perform privilege transitions (specifically, when a system call is requested). On RISC-V, there exists a very similar mechanism for privilege transitions!
The exception handler is the code in the kernel that is executed on an exception condition. In the fuzzing lab, you will write one!
Hands-On Activities
Check out the hands on activities repository for four introductory RISC-V programming activities. These exercises will get you familiar with writing RISC-V assembly and the calling convention, as these topics will be useful in the fuzzing lab.