Guide to Binary Exploitation
Beginner Concepts ⭐️
🎉 Goal: Learn what binaries are, how to break them, and their protections.
By finding vulnerabilities in binaries, we can gain access or control of various places in a program to conduct privilege scalation or code execution.
What’s a Binary?
- Definition: A binary is compiled code.
- Types of Binaries
- Executable and Linking Format (ELF): Used on Linux and UNIX systems
- Portable Executable Format (PE): Used on Windows
Inside a Binary: Below I will describe some sections which organize the binary’s data and code.
.text: Holds the program’s instructions (Should be read only. Writing here will definitely cause a segfault-!).data: Contains global and static variables that are explicitly initialized by the program.bss: Uninitialized data (filled with zeros by default)- Heap: Dynamic memory that grows from the end of the
.bsssection to higher memory addresses - Stack: A Last-In-First-Out (LIFO) structure that stores the return address, parameters, and local variables.
For a quick assembly review, check out: Introduction to Assembly
Common Vulnerabilities
- Buffer Overflows: Occur when a program writes more data into a buffer than it can hold.
- Vulnerable Functions: These are examples of some C functions which don’t check input size.
gets: reads user input without any limitstrcpy: copies strings without checking boundssprintf: formats strings without size limitsscanf: can be misused to overflow buffersstrcat: appends strings without checking available space
- Format String Vulnerabilities: Misusing
printfwith the string format specifier%s, allowing an attacker to leak addresses in memory (Peek the pwn.college dojo if you want to practice some challenges 👀).
Binary Protections
Binary protections, or mitigations, are mechanisms used to protect binaries against arbitrary code execution and other attacks. In the picture below we see the output of the command checksec. To use this outside of gdb-peda (see important links below), you can enter checksec --file=<binary_name>.
The output of checksec gives us a starting point for exploiting our binary.
Stack Canary: a random value put on the stack before the saved return address, if this is overwritten the program will exit (crash) NX (no eXecute): prevents parts of the program that aren’t code to be executed (ex. stack, heap) ASLR (address space layout randomization): randomizes memory addresses to make it difficult to predict executable places in code PIE (position independent executable): enables ASLR for the binary RELRO (relocation read-only): protects the global offset table (GOT), full RELRO is the most secure
However, even with these mitigations in place, binaries can be exploited!
Once you understand basic stack buffer overflows, there are many more advanced techniques to bypass protections.
For a more in-depth explanation of binary protections, check out this article
Stack Buffer Overflows
This basic type of overflow occurs when a program writes more bytes into a buffer than it can hold! Doing so corrupts the rest of the stack, overflowing the extra bytes into other registers, such as rip (pointer to the next instruction).
ex.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
void win() {
printf("yuppee!");
system("/bin/sh");
}
void vuln() {
char buffer[16];
printf("enter: ");
gets(buffer);
printf("you entered: %s\\n", buffer);
}
int main() {
vuln();
return 0;
}
With an example like this, if we enter more than 16 bytes when prompted, we will overwrite saved registers and eventually the return address. This is because gets(buffer) is reading the user input without actually checking the length. Knowing this, we can now learn more in-depth exploitation techniques.
Intermediate Concepts 🌟
🎉 Goal: Learn more techniques for exploiting binaries, learn how to use a disassembler to analyze code, and get familiar with using gdb to pinpoint addresses to create good exploits.
I will only go over a few basic techniques. Refer to the binary exploitation guide Nightmare to view more CTF-related examples, and the pwn.college binary exploitation course to learn more about memory errors and shellcode injections.
Return Oriented Programming (ROP)
When binary protections such as NX are enabled, we cannot arbitrarily execute shellcode on the stack. Still, this doesn’t stop us from creating exploits. ROP involves chaining together gadgets, which are parts of existing code, to redirect program execution.
Refer to Red Team Notes - ROP Chaining for some practical examples and diagrams.
Ret2Win (return-to-win)
Looking at our Stack Buffer Overflow example from before, we see a function win(). Because we could easily find out the address of the win function, we can calculate the offset needed to overwrite the return address (use gdb-peda + pattern generation, and overwrite the return address with the address of win().
Ret2Lib (return-to-library)
We use return-oriented programming (ROP) to call functions from a loaded library like libc.
This technique is primarily used when NX is enabled, meaning that injecting shellcode in the program is not allowed. However, we can use existing like system() to pop a shell (ex. system("/bin/sh")).
Stack Pivoting
We redirect execution by manipulating ebp **(base pointer) to control the stack. To do this, we will use the **gadget leave; ret
ex. leave gadget
1
2
pop ebp
ret ; return to address on stack=
Off-By-One Exploits
An exploit where we alter the least significant byte (LSB) of ebp — used in scenarios where we cannot completely overwrite the register but can only increment/decrement to direct program execution.
ex.
1
2
ebp = 0x41424344
ebp = 0x41424300
This playlist by LiveOverflow is one of my favorite resources for learning binary exploitation. Binary Exploitation/Memory Corruption
Advanced Concepts 💫
🎉 Goal: Learn basics about heap exploitation and another advanced exploitation method — SROP — when traditional exploitation methods don’t suffice.
Heap Exploitation
Similar to stack buffer overflow, heap overflows occur when more data is read into a buffer than what it can hold
Use After Free: use of a free‘d object in program
I took an example code snippet from the CTF101 Guide:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
typedef struct string {
unsigned length;
char *data;
} string;
int main() {
struct string* s = malloc(sizeof(string));
puts("Length:");
scanf("%u", &s->length);
s->data = malloc(s->length + 1);
memset(s->data, 0, s->length + 1);
puts("Data:");
read(0, s->data, s->length);
free(s->data);
free(s);
char *s2 = malloc(16);
memset(s2, 0, 16);
puts("More data:");
read(0, s2, 15);
// Now using s again, a UAF
puts(s->data);
return 0;
}
Sigreturn Oriented Programming
When traditional ROP just doesn’t cut it, we can still bypass NX by using signal handler mechanisms. Additionally, SROP can be useful when a program is not linked to libc (we don’t have system() that we can just jump to).
SROP vs. ROP
SROP exploits are similar to ROP in that they both use gadgets, however, having ASLR enabled limits the use of ROP chaining as finding gadgets is extremely difficult without a given leaked address (by a format string exploitation, for example). Additionally, SROP requires a minimal amount of gadgets that are always present in memory (see the wiki).
Signals in POSIX-like Systems
Here’s a snippet from Nightmare explaining how the signal handler mechanism works:
“When the kernel delivers a signal from a program, it creates a frame on the stack before it is passed to the signal handler. Then after that is done that frame is used as context for a sigreturn syscall to return code execution to where it was interrupted. It does this by popping values off of the top of the stack into registers, which were stored there so execution could continue after the signal is dealt with. The syscall itself takes a single argument in the
rdiregister (however for our use, it’s not important in this context). We can tell that a sigreturn syscall is being made since it pops the value 0xf into theraregister before making the syscall to specify a sigreturn syscall.”
For more reading, navigate to the AMRIUNIX SROP Guide
Using the sigreturn() system call, we can set up and take control of registers such as eip (the instruction pointer) with just one single gadget. Sometimes, this gadget can have a fixed address within the program. With control over the registers, we can alter them to call whichever syscall we would like (ex. write to print the contents of a flag).
We can use pwntool’s method of SigreturnFrame() to automate building a sigreturn frame, all we have to do is prepare the registers for the syscall.