Exploitation of Integer Overflow in Apache 2.2.19 mod-setenvif

The main goals creating the exploit were:

Exploit has to be triggerable via HTTP GET requests only
Exploit data has to be 0-byte free to have valid HTTP-protocol
No alternative way of heap-spraying is used, e.g. GET + content-length. All variants I knew of had much too low efficiency
Use libc for ROP, although all libc-addresses start with 0-byte, which cannot be sent via HTTP
Rely only on libc address guess, but not heap/stack address guess, unless guess could be made nearly 100% reliable
Use the already open HTTP-connections and turn them into command connections on the fly
Have exploit in less than 256 bytes

Two different exploit layouts were developed. The first one used multiple threads, so that one was overwriting the data of the second thread before hitting the end of the memory area. Precise timing was essential to get shell access.

The second one used a more crafted substitution expression, stopping the copy in a single thread by modifying the regular expression currently processed in the thread. Since there is race condition involved, this exploit was far more reliable than the first one.

First Exploitation Attempt

Due to the afore-mentioned requirements, an exploit with about 30% success rate for apache2-mpm-worker 2.2.19 on ubuntu oneiric 32bit was developed with the purpose to learn alternative programming techniques in a hands-on approach. To get hold of crucial apache data structures, the programs outlined here tried to exploit concurrent access to already corrupted data structures before the otherwise inevitable apache crash. Due to the additional

Step 0:

Create an .htaccess-file that will copy more than 4GB of data into a 16MB buffer. Usually ap_pregsub will copy data to the buffer, overwriting the whole apr-pool memory until copy hits the upper mapped memory boundary (see second while-loop in ap_pregsub from server/util.c). Since the .htaccess will cause ap_pregsub to fill the whole heap with repeating copies of the HTTP-header block, this will also circumvent the heap randomization. The heap data will span such a large portion of heap, so that a pointer to heap will always hit one of the copies.

Step 1:

To get a chance to execute code, ap_pregsub copy process has to be stopped without SEGV. Two ways are possible:

The copy process also overwrites the regular expression, that is defining which data is currently copied. By overwriting the expression with a stop sequence ($9$9..), execution will leave ap_pregsub function without SEGV, but will continue using the corrupted heap. Since there is no function pointer call near, most of the execution branches will lead to crash. Just one path allows to construct an endless loop loop in apr_palloc.
Terminate the overwriting process before it reaches the end using another apache thread and the already corrupted heap. Since data copy is quite fast, the race between the two apache threads is very hard to make.

To extend the window of opportunity, an ap_pregsub-stop sequence is sent first using SendTrigger-SingleThreadAprPallocEndlessLoop.c. This will add a 16MB race buffer, slow down the server by sending one thread into endless loop, both helps to extend the race window in step 3 to 100ms on a 800MHz CPU, which would also be sufficient for remote TCP exploitation.

Step 2:

Send traffic, that will make apache use one function pointer more frequently. For reproducibility it was important, to send data, that will make apache loop just over a very limited part of the whole apache binary code. Otherwise a wide variety of crashes at different code positions were observed. The RequestFlood.c program will open multiple connections to apache, send GET /AAA and then continue to send AAAA every 100ms, thus making the URL data on server side longer in each iteration. Due to the long time between the sending of GET header bytes, it is quite likely, that the heap is overwritten by thread started in step 3 while the current thread is in apr_socket_timeout_set (srclib/apr/network_io/unix/sockopt.c). apr_socket_timeout_set has also one other advantage, it will pick up the sock pointer from corrupted heap and write the timeout value to that location, thus giving the opportunity to write the first MSB 0-byte and using that value as function pointer later on in ap_get_brigade from server/util_filter.c. The RequestFlood program has to be running before sending the remote shell trigger in step 3.

Step 3:

Send a trigger request, that will overwrite the apr-heap, similar to the request from step 1, but without any stop sequence. The HTTP request data will be overwrite the heap, thus the threads from step 2 can pick it up. This request contains also the remote shell code, but there are a few obstacles blocking code execution:

Stack is not overwritten at all, so standard ROP cannot work
Heap is overwritten with 0-byte free data, but ROP usually needs quite a few 0-bytes. This makes it impossible to use any test xxx; jz yyyy; branches or use small positive array indices.
Heap is not executable

Workarounds for these problems are collected in SendTrigger-RemoteShell.c:

No stack control: Jump to sscanf in a way, so that sscanf will overwrite some values on stack, including a return address. This is made easier since sscanf has an integer overflow when parsing the offsets for argument skipping syntax. Hence it is possible, to access values above and below the current stack position using offsets near 2^30, as seen in the scan string %1073741815$32c%3s%4hx%1x%1x%1073741815$7s. sscanf was used to add some 0-bytes on heap also.
No full stack control: sscanf stack editing is painful and eats up a lot of payload space. So use the return address to jump to pop esp; ret to have stack pointing to heap.
Non-executable heap: Jump to mprotect and make heap executable
Avoid back-connect: Loop over open file descriptors and fork a shell for every descriptor. See ForkPayload.c for assembly code. The code uses the return address from mprotect to calculate dup2, fork, execv addresses, thus avoiding need for some more 0-bytes. The whole remote shell loop code is just 93 bytes.

The trigger program takes the libc start address as argument. If it is possible to place a symbolic link to /proc/self/maps on the host, simple renaming of the link will defeat the NoFollowSymlinks options and allow to read the offsets from /proc/self/maps, thus defeating the ASLR ( more). If not known, address has to be guessed using different values in SendTrigger-RemoteShell --LibcMapPos 0xxxxxx.

Step 4:

To reduce the code size, the remote command connection does not return stdout. To get stdout, the first command sent to remote should be exec 1>&0. Since SendTrigger-RemoteShell.c does not implement a nice shell gui, one can also telnet to apache before starting step 3, the telnet connection will turn to remote shell connection while open.

Second Exploitation Attempt

In contrast to the first attempt, this exploit overwrites the currently interpreted regex with a crafted stop sequence to terminate buffer overwriting before reaching the upper heap limit. In contrast to the first attempt, this program requires only a single apache thread to give remote shell and could also be used to take over process on non-mpm-worker apache servers. Steps:

Step 0:

Create an .htaccess-file in /var/www, that will copy more than 4GB of data into a 16MB buffer. The new variable value expression is designed in such way, that when apache is copying data to the destination buffer and overwriting the variable definition data itself, the new definition corresponds to a variable size of zero. Hence the buffer-overflowing copy process is stopped as soon as the variable definition data is overwritten.

Step 1:

Mix up heap data to get a layout favorable to our tasks. This can be done by just sending a normal GET request for an existing file via a Keep-Alive connection and using that connection to send the trigger afterwards.

Step 2:

The most stable server code/data flow leading to successful exploitation would be one using a function pointer near to the point where the overflow begun. Otherwise the exploit code will depend also on other modules loaded or platform configuration parameters (the first attempt used a function pointer after mod_setenvif processing was completed). To archive that, apr_table_setn is used. The function usually would store the new variable value to a hash-table. Since it is operating on an overwritten table data structure, it can be used to create 0-bytes at appropriate locations and finally trigger an invalid allocation. Thus is leading to apr_palloc to call an abort-function and this function pointer can be controlled.

At the moment of this function call, only the function destination can be controlled, the content of all other registers cannot be used to call a suitable target function directly. Since the stack was not overwritten, standard ROP methods do not work. As a workaround, a part of the _IO_file_seekoff function can be used:

0x00736ff7 <_IO_file_seekoff+407>: mov 0x8(%ebp),%ecx 0x00736ffa <_IO_file_seekoff+410>: mov 0x14(%ebp),%edx 0x00736ffd <_IO_file_seekoff+413>: mov 0x4c(%ecx),%eax 0x00737000 <_IO_file_seekoff+416>: mov %esi,0x4(%esp) 0x00737004 <_IO_file_seekoff+420>: mov %edx,0xc(%esp) 0x00737008 <_IO_file_seekoff+424>: mov %edi,0x8(%esp) 0x0073700c <_IO_file_seekoff+428>: mov %ecx,(%esp) 0x0073700f <_IO_file_seekoff+431>: call *0x40(%eax)

At the end of the sequence the stack will contain one pointer of our choice, the value 1 and the call will go to a controllable destination. That stack layout matches the function call of __libc_dlopen_mode, the internal symbol for dlload(). A sample program to archive this is TriggerRemoteShell.c.

Step 3:

Since the attack assumed, that an attacker could place an .htaccess file on the server, it is also sensible to assume, that he could put a second file there also. This second file should be a shared library loaded by the dlload call. The library contains the _init symbol, this function is called during loading of the library, hence activating the exploit code. The library itself is not very special, it just tries to identify all open socket connections using a getsockopts call and forks a shell for every connection, e.g. ExploitLib.c. When the library is loaded, the open connection of TriggerRemoteShell.c is turned to a remote shell. Since apache server on ubuntu oneiric uses ASLR and the exploit needs the correct libc memory locations, the TriggerRemoteShell program can be started with the correct libc mapping information for testing. In real world examples, one might guess or try to get access to the /proc/[pid]/maps file before sending the exploit using an apache symlink timerace.

buildhost-ubuntuoneiric1110:~$ ./TriggerRemoteShell --LibcMapPos 0x6e5000 Using libc map pos at 0x6e5000 Opening ... HTTP/1.1 200 OK Date: Thu, 22 Dec 2011 08:56:36 GMT Server: Apache/2.2.20 (Ubuntu) Last-Modified: Sun, 20 Nov 2011 22:55:18 GMT ETag: "1b76-4-4b23276d097f8" Accept-Ranges: bytes Content-Length: 4 Vary: Accept-Encoding Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/plain AAAA Linux buildhost-ubuntuoneiric1110 3.0.0-12-generic #20-Ubuntu SMP Fri Oct 7 14:50:42 UTC 2011 i686 athlon i386 GNU/Linux /usr/sbin/apache2 -k start: Completed: not found sh: turning off NDELAY mode ls bin boot dev ...

With a different payload library, the lower-privileged www-data process can modify the shared memory scoreboard data in a way to trigger an invalid free/gcc-lib load in the root-priv master process, see ApacheScoreboardInvalidFreeOnShutdown.

Thinking About Security

Due to my limited programming skills, getting this first and far-from-good POC exploit was not quite easy. Some apache, libc, linux software design decisions made it simpler or easier for me:

apache: Common use of function pointers in apache. Function pointers allow implementation of apache as a flexible, modularized web server, but simplify arbitrary code execution.
apache: Parts of the apache code have no checks on reasonable sizes or return values, hence allowing abnormally large data structures, e.g. for heap spraying or to cause resource starvation.
apache: apr_palloc is quite fast but uses very simple data structure. Once one apr_memnode_t structure is under control, apr_palloc can be used to introduce 0-bytes when first_avail is incremented by a known value, which was the only other way besides apr_socket_timeout_set to add 0-bytes
All lib-addresses start with 0-byte: The whole POC would be much smaller, if library addresses did not contain 0-bytes. This advantage is only relevant for small applications, where all libraries and modules fit into the lower 16MB
libc: sscanf accepts negative arg pointer offsets in arg skipping syntax, thus allowing to use stack addresses before and after current stack position. What is that feature good for?
linux: No stack-start randomization on byte granularity, allowing sscanf stack editing by modifying only the lowest byte of an address
linux: mprotect syscall does not force unused protection mode flag bits to be 0, making it quite possible, that during ROP a stack value has the right bits set (x executable).

Starting Point

Outline of Exploit

First Exploitation Attempt

Step 0:

Step 1:

Step 2:

Step 3:

Step 4:

Second Exploitation Attempt

Step 0:

Step 1:

Step 2:

Step 3:

Thinking About Security