Linux Kernel-Panic in FPU handling on AMD (was ... in vm86 Syscall During Task Switch)

Problem description:

The initial observation was, that the linux vm86 syscall, which allows to use the virtual-8086 mode from userspace for emulating of old 8086 software as done with dosemu, was prone to trigger FPU errors. Closer analysis showed, that in general, the handling of the FPU control register and unhandled FPU-exception could trigger CPU-exceptions at unexpected locations, also in ring-0 code. Key player is the emms instruction, which will fault when e.g. cr0 has bits set due to unhandled errors. This only affects kernels on some processor architectures, currently only AMD K7/K8 seems to be relevant.

Virtual86SwitchToEmmsFault.c was the first POC, that triggers kernel-panic via vm86 syscall. Depending on task layout and kernel scheduler timing, the program might just cause an OOPS without heavy side-effects on the system. OOPS might happen up to 1min after invocation, depending on the scheduler operation and which of the other tasks are using the FPU. Sometimes it causes recursive page faults, thus locking up the entire machine.

To allow reproducible tests on at least a local machine, the random code execution test tool (Virtual86RandomCode.c) might be useful. It still uses the vm86-syscall, but executes random code, thus causing the FPU and task schedule to trigger a multitude of faults and to faster lock-up the system. When executed via network, executed random data can be recorded and replayed even when target machine locks up completely. Network test:

socat TCP4-LISTEN:1234,reuseaddr=1,fork=1 EXEC:./Virtual86RandomCode,nofork=1 tee TestInput < /dev/urandom | socat - TCP4:x.x.x.x:1234 > ProcessedBlocks

An improved version allows to bring the FPU into the same state without using the vm86-syscall. The key instruction is fldcw (floating point unit load control word). When enabling exceptions in one process just before exit, the task switch of two other processes later on might fail. It seems that due to that failure, the task->nsproxy ends up being NULL, thus causing NULL-pointer dereference in exit_shm during do_exit.
When the NULL-page is mapped, the NULL-dereference could be used to fake a rw-semaphore data structure. In exit_shm, the kernel attemts to down_write the semaphore, which adds the value 0xffff0001 at a user-controllable location. Since the NULL-dereference does not allow arbitrary reads, the task memory layout is unknown, thus standard change of EUID of running task is not possible. Apart from that, we are in do_exit, so we would have to change another task. A suitable target is the shmem_xattr_handlers list, which is at an address known from System.map. Usually it contains two valid handlers and a NULL value to terminate the list. As we are lucky, the value after NULL is 1, thus adding 0xffff0001 to the position of the NULL-value plus 2 will will turn the NULL into 0x10000 (the first address above mmap_min_addr) and the following 1 value into NULL, thus terminating the handler list correctly again.
The code to perform those steps can be found in FpuStateTaskSwitchShmemXattrHandlersOverwriteWithNullPage.c

The modification of the shmem_xattr_handlers list is completely silent (could be a nice data-only backdoor) until someone performs a getxattr call on a mounted tempfs. Since such a file-system is mounted by default at /run/shm, another program can turn this into arbitrary ring-0 code execution. To avoid searching the process list to give EUID=0, an alternative approach was tested. When invoking the xattr-handlers, a single integer value write to another static address known from System.map (modprobe_path) will change the default modprobe userspace helper pathname from /sbin/modprobe to /tmp//modprobe. When unknown executable formats or network protocols are requested, the program /tmp//modprobe is executed as root, this demo just adds a script to turn /bin/dd into a SUID-binary. dd could then be used to modify libc to plant another backdoor there. The code to perform those steps can be found in ManipulatedXattrHandlerForPrivEscalation.c.

A closer analysis of the initial vm86-syscall problem showed, that root cause was missing handling of FPU exceptions during task switch at emms instruction. That was confirmed by Borislav Petkov. According to discussion on LKML, the problem should affect only AMD CPUs, both in i386 and amd64-mode, see also patch.

Exploitation of the bug allowed to kill tasks at random on all affected kernels under test, thus leading to a denial of service when killing e.g. init or the idle task. At least on i386 architectures without SMP (Debian Sid i486 kernel), the bug also triggers a NULL-pointer dereference, which was proven useful to gain root privileges when NULL-page was mapped, see POC.

Since only some architectures are affected and local exploitation usually only results in DOS (mmap_min_addr set to zero should be rare on most systems, so escalation should be very uncommon) this vulnerability should not have a high impact on total system security.

20131228: Discovery, report at lkml, full-disclosure
20140107: Local-root privilege POC, working both on native CPU and within VirtualBox
20140112: First patch
20140114: CVE-2014-1438 assigned

Test tools and output (sorted chronologically):

Virtual8086-Mode emms-instruction fault: Virtual86SwitchToEmmsFault.c
Random code test tool: Virtual86RandomCode.c
Serial console output: SerialConsoleOutput.txt
Local-root-excalation: Modify xattr-handler (FpuStateTaskSwitchShmemXattrHandlersOverwriteWithNullPage.c), execute ring-0 code via xattr-handler (ManipulatedXattrHandlerForPrivEscalation.c)

Mailing-list reports:

Full disclosure: 20131228 initial report on i386 vm86 syscall
LKML: 20131228 same as above to LKML
LKML: 20140108 initial disclosure of local-root privilege escalation
LKML: 20140109 confirm by Borislav Petkov
LKML: 20140111 patch
oss-security: 20140112 CVE request
oss-security: 20140114 CVE assigned

Bug reports:

Debian: 733551
Redhat: 1052914, 1053599 (tracking issue)

Others:

Patch: git
CVE: CVE-2014-1438
OSVDB: 101515
SCIP: 11669
Secunia: SA56406

Introduction

Problem description:

Methods

Results, Discussion

Timeline

Material, References