2012-10-26

How (not) to use readdir_r(3)

TL;DR: always use readdir(3) instead of readdir_r(3).

I spend most of my time fixing bugs. Often I feel like there's something instructive to say, but I don't want to be that guy who's known for pointing out faults in other people's code, as if his own doesn't have bugs.

Today I have a great one, because although there's a lot of blame to go around, the bug that actually caused enough trouble to get reported is my own.

Here's an AOSP patch: libc: readdir_r smashed the stack. Some engineers point out that FAT32 lets you store 255 UTF-16 characters, Android's struct dirent has a 256-byte d_name field, so if you have the right kind of filename on a FAT32 file system, it won't fit in a struct dirent.

But first, let's back up and look at the APIs involved here: readdir(3), readdir_r(3), and getdents(2).

Traditionally, userspace used readdir(3). You'd get a pointer back to a struct dirent that you don't own. POSIX explicitly makes the following guarantee: "The pointer returned by readdir() points to data which may be overwritten by another call to readdir() on the same directory stream. This data is not overwritten by another call to readdir() on a different directory stream.", so there isn't the usual thread-safety problem here. Despite that, readdir_r(3) was added, which lets you supply your own struct dirent, but – critically – not the size of that buffer. So you need to know in advance that your buffer is "big enough". getdents(2) is similar, except there you do pass the size of the buffer, and the kernel gives you as many entries as will fit in your buffer.

If you use a tool like Google's cpplint.py, it'll actually complain if you use readdir(3) and suggest readdir_r(3) instead. This is bad advice. For one thing, POSIX guarantees that you're not sharing some static buffer with other threads on the system. So in the typical case where your function calls opendir(3), readdir(3), closedir(3) and the DIR* never escapes, you're fine.

At this point, if you've been paying attention, you'll be wondering about the size of struct dirent. Isn't readdir(3) a liability if your struct dirent isn't big enough for your file system's longest names? In theory, yes, that's a problem. In practice, readdir_r(3) is the bigger liability.

In practice, you don't have a problem with readdir(3) because Android's bionic, Linux's glibc, and OS X and iOS' libc all allocate per-DIR* buffers, and return pointers into those; in Android's case, that buffer is currently about 8KiB. If future file systems mean that this becomes an actual limitation, we can fix the C library and all your applications will keep working.

In practice, you do have a problem with readdir_r(3) because (a) you can't tell the C library how big your buffer is, so it can't protect you against your own bugs, and (b) it's actually quite hard to get the right buffer size. Most code actually just allocates a regular struct dirent on the stack and passes the pointer to that, so in practice most users of readdir_r(3) are demonstrably less safe than the equivalent readdir(3) user. What you actually have to do is allocate a large enough buffer on the heap. But how large is "large enough"? The glibc man page tries to help, suggesting the following:

           len = offsetof(struct dirent, d_name) +
                     pathconf(dirpath, _PC_NAME_MAX) + 1
           entryp = malloc(len);
But that's not quite right because there's a race condition. You probably want to use fpathconf(3) and dirfd(3) so you know you're talking about the same directory that was opened with opendir(3).

So let's look at Android. How many of the readdir_r(3) calls are correct? Here's an AOSP tree as of now:

~/aosp$ find . -name *.c* -print0 | xargs -0 grep -lw readdir_r | sort
./bionic/libc/bionic/opendir.cpp
./dalvik/vm/Thread.cpp
./external/bluetooth/glib/gio/gunixmounts.c
./external/chromium/base/file_util_posix.cc
./external/clang/lib/Basic/FileManager.cpp
./external/dbus/dbus/dbus-sysdeps-util-unix.c
./external/dbus/dbus/dbus-sysdeps-util-win.c
./external/linux-tools-perf/builtin-script.c
./external/linux-tools-perf/util/event.c
./external/linux-tools-perf/util/parse-events.c
./external/wpa_supplicant_6/wpa_supplicant/src/common/wpa_ctrl.c
./external/wpa_supplicant_8/src/common/wpa_ctrl.c
./hardware/libhardware_legacy/wifi/wifi.c
./libcore/luni/src/main/native/java_io_File.cpp
./system/core/debuggerd/backtrace.c
./system/core/debuggerd/tombstone.c
./system/vold/CommandListener.cpp
./system/vold/VolumeManager.cpp
~/aosp$ 
(The match in bionic is the implementation of readdir_r, and the match in clang is a comment.)

The following allocate a struct dirent on the stack: dalvik, external/bluetooth, external/chromium, external/linux-tools-perf, external/wpa_supplicant, hardware/libhardware_legacy/wifi, libcore, and system/core/debuggerd.

The following use malloc: external/dbus (uses sizeof(dirent), so this is just trades the above stack smashing bug for an equivalent heap smashing bug), and system/vold, which uses the probably-right glibc recommendation.

The dalvik and libcore bugs, at least, are my fault. And for years I've changed readdir(3) to readdir_r(3) when people bring it up in code reviews, because I never had a strong argument against. But now I do. Using readdir(3) is simpler, safer, and its correctness is the C library maintainer's problem, not yours. The key to understanding why this is so (if you've skipped the rest of this article) is that the dirent* you get isn't a pointer to a regular fixed-size struct dirent, it's a pointer into a buffer that was large enough to contain the directory entry in question. We know this because the kernel getdents(2) API is sane, takes a buffer size, and won't scribble outside the lines.