2009-07-16

Why find(1) doesn't like big directories

A friend had trouble recently trying to use find(1) on a directory containing millions of entries. He expected it to start pumping out results straight away, but found instead that it went away for a long time. A quick strace(1) showed it steadily reading directory entries.

We could speculate about what's going on, or we could look at the source. Let's start by finding find(1):

$ which find
/usr/bin/find

Then we ask the Debian packaging system which package installed that file:

$ dpkg -S /usr/bin/find
findutils: /usr/bin/find

And then we ask for the source code for that package:

$ cd /tmp
$ apt-get source findutils

A quick search for "opendir", "readdir", and "closedir" – the three POSIX functions responsible for directory iteration – suggests lib/savedirinfo.c as a likely candidate, and if you look at xsavedir in that file, you'll see that it does indeed iterate over the entire directory and returns an array of all the entries to its caller.

Easily fixed, right? If we're not sorting, there's no reason to return a collection rather than an iterator, and any caller who wants to sort can still implement that on top of the iterator. Conversely, you can't implement a low-memory or low-latency iterator on top of a collection (because you've already paid for all the memory and all the latency while building the collection), nor can you implement a cheap "isEmpty" that reads at most three directory entries (three because you may get "." and "..", neither of which count).

In the single-directory case, an iterator probably is the way to go. But things aren't so simple in general, which is why most modern APIs, though built on the POSIX iterator primitives, offer a collection.

Suppose you're implementing find(1) using iterators. And suppose, for simplicity's sake, you're doing a depth-first traversal. What do you do when you come to a child directory? Well, the important part is that you create a new iterator for that child's entries. And if it has child directories, you'll do the same for them. So what?

Memory isn't the scarce resource here. File descriptors are. Specifically, the one opendir(3) needs for each directory between the directory you started in and the deepest directory. Your new pathological case then is a deep directory tree. How deep? The default per-process limit on my Ubuntu 9.04 box appears to be 1024 file descriptors. And remember that your program is probably already using a bunch of file descriptors, and that most code doesn't cope well with running out of them.

You can increase this limit (the separate per-system limit is sufficiently high that you probably don't need to worry about it), but then all programs using the iterator-based code to walk a file tree needs to worry about setting this.

You can try the trick of keeping an explicit stack of iterators and closing old ones if you feel you've used too many, re-creating them when you recurse back up. See telldir(3) and seekdir(3) for how you might implement this or nftw(3) for an API that actually works like this. Note that the OpenBSD nftw(3) just ignores the file descriptor limit (i.e. it uses the naive approach) while glibc 2.9 uses a hybrid approach of collecting all remaining entries for each iterator (that is, file descriptor) it has to close.

So, yeah, it's a bit more awkward than it looks.

Still, there's little excuse for non-recursive non-sorting ls(1) to have the same behavior, but it does (and if you look at the source for that, you'll see it's got pretty much the same code as find(1) to collect all the entries before touching any of them). Is not the main reason for "ls -U" the fact that sometimes you have really big directories and just want the names as quick as possible?