2005-07-03

File Alteration Monitoring

Failed detective work

I set out one morning a few weeks ago to find out how Spotlight works. I don't have the answer, but I have some information, and I thought I'd share it.

lsof(1) can show you a process' open file descriptors. Guessing that mds is the Spotlight MetaDataServer, I got this:

hydrogen:~$ sudo lsof | grep mds
mds 21133 root cwd VDIR 14,2 1190 2 /
mds 21133 root txt VREG 14,2 563136 2668589 /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Metadata.framework/Versions/A/Support/mds
mds 21133 root txt VREG 14,2 81316 2665889 /System/Library/CoreServices/CharacterSets/CFUnicodeData-B.mapping
mds 21133 root txt VREG 14,2 17688 2665888 /System/Library/CoreServices/CharacterSets/CFUniCharPropertyDatabase.data
mds 21133 root txt VREG 14,2 352454 2665887 /System/Library/CoreServices/CharacterSets/CFCharacterSetBitmaps.bitmap
mds 21133 root txt VREG 14,2 17852 2937913 /System/Library/Caches/com.apple.IntlDataCache.tecx
mds 21133 root txt VREG 14,2 82136 2682142 /System/Library/CoreServices/Tokenizers/ja.tokenizer/Contents/MacOS/ja
mds 21133 root txt VREG 14,2 9826240 2672194 /usr/share/icu/icudt32b.dat
mds 21133 root txt VREG 14,2 1079968 2671992 /usr/lib/dyld
mds 21133 root txt VREG 14,2 4213200 3497948 /usr/lib/libSystem.B.dylib
mds 21133 root txt VREG 14,2 1227572 3497950 /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
mds 21133 root txt VREG 14,2 1455656 3498058 /usr/lib/libicucore.A.dylib
mds 21133 root txt VREG 14,2 801160 3497957 /usr/lib/libobjc.A.dylib
mds 21133 root txt VREG 14,2 3350216 3497951 /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/CarbonCore
mds 21133 root txt VREG 14,2 861592 3498039 /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/SearchKit.framework/Versions/A/SearchKit
mds 21133 root txt VREG 14,2 3182204 3498002 /System/Library/Frameworks/Security.framework/Versions/A/Security
mds 21133 root txt VREG 14,2 252180 3497963 /System/Library/Frameworks/SystemConfiguration.framework/Versions/A/SystemConfiguration
mds 21133 root txt VREG 14,2 3744936 3497959 /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation
mds 21133 root txt VREG 14,2 203956 3497966 /System/Library/Frameworks/DirectoryService.framework/Versions/A/DirectoryService
mds 21133 root txt VREG 14,2 103664 3498082 /System/Library/PrivateFrameworks/DSObjCWrappers.framework/Versions/A/DSObjCWrappers
mds 21133 root 0r VCHR 3,2 0t0 43420804 /dev/null
mds 21133 root 1w VCHR 3,2 0t700 43420804 /dev/null
mds 21133 root 2r PSXSHM 0x041c5824 4096 obj=0x02d0e270
mds 21133 root 3r PSXSHM 0x02bd6c04 4096 obj=0x02c6a658
mds 21133 root 4u VREG 14,2 194596864 3262066 /.Spotlight-V100/store.db
mds 21133 root 5u VREG 14,2 194596864 3262067 /.Spotlight-V100/.store.db
mds 21133 root 6u VREG 14,2 0 3262068 /.Spotlight-V100/.journalHistoryLog
mds 21133 root 7u VREG 14,6 151552 6930 /Volumes/Elliott Hughes’s iPod/.Spotlight-V100/store.db
mds 21133 root 8u 0x029b4980 file struct, ty=0x7, op=0x369370
mds 21133 root 9u VREG 14,6 151552 6931 /Volumes/Elliott Hughes’s iPod/.Spotlight-V100/.store.db
mds 21133 root 10u VREG 14,6 0 6933 /Volumes/Elliott Hughes’s iPod/.Spotlight-V100/.journalHistoryLog
mds 21133 root 11r VREG 14,2 936 3498501 /private/var/run/utmp
mds 21133 root 12u KQUEUE 0x029b8ec0 count=0, state=0x2
mds 21133 root 13u IPv4 0x0439cf58 0t0 TCP localhost:cisco-tdp->localhost:netinfo-local (ESTABLISHED)
mds 21133 root 14u VREG 14,2 150994944 3262069 /.Spotlight-V100/ContentIndex.db
mds 21133 root 15u VREG 14,6 6144 6949 /Volumes/Elliott Hughes’s iPod/.Spotlight-V100/ContentIndex.db
hydrogen:~$

Which looks plausible. It's got the right files open. Notice that lsof(1) knows relatively little about fd 8. If we point fs_usage(1) at mds and modify a file, we see something like this:

12:39:13.450 read F=8 B=0x4c 0.005875 W mds

That F=8? That's the bad news I was half expecting to hear. mds gets its notifications via this mystery kind of file descriptor.

A tour of file alteration monitoring techniques

You may have heard around the time that 10.4 was released that Spotlight used BSD's kqueue notification mechanism. Apple posted sample code called filesystem_examples during WWDC 2005, and one of the examples, kqueue_fragment.c shows you how to use kqueue(2).

The trouble with this is that you have to give a file descriptor to monitor vnode events on. And we want to monitor an entire file system (or a significant subtree of it). There's no constant in sys/event.h to let you monitor all files, and trying the obvious values of 0 and -1 doesn't work. (0 unsurprisingly, because it's a perfectly valid file descriptor that a process might have open. Given the number of processors where 0 sometimes means the literal and sometimes means the register, it seemed worth a try.) And running lsof(1) on Apple's sample doesn't show anything like the mysterious file descriptor 8 of mds, so it really does look like this isn't the mechanism.

The kernel event queue mechanism, then, looks rather like Linux's F_NOTIFY (see fcntl(2) for details), which shares the limitation of requiring you to register an interest in all the files and directories you find, and taking special note of any new files created. The BSD mechanism simply trades off inventing a new mechanism for kernel/process communication against the ugliness of re-using signals (which is what Linux does). Linux 2.6 offers inotify, which at least addresses the problems of using too many file descriptors and using signals as a communication mechanism.

As usual, still other Unixes have their own mechanisms. SGI had their own imon(7) inode monitor device, but they also had the good sense to write fam(1), the File Alteration Monitor. This offers a portable interface to the file alteration monitoring facilities of any Unix. The implementation will use whatever the system's best alternative is, even falling back to polling on a system that's too primitive to support anything better. (The potential advantage being that if multiple clients care about the same file, only one process need poll it. Plus you can write your application one way and know that it'll work as well as possible on any given Unix.)

fam(1) is really clever in that if it's asked to monitor a file on an NFS mount, it will try to contact the daemon on the NFS server (and fall back to polling remote files otherwise). Again, this is transparent to your application.

Debian Linux seems to ship with fam(1), seemingly because GNOME uses it. Mac OS doesn't. I don't know about Solaris.

There are two problems, though, above and beyond the question of whether it's running on your system. The first problem is the usual one of only working on individual files and directories, and not directory trees. The second is that there's a limit of 1000 monitoring requests per process. So if you need to monitor a tree of 25,000 files, you're going to need some extra cunning. There are also security concerns associated with running the daemon as root.

There's an inotify-based rewrite in the works, but it's not obvious why that's a better idea than patching FAM. At least it uses an extended subset of the same API, so if you stick to the intersection, your application should just work. The rewrite's lack of support for NFS seems like a step backwards.

MS Windows has FindFirstChangeNotification, which takes a boolean that specifies whether to monitor the given directory or to include its subdirectories too. Unfortunately, as far as I know, this doesn't work on remote file systems, so although it's the interface I want, it doesn't necessarily have the implementation to back it up. And, of course, it's specific to MS operating systems which – though I try to support them – I don't actually use.

The Unix API is fine if your application is to monitor a configuration file, say, and automatically re-read it if it changes. But it sucks if you're an editor, say, and you want to automatically respond to changes to files. Perhaps the user's version control system has just added or removed a few files, and you want to update your index. Or a file has changed outside of the editor for whatever reason, and you want to update your live search results. These things are hard to do, and that's a pity, because they're exactly the things I want to do. (At the moment, my editor knows when it has modified a file and will update its search result accordingly, but it has no clue about external modifications.) Before Spotlight, virus scanning was perhaps the best-known application for the style of notification that includes subtrees. This could be why MS Windows has the best support for it.

If you're unfamiliar with file system or protocol implementation, you might be wondering why this is all so challenging? Surely you just need to do a few string comparisons? The problems include the fact that most implementations won't actually have the pathname conveniently available (the notion of "file handle" goes all the way down, and isn't just a wrapper for the name), the existence of links (both hard and symbolic), and the security concerns about giving away information without going through the usual open(2) path. That's one reason why it's so common to require a file descriptor: it acts as proof that you're allowed to know something about the file or directory in question.

A system that might become important in future is Dazuko. (The name is Germanic, in the style of Flak or Kripo.) It offers a very nice API, but at the time of writing it supports neither Mac OS nor MS Windows, and lacks the ubiquity that would make it really useful. Because of its intended security applications, Dazuko even lets you veto file accesses.

For now, though, I guess I'm stuck. None of this is really suitable for my application, and I can't even knock up a quick hack for the benefit of Mac OS users.