When AppKit methods return nil...

...things fall apart very quietly.

I mentioned my little "NSSpell" hack a couple of years ago in NSSpellChecker, heap(1), and faking ispell(1) in 89 lines on Mac OS. I and others have been using it ever since for as-you-type spelling checking in Java programs.

A year ago, I decided to "fix" NSSpell so that Spin Control.app didn't claim that it was hung all the time. I talked about that in Spin Control and non-GUI NSApplication programs, where I presented a version of the code that, it turns out, contained a race condition.

That post was dated 2005-05-23. It's now just a few days past the problem's first birthday, and I'm back with a fix. That's not to say that this was an especially hard problem; it wasn't. I just didn't previously realize how much more painful this is on a single-processor machine until last week. Such differences being part of the nature of race conditions.

The symptom was this: every now and again I'd start a program that uses our PTextArea text component, and the check-as-you-type spelling checking would report every word as misspelled. Run it again, and it would be fine.

If you've never written native code for Mac OS, you might not know that Apple's documentation (like most documentation) is vague, incomplete, and lacking both as tutorial and as reference; their sample code (like most sample code) is often broken, uninformative, or badly-written; and worst of all (unlike Java, GTK+, the STL, the Linux kernel, Firefox, Ethereal, and all the home-grown stuff I use) you don't get to examine the source.

I had to bite my tongue recently during the storm in a teacup over whether the Darwin/x86 kernel source is available or not. A guy from the "Apple is always right" side of the fence claimed that we shouldn't care because the source wasn't a very useful thing to have anyway. A kernel extension writer corrected him, saying "You must not write kernel extensions for a living [to believe that]", at which point the apologist made me laugh out loud by claiming that "[having] source is just treating a symptom of inadequate docs, not the real issue".

"Obviously not a golfer", as the dude would say.

Java may have lots of breakage (it's a large codebase, so of course it does), but you do get to read the source. Heaven knows how much time having src.zip has saved me. Cocoa, though it's sometimes much better put-together, falls apart when it doesn't "just work". And here's an example.

My original code looked like this:

int main(int, char*[]) {
ScopedAutoReleasePool pool;

// Pretend to be ispell in a new thread, so we can enter the normal event
// loop and prevent Spin Control from mistakenly diagnosing us as a hung
// GUI application.
// It seems to be important that we invoke an instance method rather than
// a class method.
Ispell* ispell = [[Ispell alloc] init];
[NSThread detachNewThreadSelector:@selector(ispellRunLoop:) toTarget:ispell withObject:nil];

[[NSApplication sharedApplication] run];

If you look at Apple's documentation for NSApplication, you'll see that they claim Xcode automatically generates a main function like this:

void NSApplicationMain(int argc, char *argv[]) {
[NSApplication sharedApplication];
[NSBundle loadNibNamed:@"myMain" owner:NSApp];
[NSApp run];

Their description of sharedApplication says "Your program should invoke this method as one of the first statements in main(); this invoking is done for you if you create your application with Xcode. To retrieve the NSApplication instance after it has been created, use the global variable NSApp or invoke this method". The important point to notice there is that there's a global variable cache of the return value, but it isn't necessarily initialized, and it isn't checked either. And our accidental experiment suggests that there's plenty of Apple code that uses this rather than the safer method.

I say "safer" rather than "safe" because there's no indication that the method's thread-safe, and we can't see the source so we'll have to assume that it isn't.

Much though the global disgusts me, I now use it in NSSpell, both to be closer to the official Xcode idiom (the only thing guaranteed to work) and to hopefully remind myself that Apple have left broken glass on the lawn.

It turns out that if you haven't invoked [NSApplication sharedApplication] before you invoke [NSSpellChecker sharedSpellChecker] the latter will return nil. The documentation, as you've guessed, doesn't mention this, and, as you'll recall, we don't have the source. (Judging by a WebKit commit, this was a problem in Safari too. Which suggests that Safari either has a similar race condition, which seems unlikely, or there are other circumstances in which sharedSpellChecker can return nil. Not only do Apple keep their source secret, they keep their bugs secret, so we can't see what the WebKit committer was trying to fix, even though we know the Apple bug number.)

At this point, if you're not an Objective C programmer, you're probably wondering why this was such a problem. Surely the program would have thrown the equivalent of a NullPointerException or dumped core? Not quite. See the article Nil by "ridiculous fish" (actually by far the best Mac OS programming blog I've come across). For NSSpell, silently returning 0 in r3 meant all words were considered misspelled, with no suggested corrections.

Although the "nil" article says "__objc_nilReceiver is there to let you replace the default message-to-nil behavior with your own! How cool is that!?", without the AppKit source I'm unconvinced that I could get away with a nil receiver that aborts any program trying to message nil. How much of Apple's code relies on this weird behavior? The prospect of introducing a host of crashing pseudo-bugs isn't particularly appealing.

Don't you just hate it when your tools conspire against you? Just as in Ruby, dynamism costs me something in a situation where I've absolutely no use for it. I've little against people adding this kind of dangerous nonsense, but it does annoy me when it's on by default and I can't turn it off.


On a more positive note, I thought I'd finish by mentioning the race-condition equivalent of "printf-debugging". It's a commonly-used technique, but I don't remember seeing it in any programming text I've read. Perhaps because we're still pretending concurrency isn't a basic part of computer programming. It's a shame, because it's simple, useful, and often a lot more productive than just staring at the code.

There are two possible goals. The simple one is to prove there is a race. The more complicated one is to find out where the race is, assuming it's not obvious. The idea is to insert a sleep (a big one, of seconds, that even a human would notice) in one of the participants. If you're in roughly the right area, you should make the race failure 100% repeatable. You can then move the sleep back until the race stops being 100% repeatable; now you know the crucial piece of code.