2005-06-23

NetNewsWire

I bought NetNewsWire a while back. I'd been putting it off for years, saying I wasn't going to buy it until it could synchronize between my PowerBook and PowerMac. (Ideally, I'd like a Linux version too, and to be able to synchronize with that.)

So they got me on a technicality. NetNewsWire 2.0 has synchronzation. Two kinds, in fact. But I can't use either. One kind is Rendezvous (sorry, but "Bonjour" is just too stupid a name; even ZeroConf is better than "Bonjour"). The other kind is FTP. Yeah, FTP. Unless I've accidentally slipped back in time to 1995, that's a pretty weird choice. And I'm not making it up, either: here's proof.

Who has a server they can log in to in plaintext, and upload files to? And who wants to use it, even if they do? Presumably Cocoa has great support for ftp: URLs that made it more convenient than scp(1). (I tried and failed to write a GUI ssh-askpass for Mac OS. I followed the instructions in the man page, but I just couldn't get my program to be called. I'd imagine that was the stumbling block.)

What else don't I like? Well, there's a nasty parochial US date format, for one thing, blithely ignoring my configured ISO date format.

And it's a bit slow on my 2001-vintage PowerBook 666 MHz, but that's a slow machine anyway, and I'm just hoping it will hold out until the first round of Intel-based laptops arrive.

What's good? In brief, "everything else".

If you're stuck using, say, SharpReader on MS Windows, you have no idea how good NetNewsWire is. Perhaps not good enough on its own to justify getting a Mac, but almost. (If you're a Mac user, you have no idea how bad SharpReader is. Though I'm sure it seems fine if you've never used a decent RSS reader.)

I really like the disclosure triangle view, which you don't get in the free NetNewsWire Lite. Being able to open posts in tabs in NetNewsWire is more convenient than opening new browser windows, too, though ideally I'd like to have the WebView instances embedded in the disclosure triangle view, rather than on their own tabs. For some reason I never remember to close the tabs until I've got too many. I tend to go "back" by clicking on the next feed.

The HTML differencing is great, too. You know how it is when someone updates a good post, and you care about what's changed, but they haven't clearly marked the changes? You can pretty much forget about that. NetNewsWire will show removed text like this, and show added text like this. Simple but effective. (It uses this script, which can be fooled, but seems to work pretty well most of the time.)

I really hope the next version of NetNewsWire gives me scp(1)-based synchronization, or something equally convenient and yet secure on the public internet, but other than that, I'm pretty happy. Everything else is good enough that, especially considering that I could be stuck with RSSOwl or SharpReader instead, it seems almost churlish to complain.

[A couple of people have suggested rsync over ssh, or stunnel and FTP, but that's not convenient or fool-proof enough for me. The rsync solution in particular, because it requires me to be responsible for only running NetNewsWire on one machine at a time, and not running it while I'm doing the rsync. This should be NetNewsWire's responsibility.]

2005-06-22

Boost Logo Contest

If you haven't heard of boost, I can only assume you're not a C++ programmer. Boost is a lot of what makes C++ programming bearable. The sooner its libraries become part of Standard C++, the better. (Hell, it would be a start if Linux and Mac OS even shipped with them. Debian has a libboost-dev package, but the presence of boost ought to be guaranteed on every machine that has g++(1) present.)

Anyway: the logo contest. Check out the entries. If you look at them, some of them are awesome. Others are excellent. Still others are adequate, but the kind of thing anyone could have done. The rest are laughably bad. So bad you wonder what possessed anyone to submit them. I can only assume they were the result of some kind of dare along the lines of "make a Google image search for your name come up with a hideous abomination that you'd be embarrassed of even if it was your two year old kid's first drawing on a computer".

I won't give any examples of which entries I think belong in the categories I just made up, though I will say that I think there are about ten entries I'd rank above the winner (one of which is the disqualified winner), which I think only belongs in the "adequate" category.

I'm not sure which I find more odd: that there were some terrible entries, or that there were any really good ones. Google suggests that many of the best entries (as judged by me) were created by people who are actually programmers. So now I'm wondering if I suck at graphic design compared to other C++ programmers, whether boost holds some particular attraction for people with graphic design skills, whether inkscape can turn any C++ programmer into a graphic designer, or whether I'm just sufficiently like other boost users that I think their designs are much better than Joe Sixpack would judge them.

Thinking about it, I can't really see Joe liking any of the entries: not one involves a nation's flag, sports apparatus, bikini-clad women with "enhanced" breasts, devices with internal combustion engines, or offensive language.

C++ static versus namespace

It really annoyed me that namespace-scope static declarations were deprecated in C++. Not because I have any particular love for the myriad meanings of "static", but because I have a dislike for namespaces, and anonymous namespaces in particular.

Why don't I like C++ namespaces? Partly because they're a less good solution than Java's packages, a more disciplined form of the same thing. Partly because I see people use them as a substitute for a class. Instead of implementing singleton the right way, they think they can get away with a namespace full of namespace-scope functions. Partly because – despite the braces – C++ programmers seem congenitally unable to indent them correctly, and not averse to having multiple namespaces in the same file, with the result that it's often hard to know where you are. (This, I guess, is just a symptom of the lack of discipline I mentioned by contrast to Java.)

Today I came across a practical reason to resist the language lawyers' complaints about using deprecated features, though:

hydrogen:/tmp$ cat > anon.cpp
namespace {
void unused1() {
}
}
static void unused2() {
}
hydrogen:/tmp$ g++-4.0 -W -Wall -pedantic -c anon.cpp
anon.cpp:6: warning: `void unused2()' defined but not used
hydrogen:/tmp$

I need all the help I can get to prevent the silting up of large codebases, so I won't be giving up on static for a while. I can't even see it's worth fixing g++(1) and submitting a patch, because I never did get anyone to even look at my last GCC patch, which warned about the "most vexing parse". (Yes, I spelled "Meyers" wrong.)

2005-06-20

Java Threads

Tim Bray wrote a long post called On Threads, which touches on whether mainstream programmers are up to writing correct concurrent programs. Here's an excerpt that caught my attention:

Problem: Java Mutexes The standard APIs that came with the first few versions of Java were thread safe; some might say fanatically, obsessively, thread-safe. Stories abound of I/O calls that plunge down through six layers of stack, with each layer posting a mutex on the way; and venerable standbys like StringBuffer and Vector are mutexed-to-the-max. That means if your app is running on next year’s hot chip with a couple of dozen threads, if you’ve got a routine that’s doing a lot of string-appending or vector-loading, only one thread is gonna be in there at a time.

One thing the Java people need to do is put big loud blinking messages in all that Javadoc saying Using this class may impair performance in multi-threaded environments! You can drop in ArrayList for Vector and StringBuilder for StringBuffer. Hey, I just noted that the StringBuffer Javadoc does have such a warning; good stuff, but we need to be doing more evangelism on this front.

On the other hand, those mutexes were there for a reason. Nobody’s saying “Ignore thread-safety” but rather “Thread-safety is expensive, don’t do it unless you need to.”

To my mind, this is one of the biggest concurrency fallacies. Those mutexes may have been there (in classes such as Vector) for a reason, but not the reason many people naively expect.

I'm not saying "ignore thread safety", but the trouble is that you can't usually push all your concurrency concerns all the way down into the library where they're cured by the library pixies. (The best of those library pixies can do some pretty cool stuff for you, an example of which you'll see later.)

Compare these two snippets:

// Traditional code, using Vector.
// Unsafe.
public void scan(Vector<String> strings) {
for (int i = 0; i < strings.size(); ++i) {
String s = strings.get(i);
doSomething(s);
}
}

// Modern code, using ArrayList.
// Unsafe.
public void scan(ArrayList<String> strings) {
for (String s : strings) {
doSomething(s);
}
}

If you buy the "those mutexes are there for a reason" argument, you might think that the former is more safe in a concurrent situation. After all, ArrayList isn't synchronized, but Vector is.

The trouble is, Vector's synchronization isn't at a useful level. You know that any individual call sees a consistent internal state (which isn't necessarily true with ArrayList), but you have no such guarantee across calls. So you can have checked that your index is less than the current size, for example, but when you come to get that item, the size has changed. Both size and get saw consistent states, but they saw different consistent states. And what you really meant was probably:

// Safe.
public void scan(Vector<String> strings) {
synchronized (strings) {
for (int i = 0; i < strings.size(); ++i) {
String s = strings.get(i);
doSomething(s);
}
}
}

In fact, of the original two alternatives, the ArrayList version was better. It wasn't any more correct, but it was still to be preferred to the other incorrect code. Why? Because the new-style for loop was implicitly invoking iterator on the ArrayList, and that gives you an instance of AbstractList.Itr, and that throws ConcurrentModificationException if it notices that you keep using it after the collection has been mutated.

It does this by keeping a sequence number inside the collection. This number is incremented on each mutation that changes the collection's structure [see the documentation for more details]. Each iterator remembers the sequence number that was current when the iterator was created. If you're in an iterator method and the container's number doesn't match the iterator's... boom!

Some people belittle Java and its contributions to the state of the art, but I think that's a mistake born of snobbery. The individual contributions may be small, but that doesn't stop them being significant. The fail-fast iterators, even if I don't much like their name, are one such small but significant contribution. (Remember: they don't make your code correct, but they can help prove to you that it isn't.)

You can rewrite the original Vector code to use the new-style for loop instead of the old-style counted for loop, and it'll then have the same property, because Vector is also a subclass of AbstractList. It still won't be correct, though.

This is the big point that people miss. The synchronization in Vector is at too low a level to ensure that your application is thread-safe. It's ensuring that the collection itself is. This kind of correctness is almost never what's important to you as an application developer, and is usually made irrelevant by the protection you have to put in at a higher level.

The same is true if you use Collections.synchronizedCollection, and the library pixies explain this in the method's documentation.

The C++ STL collection classes, for example, have no protection. And anyone who suggested it would be laughed at. "Those mutexes are there for a [good] reason" just isn't true. If you had Smalltalk-like internal iteration, where (in Java terms) you give the collection an object to invoke a method on for each contained item, then the collection could usually give you the protection you need, but that's not what we have in Java. Because of that, you almost always need a mutex around any iteration.

The SGI STL implementors say as much in SGI STL Thread-Safety. They explicitly use std::vector as an example of where the naive "synchronize all the methods" approach doesn't work. They do use synchronization where they need to, though, despite the pain it must have caused because Standard C++ has no notions of threads or locks, so it's not like they were simply ignoring the issue, or ignorant of it.

Why do the library pixies care? They care because:

  • it makes them less likely to get blamed for users' bugs (true of both fail-fast iterators and all-synchronized classes, though a failed iteration gives a clear "your code blows goats; I have proof" that all-synchronized classes don't).

  • even inadequate protection will catch (for fail-fast iterators) or avoid (for all-synchronized classes) some problems; see the above point about not being blamed.

  • all-synchronized classes simplify a class' behavior in that (since being synchronized is part of the interface) some of the behavior is made less dependent on the implementation; a synchronized interface gives you some minimal guarantee.


The (strictly correct, but practically not very useful) "thread safety" of classes like Vector can be a danger in itself, because programmers unused to concurrency think they've done enough by using these classes, rather than asking for help from someone more experienced. And with a class like Hashtable, assuming they don't iterate over the keys or values, they'll get away with it. An associative container is substantially less likely to be iterated over, so you can still mostly ignore concurrency. But only "mostly".

I really don't like seeing classes with lots of synchronized methods, because it often means the problem hasn't really been thought about. "If I just sprinkle my magic thread-safety dust around, everything will be lovely." A class with just a couple, though, often means that the problem has been thought about, and carefully contained in just a couple of points, and those points have then been protected.

For classes as low-level as ArrayList and Vector, I think having no synchronized methods is always the right choice. There's almost a parallel with exceptions where, for lack of the ability to do the right thing at some low level, you pass the problem on up until you're at a level where it can be handled (preferably with some way to make sure it's obvious when the situation hasn't been properly handled).

[Though they shouldn't be blamed for any remaining problems, thanks to Olivier Lefevre for his repeated (correct) insistence that what I'd written was unclear, to Ed Porter for his succinct distillation of what I wanted to say (rather than what I did say), and Martin Dorey for his additions relating to the STL and the library pixies' motivation.]

2005-06-18

"All She Was Worth"

I mentioned Natsuo Kirino's "Out" recently, saying it was the first modern Japanese story I'd both enjoyed and felt I understood. Miyuki Miyabe offered me the second in "All She Was Worth". It wasn't as well written as "Out", at least to my western eyes: it was far too didactic at times, stopping with a screech of brakes to explain things at length. Nice ending, though.

I wonder why the original title was "Kasha"? What does that mean?

2005-06-12

Better JNI through C++

The last time I had anything to do with JNI, I was a recovering C programmer. My expectations were low, and JNI was about what I expected. Since then I've written a lot of C++, so my expectations of how pleasant it should be to write native code have increased greatly.

We had a really nasty bug in Terminator this week. It only showed itself on Linux 2.4, presumably because of SIGCHLD-related changes between 2.4 and 2.6 (which most of us run). What had happened was that we'd factored out the FindClass/ThrowNew code into a global function called throwJavaException. The word 'throw' in Java-related C++ was obviously enough to fool all three of us who'd read the code that an exception really was being thrown, and that the call stack would be unwound rather than the rest of the calling function being executed.

Wrong.

It got worse, though. It turns out that GetFieldID fails if called with a pending exception. Worse still, we were using the resulting jfieldID on the assumption that it was good. This was causing the Java VM to crash later, claiming a VM error (and not pointing at our code at all).

(This is one reason why Joel Spolsky is wrong to dislike exceptions. With exceptions, this couldn't have happened. GetFieldID would have thrown an exception, and we'd have stopped. At the right time to see what was wrong.)

It took hours to find this, and I didn't want any of us to have to waste more time on similar problems in future. We can't well fix the Java VM (though if anyone's looking for ways to make Free VMs better, having them go to greater lengths to protect themselves against incorrect JNI code would be one place to start), but we can improve our code.

My idea was to make use of C++ exceptions throughout the implementation, but catch them and convert them to Java exceptions in the crufty extern "C" global functions that JNI forces on us. Two problems: I didn't want to have to write that code out myself in every JNI function I write, and I also needed a place for the real code (the code that throws exceptions for these stubs to catch) to live.

My answer was javahpp, which is to C++ what Sun's javah is to C. It generates stub JNI functions that instantiate a C++ class corresponding to the Java class, and invoke the matching member function on that instance. The C++ class also has a data member for each field in the Java class. These data members proxy for the Java fields, and let you use assignment to set them, and offer a get method to get their current value.

Rather than invent bogus return values for the JNI C functions when an exception is thrown, javahpp insists that your native methods return void. This is safe and simple and not a limitation in practice because you're doing as much work as possible in Java anyway (right?).

Any of the C++ can throw a C++ exception and go straight back to the JNI C function, where a Java exception is "thrown". There's no clever translation from C++ exception classes to Java exception classes, but there could easily be. If I had more JNI, I might be tempted. At the moment, we always throw java.lang.RuntimeException. (Exceptions thrown by native code are inherently unchecked anyway, so even if we were checked-exception people, which we're not, this would still have been a good choice.)

We don't use C++ namespaces because it's not easy to see what use they'd be, and it would have been slightly more work to use them than it was not to.

You can see a real-life example of this in Terminator, and javahpp is in salma-hayek. It only supports what we've needed so far, so it's not yet a complete solution to writing JNI code in C++, but if you look at Terminator's native code I'm sure you'll agree that it's useful as it stands. If you'd like to use it but need support for, say, static fields (which aren't supported at the time of writing), let me know. Time permitting, I'm interested in extending this, even if I don't have an awful lot of use for JNI personally.

I don't claim that any of the ideas presented here aren't obvious, but I do claim that they're useful, and better than I've seen elsewhere.

2005-06-11

Class.getName is criminally insane

Examples from the JavaDoc for Class.getName:

String.class.getName()
returns "java.lang.String"
byte.class.getName()
returns "byte"
(new Object[3]).getClass().getName()
returns "[Ljava.lang.Object;"
(new int[3][4][5][6][7][8][9]).getClass().getName()
returns "[[[[[[[I"

So if you've got an array, you get the encoded name for the element type (such as "Z", "J", or "Ljava.lang.Class;"). But if you've just got a primitive, you get the source name (such as "boolean", "long", "java.lang.Class"). And, as far as I can see, there's no method to get the encoded name. And the depraved mind responsible for this mess isn't even prepared to face the world:

* @author unascribed

Java has too many names for any given type. Take java.lang.String, for example, which might be "String", "java.lang.String", "java/lang/String", "Ljava/lang/String;", or "class java.lang.String", depending on what you're doing.

Oh well. Back to writing a method to give me the encoded name for a Class...

Mac OS X/Intel

There were two points in the 2005 WWDC keynote where I wanted someone to heckle. The first was where Steve Jobs said that performance per Watt was more important than performance.

"Says who, laptop boy?"

The other moment begging for a question from the audience was where the guy from Mathematica was telling us what an enormous program Mathematica is, "a beast", but that only 20 lines needed changing to port it to Mac OS/Intel.

"What went wrong?"

You see, Mathematica may be "a beast", but it's a cross-platform beast. Apple presumably wanted to showcase Mathematica because it's well known that it's huge, and most programmers are rightly scared of numeric code anyway. So obviously, Mathematica must be really hard to port, and if it was easy for them, it's therefore going to be easy for everyone else?

But that logic misses what seems to me a very important point; one I'm surprised not to have heard anyone pick up on. According to Wolfram Research's list of supported platforms, Mathematica runs on IA-32 (x86), IA-64 (the only Intel architecture I've ever had any affection for), x86-64 (amd64), PowerPC, UltraSPARC, Alpha, PA-RISC, Power, and MIPS. Sometimes on more than on OS
per architecture. So you'd expect the beastliness to be mostly tamed by now. I can only assume that those 20 lines that needed changing were in Mac-only code.

So what were these 20 lines? And how much Mac-only code is there in Mathematica?

-*-

Some things make more sense now the switch to Intel is public. I didn't understand why Intel CEO Otellini came out the other week saying Macs are safer. Now it seems embarrassingly obvious.

I also didn't understand why 10.4 was supposed to appeal to anyone but developers. Dashboard's stupid, Spotlight's not finished yet (its speed is crippled on my Mac by its insistence on waiting to spin up my iPod's disk, during which the UI is frozen; that alone makes it pretty useless to me), Mail seems to have got more ugly rather than more functional... The built-in dictionary's really useful (though let down by not one but three fairly weak interfaces, each with different problems to the other two), but other than that I can't think of a single thing that would sell 10.4 to a non-developer. Real people don't care about Java 1.5, for example, even though that alone was enough to secure my pre-order.

But CoreImage seems like a great way to add extra portability at the same time as extra performance (assuming you're using Objective C++). Maybe that was just the first prod in the direction of transition.

Maybe 10.4 really was just for developers.

-*-

I'm pretty ambivalent about the switch to Intel, myself. Any time there's a Mac twice as fast as my current Mac, I'll buy a new one. So it's probably going to be another two years before I want a new one anyway. I'm slightly bummed that my world is going to turn little-endian, and that I'm going to be staring at Intel assembler in gdb(1) or objdump(1) output. (Sorry, I mean otool(1), because fat binaries are still Mach-O, worse luck!) I'm also bummed that I'm going to pay those costs without the benefit of some of the good Linux tools that I've had to live without. Though valgrind(1) should be a lot easier to port now!

Weird to me that Apple were so careful to say Intel. Didn't it strike everyone else as a type error to have "PowerPC" and "Intel" as the choices in the Xcode build dialog? Presumably part of the deal the two companies made.

I for one welcome Intel, our new little-endian, Z80-imitating, hardware DRM (Distributor's Rip-off Mechanism) pushing overlords. I just hope their other 32 bits show up soon; be that courtesy of IA-64 or amd64.

2005-06-06

"Crash"

If you liked Magnolia, I think you'd like Crash. It's not as funny, but it's not meant to be. It's not as long, either. But it is very good.

Although a few questions are left unanswered, I'd have preferred a more red herrings along the way. I like bits that don't join up and aren't meaningful. In part because life itself is like that, but also because it makes it harder to guess what's going to happen when lots of what you've seen is actually irrelevant.

Nice to see Marina Sirtis as someone other than Counselor Troi for a change.