Implementing Java listeners

The Observer pattern lets you ensure that when an object changes state, all interested objects are notified automatically. In Java, observers are called listeners, and are widely used in the JDK. They have details that are not widely understood, though.

First, a quick reminder: an object you can listen to – called a Subject in "Design Patterns", but given no explicit name and implementing no specific interface in Java – will have public methods to addXListener and removeXListener. The Subject's implementation will have a listenerList, or many listener lists if it supports different kinds of listeners (and chooses to keep them separately, which is uncommon). The Subject's implementation will also have fire methods corresponding to the one or more methods in the XListener interface.

You knew all this.

Why are there so many listeners on a component's list?
Many components have behavior that is implemented using listeners, so you can expect to see more listeners than just the ones your application added. (The reason for implementing a component like this is the belief that if you can use the same interface internally that you offer externally, you should: it makes your external interface better-tested, helps you spot any design flaws in the external interface early on, and reduces duplication.)

What order should listeners be notified in?
In reverse order of addition. (In the common case where you append each new listener to a list, this means that you should iterate backwards over that list.) The reason for this is that it's the only way for client code to modify behavior you implemented using listeners. You can get away with notifying listeners in the wrong order for a long time, but the first person who tries to override existing behavior will be really annoyed when they can't.

How does a listener say "I've handled this; no more listeners should pay attention"?
By invoking consume on the event you were passed. See Apple Technical QA1363: Unsolicited About Boxes for an example of what might happen if you don't invoke a consume method (the method in question isn't actually called consume, which is another indication that this isn't a well traveled path).

So if I invoke consume, no other listeners are notified?
Incorrect. The rest of the listeners will still be notified. Every listener needs to check isConsumed on the event. Usually, you shouldn't respond to a consumed event. You may never have seen a listener that actually makes this check. Luckily, DefaultCaret contains a good example of listener code that does check. Of particular interest is that the mousePressed code needs to take some action when given a consumed event; it just behaves differently given an unconsumed event.

What this means is that not only do you have to implement your notification code correctly, you have to rely on all of your listeners being implemented correctly. And you can't take the protective short-cut of terminating the iteration over the listeners as soon as one invokes consume on your event, because some listeners might still need the notification.

Is there anything a component author can do to help clients write efficient code?
Offer bulk update methods that translate to a single bulk fire (see AbstractTableModel's fire methods and TableModelEvent for an example). For a hypothetical addElement method, you should have an addElements too, so a client can use your component in a more efficient manner.

Another choice is to offer a getValueIsAdjusting method like ListSelectionEvent. The client can then invoke setValueIsAdjusting (see JList) around its mutation. If a client knows its work is expensive, the information it gets from getValueIsAdjusting lets it hold off until you say that you're done changing for now.

"Design Patterns" touches on this problem (point 3 in the "Implementation" section), but offers an inferior error-prone solution: making clients responsible for invoking a fire method. These choices are similar, in that they require a well-behaved client, but it's significantly easier to write these kinds of well-behaved clients, because the extra burden is only placed on the client in the unusual case. The first suggestion is the better, because a good programmer will look for those methods first.

When a listener is added, should I inform them of the current state?
No. It might seem useful, and it often is, but it's not the Java idiom. Any listener that wants to know the current state when it's added will have to ask. The listener's constructor is often a convenient place to do this. You can also move the addXListener boilerplate inside the listener by making the constructor private and exposing a static method that creates and registers the listener (though this is an uncommon idiom).

What about "Chain of Responsibility"?
Chain of Responsibility, another pattern from "Design Patterns" can be thought of as a degenerate case of Observer, where the assumption is that at most one listener (called a Handler in that pattern) is interested in the notification. Strangely, the "Related Patterns" sections for these two patterns fail to mention the other. A recent JavaWorld article demonstrates why you don't want to use Chain of Responsibility, and fails to give any reason why you'd use this (uncommon) idiom in favor of the more common and better Observer pattern.

The modifications suggested in the article move the Chain of Responsibility pattern even closer to Observable, taking iteration back out of the hands of the individual listeners, but because the iteration is implemented through inheritance, you lose the ability to have listener interfaces: you need abstract classes instead. This is a poor choice in a language with single inheritance, such as Java.


Jikes 1.21 on Mac OS 10.2.8

For reasons of a broken SuperDrive, my PowerBook is stuck on Mac OS 10.2.8. This didn't matter to me, because I haven't done any development on it in over a year, but right now I'm moving, and the PowerBook is the only machine I can guarantee I'll have to hand for a while. So I care again.

If I try something like this on Mac OS 10.2.8 with Jikes built today from CVS:

titanium:~$ cat test.java
public class test {
public static void main(String[] args) {
for (int i = 0; i < args.length; ++i) {
titanium:~$ ~/OpenSource/jikes/src/jikes -bootclasspath /System/Library/Frameworks/JavaVM.framework/Classes/classes.jar test.java

I get crashes like this:

Date/Time: 2004-08-29 22:20:48 +0100
OS Version: 10.2.8 (Build 6R73)
Host: titanium

Command: jikes
PID: 21023

Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_INVALID_ADDRESS (0x0001) at 0x4f244fe4

Thread 0 Crashed:
#0 0x000bd8dc in Utf8LiteralTable::FindOrInsert(char const*, int) (lookup.cpp:1383)
#1 0x00144db4 in Zip::ProcessSubdirectoryEntries(DirectorySymbol*, char*, int) (zip.cpp:518)
#2 0x000e8308 in Zip::ReadDirectory() (zip.cpp:163)
#3 0x000e1ebc in Control::ProcessBootClassPath() (system.cpp:595)
#4 0x000e18dc in Control::ProcessPath() (system.cpp:445)
#5 0x0004830c in Control::Control[unified](char**, Option&) (control.cpp:154)
#6 0x000b6aec in JikesAPI::compile(char**) (jikesapi.cpp:211)
#7 0x000b6320 in main (jikes.cpp:116)
#8 0x00001fb8 in _start (crt.c:267)
#9 0x00001e38 in start

PPC Thread State:
srr0: 0x000bd8dc srr1: 0x0000d030 vrsave: 0x00000000
xer: 0x20000000 lr: 0x000bd880 ctr: 0x00000000 mq: 0x00000000
r0: 0x13c913f9 r1: 0xbffff720 r2: 0x00000046 r3: 0x01070cc0
r4: 0x01081b20 r5: 0x00000008 r6: 0xffffffe0 r7: 0xffffffc0
r8: 0xffffff80 r9: 0x00000000 r10: 0x00000008 r11: 0x00000000
r12: 0x00000000 r13: 0x01070914 r14: 0x010708f4 r15: 0x01070a48
r16: 0x010708bc r17: 0x010709b8 r18: 0x01081b00 r19: 0x01070884
r20: 0x01081b00 r21: 0x010818b0 r22: 0x010c002e r23: 0x01081b20
r24: 0x4f244fe4 r25: 0x00000008 r26: 0x01070cc0 r27: 0x13c913f9
r28: 0x00000008 r29: 0x01082180 r30: 0x01081b20 r31: 0x000bd880

The symbol in question seems to be "META-INF". If I scp the .jar to my Mac OS 10.3 machine, everything's fine. I can't reproduce the problem there.

So I re-build Jikes on the PowerBook (taking about 9 minutes; this is why I do no development on this machine) after manually changing the CFLAGS to remove the -O2, so I'll be able to get a clearer picture of what's going on in the debugger. And now Jikes works fine.

Until someone convinces me there's a problem with Jikes, rather than with gcc 3.1 20020420 on Mac OS 10.2.8, I just stopped caring. I can live with an unoptimized build. Using Jikes instead of javac and make(1) instead of Ant makes this a usable development machine.


C#/.NET doesn't look right either

I've heard people poke fun at Java by asking the question "how many Java applications do you run?". The question doesn't work with me because my editor, my terminal emulator, my revision history browser and check-in tool are all written in Java. These are the programs I use the most.

On the other hand, I hadn't ever run a C# application until a couple of weeks back. I wouldn't even have been able to name one. Which is perhaps a good thing. In an ideal world, you wouldn't be able to distinguish applications on the basis of the tools used to build them.

But I can now name a C# application. It's called SharpReader, and it's a news aggregator like NetNewsWire. Only not nearly as good. I'm not particularly interested in the details of this particular application, and why NetNewsWire is better: what interests me is that a C# application doesn't automatically look and feel like a Windows application.

At this point, I realize that the very idea of a consistent "Windows" look and feel doesn't seem right. That isn't the impression Windows leaves me with. Even two different Microsoft applications will happily use scroll bars that behave differently, different menu bar implementations, and the like. I'd always assumed that these things weren't implemented by each application, but on Windows that doesn't seem to be the case. And even when you get past things like that, there's a flagrant disregard for good taste and even Microsoft's Windows User Experience guidelines (they exist, they're just not widely followed, not even by Microsoft themselves).

I can't really explain how disorientating an experience it was to configure my parents' wireless network card on their PC. Mac users will already know what I mean, and I can only assume Windows users just don't notice that no two applications look or feel alike. There's no graphical language, no spacing or typographical conventions, no color conventions, nothing of anything that helps you steady yourself in an artificial world where anything is possible. Unless you've had anything better, you probably think this is how computers have to be. My mum thought her laptop was light until she was holding my PowerBook in her weak left hand and realized she couldn't lift hers without using both hands, and when it dawned on her, she said "you've spoilt it for me now". So maybe Unix users shouldn't bang on about stability and reliability and scriptability, and Mac users shouldn't bang on about beauty and consistency. Maybe we're not helping anyone?

Anyway, back to what SharpReader taught me.

C# doesn't seem to ensure that your application's buttons look like buttons. Or that your application's dialogs have any kind of Windows-like spacing. Or that your application won't happily retain 20MB of heap. Or that your application won't swap back in really slowly. (I never understood why Windows users leveled that criticism at Java applications: Windows just seems to cope much worse in such situations than Unix, and it seems to get itself into such situations much more easily. And experience with GNOME suggests that the C programmers have nothing to laugh about either. The application that takes longest to swap back in after a heavy C++ link on my Linux box? It's none of the Java applications: it's Evolution, which is a real dog. Thunderbird, which I've recently switched to, has no such problem on the same mailboxes.)

Don't get me wrong: I'm sure writing C# is a better experience than writing C++. But if the cost is being tied to Windows, and encouraging the dominance of bad products whose main selling point is being cheap (if only because they come "free" with your computer or you steal them because "everyone" does)... what do I get in return?


Ethereal's quit dialog

If you've used Ethereal, you'll have seen its quit dialog. It used to be one of those really confusing "yes/no/cancel" dialogs, where you had to read the text a few times to work out what actions "yes" and "no" corresponded to. My most recent apt-get upgrade brought me a new version, and with it a new quit dialog.

Here's my rendition of it:

Save capture file before program quit?

If you quit the program without saving, your capture data will be discarded.

[Continue without Saving] [Cancel] [Save]

Even without the strange "before program quit" construction and the distracting explanation of the consequences in terms that don't map directly to buttons, this would be a work of art because ...

... when they say "continue", they actually mean "quit".

It just goes to show: you can give developers simple rules like "don't use yes/no/cancel; explain the actions", but you can't be sure they'll follow them in a sensible manner.

The developers of Ethereal (like a more frequent Ethereal user I work with) have probably internalized the rule as "click on the left-most button" and don't care what the dialog or the buttons say. But for those of who don't use this program on a daily basis, we've gone from a dialog that helps us accidentally choose the wrong option to a dialog that leaves us sitting there wondering which button lets us quit.

Even if we've been happy with Ethereal up to that point, the memory we'll take away is one of frustration.


Stepford Safari

Was Nicole Kidman using her 43.18cm PowerBook in "Stepford Wives" the first time we've seen someone use Safari on the big screen?

Speaking of women with taste, I liked the way that in Jennifer Garner's character's perfect future in "13 going on 30", all the computers were Macs.

I wonder if all the C programmers were dead, too?


Jikes bug #3947

I haven't had time to do anything with Jikes for months now, but I finally got round to fixing bug #3947 last night. That's the one about whether we should warn if a local in a static method has the same name as an instance variable:

class C {
int i;
static void m() {
int i = 0; // Is this worth warning about?

One argument says that we shouldn't warn, because the instance variable isn't accessible from the static method.

Another argument says that we should warn, because it's confusing to have a local with the same name as an instance variable regardless of what kind of method it's in.

The best argument I came up with in favor of not warning is that it's quite common practice to write static methods that return an instance that they construct after preparing the constructor arguments in local variables. The usual idiom for constructor arguments is that they have the same names as the instance variables they correspond to, so it's likely that they'll be the same names used in the static method. Something like:

class C {
int i;
private C(int i) {
this.i = i;
static C fromString(String s) {
int i = Integer.parseInt(s);
return new C(i);

I also looked at what g++ does with equivalent C++, and found that it doesn't warn.

As of a few minutes ago, when I had chance to run the regression tests and found that the change doesn't break anything, CVS Jikes doesn't warn either.


Save As...

There aren't many things I hate about Mac OS, but there's one thing that drives me mad. If I say save TagsUpdater.patch in /tmp, I mean it. And I'd much rather be asked if I really want to replace the file of that name that's already there than have Mac OS silently pick a different name and use that instead. And not even own up to what it's done!

So the file I asked to be created is there, but it has different contents. And the content I wanted is there too, but with a name different than the one I thought we agreed to. Which can be really confusing and annoying when the file in question is a patch.

Damn you, whoever thought up this idiocy! How am I supposed to trust something that lies to me?


Action replay

I've just added my blog to java.blogs and I'm feeling kind of guilty that most of my recent posts have been about software in general than specifically Java issues. So, in an effort to convince people that there has been, and will continue to be, serious Java stuff here, here are a few of my favorites from last month:

[Unfortunately, Blogger's busted again for Mac users, so I can't use "Preview" in Camino, and can't publish in Safari: it gives an error when you hit "Publish Post", but does actually do the next step; you just have to start Camino to publish. So right now I'm living without a preview. Hopefully this will come out okay.]

Tracking all instances of a class

In Diagnosing AWT thread issues, I made use of the fact that java.awt.Frame gives you access to all the extant frames. Last night, while adding a menu item to accept a spelling to JTextComponentSpellingChecker, I needed to do something similar because I needed a way to tell all spelling checkers to re-check their documents.

One way to do this would have been to have a static listener list and static add and remove methods. I didn't want to do this because I'm sick of writing listener list code and wanted to solve this problem once and for all, and because with no C++-like destructors in Java, removing yourself isn't as comfortable as I'd like it to be. One of the goals of my spelling checker (like most of my library code) is that the behavior should be as close to 'free' as possible. If it takes more than a single line to add all the spelling checking functionality to a JTextComponent, I would consider that a failure.

I thought I knew what I wanted: a mapping from Class to Set<WeakReference>, and one method to add a maplet, and another to get a list of the instances of a given class. I thought I'd check out Frame first, though, and see if there was anything I should be aware of.

There are three methods and one field involved. The methods add to, remove from, and return a copy of the list.

The first unusual aspect is the field that each Frame has:

transient private WeakReference weakThis;

This is the WeakReference that's kept in the list, and it's kept hold of by the instance so that Frame.finalize can remove it from the list so that it can be garbage collected. Not the Frame: that's fine; it's the WeakReference instance itself that could be leaked. Two weak references to the same object, it seems, are not equal. (I don't know why they didn't want to iterate over the list, invoking WeakReference.get and using ==.)

An unusual aspect of the implementation to my eyes is that the methods don't use the synchronized modifier. Why are the methods not synchronized? Because they're not static, so they wouldn't be synchronizing on anything useful. Why aren't they static? Because they access weakThis. One design decision forces another (although not really; it could always be passed as a parameter). Presumably for reasons of consistency, the getFrames method, which is static also uses a synchronized block rather than the synchronized modifier.

The really weird bit is getFrames. It has to filter out the weak references that now point to null. It does this by taking each instance from its list, and inserting it into a Frame[], and then copying the used part of that array into a new array of the right size. This is how we used to program, before Java got to be as fast as it is today. We used to worry about each and every allocation, and we used to use primitive arrays and System.arraycopy. To be honest, I'd almost forgotten!

Given that I can't imagine where this would sensibly be used for performance-critical code, I went with just an ArrayList, because that lets me say in 10 lines what Sun's code from the days of Java 1.2 took 35 lines to say. Being able to see an entire method at a glance is worth its weight in gold.

I toyed with the idea of having something you subclass if you want your class' instances to be tracked; it would have worked fine for the application I had in mind, but it wouldn't have been possible to use anything similar for Frame. It's usually best to avoid subclassing if you can, even in languages with multiple inheritance.

I'm also planning on waiting until leaking WeakReference instances becomes a problem before doing anything about it. But I don't want explicit removal. I'd rather have getInstancesOfClass use a proper Iterator and remove ones I find pointing to null. That, or have a daemon thread.

Anyway, here's my current general-purpose implementation, in its entirety:

package e.util;

import java.lang.ref.*;
import java.util.*;

* Lets you keep track of all the instances of a given class. Just add a call
* to InstanceTracker.addInstance to the class' constructor, and you can
* later use InstanceTracker.getInstancesOfClass to get a list of the instances
* that are still extant.
* @author Elliott Hughes
public class InstanceTracker {
private static final HashMap CLASS_TO_INSTANCES_MAP = new HashMap();

public static synchronized void addInstance(Object instance) {
ArrayList instances = instancesOfClass(instance.getClass());
instances.add(new WeakReference(instance));

private static synchronized ArrayList instancesOfClass(Class klass) {
ArrayList instances = (ArrayList) CLASS_TO_INSTANCES_MAP.get(klass);
if (instances == null) {
instances = new ArrayList();
CLASS_TO_INSTANCES_MAP.put(klass, instances);
return instances;

public static synchronized Object[] getInstancesOfClass(Class klass) {
ArrayList result = new ArrayList();
List weakReferences = instancesOfClass(klass);
for (int i = 0; i < weakReferences.size(); ++i) {
WeakReference weakReference = (WeakReference) weakReferences.get(i);
Object instance = weakReference.get();
if (instance != null) {
return result.toArray(new Object[result.size()]);

private InstanceTracker() {

This is now part of the library available from my home page. As is the JTextComponent spelling checking.


Arts student logic

Slashdot had a link to a vacuous page about Java's recent comeback that contained a great example of why we shouldn't let arts students loose with numbers:

Before the inevitable complaints ("But it never went anywhere!") start, let's remember that everything is relative. A "Googlefight" on, say, Java vs .NET tells us that all has not necessarily gone Java's way just recently. A "mere" 66 million "Java" hits...versus 388 million for "NET" - but that may all be about to change.

By this logic, the THE operating system (the first operating system to use semaphores) is more popular than Linux. Indeed: a mere 95 million "Linux" hits versus 5.8 billion for "THE" — but that may all be about to change.

Here's a rule of thumb that many – arts students and science students alike – should bear in mind: if a result seems unlikely, it's probably wrong.

iPod update 3.0.1

I just updated the software on my iPod. I was involved for two or three seconds: after clicking on "Update", I watched a progress bar flash by (FireWire is fast), and that was it. The iPod rebooted itself, and booted slowly with a progress bar. When I say 'slowly', it was still quicker than my BlackBerry boots under normal circumstances. This, RIM, is how to do it. Be quick, ask no questions, provide good feedback — which part of this is non-obvious?


24 Hours

There are 24 hours in a day. Not two sets of 12 hours.

It might have made sense to pretend that there are two 12-hour halves to the day when analog clock faces were all we had, but I doubt it. I've seen analog clocks with a single hand that display how far through the day we are. If what you're doing is unimportant enough that you can use an analog timepiece, what do you care if you can't distinguish seconds or even minutes? That analog clock's probably not NTP synchronized anyway, so the extra precision won't give you any more accuracy.

If you tell Evolution to use the 24 hour clock for its calendar it will do so, though that isn't the default setting. Unfortunately, Evolution will continue to use the 12 hour clock for your mail. So sent/received times use the 12 hour clock, and there's no way to change that.

Blogger lets you choose a date format for posts (as you see, I use ISO 8601 dates), but it doesn't automatically use that format for its interface. At the bottom of the text area I'm composing this in (which, for reasons I don't understand given the really cool stuff DHTML/CSS gurus can do, doesn't stretch to anywhere near the bottom of the window) there's an option to "Change Time & Date" that reads 11:00 PM Aug 12 2004. The rest of the interface is similar: 12 hour clocks and weird US date formats. At least I'm a native English speaker and the month is given as a word. Otherwise I'd be really confused.

Mac OS lets me ask for metric units, and ISO 8601 dates and 24-hour times, but it insists on a distinction between "long dates" and "short dates", and won't let me specify ISO 8601 for the latter. The nearest I can get is 2004-Aug-12. And not all applications pay attention to these settings. Address Book, for example, shows birthdays in the form 11 August 2004 (for someone born yesterday).

It doesn't look like it's going to get better any time soon, either. The movie "I, Robot" relies in part on us using the 12 hour clock in 2035. Will Smith is almost killed because of this system that's out of date even now.

We have 31 years left in which to prevent this. Fix these broken programs now, and stop writing new ones.


Little touches: Safari downloads

So I'm downloading Apple's Java 1.4.2 update 1, and I've got Safari's "Downloads" window open, but I can also see enough of the desktop to see the icon for the file that's being downloaded. And I can see that the icon is a generic file icon but with a Safari logo and a progress bar. And the progress bar is changing as the file downloads.

Implementing stuff like that is what makes programming such fun.

Software Update versus cron

Don't let cron tidy /tmp while Software Update is running. Guess where it downloads the updates to?

The error message is less helpful than it could be; it asks you to check that you have write permission on a directory in /tmp that doesn't exist. But "directory in /tmp that doesn't exist" is quite a good clue as to what's gone wrong.


Writing iTunes' XML

Is iTunes' "iTunes Music Library.xml" write-only from iTunes' perspective? I've found that bad metadata is annoying even for non-classical music, and I've got a couple of CDs where the artist and song name are in each others' fields. I've also got several collection CDs where the artist and song name have both been encoded in the song name field, with "Artist" given as "Various".

I decided that the easiest way to fix problems where fields are transposed would be to use a text editor. iTunes' interface isn't designed for large numbers of edits, and it has no direct support for fixing this (surprisingly common) particular class of error.

The trouble is, I can't make iTunes read its XML file. Not even if I also change the modified dates (in ISO 8601 format) for the affected entries. And if I rm the binary file iTunes seems to really use, it re-scans the .mp3 files in its folder, but not the ones that live elsewhere. Which means a bunch of stuff (basically, the non-ripped stuff) disappears.

Google gave no obvious indication that anyone's got iTunes to read its XML file; just that they've been able to get their own programs to read it. This is unfortunate because it looks like I'll have to do a lot of manual fixing of metadata using iTune's less than stellar built-in facilities.

Alternatively, I could write a script to parse the XML file, make sure that all the files not in the iTunes folder are moved in, update the tags in the files themselves based on the XML data, delete iTunes' binary library file and start iTunes to do a re-scan. But that's a lot more involved than what I thought I was letting myself in for.

So why the XML file? Nothing more than buzzword compliance?



A mixed start for my iPod. It looks great, though the metal half of the case isn't obviously a good idea; the plastic half is much easier to keep in pristine condition. I like the way the backlight comes on instantly but fades out. And the white backlight (with no buzzing) is far classier than the BlackBerry's buzzy green backlight. The font's nice and you soon get used to the interface.

The first trick was working out that I was supposed to rub my finger around the gray circle rather than press the back/forward buttons.

The second was working out the volume control. I failed to find it amongst the menu items when I was looking for it, but my finger accidentally found it as I handed the iPod to my mum for a listen. This resulted in some head-scratching as I tried to turn the volume back up to an audible level!

iTunes just did the right thing, both the first time I connected my iPod and also the second time, after I'd removed/edited a few songs. I wish it was clearer to me how/if I'll be able to work with my current situation of two Macs with two different iTunes music libraries: one for classical music and the other for dance/trance/pop.

The bad part is that it's only a day old and it's crashed once already. It hung with the drive spinning. I couldn't get it to respond to me, so I followed the instructions in the manual to reboot it. Instructions which seemed a bit awkward — do I really have to be near a wall socket with the power adapter to hand? But then, I shouldn't ever have to do this again. Right?

Drilling down to the track you want to play really works well, at least when the metadata is good. I'm going to have to do some work on my classical library.

My first impression though, assuming that it isn't going to hang on a regular basis, is that everyone who has any kind of music collection should have an iPod. This is the portable implementation music collections have been waiting for. Just as iTunes (with its search as-you-type) is the implementation that sessile music collections have been waiting for.


Twelfth Night act 3 scene 1

I try not to listen to song lyrics. They annoy me by their silliness, or confuse me if I try to extract meaning from their meaningless twitter. I excuse opera and Lieder because with them I get to practice a language other than English, but they're every bit as bad as popular music.

One song I've particularly enjoyed lately (and I don't care to ask Google how far behind the times this makes me; I think it's a couple of years old now) is a vocal trance version of Mario Lopez' "Blind". I thought the lyrics went (in their entirety) "Because you can't turn a drop of water // Into an ocean".

This, I thought, was a very succinct way of putting the point made by Viola in Twelfth Night:

VIOLA I pity you.

OLIVIA That's a degree to love.

VIOLA No, not a grize; for 'tis a vulgar proof,
That very oft we pity enemies.

(I was embarrassed to find myself at a concert the other week talking about the instrument the viola using the pronunciation of the Shakespeare heroine Viola. I'm used to using the operatic pronunciation of Desdemona, but this is worse!)

Anyway, Google says I'm wrong. Google thinks it's can, not can't. It's not sure though, with only a handful of matches in 4.2 billion pages.

I'm not sure what "Because you can turn a drop of water // Into an ocean" is supposed to mean. Maybe they meant "in to" instead? Or maybe we're supposed to take it literally, and picture Ray Mears making practical use of osmosis.

This is what I meant about confusing myself. I can't help but get sucked in to the exegesis.

I wonder if there's a self-help group?

GNOME Panel and ~elliotth/

I'd long been embarrassed by the absence of any Java equivalent to NSString's stringByExpandingTildeInPath, which returns a string made by replacing any initial "~" or "~user" component with the corresponding path. This is useful because it (and its companion, stringByAbbreviatingWithTildeInPath) let the user use the same paths their shell lets them use, within your application.

Eventually, it annoyed me so much that I wrote something to translate "~" to System.getProperty("user.home"). And then I found myself assuming that because "~" worked in my Java applications, "~user" should work too. So I came up with code that assumes that "~user" is equivalent to "~/../user/", and for some time now I've been happy.

I might have preferred that Java did this for me. I might have preferred that File was called something more honest like Pathname, and was a bit more useful (some understanding of symbolic links, say). But given that I'm not in a position to fix the JDK, I was relatively content.

I knew Objective-C programmers had it better, and I knew GTK+ programmers had it worse. (Where did the plus come from? The same place as "Professional" or "Advanced" in conjunction with MS Windows, I suppose.) But I assumed that they'd at least copy and paste the tilde-handling code into each of their great fat C slugs.

Not GNOME Panel, though. I tried to launch "~elliotth/Projects/edit/edit" and was informed that it didn't exist. I knew it did. But I checked anyway. I'm a developer; I make a living from questioning my assumptions. It existed. "Interesting", I thought, "that it doesn't show me the pathname it handed to exec; it's generally good practice to present user-friendly forms to the user, but in an error message you should show exactly what you're using ... maybe that is what they're using?"

They were. Nice! By the time Linux is ready for the desktop (and it will be years yet), it won't be recognizably Unix any longer. And those of us who love Unix will be running Mac OS, I guess.

Which is fine by me.


When /./ should match '\n'

Every now and again, I come across a situation where I want to match any character, including '\n'. It's happened often enough that I now recognize as I type . that it isn't what I want. Usually, I end up solving my problem a different way.

The other day, though, I was having trouble processing a file. Henry Spencer's POSIX regular expressions library contains a file regcomp.c that has a C-style comment containing a brace. My code was getting confused by this, and it looked like the easiest solution was to strip out C-style comments, like I was already stripping out escaped characters, character and string literals, and C++-style comments.

C++ comments are easy, because you actually want anything but a newline. I could have written .* instead of using a character class here, but I felt that was less obvious than being explicit:

text = text.replaceAll("//[^\\n]*", "_"); // Remove C++ comments.

C comments are tricker, though, because they can span multiple lines. What you want is what Java's Pattern.DOTALL flag, but it would be a pain to switch from String.replaceAll just to use the flag. Luckily, you can use an embedded flag expression:

text = text.replaceAll("/\\*(?s).*?\\*/", "_"); // Remove C comments.

That's all there is to it. And I'm sure I've known this before. I'm hoping that writing it down will help me remember, but I'm not sure I'm not more likely to forget something after I've written it down. And Google isn't indexing me yet.


Evolution search

I've been meaning to say something nice about Evolution for a while. I keep talking about the bad bits, and I never mention the good bits. One thing I wanted to mention was the check-as-you-type spelling checking, which is okay; nothing special, but MS Outlook doesn't do this for me (though I'm told newer versions do).

I also wanted to mention the search, which I was going to describe as good. It's certainly nice and fast. And it'll let you search the entire message, even if that isn't the default, despite the fact that searching the entire message is perfectly fast enough; people who don't remember old Unix mailers' grep-like speed and have only known MS Outlook have no idea how fast a decent mailbox search is.

But it won't let me do what Apple's Mail does by default: search entire messages in all mailboxes. In fact, it won't let me do any kind of search across all mailboxes. I'm not sure why I should care which mailbox a message is in (and why I couldn't just ask to restrict my search on the rare occasion when I might care), but Evolution insists.

This wasted my time this morning when it did find plausible mails in "Trash" that didn't contain the one I was led to believe I should have received. It was still in my "Inbox". A much smaller mailbox than the one that took no noticeable time to search.

Have some people learned nothing about why Google is so great?


Everyone hates classical music

I've finally got round to ripping my classical music collection with iTunes. I'd resisted for a long time because the CDDB – though not too bad for dance/trance – is full of the lowest quality data imaginable for classical music CDs.

A glance at the iTunes window to the right shows *"Klavierksonata" and *"Introduzione, Meastoso Ed Adagio" next to one another. So there are spelling mistakes.

The Klaviersonata in question, Beethoven's "Appassionata", has three different artists: "Allegro Assai", "Andante con moto", "Allegro , ma non troppo". So there is some confusion about capitalization, spacing, and what to do with a movement name.

Because most classical music wasn't written by native English-speakers, there are a lot of non-ASCII characters needed. Unfortunately, CDDB seems totally ignorant of encoding. Sometimes (presumably when it was entered as ISO-Latin-1) it's fine, but often non-ASCII characters are mangled or simply missing. Other times, the clueless have stripped accents themselves, presumably not realizing their importance or not knowing how to type them. (I have to admit that I couldn't face entering the data for ΜΙΚΡΟΥΤΣΙΚΟΣ ΘΑΝΟΣ' "The Return of Helen".)

And I hope you're not an opera fan, or a fan of longer works. If you have anything split over multiple CDs, you'll find that completely different people will have entered the details for each CD, and they'll differ wildly in spelling ability, degree of understanding of the words they're copying from the back of the CD case, ideas about which bit of information goes in which field (confusing "Artist" and "Composer" is particularly common), and even what the "Album" is called. Woe betide you if you have 20 CDs of Haydn string quartets that you were hoping would appear with any kind of uniformity.

CDDB hates classical music.

And don't try to be clever and use iTunes to change all the tracks on multiple CDs at once; if you do, it'll mix all the tracks up and you'll have to rip them again. (There may be another way to repair the damage, but I haven't found one.) I foresee some editing of the XML metadata directly. Grim.

iTunes hates classical music.

Of course, iTune's killer feature is the "Search" field. It's the only way to make 3497 songs, 12.1 days, 19.63 GB of music navigable. You can get by with that, if you treat each row in the table of "songs" as free-form text. Mostly, most of the information is there. So you might have a Naxos recording of Brahms' first symphony listed with nothing to suggest that it's a symphony or by Brahms. But you can still find most stuff.

Until, presumably, you stick your music on an iPod. At which point the lack of a keyboard kills you. You can no longer rely on the Google approach to finding your music. You need to go through the hierarchy. Except it's busted because someone out there thinks the "Artist" field for your favorite recording of Berlioz' Symphonie Fantastique is "Jhon Eliot Gardinder". I kid you not. They missed the "Sir", the scum!

iPod hates classical music.

The worst part is that the problem's not easy to fix. Neither in terms of "what should I do?" nor "how should I do it?". Part of the reason that the data is so bad is that it's not obvious how you should encode the information about a classical piece. The fields, the idea of getting by with just fields and no higher-level groupings (no "this is a movement of a string quartet, this is a string quartet"), and the user interface all suggest that no-one thought of classical music. And even if you could get everyone to agree, how would you clean up CDDB?

The "hating" of classical music here is mostly transitive, though. Suppose CDDB weren't full of rubbish. Then iTunes wouldn't require you to fiddle with "Album" titles, and you wouldn't find out that it mixes up the tracks if you modify more than one CD at once. And you wouldn't wish quite so strongly that the iPod had a keyboard, because – though it might never work as well as a keyboard – you would at least be able to find stuff if you tried hard enough.

In the meantime, I can't help wishing that iTunes had some way of letting me tidy up large amounts of metadata. If you're editing a field in-place rather than in the "Get Info" dialog, you can't even move directly to editing the next field, as you would in a spreadsheet, say. Just that would ease a lot of my pain right now.