2012-01-01

Beware convenience methods

The Android documentation links from various methods to Beware the default locale. It's great advice, but most people don't know they need it.

I've never been a great fan of convenience methods (or default parameters), and methods that supply a default locale are a prime example of when not to offer a [supposed] convenience method.

The problem here is that most developers don't even understand the choice that's being made for them. Developers tend to live in a US-ASCII world and think that, even if there are special cases for other languages, they don't affect the ASCII subset of Unicode. This is not true. Turkish distinguishes between dotted and dotless I. This means they have two capital 'I's (one with and one without a dot) and two lowercase 'i's (one with and one without a dot). Most importantly, it means that "i".toUpperCase() does not return "I" in a Turkish locale.

The funny thing is, toLowerCase and toUpperCase just aren't that useful in a localized context. How often do you want to perform these operations other than for reflective/code generation purposes? (If you answered "case-insensitive comparison", go to the back of the class; you really need to use String.equalsIgnoreCase or String.CASE_INSENSITIVE_ORDER to do this correctly.)

So given that you're doing something reflective like translating a string from XML/JSON to an Enum value, toUpperCase will give you the wrong result if your user's device is in a Turkish locale and your string contains an 'i'. You need toUpperCase(Locale.ROOT) (or toUpperCase(Locale.US) if Locale.ROOT isn't available). Why doesn't Enum.valueOf just do the right thing? Because enum values are only uppercase by convention, sadly.

(The exception you'll see thrown includes the bad string you passed in, which usually contains a dotted capital I, but you'd be surprised how many people are completely blind to that. Perhaps because monitor resolutions are so high that it looks like little more than a dirt speck: I versus İ.)

In an ideal world, the convenience methods would have been deprecated and removed long ago, but Sun had an inordinate fondness for leaving broken glass lying in the grass.

The rule of thumb I like, that would have prevented this silliness in the first place, is similar to Josh Bloch's rule of thumb for using overloading. In this case, it's only acceptable to offer a convenience method/default parameter when there's only one possible default. So the common C++ idiom of f(Type* optional_thing = NULL) is usually reasonable, but there are two obvious default locales: the user's locale (which Java confusingly calls the "default" locale) and the root locale (that is, the locale that gives locale-independent behavior).

If you think you're safe from these misbegotten methods because you'd never be silly enough to use them, you still need to watch out for anything that takes a printf(3)-style format string. Sun made the mistake of almost always offering those methods in pairs, one of which uses the user's locale. Which is fine for formatting text for human consumption, but not suitable for formatting text for computer consumption. Computers aren't tolerant of local conventions regarding the use of ',' as the decimal separator (in Germany, for example, "1,234.5" would be "1.234,5"), and it's surprisingly easy to write out a file yourself and then be unable to read it back in! (There are a lot more of these locales, though, so in my experience these bugs get found sooner. The Enum.valueOf bug pattern in particular regularly makes it into shipping code.)

Where locales are concerned, there's really no room for convenience methods. Sadly, there's lots of this broken API around, so you should be aware of it. Especially if you're developing for Android where it's highly likely that you actually have many users in non-en_US locales (unlike traditional Java which ran on your server where you controlled the locale anyway).