Far-east Asian fonts with Java 7 on Ubuntu

If you read JDK6の新しい国際化機能についての記事, which my Mac reliably informs me means "The article concerning the internationalization function whose JDK6 is new", and you're using Java 7 on your Ubuntu box (as I am), you might have wondered why this guy's fonts are working and yours aren't.

Step 1: Actually install some non-English language support.
Choose "System" > "Administration" > "Language Support" from the menu, and check the languages you're interested in. This will net you all kinds of things (such as spelling checker dictionaries), but most importantly, this is how you make sure you have the right fonts actually installed. By default, if you chose "English" when you installed, you'll only have support for English (though typically, fonts supporting English will support other European languages, even though you won't have stuff like the spelling dictionaries for those languages that explicitly selecting the languages will get you).

Step 2: Tell Java to make use of your new fonts.
In the olden days, when everyone only read and wrote New Jersey Unix English and encoded it in ASCII, fonts would contain a glyph for every one of those glorious 95 printable characters, and maybe one extra for all the unprintable ones. 96 is a nice small number, but when UTF-8 came along, 65,536 seemed like a lot of glyphs. And a German font designer was unlikely to have designed new Hangul glyphs as part of his font, while a Korean font designer probably didn't worry too much about the glyphs needed for English.

The composite font (or "logical font") was invented to make up for this. Pike and Thompson's Hello World or Καλημέρα κόσμε or こんにちは 世界 was the first place I read about this idea. (TeX had used a more general form of the same idea virtual fonts for some time, but they were significantly more complex.)

The idea of a composite font is that it assigns different subfonts (or "physical fonts") to use for different Unicode ranges. So you get your Latin range from this physical font but your Greek range from this physical font and so on. Sometimes these physical fonts are designed to look good together, and sometimes they're not. But an ugly juxtaposition is better than not being able to see the character at all.

In Java, the composite fonts aren't hard-coded. They're defined in (these days) fontconfig.properties files in the JVM's jre/lib/ directory. You can find more details of the actual search in Sun's Font Configuration Files documentation, but you'd expect (especially given the recent announcements) that there would be an Ubuntu-specific file. There isn't, at least not as far as Java 7-b11. I've raised Sun bug 6551584 for this.

So, you need to get yourself a "fontconfig.properties" appropriate to Ubuntu. You can write your own, as I did the day before reading Naoto Sato's post, or you can just copy the one that's package with the sun-java6-jdk package.

Now, as long as you only use one of Java's logical fonts ("Dialog" or "Monospaced" and so on) you'll see glyphs for the far-east Asian languages too. If you use a physical font such as "Lucida Sans Typewriter", you'll only see glyphs for the range that the physical font covers. This is in contrast to Mac OS, where everything just works without configuration, even if you ask for a physical font.

What if my application needs a physical font with fallbacks?
If you use new Font("Verdana"), say, you get just that physical font. You might think there would be a convenient way to say "I'd like a composite font, please, with sensible fallback fonts and this primary physical font", but you'd be wrong. Now, if your physical font is proportionally-spaced, there's good news. There's private, undocumented API in sun.font.FontManager that lets you create a new CompositeFont (which is-a java.awt.Font). The composite font is hard-coded to use the "Dialog" logical font fallbacks, and it's the mechanism used by Swing to fit in with the native platform's font.

It turns out that this API is actually exposed by javax.swing.text's StyleContext class, in the shape of the getFont method. If you call StyleContext.getDefaultStyleContext.getFont instead of new Font, you'll get a composite font. The only thing to watch out for is that StyleContext's so-called "cache" doesn't have an eviction policy. So if you're creating lots of randomized fonts, this might cause problems.

Obviously, returning a composite font isn't documented behavior of this method, but it would hurt Swing to regress, so it's unlikely to be broken. And you can always reflect FontManager.getCompositeFontUIResource if the worst comes to the worst.

What if my application needs a monospaced physical font with fallbacks?
If your physical font is monospaced, you're in trouble. The use of the "Dialog" logical font for fallbacks is hard-coded, so you can't just ask for "Monospaced" for your fallbacks instead. I've tried using reflection on both CompositeFont.replaceComponentFont and FontManager.replaceFont, but without success.

The only reliable work-around I can think of would be to rewrite your text rendering to use multiple fonts, choosing the right one for each run of characters in any given Unicode range. (That is, duplicate the work of CompositeFont in your application's code.) I've raised Sun bug 6551615 for this.