2006-10-21

Using HTML in Swing Components

If you look at Sun's Using HTML in Swing Components from the Swing tutorial, you'd be forgiven for thinking that the correct way to use HTML in a Swing component (JLabel or JTextPane, say) is to give it a string starting "<html>" and followed by an arbitrary snippet of HTML.

One reason why you might come to this conclusion is that it's what all of their examples do:

button = new JButton("<html><b><u>T</u>wo</b><br>lines</html>");
...
b1 = new JButton("<html><center><b><u>D</u>isable</b><br>"
...
b3 = new JButton("<html><center><b><u>E</u>nable</b><br>"

The page is allegedly copyright 2006, but I think that was probably updated by a script given how poorly-maintained the tutorial is.

In some random Java 6 source tree I have lying around I see that several Sun classes use the same idiom. There are 13 instances of the idiom; seemingly all the Swing code outside javax.swing uses it while all the code inside javax.swing uses what turns out to be the preferred idiom.

Anecdotally, I think the "<html>" idiom is more common in all the code I've seen.

But is this the right idiom?

If you follow Swing's implementation, you'll find that your "HTML" must start "<html>" to be recognized as HTML (the comparison is case-insensitive). javax.swing.plaf.basic.BasicHTML.isHTMLString ensures that.

So starting with "<html>" is certainly necessary (there's an "html.disable" client property that you can set to Boolean.TRUE to disable HTML parsing, but no property to explicitly force HTML). The question is whether it's sufficient.

Following the implementation further down, you'll find that the HTML gets parsed by javax.swing.text.html.parser.Parser; a parser which, according to its own doc comment, "attempts to parse most HTML files".

The parser keeps track of whether we're in <html>, <head>, or <body> and uses complicated and buggy state machines to try to parse what it sees. One of the bugs (bug 4827083) means that a line starting with a '/' isn't valid until we're inside <body>. This bug catches me every couple of years, because I have a tendency to be displaying code in my Swing components. And code in many important languages has lines that often start "//" or "/*".

Only if you follow the link, you'll see Sun says it isn't a bug. We're supposed to use "<html><body>" as the prefix, and if we don't, Swing may silently drop all text before the end of the next tag.

I wish developers would be honest and just leave bugs open with no real intention of fixing them rather than closing them as "not a bug". Firefox does the right thing with the admittedly ill-formed "<html>// hello", and Swing's interpretation of that first '/' is just plain wrong.

Closing this bug without even fixing the documentation (or the tutorial) was also wrong.

Anyway. There may be other bugs in Swing's HTML parsing, but as far as I know, you're safe if you always start HTML you pass to Swing with "<html><body>".