2005-09-10

Linux Java and Apple's X11

Although I've never had any trouble running Java programs on one Linux box and displaying them on another Linux box's display via X11, I'd never tried to display on a Mac until this week. I used ssh -X to connect to a machine at work from home, and tried xclock(1). Worked fine. Then I tried the program I actually wanted: SCM's check-in tool. Its window appeared, with the correct text in the title bar, but no visible window content.

If I ran a simple program, though, such as salma-hayek's DualTimeClock (which just has a JLabel for San Jose time and a JLabel for Bracknell time in a FlowLayout), it worked fine.

Whatever the problem is, it seems to be specific to Apple's X11 server, and it doesn't happen with all Swing programs. It's not just a network thing, though, because I had the same problem the next day at work when I tried to display on a Mac there from a Linux machine connected to the same switch.

Running Java 6 build 51 didn't help, either. (I had wondered if it could be anything to do with the X11 event loop problems mentioned in Sun bug 6193066.)

I'll have to investigate further when I've got a non-Mac at home, which should be as soon as next week...

...which is now. 2005-09-18. The first thing to do was to not use ssh -X, because that meant that all Ethereal could tell me was that there was encrypted traffic. Getting a naked connection to work was harder than I expected. xhost + wasn't sufficient. I know Debian Linux these days doesn't listen on the network, but netstat(1) showed that Apple's X11 was there on *:6000, tcp4 and tcp6. macosxhints to the rescue again, via Google, with the factoid that Apple's default IP firewall rules don't allow X11 traffic. (And Apple's firewall has no UI, other than that for configuring it. Nor does it keep logs by default.)

Normally, there's no traffic on the NETGEAR LAN. The Ultra 20 is the only device, and it connects to an IMAP server once a minute, but that's about it. So seeing hundreds of packets per second was a good clue that it wasn't just stuck. I left it about 50 seconds to see if it would come to a stop, and at that point a window appeared on the Mac's display. So it does actually work, it's just insanely slow.

A look at the capture shows a few TCP retransmissions, which is probably due to the wireless link between the NETGEAR and the Mac. But the real problem looks to be 6,160 ListFonts X11 requests out of the 15,760 packets. (Remember that there will be 6,160 ListFonts replies, too.)

Most of the ListFonts replies have a replylength of 0.

Anyway, running java in gdb on my Ultra 20 with a breakpoint on XListFonts, I saw that most of the backtraces were identical, and were in MToolkit. The two commonest backtraces (determined unscientifically; I haven't learned dtrace(1) yet) were this one:

#0 0xee757612 in XListFonts () from /usr/openwin/lib/libX11.so.4
#1 0xee331c75 in get_font_from_maxfonts () from /usr/openwin/lib/locale/common/xomLTRTTB.so.2
#2 0xee33220f in parse_fontname () from /usr/openwin/lib/locale/common/xomLTRTTB.so.2
#3 0xee332484 in create_fontset () from /usr/openwin/lib/locale/common/xomLTRTTB.so.2
#4 0xee332bfc in create_oc () from /usr/openwin/lib/locale/common/xomLTRTTB.so.2
#5 0xee756b29 in XCreateOC () from /usr/openwin/lib/libX11.so.4
#6 0xee75665d in XCreateFontSet () from /usr/openwin/lib/libX11.so.4
#7 0xeec778b2 in AWTIsHeadless () from /usr/jdk/instances/jdk1.5.0/jre/lib/i386/motif21/libmawt.so

And this one:

#0 0xee757612 in XListFonts () from /usr/openwin/lib/libX11.so.4
#1 0xeec7e5e8 in AWTCountFonts () from /usr/jdk/instances/jdk1.5.0/jre/lib/i386/motif21/libmawt.so
#2 0xee6a1c55 in Java_sun_font_NativeFont_haveBitmapFonts () from /usr/jdk/instances/jdk1.5.0/jre/lib/i386/libfontmanager.so

A quick export AWT_TOOLKIT=XToolkit later (see here), and I can display Java applications on my Mac without an insane amount of network traffic and an insanely long start-up time. At which point, I'm unconvinced that this is the problem I saw on Linux at work, because XToolkit is the default there. So now I guess I need to look at a network capture of that.

Update: Hans Liss from Uppsala University in Sweden explained the likely cause of my other problem. He points to Sun bug 4374153 where Dmitri Trembovetski mentions that you need ForwardX11Trusted set in your SSH configuration (somewhere under /etc, depending on your Unix). It turns out that some systems have this on by default while others default to off.