2005-08-24

Porting JNI code to Win32 with cygwin

Once upon a time, in the dark days when Win32 was the best place to be for a good Java experience (before Sun committed to a Linux JVM, and before Apple had switched to Unix), I was a user. As soon as Sun and Apple gave me choices, though, things went back to normal, and I went back to various flavors of Unix.

When people have asked for a Win32 port of Terminator, I've quoted Dilbert and said "here's a nickel, kid" (misquoted, as it happens: I've always said "proper computer").

My co-conspirator Martin Dorey's a bit more caring, though, and has ported Terminator to Win32. More than once.

Unlike our other projects, Terminator uses a bit of JNI. You just can't do the whole child-spawning dance in Java, so we do it in C++ and write to a private field in FileDescriptor objects allocated by the Java side. We used to have a separate executable, based on W Richard Stevens' "pty" example. Martin ported that the same week that Phil Norman rewrote it all as JNI code.

(We realized we might be making porting to new platforms more difficult because JNI tends to be far more intricately connected to the Java side than a separate process can be, and you lose a lot of room to maneuver when you agree to the tighter coupling. Resulting in problems such as having to make cygwin work in the JVM. We knew also that we would be increasing the chances that a mistake in our native code would kill the JVM. On the other side of the equation, we gained clarity, efficiency, and finer control over what was going on. One of the focuses of our C++ JNI wrappers is to ensure proper error checking. We didn't know about -Xcheck:jni at the time, and we're still not exactly sure what checks it enables. Anyway, our C++ classes constrain us to use the favored idioms.)

The first Terminator port was effected by just building with the cygwin tools. We didn't even have to stub modern POSIX functions, as we did on Mac OS 10.3, which didn't offer them.

The second port wasn't expected to be as difficult as it was. The hope was that the JNI code would just need to be built under cygwin. If you ask Google about cygwin-using JNI code on Win32, though, you'll know it's not.

Pages that mention cygwin almost all use -mno-cygwin or the mingw compiler. The only page Martin could find about building cygwin JNI DLLs (linked with cygwin1.dll) was out of date, and not convincingly correct. And anyway, he only found that while supplying me with material for this post.


  • Google's first match http://www.inonit.com/cygwin/ uses -mno-cygwin as if it's the only option (specifically here).
  • The author of http://sources.redhat.com/ml/cygwin/2005-05/msg00532.html stumbled at the first problem Martin ran in to (see below) but got no reply that was of any use in the general case.
  • BEA have an example http://dev.bea.com/codelibrary/code/jni.jsp that actually uses cygwin natively. It claims to have been updated in 2004, but it uses __cygwin_noncygwin_dll_entry@12 which seems to have rusted away. It talks about cygwin b20 which timestamps at sunsite.bilkent.edu.tr suggest dates from 1998. It mentions EGCS-1.1.2 which is long since obsolete. One makefile mentions jdk1.1.7A (there's a mention of 1.4.0 in the other). Martin got it to build against 1.5.0 with a few hacks, but neither of the example programs worked for him. One caused cygwin to think that something is using an incompatible copy of cygwin1.dll (though nothing is) and the other is linked with javai.dll — the JDK1.1 name for jvm.dll. They do seem to have at least an inkling of the right idea, though — that you need to use the Java Invocation API and compile your invoker with cygwin. But we're getting ahead of ourselves.


Problem one
If there's no other cygwin-using process running, then dynamically loading a DLL which depends (at load-time) on cygwin causes the process to deadlock waiting for a "calibration thread" to complete while the Win32 kernel is preventing other threads from running in this process. This is the problem mailing-list guy had.

Problem two
Dynamically loading cygwin's DLL, or a DLL which uses it, from a non-cygwin executable overwrites the bottom 4 KiB of the stack. This is documented and possible to work around by spawning a thread to load it which takes measures to ensure that the overwritten stack doesn't matter. This can be done from an independent, small, simple, non-cygwin DLL that's loaded before any cygwin-using DLLs that you actually want to use. But this isn't a suitable solution because...

Problem three
...with the work-around from problem two, cygwin's fork(3) still doesn't work, and nor does other more basic stuff either, including cygwin's stdio. Which makes debugging difficult, to say the least.

(The reason why cygwin's fork(3) doesn't work is that it runs the same executable again and subverts its startup code, which it can easily do if the application is a cygwin one — the startup code calls into the cygwin DLL. fork(3)'s already so difficult to implement on Win32 that mere mortals would call it impossible. Making it work for a non-cygwin executable like the JVM would have eaten into their day of rest.)

Solution
The way to solve the cygwin JNI problem robustly is to compile a launcher executable with cygwin. The Java Invocation API works from cygwin and allows a JVM to be launched. This can then System.loadLibrary the cygwin-using JNI DLL, and that can then fork without problem.

If you look in your JDK directory, you'll find the source to the java(1) launcher. You'll only find the source to the version for the operating system you're on, though: Sun's habit of _md files for machine-dependent code means you can't compile it on any supported OS. Moreover, they don't include all the necessary source files (though Google's cache has the missing ones). Another potential problem on the horizon is that the JDK6.0 drops don't seem to include the files. I don't know whether this is an oversight or indicative of an intention to remove them from the distribution.

Worse still, I don't really know what we're allowed to do with the example launcher source. It's nice that most of the intelligence is in the shared object/DLL, but there's still a fair bit of code in the launcher, in particular the code that locates the shared object/DLL.

But if you've got some JNI to port to Win32 and it looks like the only sensible way to do it is to use cygwin, then we think this is your only choice. Feel free to use the source to a Win32 cygwin Java launcher that's now in salma-hayek.

Terminator now works on Win32, and you can ssh(1) to a Unix machine, run vim(1), and everything works right when you resize the window. Quite a testament to cygwin, I think. And to Martin's stubbornness.