2012-10-26

How (not) to use readdir_r(3)

TL;DR: always use readdir(3) instead of readdir_r(3).

I spend most of my time fixing bugs. Often I feel like there's something instructive to say, but I don't want to be that guy who's known for pointing out faults in other people's code, as if his own doesn't have bugs.

Today I have a great one, because although there's a lot of blame to go around, the bug that actually caused enough trouble to get reported is my own.

Here's an AOSP patch: libc: readdir_r smashed the stack. Some engineers point out that FAT32 lets you store 255 UTF-16 characters, Android's struct dirent has a 256-byte d_name field, so if you have the right kind of filename on a FAT32 file system, it won't fit in a struct dirent.

But first, let's back up and look at the APIs involved here: readdir(3), readdir_r(3), and getdents(2).

Traditionally, userspace used readdir(3). You'd get a pointer back to a struct dirent that you don't own. POSIX explicitly makes the following guarantee: "The pointer returned by readdir() points to data which may be overwritten by another call to readdir() on the same directory stream. This data is not overwritten by another call to readdir() on a different directory stream.", so there isn't the usual thread-safety problem here. Despite that, readdir_r(3) was added, which lets you supply your own struct dirent, but – critically – not the size of that buffer. So you need to know in advance that your buffer is "big enough". getdents(2) is similar, except there you do pass the size of the buffer, and the kernel gives you as many entries as will fit in your buffer.

If you use a tool like Google's cpplint.py, it'll actually complain if you use readdir(3) and suggest readdir_r(3) instead. This is bad advice. For one thing, POSIX guarantees that you're not sharing some static buffer with other threads on the system. So in the typical case where your function calls opendir(3), readdir(3), closedir(3) and the DIR* never escapes, you're fine.

At this point, if you've been paying attention, you'll be wondering about the size of struct dirent. Isn't readdir(3) a liability if your struct dirent isn't big enough for your file system's longest names? In theory, yes, that's a problem. In practice, readdir_r(3) is the bigger liability.

In practice, you don't have a problem with readdir(3) because Android's bionic, Linux's glibc, and OS X and iOS' libc all allocate per-DIR* buffers, and return pointers into those; in Android's case, that buffer is currently about 8KiB. If future file systems mean that this becomes an actual limitation, we can fix the C library and all your applications will keep working.

In practice, you do have a problem with readdir_r(3) because (a) you can't tell the C library how big your buffer is, so it can't protect you against your own bugs, and (b) it's actually quite hard to get the right buffer size. Most code actually just allocates a regular struct dirent on the stack and passes the pointer to that, so in practice most users of readdir_r(3) are demonstrably less safe than the equivalent readdir(3) user. What you actually have to do is allocate a large enough buffer on the heap. But how large is "large enough"? The glibc man page tries to help, suggesting the following:

           len = offsetof(struct dirent, d_name) +
                     pathconf(dirpath, _PC_NAME_MAX) + 1
           entryp = malloc(len);
But that's not quite right because there's a race condition. You probably want to use fpathconf(3) and dirfd(3) so you know you're talking about the same directory that was opened with opendir(3).

So let's look at Android. How many of the readdir_r(3) calls are correct? Here's an AOSP tree as of now:

~/aosp$ find . -name *.c* -print0 | xargs -0 grep -lw readdir_r | sort
./bionic/libc/bionic/opendir.cpp
./dalvik/vm/Thread.cpp
./external/bluetooth/glib/gio/gunixmounts.c
./external/chromium/base/file_util_posix.cc
./external/clang/lib/Basic/FileManager.cpp
./external/dbus/dbus/dbus-sysdeps-util-unix.c
./external/dbus/dbus/dbus-sysdeps-util-win.c
./external/linux-tools-perf/builtin-script.c
./external/linux-tools-perf/util/event.c
./external/linux-tools-perf/util/parse-events.c
./external/wpa_supplicant_6/wpa_supplicant/src/common/wpa_ctrl.c
./external/wpa_supplicant_8/src/common/wpa_ctrl.c
./hardware/libhardware_legacy/wifi/wifi.c
./libcore/luni/src/main/native/java_io_File.cpp
./system/core/debuggerd/backtrace.c
./system/core/debuggerd/tombstone.c
./system/vold/CommandListener.cpp
./system/vold/VolumeManager.cpp
~/aosp$ 
(The match in bionic is the implementation of readdir_r, and the match in clang is a comment.)

The following allocate a struct dirent on the stack: dalvik, external/bluetooth, external/chromium, external/linux-tools-perf, external/wpa_supplicant, hardware/libhardware_legacy/wifi, libcore, and system/core/debuggerd.

The following use malloc: external/dbus (uses sizeof(dirent), so this is just trades the above stack smashing bug for an equivalent heap smashing bug), and system/vold, which uses the probably-right glibc recommendation.

The dalvik and libcore bugs, at least, are my fault. And for years I've changed readdir(3) to readdir_r(3) when people bring it up in code reviews, because I never had a strong argument against. But now I do. Using readdir(3) is simpler, safer, and its correctness is the C library maintainer's problem, not yours. The key to understanding why this is so (if you've skipped the rest of this article) is that the dirent* you get isn't a pointer to a regular fixed-size struct dirent, it's a pointer into a buffer that was large enough to contain the directory entry in question. We know this because the kernel getdents(2) API is sane, takes a buffer size, and won't scribble outside the lines.

2012-08-31

How to read Dalvik SIGQUIT output

If you're a long-time Java developer you're probably used to sending SIGQUIT to a Java process (either via kill -3 or hitting ctrl-\) to see what all the threads are doing. You can do the same with Dalvik (via adb shell kill -3), and if you're ANRing the system server will be sending you SIGQUIT too, in which case the output will end up in /data/anr/traces.txt (see the logcat output for details).

Anyway, I've found that very few people actually know what all the output means. I only knew a few of the more important bits until I became a Dalvik maintainer. This post should hopefully clear things up a little.

To start with, here's an example from my JNI Local Reference Changes in ICS post:

    "Thread-10" prio=5 tid=8 NATIVE
      | group="main" sCount=0 dsCount=0 obj=0xf5f77d60 self=0x9f8f248
      | sysTid=22299 nice=0 sched=0/0 cgrp=[n/a] handle=-256476304
      | schedstat=( 153358572 709218 48 ) utm=12 stm=4 core=8
      at MyClass.printString(Native Method)
      at MyClass$1.run(MyClass.java:15)

Ignore the Java stack trace for now. If there's demand, I'll come back to that later, but there's nothing interesting in this particular example. Let's go through the other lines...

First, though, a quick note on terminology because there are a lot of different meanings of "thread" that you'll have to keep clear. If I say Thread, I mean java.lang.Thread. If I say pthread, I mean the C library's abstraction of a native thread. If I say native thread, I mean something created by the kernel in response to a clone(2) system call. If I say Thread*, I mean the C struct in the VM that holds all these things together. And if I say thread, I mean the abstract notion of a thread.

"Thread-10" prio=5 tid=8 NATIVE

The thread name comes first, in quotes. If you gave a name to a Thread constructor, that's what you'll see here. Otherwise there's a static int in Thread that's a monotonically increasing thread id, used solely for giving each thread a unique name. These thread ids are never reused in a given VM (though theoretically you could cause the int to wrap).

The thread priority comes next. This is the Thread notion of priority, corresponding to the getPriority and setPriority calls, and the MIN_PRIORITY, NORM_PRIORITY, and MAX_PRIORITY constants.

The thread's thin lock id comes next, labelled "tid". If you're familiar with Linux, this might confuse you; it's not the tid in the sense of the gettid(2) system call. This is an integer used by the VM's locking implementation. These ids come from a much smaller pool, so they're reused as threads come and go, and will typically be small integers.

The thread's state comes last. These states are similar to, but a superset of, the Thread thread states. They can also change from release to release. At the time of writing, Dalvik uses the following states (found in enum ThreadStatus in vm/Thread.h):

    /* these match up with JDWP values */
    THREAD_ZOMBIE       = 0,        /* TERMINATED */
    THREAD_RUNNING      = 1,        /* RUNNABLE or running now */
    THREAD_TIMED_WAIT   = 2,        /* TIMED_WAITING in Object.wait() */
    THREAD_MONITOR      = 3,        /* BLOCKED on a monitor */
    THREAD_WAIT         = 4,        /* WAITING in Object.wait() */
    /* non-JDWP states */
    THREAD_INITIALIZING = 5,        /* allocated, not yet running */
    THREAD_STARTING     = 6,        /* started, not yet on thread list */
    THREAD_NATIVE       = 7,        /* off in a JNI native method */
    THREAD_VMWAIT       = 8,        /* waiting on a VM resource */
    THREAD_SUSPENDED    = 9,        /* suspended, usually by GC or debugger */

You won't see ZOMBIE much; a thread is only in that state while it's being dismantled. RUNNING is something of a misnomer; the usual term is "runnable", because whether or not the thread is actually scheduled on a core right now is out of the VM's hands. TIMED_WAIT corresponds to an Object.wait(long, int) call. Note that Thread.sleep and Object.wait(long) are currently both implemented in terms of this. WAIT, by contrast, corresponds to a wait without a timeout, via Object.wait(). MONITOR means that the thread is blocked trying to synchronize on a monitor, Either because of a synchronized block or an invoke of a synchronized method (or theoretically, on a call to JNIEnv::MonitorEnter).

The INITIALIZING and STARTING states are aspects of the current (at the time of writing) implementation of the thread startup dance. As an app developer, you can probably just chunk these two as "too early to be running my code". NATIVE means that the thread is in a native method. VMWAIT means that the thread is blocked trying to acquire some resource that isn't visible to managed code, such as an internal lock (that is, a pthread_mutex). SUSPENDED means that the thread has been told to stop running and is waiting to be allowed to resume; as the comment says, typically as an app developer you'll see this because there's a GC in progress or a debugger is attached.

Not shown in this example, a daemon thread will also say "daemon" at the end of the first line.

| group="main" sCount=0 dsCount=0 obj=0xf5f77d60 self=0x9f8f248

The Thread's containing ThreadGroup name comes next, in quotes.

The sCount and dsCount integers relate to thread suspension. The suspension count is the number of outstanding requests for suspension for this thread; this is sCount. The number of those outstanding requests that came from the debugger is dsCount, recorded separately so that if a debugger detaches then sCount can be reset appropriately (since there may or may not have been outstanding non-debugger suspension requests, we can't just reset sCount to 0 if a debugger disconnects).

(If there's demand, I'll talk more about thread suspension in another post, including when suspension can occur, and what suspension means for unattached threads and threads executing native methods.)

The address of the Thread comes next, labeled obj.

The address of the Thread* comes next, labeled self. Neither of these addresses is likely to be useful to you unless you're attaching gdb(1) to a running dalvikvm process.

| sysTid=22299 nice=0 sched=0/0 cgrp=[n/a] handle=-256476304

The kernel's thread id comes next, labeled sysTid. You can use this if you're poking around in /proc/pid/task/tid. This is usually the only useful item on this line.

The kernel's nice value for the process comes next, labeled nice. This is as returned by the getpriority(2) system call.

The pthread scheduler policy and priority come next, labeled sched. This is as returned by the pthread_getschedparam(3) call.

The cgrp is the name of the thread's scheduler group, pulled from the appropriate cgroup file in /proc.

The pthread_t for the pthread corresponding to this thread comes next, labeled handle. This is not much use unless you're in gdb(1).

| schedstat=( 153358572 709218 48 ) utm=12 stm=4 core=8

The schedstat data is pulled from the per-process schedstat files in /proc. The format is documented in the Linux kernel tree (Documentation/scheduler/sched-stats.txt):

      1) time spent on the cpu
      2) time spent waiting on a runqueue
      3) # of timeslices run on this cpu
If your kernel does not support this, you'll see "schedstat=( 0 0 0 )".

The user-mode and kernel-mode jiffies come next, labeled utm and stm. These correspond to the utime and stime fields of the per-thread stat files in /proc. On sufficiently new versions of Dalvik, you'll also see something like "HZ=100", so you can double-check that jiffies are the length you expect. These numbers aren't much use in isolation, except for seeing which threads are taking all the CPU time (if any).

The cpu number of the core this thread was last executed on comes next, labeled core.

2012-04-04

gettid on Mac OS

The Linux kernel has a gettid(2) call that returns the current thread's thread id. These numbers can be handy. They're useful for debugging/diagnostic purposes, they're useful in conjunction with other tools, and they're using for pulling stuff out of /proc/<pid>/task/<tid>/.

But what about code that needs to run on Mac OS too?

If your program's single-threaded, you can use getpid(3) instead. This might sound silly, but you might well find that limiting your Mac build to a single thread lets you avoid all kinds of Mac OS pthread woe, and lets you get on with more important stuff. But this won't suit everyone.

If you poke around, you'll see that the Darwin/xgnu kernel actually has a gettid(2) system call. But before you get excited, you'll find that it's completely unrelated to the Linux gettid(2). It returns the "per-thread override identity", which is a uid and gid that a thread might be operating under (like a per-thread setuid(2) kind of facility). No use to us.

If you poke a bit further, you'll find modern kernels have a thread_selfid(2) system call. This gives you the closest equivalent, but the numbers are going to be a lot larger than you're used to on Linux (they're 64-bit integers, and quite high). And this doesn't work on 10.5 or earlier (the system call was first implemented in 10.6).

Speaking of 10.6, there's also a non-portable pthread_threadid_np(3) that turns a pthread_t into a uint64_t. This gives the same values you'd get from thread_selfid(2). Again, this is unsupported in 10.5 and earlier.

So then there's always pthread_self(3). Sure, it returns an opaque type, but you know it's either going to be a thread id itself or, more likely, a pointer to some struct. So cast it to a suitably-sized integer and you're done. You might complain that the numbers are big and unwieldy, but so are your other choices. And at least these ones are portable, not just to old versions of Mac OS but to other OSes too. The values are somewhat useful in gdb(1) too.

The pthread_self(3) pthread_t isn't useful for a managed runtime's thin lock id, but then neither is Linux's gettid(2). If you really need something like that, you're going to have to follow your threads' life cycles and allocate and free ids yourself (either pid style or fd style, depending on whether you value avoiding reuse or smaller values more). So that's another option to consider if you'd like prettier, smaller "thread ids", albeit ones that have no meaning to other tools or parts of the system.

Anyway, here's some example code:
#include <errno.h>
#include <iostream>
#include <pthread.h>
#include <sys/syscall.h>

int main() {
 std::cout << "getpid()=" << getpid() << std::endl;

 std::cout << "pthread_self()=" << pthread_self() << std::endl;
 uint64_t tid;
 pthread_threadid_np(NULL, &tid);
 std::cout << "pthread_threadid_np()=" << tid << std::endl;

 std::cout << "syscall(SYS_thread_selfid)=" << syscall(SYS_thread_selfid) << std::endl;
 return 0;
}

And here's some corresponding example output from a 10.7 system:
getpid()=97626
pthread_self()=0x7fff7932e960
pthread_threadid_np()=2750350
syscall(SYS_thread_selfid)=2750350

2012-03-09

operator<< and function pointers

This one fools me all the time. Maybe if I write it down I'll remember.

What does this code output?
#include <iostream>
int main() {
 std::cout << main << std::endl;
 return 0;
}

When I see a function pointer shown as "1", my first thought is "Thumb2 bug". ARM processors use odd addresses to mean "there's Thumb2 code at (address & ~1)", so a code address of 1 looks like someone accidentally did (NULL | 1).

What's really happened here, though, is that several design decisions have conspired to screw you. Firstly, function pointers aren't like normal pointers. If you reinterpret_cast<void*>(main), say, you'll get an address. (Though what exactly you get the address of in C++ can get quite interesting.) Then the ancient evil of the implicit conversion to bool comes into play, and since your function pointer is non-null, you have true. Then there's an operator<<(std::ostream&, bool), but the default stream flags in the C++ library shows bool values as 1 and 0. You need to explicitly use std::boolalpha to get true and false.

So what you're really seeing here is "you have a non-null function pointer". Which is almost never what you intended.

2012-01-01

Beware convenience methods

The Android documentation links from various methods to Beware the default locale. It's great advice, but most people don't know they need it.

I've never been a great fan of convenience methods (or default parameters), and methods that supply a default locale are a prime example of when not to offer a [supposed] convenience method.

The problem here is that most developers don't even understand the choice that's being made for them. Developers tend to live in a US-ASCII world and think that, even if there are special cases for other languages, they don't affect the ASCII subset of Unicode. This is not true. Turkish distinguishes between dotted and dotless I. This means they have two capital 'I's (one with and one without a dot) and two lowercase 'i's (one with and one without a dot). Most importantly, it means that "i".toUpperCase() does not return "I" in a Turkish locale.

The funny thing is, toLowerCase and toUpperCase just aren't that useful in a localized context. How often do you want to perform these operations other than for reflective/code generation purposes? (If you answered "case-insensitive comparison", go to the back of the class; you really need to use String.equalsIgnoreCase or String.CASE_INSENSITIVE_ORDER to do this correctly.)

So given that you're doing something reflective like translating a string from XML/JSON to an Enum value, toUpperCase will give you the wrong result if your user's device is in a Turkish locale and your string contains an 'i'. You need toUpperCase(Locale.ROOT) (or toUpperCase(Locale.US) if Locale.ROOT isn't available). Why doesn't Enum.valueOf just do the right thing? Because enum values are only uppercase by convention, sadly.

(The exception you'll see thrown includes the bad string you passed in, which usually contains a dotted capital I, but you'd be surprised how many people are completely blind to that. Perhaps because monitor resolutions are so high that it looks like little more than a dirt speck: I versus İ.)

In an ideal world, the convenience methods would have been deprecated and removed long ago, but Sun had an inordinate fondness for leaving broken glass lying in the grass.

The rule of thumb I like, that would have prevented this silliness in the first place, is similar to Josh Bloch's rule of thumb for using overloading. In this case, it's only acceptable to offer a convenience method/default parameter when there's only one possible default. So the common C++ idiom of f(Type* optional_thing = NULL) is usually reasonable, but there are two obvious default locales: the user's locale (which Java confusingly calls the "default" locale) and the root locale (that is, the locale that gives locale-independent behavior).

If you think you're safe from these misbegotten methods because you'd never be silly enough to use them, you still need to watch out for anything that takes a printf(3)-style format string. Sun made the mistake of almost always offering those methods in pairs, one of which uses the user's locale. Which is fine for formatting text for human consumption, but not suitable for formatting text for computer consumption. Computers aren't tolerant of local conventions regarding the use of ',' as the decimal separator (in Germany, for example, "1,234.5" would be "1.234,5"), and it's surprisingly easy to write out a file yourself and then be unable to read it back in! (There are a lot more of these locales, though, so in my experience these bugs get found sooner. The Enum.valueOf bug pattern in particular regularly makes it into shipping code.)

Where locales are concerned, there's really no room for convenience methods. Sadly, there's lots of this broken API around, so you should be aware of it. Especially if you're developing for Android where it's highly likely that you actually have many users in non-en_US locales (unlike traditional Java which ran on your server where you controlled the locale anyway).

2011-09-22

How can I get a thread's stack bounds?

I've had to work out how to get the current thread's stack bounds a couple of times lately, so it's time I wrote it down.

If you ask the internets, the usual answer you'll get back is to look at the pthread_attr_t you supplied to pthread_create(3). One problem with this is that the values you get will be the values you supplied, not the potentially rounded and/or aligned values that were actually used. More importantly, it doesn't work at all for threads you didn't create yourself; this can be a problem if you're a library, for example, called by arbitrary code on an arbitrary thread.

The real answer is pthread_getattr_np(3). Here's the synopsis:
#include <pthread.h>

int pthread_getattr_np(pthread_t thread, pthread_attr_t* attr);

You call pthread_getattr_np(3) with an uninitialized pthread_attr_t, make your queries, and destroy the pthread_attr_t as normal.
pthread_attr_t attributes;
errno = pthread_getattr_np(thread, &attributes);
if (errno != 0) {
  PLOG(FATAL) << "pthread_getattr_np failed";
}

void* stack_address;
size_t stack_size;
errno = pthread_attr_getstack(&attributes, &stack_address, &stack_size);
if (errno != 0) {
  PLOG(FATAL) << "pthread_attr_getstack failed";
}

errno = pthread_attr_destroy(&attributes);
if (errno != 0) {
  PLOG(FATAL) << "pthread_attr_destroy failed";
}

Note that you don't want the obsolete pthread_attr_getstackaddr(3) and pthread_attr_getstacksize(3) functions; they were sufficiently poorly defined that POSIX replaced them with pthread_attr_getstack(3).

Note also that the returned address is the lowest valid address, so your current stack pointer is (hopefully) a lot closer to stack_address + stack_size than stack_address. (I actually check that my current sp is in bounds as a sanity check that I got the right values. Better to find out sooner than later.)

Why should you call pthread_getattr_np(3) even if you created the thread yourself? Because what you ask for and what you get aren't necessarily exactly the same, and because you might want to know what you got without wanting to specify anything other than the defaults.

Oh, yeah... The "_np" suffix means "non-portable". The pthread_getattr_np(3) function is available on glibc and bionic, but not (afaik) on Mac OS. But let's face it; any app that genuinely needs to query stack addresses and sizes is probably going to contain an #ifdef or two anyway...

2011-05-04

signal(2) versus sigaction(2)

Normally, I'm pretty gung-ho about abandoning old API. I don't have head space for every crappy API I've ever come across, so any time there's a chance to clear out useless old junk, I'll take it.

signal(2) and sigaction(2) have been an interesting exception. I've been using the former since the 1980s, and I've been hearing that it's not portable and that I should be using the latter since the 1990s, but it was just the other day, in 2010, that I first had an actual problem. (I also knew that sigaction(2) was more powerful than signal(2), but had never needed the extra power before.) If you've also been in the "there's nothing wrong with signal(2)" camp, here's my story...

The problem

I have a bunch of pthreads, some of which are blocked on network I/O. I want to wake those threads forcibly so I can give them something else to do. I want to do this by signalling them. Their system calls will fail with EINTR, my threads will notice this, check whether this was from "natural causes" or because I'm trying to wake them, and do the right thing. So that the signal I send doesn't kill them, I call signal(2) to set a dummy signal handler. (This is distinct from SIG_IGN: I want my userspace code to ignore the signal, not for the kernel to never send it. I might not have any work to do in the signal handler, but I do want the side-effect of being signalled.)

So imagine my surprise when I don't see EINTR. I check, and the signals are definitely getting sent, but my system calls aren't getting interrupted. I read the Linux signal(2) man page and notice the harsh but vague:
The only portable use of signal() is to set a signal's disposition to
SIG_DFL or SIG_IGN. The semantics when using signal() to establish a
signal handler vary across systems (and POSIX.1 explicitly permits this
variation); do not use it for this purpose.

POSIX.1 solved the portability mess by specifying sigaction(2), which
provides explicit control of the semantics when a signal handler is
invoked; use that interface instead of signal().

It turns out that, on my system, using signal(2) to set a signal handler is equivalent to using the SA_RESTART with sigaction(2). The Open Group documentation for sigaction(2) actually gives an example that's basically the code you'd need to implement signal(2) in terms of sigaction(2).) The SA_RESTART flag basically means you won't see EINTR "unless otherwise specified". (For a probably untrue and outdated list of exceptions on Linux, see "man 7 signal". The rule of thumb would appear to be "anything with a timeout fails with EINTR regardless of SA_RESTART", presumably because any moral equivalent of TEMP_FAILURE_RETRY is likely to lead to trouble in conjunction with any syscall that has a timeout parameter.)

Anyway, switching to sigaction(2) and not using the SA_RESTART flag fixed my problem, and I'll endeavor to use sigaction(2) in future.

Assuming I can't stay the hell away from signals, that is.

Alternative solutions

At this point, you might be thinking I'm some kind of pervert, throwing signals about like that. But here's what's nice about my solution: I use a doubly-linked list of pthreads blocked on network I/O, and the linked list goes through stack-allocated objects, so I've got no allocation/deallocation, and O(1) insert and remove overhead on each blocking I/O call. A close is O(n) in the number of threads currently blocked, but in my system n is currently very small anyway. Often zero. (There's also a global lock to be acquired and released for each of these three operations, of course.) So apart from future scalability worries, that's not a bad solution.

One alternative would be to dump Linux for that alt chick BSD. The internets tell me that at least some BSDs bend over backwards to do the right thing: close a socket in one thread and blocked I/O on that socket fails, courtesy of a helpful kernel. (Allegedly. I haven't seen BSD since I got a job.) Given Linux's passive-aggressive attitude to userspace, it shouldn't come as a surprise that Linux doesn't consider this to be its problem, but changing kernel is probably not an option for most people.

Another alternative would be to use shutdown(2) before close(2), but that has slightly different semantics regarding SO_LINGER, and can be difficult to distinguish from a remote close.

Another alternative would be to use select(2) to avoid actually blocking. You may, like me, have been laboring under the misapprehension that the third fd set, the one for "exceptional conditions", is for reporting exactly this kind of thing. It isn't. (It's actually for OOB data or reporting the failure of a non-blocking connect.) So you either need to use a timeout so that you're actually polling, checking whether you should give up between each select(2), or you need to have another fd to select on, which your close operation can write to. This costs you up to an fd per thread (I'm assuming you try to reuse them, rather than opening and closing them for each operation), plus at least all the bookkeeping from the signal-based solution, plus it doubles the number of system calls you make (not including the pipe management, or writing to the pipe/pipes when closing an fd). I've seen others go this route, but I'd try incest and morris dancing first.

I actually wrote the first draft of this post last August, and the SA_RESTART solution's been shipping for a while. Still, if you have a better solution, I'd love to hear it.

2009-08-21

Java on a thousand cores

Cliff Click wrote a recent blog post that started "warning: no technical content", which wasn't true. Buried 3/4 of the way through was a link to the PDF slides for the "Java on 1000 cores" talk he's been giving recently. I was in the audience of the specific instance of the talk he mentioned, but if you weren't, you can now watch the video on YouTube.

I tend to avoid talks because so many of them are a complete waste of time. This one, though, was so good it had me thinking about why I've mostly given up on talks.

For one thing, most speakers speak far too slowly. I know one of the rules everyone's taught about public speaking is to speak more slowly. That because you're nervous you'll speak faster than you realize, and that sounds like you're rushing, and that's bad. Which might have made sense in the 1700s when public speaking likely meant boring your congregation to sleep every Sunday morning, where the point was less the content than the submission to authority. Going slow might make sense if you're trying to educate beginners (though there I think repetition and usage is what matters, not slow presentation). I can even admit that going slow is downright necessary if you're walking someone through a physical procedure, but it's an awful way to present summaries or reviews (in the sense of "review paper"). And most talks are, one way or another, summaries or reviews.

As if looking for a less literal way in which they can speak too slowly, few speakers have a good feel for how many words any particular point deserves. You know these people. You've been in their talks. These people have their four obvious bullet points on their slide, four bullet points that you grasp in the second or two after the slide is up, and somehow they manage to spend five minutes laboriously wading their way through these points while adding nothing to your understanding. And all the time, you stare at the screen willing the next slide to appear. Wishing you'd brought your Apple remote. (They're not paired by default, you know.)

Don't be afraid to have a lot of stuff on your slides, either. The people who tell you not to have more than a couple of points per slide have nothing to say. At best, they're salesmen. (See Steve Jobs' slides and presentations. Things of beauty both, but designed for selling, not transfer of technical information and ideas.) The naysayers also assume you're making all the usual mistakes too, when they should instead tell you to speak fast and only speak useful words, in which case it won't matter how much stuff is on each slide, because no-one will be bored enough that they're sat there just reading the slides. If they're not listening, taking away the slides is not a fix for your problem.

Another bad habit is not asking anything of your audience. I don't mean questions. I mean prerequisites. Seriously, those five people at your talk about building STL-like containers who don't know what the STL is? Tell them to fuck off and stop wasting everybody's time. Or, if you think that's a bit much, just start with "this talk assumes a basic familiarity with X, Y, and Z; if you don't have this, I'm afraid you probably won't get much out of this talk". You're not helping anyone by spending ten minutes rambling on about stuff that anyone who's going to get anything out of your talk already knows anyway. Unless you're someone like Steve Jones, you probably don't need to explain science to the layman. It's much more likely you're talking to an audience of your peers, and you should respect their time as you'd expect them to respect yours.

Also, please don't waste my time with talks where the only thing I learn is how much you like the sound of your own voice, or talks that only exist because you get some kind of merit badge for having given a talk. In the former case, get a blog. Then people can decide for themselves whether they like the sound of your voice, and subscribe if they do and ignore you if they don't. In the merit badge case, corporate life is full of bogus achievements; I'm sure you can find one that minimizes waste of other people's time. Hell, if it helps, I'll make you a little "I didn't waste anyone's time giving a gobshite talk" certificate.

Et cetera.

Anyway, getting back to Cliff Click's talk... his was a great example of what talks should be like. His content was interesting and concentrated. He assumed his audience knew the basics of the areas he was talking about. He spoke fast. Fast enough that I was often still digesting his previous point while he was on to the next. (And for all you MBAs: this is a good thing. Better too much content than too little. I can always re-read the slides/re-watch the video afterwards. And sometimes thinking about the last thing I was interested in helps me coast through a bit I'm less interested in.)

One thing I particularly like about Cliff Click's stuff in general is his practitioner's point of view. He asks the important practical questions: "Have you successfully built one?", "Did it do what you expected?", "What is it good for?", "What isn't it good for?", "Is it worth it?".

As for this particular talk, there were several new ideas to me (I hadn't heard of the optimistic escape detection [as opposed to escape analysis] he mentioned in passing, for example), several things I found surprising (that write buffers turned out to be unimportant with Azul's CLZ [unrelated to ARM's CLZ], for example), and a lot of familiar stories about mixed hardware-software designs, interesting mainly because it hadn't occurred to me that they might be common to all hardware-software companies.

Plus it was a model of how to give a good talk.