2012-10-26

How (not) to use readdir_r(3)

TL;DR: always use readdir(3) instead of readdir_r(3).

I spend most of my time fixing bugs. Often I feel like there's something instructive to say, but I don't want to be that guy who's known for pointing out faults in other people's code, as if his own doesn't have bugs.

Today I have a great one, because although there's a lot of blame to go around, the bug that actually caused enough trouble to get reported is my own.

Here's an AOSP patch: libc: readdir_r smashed the stack. Some engineers point out that FAT32 lets you store 255 UTF-16 characters, Android's struct dirent has a 256-byte d_name field, so if you have the right kind of filename on a FAT32 file system, it won't fit in a struct dirent.

But first, let's back up and look at the APIs involved here: readdir(3), readdir_r(3), and getdents(2).

Traditionally, userspace used readdir(3). You'd get a pointer back to a struct dirent that you don't own. POSIX explicitly makes the following guarantee: "The pointer returned by readdir() points to data which may be overwritten by another call to readdir() on the same directory stream. This data is not overwritten by another call to readdir() on a different directory stream.", so there isn't the usual thread-safety problem here. Despite that, readdir_r(3) was added, which lets you supply your own struct dirent, but – critically – not the size of that buffer. So you need to know in advance that your buffer is "big enough". getdents(2) is similar, except there you do pass the size of the buffer, and the kernel gives you as many entries as will fit in your buffer.

If you use a tool like Google's cpplint.py, it'll actually complain if you use readdir(3) and suggest readdir_r(3) instead. This is bad advice. For one thing, POSIX guarantees that you're not sharing some static buffer with other threads on the system. So in the typical case where your function calls opendir(3), readdir(3), closedir(3) and the DIR* never escapes, you're fine.

At this point, if you've been paying attention, you'll be wondering about the size of struct dirent. Isn't readdir(3) a liability if your struct dirent isn't big enough for your file system's longest names? In theory, yes, that's a problem. In practice, readdir_r(3) is the bigger liability.

In practice, you don't have a problem with readdir(3) because Android's bionic, Linux's glibc, and OS X and iOS' libc all allocate per-DIR* buffers, and return pointers into those; in Android's case, that buffer is currently about 8KiB. If future file systems mean that this becomes an actual limitation, we can fix the C library and all your applications will keep working.

In practice, you do have a problem with readdir_r(3) because (a) you can't tell the C library how big your buffer is, so it can't protect you against your own bugs, and (b) it's actually quite hard to get the right buffer size. Most code actually just allocates a regular struct dirent on the stack and passes the pointer to that, so in practice most users of readdir_r(3) are demonstrably less safe than the equivalent readdir(3) user. What you actually have to do is allocate a large enough buffer on the heap. But how large is "large enough"? The glibc man page tries to help, suggesting the following:

           len = offsetof(struct dirent, d_name) +
                     pathconf(dirpath, _PC_NAME_MAX) + 1
           entryp = malloc(len);
But that's not quite right because there's a race condition. You probably want to use fpathconf(3) and dirfd(3) so you know you're talking about the same directory that was opened with opendir(3).

So let's look at Android. How many of the readdir_r(3) calls are correct? Here's an AOSP tree as of now:

~/aosp$ find . -name *.c* -print0 | xargs -0 grep -lw readdir_r | sort
./bionic/libc/bionic/opendir.cpp
./dalvik/vm/Thread.cpp
./external/bluetooth/glib/gio/gunixmounts.c
./external/chromium/base/file_util_posix.cc
./external/clang/lib/Basic/FileManager.cpp
./external/dbus/dbus/dbus-sysdeps-util-unix.c
./external/dbus/dbus/dbus-sysdeps-util-win.c
./external/linux-tools-perf/builtin-script.c
./external/linux-tools-perf/util/event.c
./external/linux-tools-perf/util/parse-events.c
./external/wpa_supplicant_6/wpa_supplicant/src/common/wpa_ctrl.c
./external/wpa_supplicant_8/src/common/wpa_ctrl.c
./hardware/libhardware_legacy/wifi/wifi.c
./libcore/luni/src/main/native/java_io_File.cpp
./system/core/debuggerd/backtrace.c
./system/core/debuggerd/tombstone.c
./system/vold/CommandListener.cpp
./system/vold/VolumeManager.cpp
~/aosp$ 
(The match in bionic is the implementation of readdir_r, and the match in clang is a comment.)

The following allocate a struct dirent on the stack: dalvik, external/bluetooth, external/chromium, external/linux-tools-perf, external/wpa_supplicant, hardware/libhardware_legacy/wifi, libcore, and system/core/debuggerd.

The following use malloc: external/dbus (uses sizeof(dirent), so this is just trades the above stack smashing bug for an equivalent heap smashing bug), and system/vold, which uses the probably-right glibc recommendation.

The dalvik and libcore bugs, at least, are my fault. And for years I've changed readdir(3) to readdir_r(3) when people bring it up in code reviews, because I never had a strong argument against. But now I do. Using readdir(3) is simpler, safer, and its correctness is the C library maintainer's problem, not yours. The key to understanding why this is so (if you've skipped the rest of this article) is that the dirent* you get isn't a pointer to a regular fixed-size struct dirent, it's a pointer into a buffer that was large enough to contain the directory entry in question. We know this because the kernel getdents(2) API is sane, takes a buffer size, and won't scribble outside the lines.

2012-08-31

How to read Dalvik SIGQUIT output

If you're a long-time Java developer you're probably used to sending SIGQUIT to a Java process (either via kill -3 or hitting ctrl-\) to see what all the threads are doing. You can do the same with Dalvik (via adb shell kill -3), and if you're ANRing the system server will be sending you SIGQUIT too, in which case the output will end up in /data/anr/traces.txt (see the logcat output for details).

Anyway, I've found that very few people actually know what all the output means. I only knew a few of the more important bits until I became a Dalvik maintainer. This post should hopefully clear things up a little.

To start with, here's an example from my JNI Local Reference Changes in ICS post:

    "Thread-10" prio=5 tid=8 NATIVE
      | group="main" sCount=0 dsCount=0 obj=0xf5f77d60 self=0x9f8f248
      | sysTid=22299 nice=0 sched=0/0 cgrp=[n/a] handle=-256476304
      | schedstat=( 153358572 709218 48 ) utm=12 stm=4 core=8
      at MyClass.printString(Native Method)
      at MyClass$1.run(MyClass.java:15)

Ignore the Java stack trace for now. If there's demand, I'll come back to that later, but there's nothing interesting in this particular example. Let's go through the other lines...

First, though, a quick note on terminology because there are a lot of different meanings of "thread" that you'll have to keep clear. If I say Thread, I mean java.lang.Thread. If I say pthread, I mean the C library's abstraction of a native thread. If I say native thread, I mean something created by the kernel in response to a clone(2) system call. If I say Thread*, I mean the C struct in the VM that holds all these things together. And if I say thread, I mean the abstract notion of a thread.

"Thread-10" prio=5 tid=8 NATIVE

The thread name comes first, in quotes. If you gave a name to a Thread constructor, that's what you'll see here. Otherwise there's a static int in Thread that's a monotonically increasing thread id, used solely for giving each thread a unique name. These thread ids are never reused in a given VM (though theoretically you could cause the int to wrap).

The thread priority comes next. This is the Thread notion of priority, corresponding to the getPriority and setPriority calls, and the MIN_PRIORITY, NORM_PRIORITY, and MAX_PRIORITY constants.

The thread's thin lock id comes next, labelled "tid". If you're familiar with Linux, this might confuse you; it's not the tid in the sense of the gettid(2) system call. This is an integer used by the VM's locking implementation. These ids come from a much smaller pool, so they're reused as threads come and go, and will typically be small integers.

The thread's state comes last. These states are similar to, but a superset of, the Thread thread states. They can also change from release to release. At the time of writing, Dalvik uses the following states (found in enum ThreadStatus in vm/Thread.h):

    /* these match up with JDWP values */
    THREAD_ZOMBIE       = 0,        /* TERMINATED */
    THREAD_RUNNING      = 1,        /* RUNNABLE or running now */
    THREAD_TIMED_WAIT   = 2,        /* TIMED_WAITING in Object.wait() */
    THREAD_MONITOR      = 3,        /* BLOCKED on a monitor */
    THREAD_WAIT         = 4,        /* WAITING in Object.wait() */
    /* non-JDWP states */
    THREAD_INITIALIZING = 5,        /* allocated, not yet running */
    THREAD_STARTING     = 6,        /* started, not yet on thread list */
    THREAD_NATIVE       = 7,        /* off in a JNI native method */
    THREAD_VMWAIT       = 8,        /* waiting on a VM resource */
    THREAD_SUSPENDED    = 9,        /* suspended, usually by GC or debugger */

You won't see ZOMBIE much; a thread is only in that state while it's being dismantled. RUNNING is something of a misnomer; the usual term is "runnable", because whether or not the thread is actually scheduled on a core right now is out of the VM's hands. TIMED_WAIT corresponds to an Object.wait(long, int) call. Note that Thread.sleep and Object.wait(long) are currently both implemented in terms of this. WAIT, by contrast, corresponds to a wait without a timeout, via Object.wait(). MONITOR means that the thread is blocked trying to synchronize on a monitor, Either because of a synchronized block or an invoke of a synchronized method (or theoretically, on a call to JNIEnv::MonitorEnter).

The INITIALIZING and STARTING states are aspects of the current (at the time of writing) implementation of the thread startup dance. As an app developer, you can probably just chunk these two as "too early to be running my code". NATIVE means that the thread is in a native method. VMWAIT means that the thread is blocked trying to acquire some resource that isn't visible to managed code, such as an internal lock (that is, a pthread_mutex). SUSPENDED means that the thread has been told to stop running and is waiting to be allowed to resume; as the comment says, typically as an app developer you'll see this because there's a GC in progress or a debugger is attached.

Not shown in this example, a daemon thread will also say "daemon" at the end of the first line.

| group="main" sCount=0 dsCount=0 obj=0xf5f77d60 self=0x9f8f248

The Thread's containing ThreadGroup name comes next, in quotes.

The sCount and dsCount integers relate to thread suspension. The suspension count is the number of outstanding requests for suspension for this thread; this is sCount. The number of those outstanding requests that came from the debugger is dsCount, recorded separately so that if a debugger detaches then sCount can be reset appropriately (since there may or may not have been outstanding non-debugger suspension requests, we can't just reset sCount to 0 if a debugger disconnects).

(If there's demand, I'll talk more about thread suspension in another post, including when suspension can occur, and what suspension means for unattached threads and threads executing native methods.)

The address of the Thread comes next, labeled obj.

The address of the Thread* comes next, labeled self. Neither of these addresses is likely to be useful to you unless you're attaching gdb(1) to a running dalvikvm process.

| sysTid=22299 nice=0 sched=0/0 cgrp=[n/a] handle=-256476304

The kernel's thread id comes next, labeled sysTid. You can use this if you're poking around in /proc/pid/task/tid. This is usually the only useful item on this line.

The kernel's nice value for the process comes next, labeled nice. This is as returned by the getpriority(2) system call.

The pthread scheduler policy and priority come next, labeled sched. This is as returned by the pthread_getschedparam(3) call.

The cgrp is the name of the thread's scheduler group, pulled from the appropriate cgroup file in /proc.

The pthread_t for the pthread corresponding to this thread comes next, labeled handle. This is not much use unless you're in gdb(1).

| schedstat=( 153358572 709218 48 ) utm=12 stm=4 core=8

The schedstat data is pulled from the per-process schedstat files in /proc. The format is documented in the Linux kernel tree (Documentation/scheduler/sched-stats.txt):

      1) time spent on the cpu
      2) time spent waiting on a runqueue
      3) # of timeslices run on this cpu
If your kernel does not support this, you'll see "schedstat=( 0 0 0 )".

The user-mode and kernel-mode jiffies come next, labeled utm and stm. These correspond to the utime and stime fields of the per-thread stat files in /proc. On sufficiently new versions of Dalvik, you'll also see something like "HZ=100", so you can double-check that jiffies are the length you expect. These numbers aren't much use in isolation, except for seeing which threads are taking all the CPU time (if any).

The cpu number of the core this thread was last executed on comes next, labeled core.

2012-04-04

gettid on Mac OS

The Linux kernel has a gettid(2) call that returns the current thread's thread id. These numbers can be handy. They're useful for debugging/diagnostic purposes, they're useful in conjunction with other tools, and they're using for pulling stuff out of /proc/<pid>/task/<tid>/.

But what about code that needs to run on Mac OS too?

If your program's single-threaded, you can use getpid(3) instead. This might sound silly, but you might well find that limiting your Mac build to a single thread lets you avoid all kinds of Mac OS pthread woe, and lets you get on with more important stuff. But this won't suit everyone.

If you poke around, you'll see that the Darwin/xgnu kernel actually has a gettid(2) system call. But before you get excited, you'll find that it's completely unrelated to the Linux gettid(2). It returns the "per-thread override identity", which is a uid and gid that a thread might be operating under (like a per-thread setuid(2) kind of facility). No use to us.

If you poke a bit further, you'll find modern kernels have a thread_selfid(2) system call. This gives you the closest equivalent, but the numbers are going to be a lot larger than you're used to on Linux (they're 64-bit integers, and quite high). And this doesn't work on 10.5 or earlier (the system call was first implemented in 10.6).

Speaking of 10.6, there's also a non-portable pthread_threadid_np(3) that turns a pthread_t into a uint64_t. This gives the same values you'd get from thread_selfid(2). Again, this is unsupported in 10.5 and earlier.

So then there's always pthread_self(3). Sure, it returns an opaque type, but you know it's either going to be a thread id itself or, more likely, a pointer to some struct. So cast it to a suitably-sized integer and you're done. You might complain that the numbers are big and unwieldy, but so are your other choices. And at least these ones are portable, not just to old versions of Mac OS but to other OSes too. The values are somewhat useful in gdb(1) too.

The pthread_self(3) pthread_t isn't useful for a managed runtime's thin lock id, but then neither is Linux's gettid(2). If you really need something like that, you're going to have to follow your threads' life cycles and allocate and free ids yourself (either pid style or fd style, depending on whether you value avoiding reuse or smaller values more). So that's another option to consider if you'd like prettier, smaller "thread ids", albeit ones that have no meaning to other tools or parts of the system.

Anyway, here's some example code:
#include <errno.h>
#include <iostream>
#include <pthread.h>
#include <sys/syscall.h>

int main() {
 std::cout << "getpid()=" << getpid() << std::endl;

 std::cout << "pthread_self()=" << pthread_self() << std::endl;
 uint64_t tid;
 pthread_threadid_np(NULL, &tid);
 std::cout << "pthread_threadid_np()=" << tid << std::endl;

 std::cout << "syscall(SYS_thread_selfid)=" << syscall(SYS_thread_selfid) << std::endl;
 return 0;
}

And here's some corresponding example output from a 10.7 system:
getpid()=97626
pthread_self()=0x7fff7932e960
pthread_threadid_np()=2750350
syscall(SYS_thread_selfid)=2750350

2012-03-09

operator<< and function pointers

This one fools me all the time. Maybe if I write it down I'll remember.

What does this code output?
#include <iostream>
int main() {
 std::cout << main << std::endl;
 return 0;
}

When I see a function pointer shown as "1", my first thought is "Thumb2 bug". ARM processors use odd addresses to mean "there's Thumb2 code at (address & ~1)", so a code address of 1 looks like someone accidentally did (NULL | 1).

What's really happened here, though, is that several design decisions have conspired to screw you. Firstly, function pointers aren't like normal pointers. If you reinterpret_cast<void*>(main), say, you'll get an address. (Though what exactly you get the address of in C++ can get quite interesting.) Then the ancient evil of the implicit conversion to bool comes into play, and since your function pointer is non-null, you have true. Then there's an operator<<(std::ostream&, bool), but the default stream flags in the C++ library shows bool values as 1 and 0. You need to explicitly use std::boolalpha to get true and false.

So what you're really seeing here is "you have a non-null function pointer". Which is almost never what you intended.

2012-01-01

Beware convenience methods

The Android documentation links from various methods to Beware the default locale. It's great advice, but most people don't know they need it.

I've never been a great fan of convenience methods (or default parameters), and methods that supply a default locale are a prime example of when not to offer a [supposed] convenience method.

The problem here is that most developers don't even understand the choice that's being made for them. Developers tend to live in a US-ASCII world and think that, even if there are special cases for other languages, they don't affect the ASCII subset of Unicode. This is not true. Turkish distinguishes between dotted and dotless I. This means they have two capital 'I's (one with and one without a dot) and two lowercase 'i's (one with and one without a dot). Most importantly, it means that "i".toUpperCase() does not return "I" in a Turkish locale.

The funny thing is, toLowerCase and toUpperCase just aren't that useful in a localized context. How often do you want to perform these operations other than for reflective/code generation purposes? (If you answered "case-insensitive comparison", go to the back of the class; you really need to use String.equalsIgnoreCase or String.CASE_INSENSITIVE_ORDER to do this correctly.)

So given that you're doing something reflective like translating a string from XML/JSON to an Enum value, toUpperCase will give you the wrong result if your user's device is in a Turkish locale and your string contains an 'i'. You need toUpperCase(Locale.ROOT) (or toUpperCase(Locale.US) if Locale.ROOT isn't available). Why doesn't Enum.valueOf just do the right thing? Because enum values are only uppercase by convention, sadly.

(The exception you'll see thrown includes the bad string you passed in, which usually contains a dotted capital I, but you'd be surprised how many people are completely blind to that. Perhaps because monitor resolutions are so high that it looks like little more than a dirt speck: I versus İ.)

In an ideal world, the convenience methods would have been deprecated and removed long ago, but Sun had an inordinate fondness for leaving broken glass lying in the grass.

The rule of thumb I like, that would have prevented this silliness in the first place, is similar to Josh Bloch's rule of thumb for using overloading. In this case, it's only acceptable to offer a convenience method/default parameter when there's only one possible default. So the common C++ idiom of f(Type* optional_thing = NULL) is usually reasonable, but there are two obvious default locales: the user's locale (which Java confusingly calls the "default" locale) and the root locale (that is, the locale that gives locale-independent behavior).

If you think you're safe from these misbegotten methods because you'd never be silly enough to use them, you still need to watch out for anything that takes a printf(3)-style format string. Sun made the mistake of almost always offering those methods in pairs, one of which uses the user's locale. Which is fine for formatting text for human consumption, but not suitable for formatting text for computer consumption. Computers aren't tolerant of local conventions regarding the use of ',' as the decimal separator (in Germany, for example, "1,234.5" would be "1.234,5"), and it's surprisingly easy to write out a file yourself and then be unable to read it back in! (There are a lot more of these locales, though, so in my experience these bugs get found sooner. The Enum.valueOf bug pattern in particular regularly makes it into shipping code.)

Where locales are concerned, there's really no room for convenience methods. Sadly, there's lots of this broken API around, so you should be aware of it. Especially if you're developing for Android where it's highly likely that you actually have many users in non-en_US locales (unlike traditional Java which ran on your server where you controlled the locale anyway).

2011-12-23

ThinkPad "plugged in, not charging"

The web is full of all kinds of advice if you find yourself with a ThinkPad running Windows 7 that refuses to charge. The message "plugged in, not charging" being what you see in the tool tip if you hover over the battery icon in the system tray. (The Windows battery icon, that is, not the Lenovo one next door. I love the way Lenovo ship their own battery and wifi widgets, so you get two of each, and I really love the way that -- unlike the Microsoft ones -- the Lenovo ones get stretched and blurry at screen resolutions that Lenovo ship. Nice touch. With attention to detail like that from their competitors, it's no wonder Apple has such a hard time in the laptop market.)

Anyway, none of the advice I saw on the web helped (though all the BIOS updating and other fun was hopefully not completely wasted). The trick turned out to be that there's a little catch on the underside of the laptop, next to the battery. When you push the battery into place, that catch goes from "unlocked" to somewhere in the middle, just by the mechanical force of battery insertion. But you actually have to physically move the catch all the way over to the "locked" side before you can charge. There's nothing in the UI to indicate this, not even if you run the system tests.

Like I said: with attention to detail like that from their competitors, it's no wonder Apple has such a hard time in the laptop market.

2011-09-22

How can I get a thread's stack bounds?

I've had to work out how to get the current thread's stack bounds a couple of times lately, so it's time I wrote it down.

If you ask the internets, the usual answer you'll get back is to look at the pthread_attr_t you supplied to pthread_create(3). One problem with this is that the values you get will be the values you supplied, not the potentially rounded and/or aligned values that were actually used. More importantly, it doesn't work at all for threads you didn't create yourself; this can be a problem if you're a library, for example, called by arbitrary code on an arbitrary thread.

The real answer is pthread_getattr_np(3). Here's the synopsis:
#include <pthread.h>

int pthread_getattr_np(pthread_t thread, pthread_attr_t* attr);

You call pthread_getattr_np(3) with an uninitialized pthread_attr_t, make your queries, and destroy the pthread_attr_t as normal.
pthread_attr_t attributes;
errno = pthread_getattr_np(thread, &attributes);
if (errno != 0) {
  PLOG(FATAL) << "pthread_getattr_np failed";
}

void* stack_address;
size_t stack_size;
errno = pthread_attr_getstack(&attributes, &stack_address, &stack_size);
if (errno != 0) {
  PLOG(FATAL) << "pthread_attr_getstack failed";
}

errno = pthread_attr_destroy(&attributes);
if (errno != 0) {
  PLOG(FATAL) << "pthread_attr_destroy failed";
}

Note that you don't want the obsolete pthread_attr_getstackaddr(3) and pthread_attr_getstacksize(3) functions; they were sufficiently poorly defined that POSIX replaced them with pthread_attr_getstack(3).

Note also that the returned address is the lowest valid address, so your current stack pointer is (hopefully) a lot closer to stack_address + stack_size than stack_address. (I actually check that my current sp is in bounds as a sanity check that I got the right values. Better to find out sooner than later.)

Why should you call pthread_getattr_np(3) even if you created the thread yourself? Because what you ask for and what you get aren't necessarily exactly the same, and because you might want to know what you got without wanting to specify anything other than the defaults.

Oh, yeah... The "_np" suffix means "non-portable". The pthread_getattr_np(3) function is available on glibc and bionic, but not (afaik) on Mac OS. But let's face it; any app that genuinely needs to query stack addresses and sizes is probably going to contain an #ifdef or two anyway...

2011-05-29

Language-induced brain damage is better than the alternative

Whenever old uncle Edsger had had a skinful, he'd rant about the harmful effects of bad languages on programmers. As a kid, I was never particularly convinced. He was too old to be all that concerned about C, and Ada hadn't even been invented yet, let alone Java, so his rants about Fortran, PL/I, Cobol and APL all seemed comically anachronistic. The languages he'd talk about were all pretty moribund anyway, at least for the purposes of a software engineer (as opposed to, say, a physicist).

BASIC, his other bête noire, never really seemed that bad. I grew up with a pretty good BASIC, the main deficiency of which seemed to be the lack of a garbage collected heap (except for strings, which were special). Even as a kid, it was pretty clear that fixed-size arrays were evil, so it was distressing that one's language forced one into the habit. But to me, that disproved Ed's claim: I was clearly a child of the BASIC era, and yet wasn't I sophisticated enough to recognize the problems? Wasn't this ability to recognize the flaws of one's language almost a test of one's latent aptitude, and thus useful in distinguishing those with real potential from those without?

In the years since, I've used a lot of languages. It's hard to imagine a well-rounded programmer who hasn't been exposed to an assembly language, a Lisp, Haskell or an ML, SQL, C, C++, and a Java-like managed language. And it's probably a shame too few encounter a Smalltalk. Even if you don't like these languages, and even if you wouldn't use them commercially, I think they each influence the way we think about computation and programming. And I think that that makes us better at doing our jobs, regardless of which language we're actually using.

(I deliberately omitted logic programming languages -- both deductive and the even less common inductive -- because if they did have an effect on me or my thinking, I've absolutely no idea what it was, and if they didn't I've absolutely no idea what I've missed.)

So it seems to me like there's a trade-off. Yes, learning a new class of language will change the way you think, but it will be both for better and worse. I don't think you can avoid this, and I think that deliberately remaining ignorant is worse than just accepting the mental scarring as a fact of life. Hell, I even think that learning absolutely appalling languages like Ada, S, and Javascript is an important learning experience. Those who cannot remember the past are condemned to repeat it.

But what I think is really interesting, and another reason it was hard to believe Ed's claim, is that pretty much by definition you can't see the damage a language does to you as clearly as you can see the good. You're likely to remember that language X taught you Y, but you don't even know that it failed to expose you to Z. So back in my BASIC days, I never bemoaned the lack of a sequence type or a map type. I almost missed the former, but would have been over-specific in my demands: I wanted to dynamically size arrays. What I thought I wanted was something like C's realloc(3), not C++'s std::vector. It wasn't until I was a C programmer and had realloc(3) that I realized how small an advance that is, and it wasn't until I was a C++ programmer that I realized that, really, I wanted a managed heap. (Not always, of course, because someone has to implement the managed language's runtime, but another thing that learning plenty of languages teaches you is the importance of always using the highest-level one you can afford for any given task.)

I was reminded of this topic recently when someone sent round a link to a Javascript x86 emulator. The interesting part to me was Javascript Typed Arrays. Javascript is very definitely in the class of languages that I'd never voluntarily use, but that doesn't mean I'm not interested to see what they're up to. And, as maintainer of an implementation of java.nio buffers, I was interested to see the equivalent functionality that Javascript users are getting.

If you don't know java.nio buffers, they're one of Java's many ill-conceived APIs. I say this as a fan of managed languages in general, and a long-time Java user, but having both used and implemented java.nio buffers, there's very little love lost between me and them. They're almost exactly not what I would have done. Surprisingly to me, given my admitted dislike of Javascript, Javascript's typed arrays are pretty much exactly what I would have done.

If I were asked to point to the most damaging design error in java.nio buffers, it would be one that I think was a side-effect of the kind of brain damage that C causes. Specifically, I contend that C programmers don't usually have a clear mental distinction between containers and iterators. I think that was one of the things that C++'s STL really taught us: that containers and iterators (and algorithms) are distinct, and that it's important to maintain these distinctions to get a high-quality library. The design of ICU4C suffers greatly from an ignorance of this idea (ICU4C is the C/C++ equivalent of the heinous java.text classes and such all-time API war crimes as java.util.Calendar, brought to you by the same people).

Java programmers ought not to be ignorant of this important lesson, but it took two attempts to get half-decent collections in the library (three if you count the late addition of generics), and iteration has been such a dog's breakfast in Java that I don't think the lesson to students of Java is nearly as clear as it is to students of C++.

(Dog's breakfast? Enumeration versus Iterator versus int indexes; raw arrays versus collections; the awful and verbose Iterator interface; and most of all the modern Iterable turd which makes the "enhanced" for loop less generally useful than it should have been and encourages the confusion between collections and iterators because the modern style involves an anonymous and invisible iterator. From arguments I've had with them, I think those responsible were hampered by the brain damage inflicted C and their ignorance of C++, an ignorance of which they're bizarrely boastful.)

But java.nio buffers are far far worse. There, rather than offering any kind of iterators, the collections themselves (that is, the buffers) have an implicit position. (Buffers have other state that really belongs in an iterator, and that is inconsistently inherited by related buffers, but that's beyond the scope of this discussion.) You can simulate iterators by making new buffers (with Buffer.duplicate, say) but it's awkward and ass-backward, leading to ugly and intention-obscuring calling code, and leading to another generation of programmers with this particular kind of brain damage.

(At this point you might argue that the ideal model is one of collections and ranges rather than iterators, since C++ iterators do tend to come in pairs, and from there you might argue that a range is really just another way of expressing a view, and from there that a view is best expressed as a collection, and from there that the containers-are-iterators model I'm complaining actually makes sense. It's one of those "how did we get into Vietnam"-style arguments, where any individual step isn't entirely unreasonable in itself, but where the final result is utterly fucked. The problem here being not so much a land war in Asia but having all collections have an implicit position to support iterator-style usage. Which in practice means that you've got a new orthogonal axis of "constness" to worry about, and that it's a lot harder to share containers. It's actively working against what I think most people consider to be one of the big lessons Java taught us: design for immutability. In a functional language, always working with views and relying on referential transparency might be fine, but Java is not that language, and many of the mistakes in the collections API are, I think, down to trying to pretend that it is. Which I hope makes it clear that I'm not blaming C any more than I'm blaming Haskell: I'm just giving examples of mistakes caused by transferring concepts into situations where they no longer make sense.)

The Javascript DataView class is sorely missing from java.nio's mess too. It's a really common use case that's very poorly served by java.nio. My java.nio implementation has something similar internally, but it's really nice to see the Javascript guys exposing what appears to be a lean, sane, task-focused API.

I do think there's a really nice comparative programming languages book to be written, and I think one of the most interesting chapters would be the one about iteration styles. I don't know whether it's surprising that something so fundamental should differ so wildly between languages (I didn't even touch on the Smalltalk style, which is something completely different again from any of the styles I did touch on), or whether it's exactly in the fundamentals that you'd expect to find the greatest differences.

If this is the brain-damage uncle Ed was so keen to warn me about, all I can say is "bring it on!". As far as I can tell, more different kinds of brain damage seems to lead to better outcomes than staying safe at home with the first thing that ever hit you over the head.

2011-05-19

Fuck Sony

So the only reason I use my PS3 any more is for Netflix. It sucks hard when it comes to anything else. It's a crap DVD player compared to the 360, cross-platform games are usually better on the 360, the platform exclusives are usually more interesting on the 360, and the PS3 doesn't even work as reliably with my Harmony remote as the 360 does.

And that's ignoring the PS3's UI shitfest.

There's nothing wrong with the 360's Netflix player (though it does tend to lag behind the PS3's because it's beholden to Microsoft's slow OS update schedule rather than downloading each time you start it). I use the PS3 for Netflix because I resent paying Microsoft $60/year protection money for Netflix on the Xbox 360.

The flaw in my logic is that paying Microsoft's protection money only upsets me one day each year. Using the PS3 upsets me every single night.

If I'm not being forced to install some stupid update that benefits Sony but not me, I'm being forced to fail to log in twice to their broken online gaming network so I can use Netflix, the only thing on the PS3 that doesn't suck.

Tonight, I'm told my password is no longer valid. (A not unreasonable thing to declare when your broken online gaming network has just lost millions of your users' passwords and credit card numbers.) I'm told that a link has been mailed to my email address. I wait for it to arrive, click it, and get:
PLAYSTATION



Site Maintenance Notice

The server is currently down for maintenance.

We apologize for the inconvenience. Please try again later.



メンテナンスのお知らせ

現在、サーバーのメンテナンス中です。

大変申し訳ございませんが、しばらくしてから再度接続してください。

So now my PS3 is completely useless. I've no idea when they'll fix this, but if it's anything like as fast as they fixed their last network problem, I've got about 30 days to wait.

Typical fucking Sony bullshit. And I'm finally sick of it. The PS3 is dead to me. I don't know whether I'm actually going to support Microsoft's protection racket, but I'd rather watch Netflix on my Nexus S than put up with any more of the PS3's passive-aggressive nonsense.

2011-05-15

Ubuntu 11.04

If Mac OS is the continuing evolution of Steve Jobs' vision of how we should use our computers, it's becoming increasingly clear that Ubuntu is Mark Shuttleworth's indirect request that we all just fuck off and get ourselves an OS from someone who actually gives a shit.

Rise
I was a big fan of Ubuntu in the beginning. I liked Debian in principle, but hated having to choose between the "stable" and "testing" branches, the former of which was literally years out of date, while the latter was too unstable for my taste (leading me to dub the choice "stale" or "broken"). Ubuntu at the time seemed to strike a happy medium: a reasonably well-tested 6-month snapshot of Debian "testing". As far as I recall, my only real complaint in the early days was that its color scheme had been decided upon by someone we can only assume was legally blind. Turd brown with safety orange highlights: no sighted person's first choice.

It also seemed, in those early days, as if Canonical was adding some value. They were acting as editors, shielding us from the petty internecine freetard religious wars. So, for example, those of us who just wanted to be able to play our mp3s didn't have to have an opinion on exactly which of the 100 not-quite-shippable music apps to choose, nor did we have to trawl through them all trying to find one that we'd consider acceptable: someone at Canonical had made a good-enough choice for us.

Decline...
Then things turned bad. Each release was less stable than the last. Only the LTS ("Long Term Support") releases were even half-way reasonable, and then they started fucking them up too, changing major components shortly before release, swapping in things that couldn't be considered stable. (And, of course, the user who restricts themselves to LTS releases gets to relive the old Debian "stable" days. Given that Debian is no longer as pathologically bad at shipping as it once was, such a user would have to ask themselves "why not Debian?".)

The usual volunteer disease afflicted Ubuntu too: people would only work on stuff that interested them. Which basically means that the same three components (window manager, desktop, music system) get rewritten over and over and over, each one being replaced before it can actually mature to the state where an impartial observer might call it good.

...and Fall
And now we have Ubuntu 11.04. The worst release yet. A release so bad even noted free software apologist Ryan Paul hates it.

I've no idea what the underlying stuff is like, because the surface layer of crap is so bad that it's taken away all my will to use it, and I'm spending my time surfing the web trying to decide which other distro to jump ship for. (Presumably Debian, but if I'm going to go to all the trouble of reinstalling, I may as well do the legwork.)

Misguided netbook focus
What sucks? There's yet another implementation of a dock from someone who appears to know nothing of the competition that can't be gleaned from screenshots. The old task-bar-thing has moved to the top of the screen (and apparently can't be moved back). The old menus are gone, and so are the buttons representing windows (the latter of which never worked very well anyway, compared to Mac OS or Windows). My system monitor and weather thingies disappeared (and if they can be added back, it's not in any way I can find), the rather nice world map used for the world clock is gone, my launcher for Chrome was replaced by Firefox and random crap I've never used like OpenOffice (and if I can add my own launchers, I couldn't work out how). The replacement for the apps menu appears to be an enormous search box that -- despite using almost a quarter of the area of my 30" display -- somehow only manages to show four apps at a time.

(Despite all this upheaval, there's no attempt to introduce users to the new system.)

The reason for moving the task-bar-thing to the top of the screen is because they've tried to switch to a Mac-style screen menu bar (rather than a Windows-style per-window menu bar). Unfortunately, this doesn't work with any of the apps I actually use. The only thing I've found that it did work with was the on-line help which I tried to use, but that inexplicably starts in some kind of full-screen mode, making it really frustrating to try to actually follow the instructions in the help.

I'm sure some of this must be semi-reasonable on a 10" netbook screen, but can only assume that none of the freetards responsible was able to get their mothers to buy them a 30" display. For example, even on Mac OS, the per-screen menu doesn't work very well on a 30" display. The screen's just too damned big.

ChromeOS: the netbook done right
But why would I be running Ubuntu on a netbook? Why wouldn't I be running ChromeOS? The only reason I can think of is if the netbook was my only computer. But that would be pretty stupid for the kind of person who even considers Linux. Sure, I have the Linux kernel on my Android phone, my Android tablet, my ChromeOS netbook (sorry, "Chromebook"), and my big-ass make -j16 desktop. But there's only one of those devices I'd consider using a Linux distro or desktop on, and honestly that's only for lack of an alternative.

I was hugely skeptical of ChromeOS until I acquired a Cr-48 and started using it. It's replaced my MacBook Pro at work. It hasn't replaced any of my Android devices, nor my work or home desktops, but that's fine and hardly unexpected. An Android-based netbook might be an interesting idea, but it would represent a different trade-off. For example, ChromeOS' multi-account model is its multi-user model. Pro: you can safely let friends or strangers log in to your Chromebook. Con: if you personally have multiple accounts (one for work, one for talking to the wife, and one for talking to the mistress, say), it's awkward to switch between them because you have to actively log back in. Android doesn't have a multi-user model, but supports multiple accounts being logged in simultaneously. Pro: you don't have to log in and out. Con: you can't log in and out, so an Android device is something you no more want to hand out than you would your wallet.

This whole Ubuntu netbook mania just seems like a way to screw your real users with no realistic hope of gaining new users. Not happy ones, anyway. Sadly, it looks like we're going to have this stuff forced down our throats whether we like it or not; GNOME Shell looks to be pretty much the same.

A work-around
As a work-around until you install something less lossy, here's how to go back to the pre-11.04 desktop. Click the "power off" button to get to "System Settings". Why wasn't I able to find that myself? I must be stupid, not trying the "power off" button!