2012-08-31

How to read Dalvik SIGQUIT output

If you're a long-time Java developer you're probably used to sending SIGQUIT to a Java process (either via kill -3 or hitting ctrl-\) to see what all the threads are doing. You can do the same with Dalvik (via adb shell kill -3), and if you're ANRing the system server will be sending you SIGQUIT too, in which case the output will end up in /data/anr/traces.txt (see the logcat output for details).

Anyway, I've found that very few people actually know what all the output means. I only knew a few of the more important bits until I became a Dalvik maintainer. This post should hopefully clear things up a little.

To start with, here's an example from my JNI Local Reference Changes in ICS post:

    "Thread-10" prio=5 tid=8 NATIVE
      | group="main" sCount=0 dsCount=0 obj=0xf5f77d60 self=0x9f8f248
      | sysTid=22299 nice=0 sched=0/0 cgrp=[n/a] handle=-256476304
      | schedstat=( 153358572 709218 48 ) utm=12 stm=4 core=8
      at MyClass.printString(Native Method)
      at MyClass$1.run(MyClass.java:15)

Ignore the Java stack trace for now. If there's demand, I'll come back to that later, but there's nothing interesting in this particular example. Let's go through the other lines...

First, though, a quick note on terminology because there are a lot of different meanings of "thread" that you'll have to keep clear. If I say Thread, I mean java.lang.Thread. If I say pthread, I mean the C library's abstraction of a native thread. If I say native thread, I mean something created by the kernel in response to a clone(2) system call. If I say Thread*, I mean the C struct in the VM that holds all these things together. And if I say thread, I mean the abstract notion of a thread.

"Thread-10" prio=5 tid=8 NATIVE

The thread name comes first, in quotes. If you gave a name to a Thread constructor, that's what you'll see here. Otherwise there's a static int in Thread that's a monotonically increasing thread id, used solely for giving each thread a unique name. These thread ids are never reused in a given VM (though theoretically you could cause the int to wrap).

The thread priority comes next. This is the Thread notion of priority, corresponding to the getPriority and setPriority calls, and the MIN_PRIORITY, NORM_PRIORITY, and MAX_PRIORITY constants.

The thread's thin lock id comes next, labelled "tid". If you're familiar with Linux, this might confuse you; it's not the tid in the sense of the gettid(2) system call. This is an integer used by the VM's locking implementation. These ids come from a much smaller pool, so they're reused as threads come and go, and will typically be small integers.

The thread's state comes last. These states are similar to, but a superset of, the Thread thread states. They can also change from release to release. At the time of writing, Dalvik uses the following states (found in enum ThreadStatus in vm/Thread.h):

    /* these match up with JDWP values */
    THREAD_ZOMBIE       = 0,        /* TERMINATED */
    THREAD_RUNNING      = 1,        /* RUNNABLE or running now */
    THREAD_TIMED_WAIT   = 2,        /* TIMED_WAITING in Object.wait() */
    THREAD_MONITOR      = 3,        /* BLOCKED on a monitor */
    THREAD_WAIT         = 4,        /* WAITING in Object.wait() */
    /* non-JDWP states */
    THREAD_INITIALIZING = 5,        /* allocated, not yet running */
    THREAD_STARTING     = 6,        /* started, not yet on thread list */
    THREAD_NATIVE       = 7,        /* off in a JNI native method */
    THREAD_VMWAIT       = 8,        /* waiting on a VM resource */
    THREAD_SUSPENDED    = 9,        /* suspended, usually by GC or debugger */

You won't see ZOMBIE much; a thread is only in that state while it's being dismantled. RUNNING is something of a misnomer; the usual term is "runnable", because whether or not the thread is actually scheduled on a core right now is out of the VM's hands. TIMED_WAIT corresponds to an Object.wait(long, int) call. Note that Thread.sleep and Object.wait(long) are currently both implemented in terms of this. WAIT, by contrast, corresponds to a wait without a timeout, via Object.wait(). MONITOR means that the thread is blocked trying to synchronize on a monitor, Either because of a synchronized block or an invoke of a synchronized method (or theoretically, on a call to JNIEnv::MonitorEnter).

The INITIALIZING and STARTING states are aspects of the current (at the time of writing) implementation of the thread startup dance. As an app developer, you can probably just chunk these two as "too early to be running my code". NATIVE means that the thread is in a native method. VMWAIT means that the thread is blocked trying to acquire some resource that isn't visible to managed code, such as an internal lock (that is, a pthread_mutex). SUSPENDED means that the thread has been told to stop running and is waiting to be allowed to resume; as the comment says, typically as an app developer you'll see this because there's a GC in progress or a debugger is attached.

Not shown in this example, a daemon thread will also say "daemon" at the end of the first line.

| group="main" sCount=0 dsCount=0 obj=0xf5f77d60 self=0x9f8f248

The Thread's containing ThreadGroup name comes next, in quotes.

The sCount and dsCount integers relate to thread suspension. The suspension count is the number of outstanding requests for suspension for this thread; this is sCount. The number of those outstanding requests that came from the debugger is dsCount, recorded separately so that if a debugger detaches then sCount can be reset appropriately (since there may or may not have been outstanding non-debugger suspension requests, we can't just reset sCount to 0 if a debugger disconnects).

(If there's demand, I'll talk more about thread suspension in another post, including when suspension can occur, and what suspension means for unattached threads and threads executing native methods.)

The address of the Thread comes next, labeled obj.

The address of the Thread* comes next, labeled self. Neither of these addresses is likely to be useful to you unless you're attaching gdb(1) to a running dalvikvm process.

| sysTid=22299 nice=0 sched=0/0 cgrp=[n/a] handle=-256476304

The kernel's thread id comes next, labeled sysTid. You can use this if you're poking around in /proc/pid/task/tid. This is usually the only useful item on this line.

The kernel's nice value for the process comes next, labeled nice. This is as returned by the getpriority(2) system call.

The pthread scheduler policy and priority come next, labeled sched. This is as returned by the pthread_getschedparam(3) call.

The cgrp is the name of the thread's scheduler group, pulled from the appropriate cgroup file in /proc.

The pthread_t for the pthread corresponding to this thread comes next, labeled handle. This is not much use unless you're in gdb(1).

| schedstat=( 153358572 709218 48 ) utm=12 stm=4 core=8

The schedstat data is pulled from the per-process schedstat files in /proc. The format is documented in the Linux kernel tree (Documentation/scheduler/sched-stats.txt):

      1) time spent on the cpu
      2) time spent waiting on a runqueue
      3) # of timeslices run on this cpu
If your kernel does not support this, you'll see "schedstat=( 0 0 0 )".

The user-mode and kernel-mode jiffies come next, labeled utm and stm. These correspond to the utime and stime fields of the per-thread stat files in /proc. On sufficiently new versions of Dalvik, you'll also see something like "HZ=100", so you can double-check that jiffies are the length you expect. These numbers aren't much use in isolation, except for seeing which threads are taking all the CPU time (if any).

The cpu number of the core this thread was last executed on comes next, labeled core.