The main tool I use for both is Vogar. Think of it as the "run stuff" side of an IDE, but for the command line: you can point it at a Java source file with a main method, at a particular class in a JAR file, at a source file containing JUnit tests, at a source file containing Caliper benchmarks, or a bunch of other stuff... Vogar will just do the right thing. By default, since it was written by Android developers for our own use, code will run on your attached Android device, but you can also use
--mode jvmto run on a desktop JVM, or
--mode hostto run on a desktop dalvikvm (assuming you've built one). (We might want to make that default sensitive to whether you have the Android SDK on your path; the current default makes Vogar seem more Android-specific than it actually is.)
At its simplest, then, it's a replacement for javac test.java && java test that also works on Android (doing all the dexing and pushing and adb shelling behind the scenes). But it goes way beyond that, and all without requiring you to think. You just give it the path to the source, and it works out what to do.
Caliper is the best way to write Java benchmarks. There are examples in the Caliper project itself, but what I want to mention here is the collection of benchmarks in the code.google.com Dalvik project. That's kind of a misleading name for the project; the Dalvik VM doesn't live there. (If that's what you're looking for, the Dalvik VM lives here instead.) The code.google.com project is a dumping ground for other source from the Dalvik and core libraries team. In particular, that's where our benchmarks live.
As you may have noticed, we rewrote the Designing for Performance documentation for Froyo. Previously it was a bunch of stuff that may have been true at some point, but had long ceased to bear any relationship to reality. In Froyo, every single claim in the document is backed by a benchmark to prove (or, in future, disprove) it. You can peruse the "Designing For Performance" benchmarks in your browser.
When we're not writing documentation, we also do a lot of performance work. To guide our efforts, and to be able to recognize whether we're making forward progress, we write Caliper benchmarks. You can see the currently checked-in regression benchmarks.
These aren't always completely up-to-date. I've spent the last couple of weeks rewriting our nio implementation, for example, and my benchmark is huge. But until Sunday it wasn't checked in. That said, because the code.google.com dalvik project is on no particular schedule, you'll likely have access to the any given benchmark before you have access to the improvements it inspired/helped us develop.
I think this collection of benchmarks is interesting for anyone who's interested in Caliper (and anyone writing Java benchmarks – for any platform, not just Android – should be using Caliper), and interesting for Android developers to see what benchmarks we're using to improve the core libraries.
If you want to file a particularly convincing performance bug against the Android core libraries, you can't do much better than providing us with a Caliper benchmark we can point Vogar at.