HashMap type safety leaves something to be desired

I hate puzzles, so I'll tell you straight out that though you might think this code shouldn't compile, it does, and prints "null" twice:

import java.awt.Color;
import java.util.HashMap;

public class test {
public static void main(String[] args) {
HashMap<Color, String> m = new HashMap<Color, String>();
m.put(Color.RED, "red");

HashMap<Long, String> n = new HashMap<Long, String>();
n.put(100L, "red");

I've actually seen someone make this mistake, and want me to explain to them why in hell the Java compiler let them pass an object that wasn't of type K to HashMap<K, V>.get. Wasn't the very reason they rewrote their code to use generics, they asked, to be protected from this kind of mistake?

The reason this compiles is that although put has the signature you'd expect, get (and containsKey, containsValue, and remove) don't. They all take parameters of type Object.

It's the usual reason something's broken: backwards compatibility. Neal Gafter brushes it off thus:

The reason the argument type is Object and not K is that existing code depends on the fact that passing the "wrong" key type is allowed, causes no error, and simply results in the key not being found in the map. This is no worse than "accidentally" trying to get() with a key of the right type but the wrong value. Since none of these methods place the key into the map, it is entirely typesafe to use Object as the method's parameter.

Backwards compatibility is all well and good, but I can hear Stroustrup wondering why we have to pay for things we don't use. I keep my code up-to-date, and yet I have to suffer this breakage for the benefit of the guy who lost his source in 1997? Why don't my "-source 1.5 -target 1.5" arguments to the compiler let me have libraries without bugs or infelicities from years ago?

The "no worse" bit is particularly cheeky, slippery language-lawyer talk, since the two classes of error are not comparable (looking up a non-existant key isn't even necessarily an error, and one of these classes of error is something the compiler should catch for us; no C++ compiler would accept incorrect code like the above). Gafter implies that the sole purpose of generics, and the only guarantee from "type safety" is that you won't have a malformed collection. Which is a good thing, and better than nothing, but disregards the concerns of the users (rather than the authors) of the collections classes.

It's especially cheeky when you consider the second example, where our arch-enemy autoboxing lends a hand. You might think the exact example (autoboxing choosing the wrong type for the literal) is unlikely, and I'd be inclined to agree (though I wouldn't agree that that's an excuse for the compiler to miss it), but I've seen someone make a similar mistake, where they started with an index into an indexed collection and should have got the corresponding element and looked the element up in a hash, but accidentally tried to look up the index in the hash instead. Nonsense code of exactly the kind that generics and static typing are supposed to protect you from, and not so much as a run-time error; just wrong behavior.

Update: Neil Gafter responds:

We would love to have made Map.get and remove take K instead of Object, and we tried that. Unfortunately, it broke lots of genuine, correct, production customer code which depends on the existing behavior, which was after all specified in the interface before it was generified.

He also provides one example of the kind of code that was already out there:

One kind of example goes something like this: You have a system in which you handle lots of objects of type Foo. There is some particular subtype of Foo called Bar that you sometimes cache in a collection. Sometimes some of your Foo objects become "out of date" and have to be flushed from any caches. So you have a

Collection<Bar> barCache;
Collection<Foo> outOfDateThings;

and you just

for (Foo foo : outOfDateThings) barCache.remove(foo);

(Ignoring the fact that I used the new looping construct) This kind of code was clearly correct and typesafe before the collection classes were generified. The code used the collection classes in ways that only depended on specified behavior.

You have to decide for yourself if such examples are convincing.

His most interesting comment, though, was this:

By the way, what you lose in "checking" you gain in flexibility; folks using the (existing) relaxed Map API can actually do useful things that the (hypothetical) stricter version doesn't allow.

Which has me shaking my head again, and Stroustrup wondering why we're all paying for something even when we don't need it. I still think this is an odd way to look at it, because it's the kind of thing you usually hear from the dynamic typing crowd. The word 'checking' in quotes? Argument by appeal to flexibility?

Ricky Clarkson and Thomas Hawtin both think it's an important decision because it makes wildcards more useful than they would otherwise be. Even if there were no legacy code, they want code like



    List<? super Integer>.contains(5)

to work. Hawtin imagines an alternate universe where we could have our cake and eat it, and say something like this:

    public V get (? super K key) { // not valid Java

Given the number of times I've talked about C++ above, it's worth mentioning Stroustrup and Dos Reis' Specifying C++ Concepts, which talks about the author's attempts to describe "concepts" to the computer. (In the C++ world, "concept" is the name for a set of type requirements. At the moment these are described in English, in documentation, for human consumption only. You'll get a compile-time error if a template instantiation leads to invalid code, perhaps because a type you used doesn't support a required operation. Other times you'll just get run-time errors, similar to the problems you can get in Java from bad equals or hashCode implementations.)

My real concern for Java, though, as I said above, is about Java's future when there's this strong promise of backwards compatibility but no corresponding mechanism for fixing old decisions. It would be a shame to have to move house just because the toilet won't flush.

On a lighter note, I was caught out recently myself, too. As I typed something analogous to the following, I thought my editor's code coloring was broken:

public class C {
public static void main(String[] args) {
/* System.err.println("*/"); */

The compiler confirmed, though, that there was an unterminated string. The temptation is to say "stupid compiler!", because there "obviously" isn't: anyone can see that there's a valid string literal in there that just happens to contain the characters that terminate Java block comments.

Neal Gafter recently wrote, in the context of a proposal to add closures to Java, about "Gilad's insistence that we comply with Tennent's Correspondence and Abstraction Principles (described in detail in the now out-of-print book Principles of Programming Languages by R.D.Tennent)". [Embarrassing that a great book you remember from university should now be out of print, even if it was a decade old in your day.] So here, for example, you'd want to be able to block-comment or uncomment any piece of code without changing its well-formedness.

Unfortunately, the compiler doesn't magically know where a comment ends. So it has no choice but to just munch through characters until it next sees "*/", causing it to read the body of main as a comment followed the unterminated string literal "); */ (Java string literals can't contain unescaped newlines, so the lexical analyzer recognizes the problem at that point). Commenting isn't really an abstraction mechanism, but you can see how this might upset Tennent. There's no good solution, though, because block comments are often required precisely because the code they contain isn't well-formed. So we can't expect the compiler to parse the content of block comments looking for string literals.

There's a work-around, of course, that gives you code that works with or without the block comment, but this kind of intrusion is always unwelcome, even when there's no solution (as far as I'm aware) to the underlying problem:

public class C {
public static void main(String[] args) {
/* System.err.println("*" + "/"); */