2008-07-19

Review: "Chances Are: Adventures in Probability"

Statistics and probability have a hard life. They're often counter-intuitive (i.e. "hard"), neither the principles, the methods, or the valid/invalid implications are well understood by the general public, they're often abused to add an air of "science" to help snake-oil salesmen and mistruth-peddlers mislead more convincingly, and many of their truths are deeply unsatisfying to humans because they don't give us the simple intelligible insight we crave.

Especially if you're young, I think, and especially if you can program a computer, this can be hard to take. Any "statistical" or "probabilistic" algorithm is naturally suspect, either seen as a cop-out or just impractical because you don't have the amount of data to work with that the Amazons, Googles, and Netflixes of the world do. And I personally as a kid grew up thinking that these "approximations" were just temporary stop-gaps until we really understood, really had the insight.

Now I'm an old man, I accept that there are many situations in which statistics and probability offer us the best answers we can hope for, or the best answers we could want. I'll never know whether this book, had I read it as a teenager, might have offered me a shortcut between there and here, but I can say that it's an enjoyable read. I've never previously been taken through the chronological development of the field, never been shown the wrong turns that were made, and never understood just how recent so much of the field actually is. (I also never knew why "Student's t-test" had such a silly name. The actual reason is pretty silly in its own way, but probably nothing you'd guess.)

This book won't teach you statistics or probability, but it will give you a better appreciation of the field, its history, and its relevance. It's over-written at times, making too much of an effort to be literary to no great advantage, and with some resulting awkwardness, but these are just a handful of sentences in a 300-page book. I also wonder how well the book works if you go in knowing nothing of the subject matter. As ever, there's conflict between telling things in the order in which they happened, the order in which they're easiest to understand, and in this case there's the added dimension of the order which makes for the best story.

As it happens, the missus bought this book for herself. I don't think I'd have bothered to take a closer look in a bookshop, but since it was lying around at home there was nothing to lose, and I'm glad I took the time.

If you're too lazy to read a book but would like to be convinced of the importance of the field, try subscribing to netflix. Marvel at their "viewers like you" ratings, which, once you rate enough movies, "knows" you better than you know yourself, and is significantly more useful than the average rating of all users. The book doesn't talk about clustering algorithms, but it does touch on this kind of distinction. Netflix does this without being able to give you any insight beyond "you are not a beautiful and unique snowflake", and yet no-one could question its usefulness. Sometimes, kid, insight is neither necessary nor possible, and this can be a good thing, not just an unfortunate roadblock on the path to enlightenment. If you won't take my word for it, read the book.