2009-01-21

<boost/algorithm/string.hpp>

Item 55 of Scott Meyers' (excellent) "Effective C++" is "Familiarize yourself with Boost". Good advice. The trouble is, those crafty Boost buggers keep adding new stuff.

In 2004, for example, they added <boost/algorithm/string.hpp> to Boost version 1.32. I didn't find out until this past weekend when I needed several of the usual string utilities, and didn't have any of many implementations I've written before (or were written for me by other C++ programmers at the various companies I've worked for).

If you've used C++ and Java, you'll know that C++'s std::basic_string is full of orthogonal but mostly useless methods (find_last_not_of, say) and missing most of the really useful stuff you actually want from your strings (no starts_with, say). The sort of string an academic might design.

Java's java.lang.String, on the other hand, is big bag of random really useful stuff (startsWith, say) whose non-orthogonality can drive you mad (no startsWithIgnoreCase, say). The sort of string people who actually write programs would arrive at.

Boost being what it is, you can now have a big bag of orthogonal useful stuff.

So there's starts_with and ends_with, just like Java, but there's istarts_with and iends_with too.

And you don't just get contains, you get icontains too — something I wanted from Java just yesterday.

There are split functions too, including a regular expression variant. There's even a join in case you change your mind and want to put humpty back together again.

There's a full family of nicely orthogonal trim functions. Not that I personally have ever needed anything more than Java's plain old trim, but it brings a gleam to my shiny icicle of a heart to see all the possible variants laid out before me with sane names, neatly arranged like well-designed German surgical instruments designed as a coherent set by someone with a plan. This in stark contrast to the usual slowly-accreted collection of stains in some god-forsaken header file in a directory called something like "common" or "utils", squirted there by a thousand unthinking deadbeat parents who barely remember their progeny themselves.

Boost's case conversion stuff looks a bit dodgy, so you might end up with upset Turks and Azerbaijanis on your hands if you use it (or, by implication, any of the case-insensitive functions) without checking that out first, but otherwise what I used looked pretty sound. (And if you care about languages other than Seventh Edition New Jersey ASCII, you're probably writing something that should be in Java rather than C++.)

Being Boost, these are all templates too, so they'll work just fine with your home-grown basic_string-that-doesn't-own-the-memory-it-points-to, whatever you call it. (I still haven't found that class in Boost anywhere, which is surprising, because I keep tripping over them everywhere else. Google's variant is called StringPiece, which will have British readers rolling in the aisles. The wide-character variant is sadly not called STGaryGlitter.)

Anyway, I'll endeavor not to write my own C++ starts_with ever again, especially on Linux, where it's trivial to add a build dependency on Boost. And if you read this thinking "I knew that already", why didn't you tell me?