2004-08-04

When /./ should match '\n'

Every now and again, I come across a situation where I want to match any character, including '\n'. It's happened often enough that I now recognize as I type . that it isn't what I want. Usually, I end up solving my problem a different way.

The other day, though, I was having trouble processing a file. Henry Spencer's POSIX regular expressions library contains a file regcomp.c that has a C-style comment containing a brace. My code was getting confused by this, and it looked like the easiest solution was to strip out C-style comments, like I was already stripping out escaped characters, character and string literals, and C++-style comments.

C++ comments are easy, because you actually want anything but a newline. I could have written .* instead of using a character class here, but I felt that was less obvious than being explicit:

text = text.replaceAll("//[^\\n]*", "_"); // Remove C++ comments.

C comments are tricker, though, because they can span multiple lines. What you want is what Java's Pattern.DOTALL flag, but it would be a pain to switch from String.replaceAll just to use the flag. Luckily, you can use an embedded flag expression:

text = text.replaceAll("/\\*(?s).*?\\*/", "_"); // Remove C comments.

That's all there is to it. And I'm sure I've known this before. I'm hoping that writing it down will help me remember, but I'm not sure I'm not more likely to forget something after I've written it down. And Google isn't indexing me yet.