2004-07-31

A Java implementation of Ruby's gsub

In Ruby, gsub is like Java's String.replaceAll. Ruby has first-class blocks, though, and when you have those, your programming style changes. Kent Beck's exhortation to "do the simplest thing that could possibly work" sounds different when you have first-class blocks. The simplest thing has a habit of being the most general, and it's only when you get tired of some idiomatic usage that you come back and write yourself a convenience routine.

Because of this, Ruby's gsub is a lot more general than Java's String.replaceAll. As an alternative to a replacement string – which is often all you need – Ruby lets you pass a block of code whose result will be the replacement. It's similar to, but nicer than, Perl's /e modifier on the s/// operator. It's also similar to, but a lot more concise than, the sample code in the JavaDoc for Matcher. It also gives a much better implementation of example code in the Java Almanac, a cut-and-paster's compendium of terrible code (an amusing exercise, if it weren't for copyright law, would be to take every example on that site and demonstrate how it should really be done).

Here's my current best attempt to come close to this idiom in Java; spoiled mainly by Java's lack of regular expression literals:

package e.util;

import java.util.regex.*;

/**
* A rewriter does a global substitution in the strings passed to its
* 'rewrite' method. It uses the pattern supplied to its constructor,
* and is like 'String.replaceAll' except for the fact that its
* replacement strings are generated by invoking a method you write,
* rather than from another string.
*
* This class is supposed to be equivalent to Ruby's 'gsub' when given
* a block. This is the nicest syntax I've managed to come up with in
* Java so far. It's not too bad, and might actually be preferable if
* you want to do the same rewriting to a number of strings in the same
* method or class.
*
* See the example 'main' for a sample of how to use this class.
*
* @author Elliott Hughes
* @author Roger Millington
*/
public abstract class Rewriter {
private Pattern pattern;
private Matcher matcher;

/**
* Constructs a rewriter using the given regular expression;
* the syntax is the same as for 'Pattern.compile'.
*/
public Rewriter(String regularExpression) {
this.pattern = Pattern.compile(regularExpression);
}

/**
* Returns the input subsequence captured by the given group
* during the previous match operation.
*/
public String group(int i) {
return matcher.group(i);
}

/**
* Overridden to compute a replacement for each match. Use
* the method 'group' to access the captured groups.
*/
public abstract String replacement();

/**
* Returns the result of rewriting 'original' by invoking the method 'replacement' for each match of the regular expression supplied to the constructor.
*/
public String rewrite(CharSequence original) {
return rewrite(original, new StringBuffer(original.length())).toString();
}

/**
* Returns the result of appending the rewritten 'original' to 'destination'.
* We have to use StringBuffer rather than the more obvious and general Appendable because of Matcher's interface (Sun bug 5066679).
* Most users will prefer the single-argument rewrite, which supplies a temporary StringBuffer itself.
*/
public StringBuffer rewrite(CharSequence original, StringBuffer destination) {
this.matcher = pattern.matcher(original);
while (matcher.find()) {
matcher.appendReplacement(destination, "");
destination.append(replacement());
}
matcher.appendTail(destination);
return destination;
}

public static void main(String[] arguments) {
// Rewrite an ancient unit of length in SI units.
String result = new Rewriter("([0-9]+(\\.[0-9]+)?)[- ]?(inch(es)?)") {
public String replacement() {
float inches = Float.parseFloat(group(1));
return Float.toString(2.54f * inches) + " cm";
}
}.rewrite("a 17 inch display");
System.out.println(result);

// The "Searching and Replacing with Non-Constant Values Using a
// Regular Expression" example from the Java Almanac.
result = new Rewriter("([a-zA-Z]+[0-9]+)") {
public String replacement() {
return group(1).toUpperCase();
}
}.rewrite("ab12 cd efg34");
System.out.println(result);

result = new Rewriter("([0-9]+) US cents") {
public String replacement() {
long dollars = Long.parseLong(group(1))/100;
return "$" + dollars;
}
}.rewrite("5000 US cents");
System.out.println(result);

// Rewrite durations in milliseconds in ISO 8601 format.
Rewriter rewriter = new Rewriter("(\\d+)\\s*ms") {
public String replacement() {
long milliseconds = Long.parseLong(group(1));
return TimeUtilities.msToIsoString(milliseconds);
}
};
result = rewriter.rewrite("232341243 ms");
System.out.println(result);

for (String argument : arguments) {
System.out.println(rewriter.rewrite(argument));
}
}
}