Tuesday, October 23, 2007

SandR, Java, Perl and Regular Expression

A couple of years ago, I wrote a small utility for searching and replacing text in files. At that time, I was looking for such a tool and found none that's suitable enough for my need. So I wrote it and released it as a open source software (OSS) so that others can freely download it, use it and if needed modify it. I called it SandR (pronounced as sand-arr). It was not hugely popular, people downloaded it sparingly. As of today there are altogether 1,288 downloads. I would love to see that number going up, nonetheless it was satisfying to know at least some people found it useful.

I released it as pre-alpha version, which in software business means, "Feel free to use it, but do expect to see bugs and crashes". Not too many bugs were reported in last two years. So, I decided to upgrade it to "Production/Stable" status. In the process I tweaked the code for minor enhancement. Today, I have requested a release.

The unique feature for SandR is that it supports auto-detection of file encoding. I used the Java port of Mozilla's Character detection algorithm for detecting the character encoding of the files. SandR also supports regular expression for search string, although there are some other similar OSS utilities which provide regex support.


It's really very useful that Java now supports regex or Regular Expression. Previously regex was the power tools for the Perl programmers only. GNU had a C library from regex, but it was really the forte of Perl. So when Java 5 started supporting regex, programmers welcomed it enthusiastically. However, as we delved more into it, we found there are some differences between Perl and Java regex, nothing major though. One conversant in one will have absolutely no difficulty in understanding and using it in the other. But why? Why there has to be two flavors of the same utility, however small may be the difference? Techies and programmers are using regex for ages. They have become very conversant with the Perl type. Then why, oh why, introduce a minor variation? This is so Microsofty. Sun can do better. I haven't tried Java 6 yet since I do not use Java in my day job regularly, but I doubt Sun has changed the regex implementation. Don't know the plans for upcoming Java 7 release. But let's request Sun to abolish whatever minor differences there are between Java implementation of regex with its Perl counterpart. You can do it, Sun.

No comments:

Post a Comment