« The hidden chambers that nobody can understand | Main | I Suppose I should Be Flattered »

When is a String not a String?

Listen to this articleListen to this article

So I've been stumbling along writing this book and today I wrote some code where, rather than using good old Strings, I instead chose to use CharSequences. Why?

Well for one thing, I wanted to be able to process both Strings and StringBuffers alike and, thanks to JDK 1.4 and JSR-51, that's possible as they both implement CharSequence- very nifty indeed. More importantly though, I wanted to be able to process data from files as well as in-memory.

Now, admittedly I could have written my code to process Readers or InputStreams instead and used a StringReader, etc. But really, when it came down to it, the algorithm I was writing suited strings and characters not bytes and buffers and all that hoo-ha and I wanted to keep it that way.

So, I thought to myself, I would do the opposite. I would write a CharSequence that wrapped a Reader. That shouldn't be too difficult now should it? But then another thought occured to me. Before I leap into writing a whole bunch of code that I'd need to test and *gasp* maintain, maybe someone else had already done the hard work for me; maybe even the JDK!?

A quick flick of the wrist and IntelliJ brought up a class heirarchy for me and what do you know? along with String and StringBuffer was the little used CharBuffer from the NIO package. Another little present courtesy of the JSR-51 group.

A CharBuffer facilitates, among other things, memory mapped I/O, meaning you can address a file on disk as if it were a contiguous array of characters in memory. And, because it conveniently implements CharSequence, can be passed into my code just as any of the purely in-memory implementations.

The other thing I like about CharSequence is that it's an interface. This makes it possible to decorate them and gather performance statistics really easily. Yes, yes, all you AOP weenies sit down and stop waving your arms about. I know your new (ok not so new anymore) fandangled whizzbang toy can do that too. But I don't have to learn a new language or tools to do it. One day perhaps but not this one in particular. Believe me, my paradigm has been shifted quite enough and I'm giving it a rest for a while ;-)

The only thing missing is that, surprisingly, the Character class doesn't implement CharSequence. I'm not sure how useful that really would be but I like the idea if only for completeness sake. It certainly wouldn't be hard.

Anyway, I think I'll be looking for opportunities to use my new best friend where I would usually have used a String. The only caveat is that a String is guaranteed to be immutable while a CharSequence isn't. Oh and you can't be guaranteed that equals will work between them either. No matter, I can still see the glint of those shiny golden nails from here :)

Comments

thanks for that Simon ! great to know.

Very nifty, now that I think of it. More and more often we are finding ourselves dealing with new fangled XML in these modern timeses, and I am often at a lost to decide how to take input as XML without knowing all the usages of a class up front.

CharSequence, you may become my new best friend.

One thing that does worry me though, is as you pointed out, CharBuffer can work with memory mapped IO.

CharSequence specifies toString() which will then push it all out as a dirty old immutable string.

Lazy or tired people (such as myself) might just go "bugger it" and do a "toString" just to get by, and watch the lights dim as the file is memory mapped and then put into a dirty big String for later garbage collection, which is partly what you wanted to avoid in the first place by not passing dirty big strings around.

Michael,

Interesting point. Maybe you could decorate the CharSequence to barf with an UnsupportedOperationException if toString() is called? Then pass it around to whatever methods/classes you like and be comfortable in the knowledge that you'll find out pretty quickly if it's being abused :)

Cheers,

Simon

Yes that would do the trick for sure.
I am probably worried about nothing, but CharSequence is a great idea.

If they had named it something like DynamicString or similar, perhaps it would have grabbed more headlines then it did.

Looks like your book will be interesting - if nothing else, flushing out some neat corners of the basic J2SE API that the rest of us are too busy to explore - will be very useful. There is so much in the basics, yet we seem to spend so much time looking for exotic ways (*cough* AOP *cough*) to solve problems that could be done in an easier fashion.

Honestly, the book isn't that interesting. I think it's a book that is well needed but the bits and pieces such as CharSequence are really incidental.

As for solving problems in an easier fashion, I'm with you 100%.

When I was writing an HTTP parser for NIO, I needed to implement CharSequence over a plain old ByteBuffer. The problem with CharBuffer is it requires 2-byte characters, which isn't appropriate for any sort of protocol parsing. Fortunately my ByteBufferCharSequence made parsing ridiculously easy - I could just scan through the ASCII input using regex.

Internationalisation has not extended to text based protocols as yet ! (SMTP, HTTP etc) !.
(thankfully - goode olde ASCII still reigns).


Remember that you can always use
CharBuffer cb = Charset.forName("ASCII").newDecoder().decode(byteBuffer);
to decode the ByteBuffer using a specific encoding. Although, to do it properly, you'd need to pass through buffering parameters, and wrap it in a class...

Oh well.

On thinking, why weren't the sequence classes able to be implemented as chains of operations? Or am I missing something...

Post a comment