« July 2005 | Main | September 2005 »

August 30, 2005

Attack of the Killer GPUs

Listen to this articleListen to this article

After reading about the latest PlayStation, XBox, GameCube, etc. I was struck by how much raw processing power these machines have and how they manage to deliver such massively parallel computing at relatively low prices. For example, according to an article I read recently in Popular Science, the latest PlayStation sports nine dual-core processors, Rambus XDR RAM (apparently supporting data rates of 25.6 GB/sec) and a Rambus IO chip that supposedly moves data around at 76.8 GB/sec! All for what, a couple of hundred dollars?

Of course these machines sell in the millions and you pay through the nose for the games themselves but even still.

Then this morning, my brother forwarded me a link to an interesting article on Nvidia:

...in a recent contest to build the world's fastest database server, the winner was a university professor who ported SQL software to run on an Nvidia GPU...

I can't vouch for the facts of said article but if that quote is anything to go by, maybe all that wasted (oops I mean un-tapped) processing power might get some interesting use.

Pull Me Push Me Anyway You Want Me

Listen to this articleListen to this article

I recently needed some code to parse text streams. Conceptually, the logic was pretty simple: break a stream into words and combine consecutive words into "phrases". So for example, the stream "have a nice day" might be broken into the phrases: "have", "have a", "a", "have a nice", "a nice", "nice", "a nice day", "nice day" and "day".

The problem is that I'm not going to own this code in the end, someone else is (permission was given to publish the code) and after numerous consulting gigs over the years, I've become very careful to avoid leaving behind any Alien Artifacts. So I produced two versions, each producing identical output but based on very different designs, in the hope that at least one of them would fly.

I'll start with my first approach which, IMHO, is pretty easy to understand--well I wrote it so I guess it is for me anyway--and an interface that accept tokens as they are parsed (a Visitor of sorts):

public interface TokenVisitor {
    public void onToken(String token);
}

Classes that implement TokenVisitor will be notified anytime a token becomes available for processing.

Next, we want to create phrases from tokens and print them to the console so, we also need a class that takes the tokens it receives, creates phrases of between 1 and maximumPhraseSize tokens and sends them to another visitor (a Decorator):

import java.util.LinkedList;
import java.util.Queue;

public class PhraseBuilder implements TokenVisitor {
    private final Queue<StringBuilder> builders = new LinkedList<StringBuilder>();
    private final TokenVisitor output;
    private final int maximumPhraseSize;

    public PhraseBuilder(TokenVisitor output, int maximumPhraseSize) {
        assert output != null : "output can't be null";
        assert maximumPhraseSize > 0 : "maximumPhraseSize can't be < 1";

        this.output = output;
        this.maximumPhraseSize = maximumPhraseSize;
    }

    public void onToken(String token) {
        assert token != null : "token can't be null";

        for (StringBuilder builder : this.builders) {
            builder.append(' ').append(token);
            this.output.onToken(builder.toString());
        }

        this.output.onToken(token);

        this.builders.add(new StringBuilder(token));
        if (this.builders.size() == this.maximumPhraseSize) {
            this.builders.remove();
        }
    }
}

And finally, some sample usage:

    Lexer lexer = new Lexer(stream, new PhraseBuilder(new TokenVisitor() {
        public void onToken(String token) {
            System.out.println(token);
        }
    }, 3));

    lexer.run();

Here, an instance of the Lexer class (not shown for the sake of brevity) simply breaks stream into words (tokens) and calls TokenVisitor.onToken() passing each one in turn. The phrase builder acts as the first level visitor and passes its results on to the next level visitor, an anonymous inner class that simply prints each token to the console.

For many people it seems, this push style of processing is not only unfamiliar, but downright peculiar. When I show this style of code to developers--especially junior developers and non-technical people--they find it hard to grasp. Not so much the logic in PhraseBuilder but what stumps many people is the "complexity" of the overall "pattern" and in particular the usage. For many it seems, this approach is all a bit "backwards".

So for comparison, here's an example of a more conventional pull mechanism, starting with the interface:

public interface TokenStream {
    public String nextToken();
}

This time we have an interface which we can call to get (pull) the next token rather than be notified (push) as in the previous example. The method nextToken() returns null to signify the end of the stream--no more tokens.

Next up, the phrase builder. Again, we'll implement the interface--to allow chaining--but this time we're relying on a pull rather than push mechanism to get tokens:

import java.util.LinkedList;
import java.util.Queue;

public class PhraseBuilder implements TokenStream {
    private final Queue<StringBuilder> builders = new LinkedList<StringBuilder>();
    private final Queue<String> phrases = new LinkedList<String>();
    private final TokenStream input;
    private final int maximumPhraseSize;

    public PhraseBuilder(TokenStream input, int maximumPhraseSize) {
        assert input != null : "input can't be null";
        assert maximumPhraseSize > 0 : "maximumPhraseSize can't be < 1";
        
        this.input = input;
        this.maximumPhraseSize = maximumPhraseSize;
    }

    public String nextToken() {
        return this.hasNextToken() ? this.phrases.remove() : null;
    }

    private boolean hasNextToken() {
        if (this.phrases.isEmpty()) {
            makePhrasesWithToken(this.input.nextToken());
        }

        return !this.phrases.isEmpty();
    }

    private void makePhrasesWithToken(String token) {
        if (token != null) {
            this.builders.add(new StringBuilder());

            for (StringBuilder builder : this.builders) {
                if (builder.length() > 0) {
                    builder.append(' ');
                }
                builder.append(token);

                this.phrases.add(builder.toString());
            }

            if (this.builders.size() == this.maximumPhraseSize) {
                this.builders.remove();
            }
        }
    }
}

Holy schmokes! That's a whole lotta code with multiple queues and extra private methods. Surely it can't be that complicated?

Ok, so how about some sample usage:

    PhraseBuilder builder = new PhraseBuilder(new Lexer(stream));
    String token;
    while ((token = builder.nextToken()) != null) {
        System.out.println(token);
    }

Sheesh! That's pretty simple. Far simpler than the code in the first example and pretty obvious what it's doing really and when I show this kind of code to people, they tend to respond with "Oh, I see. That makes sense."

Considering the relative complexity of the second phrase builder to the first, I find this all somewhat odd: the original example took me about ten minutes to code up and test; the second probably around twenty—at first I tried to do it from sratch but I gave up in the end and resorted to a brute-force conversion approach not dissimilar to that required to convert a recursive algorithm to an iterative one.

In fact, push versus pull is very similar to recursive versus iterative: people tend to have the same comprehension difficulties with recursion as they do with a push-style calling mechanism. It's a strange thing really because although conceptually simpler, an iterative approach can often be much harder to implement than a recursive one; similarly, a pull-mechanism can be harder to implement than a push-mechanism.

It should be obvious by now that I have a preference for recursion and push-style processing. For one thing it removes lots of getXxx() methods which is no doubt why I like closures in languages such as Smalltalk, Ruby, Groovy, JavaScript, etc. I also find it forces me to create lots of little classes that do one thing and do it well.

That said, I can also see why (and under what circumstances) pull is more attractive: it's usually easier to manage flow-control than with push. Often the difference between the two comes down to where state is being maintained: In the case of pull, state is maintained inside the stream; for push, state is maintained inside the parser. No doubt why many people have switched from using push-parsing to pull-parsing for XML.

Updated (1 September 2005): Thanks to Kris for pointing out the typos in my code. I had copied the examples from and made some on-the-fly modifications to reduce their size but it seems I missed a few things—oops.

August 25, 2005

Godwin's Law of Java

Listen to this articleListen to this article

While reading A case against Annotations, I couldn't help but laugh-out-loud at this particular response:

RoR is quickly becoming the Godwin's Law of Java language related discussions:

"As an online Java discussion grows longer, the probability of a comparison involving Ruby or RoR approaches 1 (i.e. certainty)." -- Marc Stock

Some others off the top of my head:

  • Any editor and Emacs;
  • Any programming language and Smalltalk;
  • Windows and Linux;
  • ...?

Having never heard of the Law before, I did a bit of reading and found, among other things, an FAQ and a paper by Mike Godwin himself.

August 24, 2005

PGP for Mac Mail

Listen to this articleListen to this article

If you've ever needed (perhaps need is too strong a word, how about wanted) to digitally sign—or encrypt for that matter—your emails from within the Mac Mail client, it's pretty simple. Even though there are plenty of mail applications that support PGP, I've grown fond of Mail.app so this morning—for no real good reason—I installed a few plugins, etc. and was up and running in literally 5 mins (ok 10 mins—I didn't have a clue what kind of signature I should use).

First go and get GPGMail. The download is a .DMG file and the instructions on the web site are easy to follow. In addition to GPGMail, you'll also need GNU PG for the Mac (GPGMac and GPG Keychain Access as a minimum).

I also found I needed to go into the Preferences>PGP>Composing and switch on "By default, use OpenPGP/MIME". This allows signed messages to be sent using MIME rather than the old-school format which surrounds the text in the signature and makes it look as though your email was sent through some kind of full-on nerdifier—a bit scary for mum.

Now that I have the ability to sign emails I'll just, well, probably never use it really—quite frankly, I can't imagine anyone being bothered to impersonate me in an email—but at least I feel "safer" LOL.

So anyway, here is my PGP key (valid until August 22, 2006) for anyone who cares:

Key ID: 0xD56C95B07EA87B26
Key Type: RSA
Expires: 2006-08-23
Key Size: 2048
Fingerprint: AF94 4D3E F229 1A40 4F79 17DD D56C 95B0 7EA8 7B26
UserID: Simon Harris <simon@redhillconsulting.com.au>

-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: GnuPG v1.4.1 (Darwin)

mQELBEMLpCoBCAD1qKqEmXhnbg8unu7n3wL8d8qyqu9M7iIhLMn6ZIxHPe91vXCi
3NxCu9J5p+nenKcwyPV4i/TA4G6lr6hGAv6yOkCKXg/kMX/oPoqVALBnzz+NNXXO
v9xZ1/DnQuyMXj/JP3u/rMrGErO5BEq+RtxQ3BwdbF8TsEktaaIrPPG/ZRWlSSCn
FhTbS4F+J8bNNgmB2M8AYbk5F2FJc68ikSDDKz28F/pF1Xal+O/s3nVtXgQkf6o5
sV/YnIGogDP419XQ8C9tb4dqoe8oR8v25g6umNEDcUMtOS3Utyds2q7mfdlczjAu
YpixCLk8QMbEfsaQqGJU7b7YjK9iaVJqS/Z3AAYptC1TaW1vbiBIYXJyaXMgPHNp
bW9uQHJlZGhpbGxjb25zdWx0aW5nLmNvbS5hdT6JATcEEwECACEFAkMLpCoFCQHf
4gAGCwkIBwMCAxUCAwMWAgECHgECF4AACgkQ1WyVsH6oeyZa4gf9Fus1SDOwBYG6
RLiQomXWhfHibGZnrssw9ECemI6I81kgKC6rd+srxbiKit09TIMIUzZ/oecNVtxg
80rgbYsOT4EGniq/As5c6xfYNcxwgGW00Xf6txvMGCRzkierHWlE0KajOW94AnuA
tzHC9vsPVxTjt4dM08IHWS9VeuqGq8ULokcHh9uF4e24s/maJFUGikYm2dACKS8v
NPImFHUcV17pTm4gptag4bm1+KmFWS1wnUS/I1jfmUc+xJOrXFRadkEbiYJEi1aa
PyvgyvXzupia6RDyMvPfUILZLO1L4be/KGf6R6jdhE4T+9U4dNHbGnukIqkIQLRx
gNqhtklgkA==
=2DOr
-----END PGP PUBLIC KEY BLOCK-----

Of course it occurs to me that someone could also spoof my blog and change the public key here but again, I'm thinking people have better things to waste their valuable time on like say, RAD for example ;-).

August 23, 2005

Invasion of the Battery-Life Snatchers

Listen to this articleListen to this article

Over the past 6-9 months, I've been doing a lot of development using my PowerBook running on battery. For the most part it works really well: The performance is just fine (with some tweaking of the power-saving settings), giving me around 2 hours of editing, browsing, emailing, etc. That is unless I'm developing code using IntelliJ.

When running in the foreground, IntelliJ seems to use anywhere between 3.0% and 6.0% of the CPU. Not too bad you might think but that is when I'm not doing anything with it. Even when IntelliJ is in the background -- either hidden or minimized -- it still uses around 3.0% of the CPU as this screen-shot from top clearly shows:

     PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
    1259 top         14.5%  0:29.70   1    18    22   776K   372K  2.62M  26.9M 
-->  670 idea         2.7% 11:11.22  26   >>>   485   270M  28.3M   231M   831M <--
    1248 Safari       0.2%  0:06.22   6   127   238  8.89M  27.9M  18.8M   243M 
    1247 SyncServer   0.0%  0:03.65   2    53    48  12.2M  3.36M  15.1M  47.3M
    1225 mdimport     0.0%  0:01.11   4    66    67  1.39M  3.27M  5.16M  39.7M
     668 bash         0.0%  0:00.02   1    14    17   220K   820K   904K  27.1M

A quick check of the Java version indicates that I am running JDK 1.5 by default:

simon$ java -version
java version "1.5.0_02"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_02-56)
Java HotSpot(TM) Client VM (build 1.5.0_02-36, mixed mode, sharing)

And a quick check of the About Box in IntelliJ confirms this.

Now admittedly I haven't tried any other Java applications so I'm not too sure if it's a Java issue, a Java on OS X issue, an IntelliJ issue or what but it is driving me nuts because it pretty much turns my 2+ hours of battery life into 30-45 minutes.

So if anyone has any idea how I might get IntelliJ to stop using CPU when it's idle, please, please, please let me know.

Oh and Jeaves, "use Eclipse" and "get a real computer" will not be considered helpful answers ;-)

Update (30 August 2005): Vote for the bug.

August 22, 2005

Scraps of JavaScript

Listen to this articleListen to this article

Not much today but some little bits-and-pieces of stuff I've picked up over the last two weeks. It's been a steep learning curve going from no JavaScript to writing a character-based terminal emulator and it's sure been fun.

Now that I have a modicum of JavaScript under my belt, I think I'll finally take Big Daz' advice and have another look at prototype. I had a quick look initially—on his recommendation—but I was so new to the language that none of it made much sense. FWIW, thanks again to Big Daz, I also spent a lot of time reading quirksmode.org.

Overall, DHTML works really well. The browsers seem to handle running JavaScript pretty well -- the performance is quite impressive—and it's not that difficult to get things to work cross-browser.

So, here we go...

Rather than report an error, most browsers seem to silently fail or at best give a rather less than helpful message -- either by way of a pop-up or a message to the JavaScript console.

The error messages in Mozilla—sent to the JavaScript Console—are far more useful than those generated by Safari -- also sent to the JavaScript Console; MSIE is woeful when reporting (by way of a pop-up) errors in JavaScript files that have been included via <script language="javascript" src="..." type="text/javascript" />.

The debugger for Mozilla works a treat.

Methods can't be named the same as fields—they're really just the same thing anyway. Not really a problem but I was translating some code to JavaScript and it didn't work out as I had planned ;-). Either use an underscore (_) for field names; make sure your method names are always prefixed with a verb such as get/is/etc.; or "allow" direct access to fields. I say "allow" because strictly speaking, it seems that field values are pretty much always accessible anyway.

Closures usually require that you define a variable with a value of this to ensure you can always refer back to the object that owns the function being called:

    var self = this;
    orders.each(function(order) {
        self.process(order);
    });

To ensure your onkeypress event handler is called with an event object, use something like the following to capture the event and then delegate:

    var self = this;
    document.onkeypress = function(event) {
        return self.onkeypress(event ? event : window.event);
    }

To have a keystroke ignored seems to require the following code in your onkeypress event:

    event.cancelBubble = true;
    event.returnValue = false;
    return false;

This works for most everything with the noteable exception of F1 in MSIE which displays help on the browser. To prevent this, try:

    document.onhelp = function() {
        return false;
    };

The Mac generates very odd key codes for things such as Up (63232), Down (63233), Left (63234), Right (63235), etc. I say odd only because I'm used to the ones generated on PCs (38, 40, 37, 39, ...). Ok, so maybe they're not odd just different ;-)

MSIE seems only to allow you to modify the content (DHTML) of a div.

Even though the HTTP protocol allows you to send and receive binary data -- using Content-Type: application/octet-stream and Content-Transfer-Encoding: binary for example --- none of the browsers I tested would reliably allow the JavaScript code to receive that data as a string of characters, even though the browser would quite happily download the content to a file on my hard-disk and allow me to manually construct a string with identical content -- using String.fromCharCode(0x1b) for example.

You can simulate Swings invokeLater by using window.setTimeout() with a time-out value of zero:

    var self = this;
    window.setTimeout(function() {
        self.doSomething(...);
    }, 0);

Most of the browsers I tested didn't seem to support for .. in ..; they all accepted the syntax but produced kooky results when used.

All browsers I tested support using innerHTML to replace the content:

    document.getElementById(id).innerHTML = html;

Using a span with CSS classes is the simplest way to inline style changes:

    <span class="important">...</span>

Handling errors (and for that matter state changes) when using XMLHttpRequest (or in the case of MSIE, ActiveXObject("Microsoft.XMLHTTP")) differs between browsers:

  • Safari and MSIE seem to always set request.status and request.statusText;
  • Netscape/Mozilla seem to sometimes set these variables, yet other times throw exceptions due to the varible having not been defined;
  • Most will allow any old value for request method and URL and notify you via onreadystatechange if there was an error -- such as 404 Not Found for example -- though sometimes (under what circumstances I don't recall) they will throw an exception on open() and sometimes on send().

Both Netscape/Mozilla and MSIE append a CRLF (0x0d0a) to the end of any content you send, leaving the Content-Length field two-bytes short; Safari seems to leave the content as-is. Not really a problem but interesting as the data already had the CRLF as usually recommended for sending content via HTTP.

To change the colour of a horizontal-rule (<hr class="a_style" />) in a browser-neutral manner, you need to set your CSS style as:

hr.a_style {
    background-color: #NNNNNN;
    color: #MMMMMM;
    border: 0;
    height: 1px;
}

You can call a method using a string for the name, allow a switch-like calling mechanism:

    var methodName = (this.insertMode) ? "insert" : "overwrite";
    this[methodName](aCharacter);

More to come I'm sure. Add any more you can think of or let me know of better ways to do these things as I'm truly ignorant in this space.

August 19, 2005

Getting Soft or Getting Smarter?

Listen to this articleListen to this article

Sixteen or so years have passed since I started my first job—a programming gig writing language interpreters and compilers—even though I didn't really know how to program. On day one I was given an intel 80386 machine code instruction manual, a System/370 machine code instruction manual, a desktop PC, a copy of Microsoft assmbler, access to a mainframe via a 3270 terminal emulator and told to pretty-much work it out myself. And that's pretty-much what I did.

Sixteen years later, I'm doing some work for the same company. And while a lot of the industry has move on (or was that gone 'round in circles?) they're still developing the same software all written in assembler and their proprietary development languages. They do great business selling software for 7-digit figures whilst competing with the likes of IBM, CA, etc. The proprietary language they use—all written in assembler remember—runs on Windows, Linux, OS/2 and zOS and has many of the features that "modern" languages have: dynamic dispatch, loose typing, etc. Not bad for a company with three people, doing it all The Wrong Way™.

My job has been two-fold: write some DHTML and JavaScript communicating back and forth with a web server written in the proprietary language; and add some new features to said language. And, all-in-all it's been pretty good fun. The DHTML and JavaScript stuff is easy peasy (like we need an acronym for this stuff, sheesh!) and getting back into assembler programming after all these years of Java has been downright good geeky fun. That is of course until something goes wrong.

JavaScript debugging is still a bit lame, even with the Mozilla plugin but that's easy enough to fix with a few carefully placed calls to alert(). The biggest issue with JavaScript unfortunately is the difference in browser behaviour, especially with respect to keyboard events and XMLHttpRequest. No matter though, we overcame those issues pretty easily and moved on to other things: adding new features to the proprietary language.

It's probably been 10+ years since I did any assembler programming and I feel it; I've become soft. I make my changes and run the application. It touches a bit of memory it shouldn't and BOOM, memory violation. Right. Register dump. Ick! I remember those. Urgh. Start up the debugger—gdb—and let's try that again. BOOM. This time though we're inside gdb so I can start poking around. Right. Where are we? Hmm...let's look at where eip points—linux on pentium hardware. Ok, where is that relative to the load-point for the module. Ok, <snip>

My forensic skills have become soft. I've become too accustomed to exception handling, stack traces, automatic buffer overrun detection, garbage collection, no pointer arithmetic; unlimited numbers of variables, no little- vs. big-endian issues, etc. No peeking into memory to see what the processor stack might look like anymore. Instead, just look at the line numbers and there's most of what you need to know already in front you.

I'm not complaining mind you— like not needing to think so hard about debugging—but it is interesting to see how my skills have changed over the years and how my current development ideas (and ideology?) have been shaped (for better or for worse) by having a knowledge of the underlying execution architecture. Forget 80x86 or System/370 processors, these days Java, .Net (and no doubt countless others) are built on virtual machines with their own instruction sets, stacks, etc. How many developers actually understand the workings of the underlying VM? How much does a developer gain (or lose) by having this understanding?

Update (23rd August 2005): Just to show you how soft I'm getting, I changed the design to one that didn't involve me needing to use gdb LOL.

August 17, 2005

Prevent Mac Droppings on Network Drives

Listen to this articleListen to this article

In my current gig, I'm using my PowerBook in an all M$ Windoze (with the exception of Linux servers) environment and it's working a treat. Except for one thing: all those .DS_Store files.

Finder in Mac OS X (and probably previous versions too I imagine) creates .DS_Store files whenever you browse a directory. The file is seems pretty harmless - apparently it contains little more than window preferences, etc. - and Finder hides them from view.

Unfortunately, it does get the back up of some of the other developers. Not really because the files exist, but more because, for some reason, the files get created with a timestamp in the future which causes all manner of problems for the guys writing their MFC applications - with asserts turned on the applications barf all over the place.

So, I did a quick hunt on google and found this article that explains how to prevent the behaviour. It's a pretty simple fix and involves entering the following at the command-line (possibly followed by a re-boot?):

defaults write com.apple.desktopservices DSDontWriteNetworkStores true

Now if someone could only tell me why I seem to end up with all these ._XXX files lying around that don't appear in Finder, nor when running ls -la but do end up in my zip and tar balls.