« September 2004 | Main | November 2004 »

October 31, 2004

Death To Blog Spam Arrgghhh

Listen to this articleListen to this article

I've been using MT-Blacklist for sometime now and while it does a good job of moderating the spam, I'd rather it didn't even get that far. So in a last ditch effort to eradicate comment spam all together, I've just installed a different kind of solution. This plugin puts up a security code graphic that you must enter in order to submit the comment. Although there have been some complaints about this technique on the grounds that it is discrimatory towards people with impaired vision, I'm going to give a whirl anyway and see how it goes. Apparently the guy who wrote the plugin has also recently written a bayesian filter as well but personally, like with MT-Blacklist, I don't have the time to sift through all the comments, deleting the spam.

Update: 1st November 2004 - Seems to be working a treat. I've had not one blog spam comment in the last 24 hours but people have successfully commented manually. I usually get around 6-10 spam comments in the same period.Update: 4th November 2004 - It's amusing to look at my web logs and see all the access attempts from dodgy sites, no doubt attempting to post comment spam and failing dismally!

October 26, 2004

When Corporates Embrace Open Source

Listen to this articleListen to this article

It is common for organisations to justify the use of popular Open Source Frameworks on the basis that developers with these skills are easy to come by. In addition, because the source code is readily accessible, it's easy to make bug fixes and patches whenever needed. This is clearly justification enough that no analysis need be performed in order to ascertain if said framework actually fits the technical requirements of the application.

The next step always seems to be to download the source code and check it into a local repository. Then, have a core group of developers maintain it internally. This team will be responsible for checking out the source code, building it and distributing it to all the other teams ensuring that changes are controlled and all teams keep up to date with the correct version.

After using the framework for a few months, it becomes obvious that the way the code was originally written is either: broken; wrong; or doesn't quite fit with The Way We Do Projects Here ™. This then requires massive changes to "simplify" the design and add enhancements wherever "necessary" - Like masking all those pesky exceptions that get thrown and instead returning null.

Of course now that so many changes have been made, and coupled with the requirement that all projects be uniform in quality, it becomes necessary to ensure that project teams cannot and will not use the version(s) available from the original project site but instead are forced to use the highly tailored internal version. In fact it's probably a good idea to make the framework a "black box". I mean, why would the non-core developers need or want access to the source code. The core team are providing a service after all and that is all that's important, so access to the internal repository must be on an as-needs basis.

And finally, after 12 months of development and hard-work, it is customary to allow the The Architect who made ALL the proprietary changes (to the supposedly open framework) to go on 4 weeks holiday, just prior to delivery to System Test, leaving the project team to fend for themselves so that when a bug is found, the only solution is to fork the code (again) and check it in to the project repository on the proviso that the changes make their way back into the core ASAP.

October 23, 2004

Why Type When I Can Skype

Listen to this articleListen to this article

Throw out Yahoo! Messenger, if you're not using Skype, I'm no longer your "buddy" :P. I've tried voice chat before but nothing even as close to as good as this. I can't believe I've never heard (pardon the pun) of it before.

I plugged my headphones in, "called up" a friend and starting speaking at my laptop (there's a mic there somewhere though I've no idea where). The sound quality is astonishingly good. My friend might as well have been sitting next to me.

So I started calling up everyone I could. My brother travels a lot for work and has two kids and I figured it would be really useful for him. "brb (be right back)" he says so I start playing some of my newly ripped CDs as "hold music". When he got back I asked him what the sound quality was like. "I thought a CD had started playing on my computer" he replied.

Apparently you can have up to a 4-way chat. Though I've not tried, if you're prepared to pay, it allows you to make international, and possibly even local calls. I'd be interested to hear from anyone that has. And even better it has a text based IM client built in so if you like to type instead you can; though why would you bother, Jon? :P

The next bit is to work out how to get VNC working over a VPN across The 'Net so that James and I can do a bit of remote pair programming...mmmm

Now if only I could find a way to have multiple voice chat conversations going at once without having my brain explode. Oh well I guess it's a little too early to throw away the IM client after all. DOH!

October 21, 2004

Beware The Cross-Product Join

Listen to this articleListen to this article

An intersting discussion started on the Drools user mailing list regarding some problems writing a rule. The particular problem is not unique to business rules though. RETE-based inferences engines share much in common with relational databases and in fact this particular problem can affect SQL queries in the same way as it affects business rules.

Let's say we wanted to find all pairs of people that were maternal siblings (ie that had the same mother). In SQL we could write a query like this*:

SELECT * FROM Child c1, Child c2
WHERE c1.motherId = c2.motherId

If we imagine we have only two children in our database, Bob (childId = 1) and Mary (childId = 2), both having the same mother, this query would generate four rows:

  • Bob, Mary
  • Mary, Bob
  • Bob, Bob
  • Mary, Mary

This is called a cross-product; every row is joined to every other row. This results in rows we're not interested in: Bob, Bob and Mary, Mary. So the first thing we would do is try and ignore rows where the child was the same:

SELECT * FROM Child c1, Child c2
WHERE c1.motherId = c2.motherId
AND c1.childId != c2.childId

Which results in:

  • Bob, Mary
  • Mary, Bob

The next thing you'll notice is that we still have redundant rows - rows that mean the same thing. There are a few "tricks" to avoiding this and really come down to a knowledge of the underlying attributes of the tables involved. The simplest in our case is to change the condition:

SELECT * FROM Child c1, Child c2
WHERE c1.motherId = c2.motherId
AND c1.childId < c2.childId

By imposing an arbitrary ordering, we prevent rows being joined to themselves and ensure that for any two siblings, we only get one row. Best of all, this technique translates directly into the implementation of business rules.

Not only do cross-products produce redundant and possibly incorrect results, the extra tuples (rows) generated as a consequence can cause your rule engine to grind to a halt.

* I realise that no one is going to model Children and Mothers in different tables but please cut me some creative slack ;-)

October 20, 2004

Project Risks

Listen to this articleListen to this article

A few weeks ago I gave a lecture to some second year university students here in Melbourne. The talk was titled "e-Business In The Real World" but really it was me yabbering on about my experiences delivering software. Anyway, a couple of people have asked me to publish the slides so here they are, all done using NeoOffice/J on my brand spanking new PowerBook. They're not much, nothing fancy, but they really summarise the risks associated with delivering software.

If I could sum it all up I would say that if your problems are largely imposed by entities external to The Team then that's about normal; you just have to identify the risks and mitigate them somehow. If, on the otherhand, your major problems are technical, ie. within The Team, you're in deep doggie doodoo; fire them all and start again ;-P

Updated 18 May 2005: Having been asked to present again, I revised the slides slightly using Keynote. The content may be much the same (a few changes here and there) but it sure does look sexier now ;-). Unfortunately keynote produces an enormous PDF so I actually exporpted to PPT then imported into NeoOffice/J and re-exported to PDF producing a file that is less than 10% the size!

October 06, 2004

Drools Schmokes! - Part II

Listen to this articleListen to this article

So once we'd worked out what the major hot spot in drools was, it was time to find an alternative method of conflict resolution.

As a bit of background, in simple terms, as facts are asserted, new items (or activations) are added to the agenda. In the general sense, all agenda items are equal. But some are more equal than others.

Although you should stay away from attempting to infer or impose ordering on rules, sometimes it is necessary. Sometimes you just need a couple of "cleanup" or "setup" rules, that are guaranteed to fire before or after all others. In Drools (and JESS) this is known as salience. In JRules it's called priority.

There are other reasons to order the agenda and Drools has a number of different strategies: Random; Complexity; Load Order; etc. These are then chained together. Each Resolver then gets a chance to add the item to the agenda. If it succeeds, no more resolvers are called. If however the item conflicts with one or more existing ones, all are returned and passed to the next resolver to, well, resolve LOL.

Confused? Here's a better explanation.

Looking at the implementation it was apparent that the complexity was O(n^2). Each resolver seemed to be doing a similar thing. It was also optimised quite a bit meaning there was necessarily duplicated code.

My initial gut feeling was that a priority queue was what we needed but how would we do the chaining of the different concerns?

Maybe something like a Red-Black Tree would be useful. Maybe we could implement a comparator for each strategy. Conceptually at least, if we used the first comparator to insert into the tree until we found items that were equal. From then on we would continue to insert using the next comparator, etc.This seemed too complicated and I don't do complicated very well. Makes my head hurt.

It seemed that each of the strategies was really just using a different dimension or aspect of the item to perform a sort. It was like a composite key. So whats the easiest way to sort on a composite key? Use a composite comparator. Something like:

public class CompositeComparator implements Comparator {
    private final Comparator[] _comparators;

    public CompositeComparator(List comparators) {
        this((Comparator[]) comparators.toArray(new Comparator[comparators.size()]));
    }

    public CompositeComparator(Comparator[] comparators) {
        _comparators = comparators;
    }

    public int compare(Object o1, Object o2) {
        int result = 0;

        for (int i = 0; result == 0 && i < _comparators.length; ++i) {
            result = _comparators[i].compare(o1, o2);
        }

        return result;
    }
}

I tried it out using a TreeSet but it performed just as badly. Maybe I was wrong I thought to myself. So I jumped online and chatted to some of the Drools guys, Mark Proctor in particular. I described my ideas and he seemed to like them.

We did a bit of searching around for implementations we could use. I found one here but the license wasn't right. Next we thought of Doug Lea's stuff but it was overkill. Finally Peter Royal suggested looking at the commons-collections stuff and voila, there it was - PriorityBuffer - and it took a Comparator!

Hackedy, hackedy, hack and we'd replaced the original stuff with the priority queue. Time to give it a whirl.

The first step was to run the queue with a simple Comparator. Although it doesn't really do anything much, it would at least allow us to see what the basic overhead of the queue implementation was:

public class ApatheticComparator implements Comparator {
    public int compare(Object o1, Object o2) {
        return -1;
    }
}

Hit run. Damn that's quick! Once more to be sure. Yup. Hmmm. Still not convinced. Add a breakpoint and run in the debugger. Sure enough it's being called. Cool! Ok now to try LoadOrder and Salience.

public class SalienceComparator implements Comparator {
    public int compare(Object o1, Object o2) {
        return ((Activation) o1).getRule().getSalience() - ((Activation) o2).getRule( ).getSalience();
    }
}

Same deal. All works just fine and after implementing a few more I was convinced that this was going to be a winner.

Now we have O(n log n). Even with all the comparators chained in, the peformance doesn't change one bit. What's more, the different strategies are simple one liners making implementing new strategies almost trivial!

So once more I must applaud the Drools guys for a flexible and performant design!

October 05, 2004

Paste Your Code

Listen to this articleListen to this article

Anyone who's used TinyURL will understand how cool this is. One of the guys (Mark Proctor) over at the codehaus put me on to it.

As the title suggests, it allows you to paste your code and generate a unique URL for it. You can select a language, choose a "nickname", enter a description and even convert tabs to spaces if you so desire.

The result is formatted code, with line numbers, that you can easily share with others. Pretty neat.

October 04, 2004

Drools Schmokes!

Listen to this articleListen to this article

We're about to open source a new rule-based project and up until now, we'd been using various closed source rule engines to get us going. Of course this won't cut-it once we open source so we hoped that Drools would come to our rescue.

And it did. With some caveats, I can safely say that Drools is incredibly fast. Not bad for a code base that by their own admission has, quite rightly, favoured stability over performance and as such has had ittle or no profiling done.

Luckily we had built joodi, short for Java-Based O-O Design Inferometer (just had to get the word Inferometer into a project somehow!), test-first and as such the guts of the app was based on interfaces so cutting over to Drools was prety easy. It took me about an hour I guess to convert the application, rules, tests and all, to run with Drools. We fired it up. All tests passed. Hooray!How happy were we!?

Next to run a "benchmark". We ran the application over the struts classes using the closed source engine first and it finished in around 9 seconds. COOL! Performance had been one of our unknowns and this was certainly well within tolerences.

Then we switched over to Drools and run the same test. 20 minutes later it still hadn't finished. Another ten minutes I'd say and I was fast asleep. So when morning came around I lept up and ran into the lounge to see if it had finished. It had. In 78 minutes!!!

Yikes we thought. This aint going to cut it. Elation turned to dismay. But no real profiling of Drools had been done so surely there was room for improvement?

After a bit of chatting with the peeps in da haus, I decided to check-out the source and use JMP to do some profiling. Run it, we thought, find the lowest hanging fruit, fix it, then keep doing that until we've done all the obvious stuff.

So I cranked it up and it didn't take long to find a hot-spot. In fact it appeared that nearly 50% of the time was being spent in one small area - conflict resolution. A quick look at the source code was all that was needed to confirm my suspicions. Lots of unecessary iteration. But again, I'm not taking anyone to task over it. I'd rather it was stable and functional first.

Looking more closely at the code, I realised that the functionality provided by the classes under scrutiny were not actually necessary, yet, for me to get joodi running. Thankfully due to the thoughtful design it was pretty easy to stub out, without even touching the Drools source-code.

Time to run again...holy-cow! 5 seconds! That can't be right. Run it again. Nope 5 seconds again. Quick look at the output to verify it was actually working correctly. Yup. Run all the joodi unit tests just to be sure. Yup they run just fine. It had gone from being 300 times slower to almost twice as fast!

Damn I'll try running joodi against another, bigger, project - xerces. With Drools plugged in, joodi ran in around 9 seconds. With the closed source product I gave up after 5 minutes and stopped it.

So hats off to the Drools team. Damn fine job! I'll be submitting my patches ASAP and hope to see some of that other code re-factored soon :-)

October 02, 2004

Care-Factor Nine Mister Spock

Listen to this articleListen to this article

This started as a reply to a very pertinent comment on a blog entry of mine but it grew to the point where I thought it deserved an entry of its own.

First to the original comment, I always appreciate a good rant. How could I not LOL. And I agree whole-heartedly with the sentiment. I don't tend to blog about my personal life because, well, it's personal hehehe. I don't really get much from writing about my life experiences, yet. Maybe one day but until then I do get a lot from writing about software development. It's an area of my life where lots of discussion and debate seems to make a big difference.

So for the curious, I teach and train martial arts most week nights. I spend most weekends with my family except for the occasional geek session here and there. I work for 9 months of the year and take 3 months off mostly to travel - I've lived a total of 3 years in Japan off and on over the past 17 years. I speak Japanese. I ride my motorbike whenever I can. I ride my mountain bike whenever the weather permits.... But rather than bore you with my "I'm a Leo I enjoy cooking and dancing" story, let me summarise by saying that I do believe that life is about living and NOT about software development.

Don't get me wrong, I don't dislike software development. As far as a job goes it's the best one I could hope for right now. It's interesting. It's challenging. It keeps my mind active. And I get to meet loads of interesting people in the process. But every year I go to Japan to train or I go hiking in New Zealand and I don't miss the internet nor email nor mobile phones nor any technology to be honest. When it comes down to it, if I were independently wealthy I could turn my back on computers and never look back.

But that was not and is not the point. The point is that no matter whether it be software development, house keeping, whatever, all I ask is that you GIVE A SHIT about what it is you are doing and that you take some care and some responsibility. If you don't, won't or can't, then STOP, CEASE, DESIST! You will do more harm than good so please go away, we don't need nor want you.

My Aikido instructor is famous for ripping shreds through students correcting their technique. Hearing him scream "DAME!" (Japanese for "wrong") across the mat can be a bit much for some students. But he once said to us that "there are only two reasons you'll never receive a DAME from me. Either you're so good that you don't need it; or you're so bad I've given up and I don't care about you anymore."

So I hope you'll understand that I intend to continue ranting and writing about software development, and anything else I feel passionate and enthusastic about, BECAUSE I GIVE A SHIT. :-)

October 01, 2004

Stop Calling Me Shirley

Listen to this articleListen to this article

The lack of documentation is disturbing. Requirements in the form of code or often, reverse engineered from the code. Phooey! Seemingly adhoc changes to the spec by the architects. Cowboy developers making changes here and there whenever they feel like it to hack in some new feature. Dependencies between developers forcing them to pair up to write code. What a ludicrous idea! Nothing seems to get done until the last minute. We'll be lucky to limp across the line. Whatever that line may be. With no real acceptance criteria, how does anyone know when we're finished?

But wait a minute...it's a waterfall project. Oops! XP was used on the previous project. Let's try that again shall we?

All those tests slowing down my build. How dare they make me ensure my code works. All those story cards on the Wiki to read. Boring! Would you believe I was even *gasp* forced to understand what I was doing by consulting with the business rep. Sheesh gimme a break. Imagine allowing the customer to change their mind at the last minute and still delivering on time. Bah! And what's with asking me for revised estimates every day? I signed up for anarchy. Instead I got micro-management!