« October 2004 | Main | December 2004 »

November 27, 2004

Assistant Orange Peelers

Listen to this articleListen to this article

My father is a commercial airline pilot. He's been flying since around the time I was born, 32 or so years now. He's flown everything from light aircraft through 707's, 767's right up to the latest 747-400 - the ones with the winglets and beds for the 2 crews, enabling them to fly non-stop Sydney to London.

Pilots must undergo medical and practical examinations each year in order to maintain their rating on a specific aircraft and apart from actually flying, he has at various times been a training captain working out of Boeings Washington base. From there he teaches and re-trains pilots on simulators to either attain accrediation on an aircraft they haven't flown before, or simply to renew their existing accreditation. Obviously he has lots of stories to tell but by far and away the most interesting group of pilots he's ever had to train were Russian commercial airline pilots.

Back in the bad old cold-war days, defections to The West were common place and of great concern to Russian authorities, especially when it came to pilots. You can imagine that it wouldn't take much for a commercial airliner to continue on to Japan or parts of Northern Europe. So, no doubt in an effort to counter this, and also possibly to give as many people as possible a job, Russian commercial airliners would sometimes have as many as 6 crew members in the cockpit with each one assigned and trained for a specific task. My father used to joke that they would have a Navigator; Radio Operator; Flight Engineer; Pilot; Co-Pilot; Orange Peeler; and an Assistant Orange Peeler. Re-training them on aircraft that require at most a 2-man crew was challenging to say the least.

This kind of "siloing" occurrs all too often in IT shops as well where each person has a specific task: The GUI Guy; Miss Middleware; Database Dude; Build Master; Application Deployer. Not such a bad thing at first glance I suppose - it's always good to have someone in the team who knows about these things. For example, if you ever need to know something about how the Object-Relational-Mapping works, go see Billy Bob.

Another argument put forward is that by concentrating responsibility and accountability it removes the "burden" from the rest of the team. In practice though, this approach seems to lead inexorably to demarkation disputes further resulting in much finger-pointing and scape-goating. Cries of "Can you fix that, it's in your code" or "Who's been making changes to Blah.jsp without asking me?" can often be heard. What's worse is how this can materially affect the productivity of the whole team - "She's not in today so we'll have to wait until tomorrow to make those changes" or "You'll have to wait until the Build Master gets back from lunch before we can do another drop for you."

It seems odd that a modern software developer cannot or is unwilling to become multi-skilled. The days of being solely a JSP guru or COM expert are gone and have been replaced with collective code ownership and the idea that moving people around is a good thing.

Or perhaps it is management worried about losing control? Perhaps a team who's members have become self-reliant may decide that they can make it on their own and "defect"?

November 21, 2004

Using Lucene To Find A Date

Listen to this articleListen to this article

For the next 3 weeks (and for the past few), I'm the DefectController. I get to watch the defects roll in, assess them, and hand them out to the approriate developer (which may be me). Last week I saw a rather odd defect pass by:

org.apache.lucene.queryParser.ParseException: Too many boolean clauses when performing date range search.

My first reaction was puzzlement replaced shortly thereafter with shock as I thought through the problem. It occured to me that the most obvious cause would be the unthinkable: the developer must have enumerated every possible date in the range and included them ALL in one gigantic OR condition.

A bit of groking later and shock turned to horror. Fortunately, the developer had not done as I suspected. They had done the correct thing and generated the correct Lucene criteria in the form:

dateOfBirth:[19700801 TO 20030615]

Unfortunately, that left only one option: It must be Lucene!

Two minutes on Google and the BuildController turned up, among others, this link. Yes indeed, it seems, Lucene does enumerate ALL possible dates. In fact depending on the granularity, it will end up enumerating all possible seconds! Apparently this is not a bug nor even a feature but a "known behaviour".

So now here's the thing that puzzles me. It would appear, from the documentation, that string ranges are also supported allowing us to find say, people where:

name:[Albert TO Betty]

This being the case, does Lucene enumerate ALL possible names? I find that hard to fathom. If it does, then I give up now. If not, then couldn't we just encode the dates as umambiguous, comparable strings? something like:

dateOfBirth:[19700801 TO 20030615]

Look familiar? It should. I just copied and pasted the original example. But if this time around we consider the dates as strings of the form yyyyMMdd instead of attributing any special notion of date, wouldn't that solve the problem? Wouldn't that also easily allow us to perform partial range searches that include say only the year or year and month?

A Lucene expert I am not but all the links we found suggesting various other "work-arounds" (one of which suggested upping the limit on the number of clauses!) seemed little more than hacks. So, please, please, please tell me I've missed something obvious because the solution really does seem that simple to my feeble bwain.

November 19, 2004

Where's The Problem?

Listen to this articleListen to this article

One of the biggest problems I see over and over again is the difficulty support and maintenance teams have diagnosing problems. It's unfortunate but developers have a knack of writing code for Happy Days scenarios - when the software works, it works well; when it fails, it can be a disaster.

A cow-orker and I were recently discussing the use of assertions in production code. He had previously been discussing the topic with one of his colleagues who had suggested that their existence was a smell; that it indicated a lack of testing.

Now, enough has been said on the topic of unit testing so suffice it to say that the great thing about unit testing is that it's easy to ensure your components work as advertised. You can even, and in many cases should, test what the behaviour would be given invalid input such as NULLs, etc. Then of course we move into integration, functional, acceptance, regression, etc. testing to prove that the application hangs together as a whole.

The problem is that tests don't necessarily prove that the software does what it's supposed to *GASP!*. Rather, tests prove that software works for the given scenarios and the assumptions made and the plain fact is that these do not always match reality. We may have the greatest, most comprehensive test suite in the known universe, but if it's testing the wrong things, it matters little. Sure, in a system over which you have complete control, high levels of functional/integration test coverage can compensate but even then, even 100% code coverage doesn't equate to 100% accuracy. In particular, at best, it is difficult to test for and therefore prevent clients of your code from passing invalid parameters. In fact if you've ever written and published a public API you'll know that, by definition, it's impossible.

The further into a system a problem propogates, the more difficult it becomes to diagnose and the greater the likelyhood of "damage". One of the major benefits of production code (as opposed to test code) assertions is that at run-time we can detect and prevent unexpected scenarios as early as possible, thereby preventing them from propogating. Maintenance developers and those familiar with the Fail Fast axiom will appreciate how important this is in a production environment.

November 16, 2004

Environmentally-Friendly Configuration

Listen to this articleListen to this article

Dave came over to me the other day and said he wanted a new build check written that searches the entire source base for "C:". It's a familiar problem where supposedly automated scripts refer to specific directories, paths, etc. Move the script, run it in another environment, and BLAMMO!

I recall some colleagues telling me of a problem they had once where the system had been running fine for months in development and then one day it mysteriously started failing, everytime, on every developers machine. It turned out that ALL developer machines had been configured to communicate with the message queue on another developers machine because that's what had been checked in to CVS.

More recently, we encountered a problem where the UAT environment worked but not System Test. Again, it turned out that the default configuration for a remote URL was, lo and behold, set for use in a UAT environment. Each time the System Test deployment was run, it was communicating with the wrong server. No one noticed it at first because, aside from the server name, the string of request params is the same in all cases.

These days, as a bare minimum, we strive to have ALL "default" configuration values set to something along the lines of "THIS_IS_WHERE_THE_VALUE_OF_X_NEEDS_TO_GO". This way it sticks out like the proverbial Dogs Bits when you've forgotten to tailor something for a specific environment.

Then we use build properties such as configuration=development, configuration=uat, etc. specified on the command-line that allow the build scripts to substitute in all the various values appropriate for the intended target environment. This, coupled with some Build Watermarking, almost guarantees that this class of configuration problem is a thing of the past.

That is of course until one of the developers comes to you and proudly explains that they have "discovered the problem. There were these strange values in the reference data scripts, so I changed them all to sensible defaults and checked it in."

November 11, 2004

Unit Tests As Complexity Sponge

Listen to this articleListen to this article

A number of people have variously commented that unit tests may in fact be more about design than actual testing. Many others (the links elude me at present) have also complained about the undue burdon imposed by a large number of unit tests and that because of this, and other very sound reasons, they prefer acceptance tests. If I was ever in any doubt about the importance of acceptance tests, I was certainly convinced after the last project where acceptance tests would fail where no unit test had, due no doubt in large part to the fact that the acceptance tests also acted as integration tests.

One thing I did notice however was that over and above their usefuleness as a design tool, unit tests seemed to act as yet another positive constraint helping reduce the overall complexity of the code. Because developers were forced to write unit tests, they were forced to produce relatively simple, testable code. Much simpler, I believe, than would have been the case otherwise. The down side to this testability was that in many cases, the corresponding unit tests were rather more complex than we would have liked. And, as noted previously, complex tests tend to be brittle and this has a knock on effect with respect to maintenance. But does this really matter?

Ultimately what is important is working software (for which you have acceptance tests) and clean, easy to understand code that is hopefully cheaper to maintain. You could choose to throw away all those "dirty" unit tests once you reach production and rely solely on your acceptance tests; or you might choose to buy new ones through refactoring/re-writing; or you may decide that the unit tests are worth the extra effort to maintain.

Whatever the course of action, it seems to me that, yet again, unit tests have benefits beyond simply (or not as the case may be) producing "correct" code.

November 10, 2004

Something Tells Me This Could Be Bad

Listen to this articleListen to this article

ADGU3163I: Suppressing console message display from server because the arrival rate of 38.76
per second exceeds the threshhold rate of 10

November 07, 2004

CVS Saves My Life Once Again

Listen to this articleListen to this article

Some time ago I trashed my linux machine by running rm -rf on a logical mount that, for reasons too mundane to discuss, was pointed at the root partition. Yes yes, snigger snigger hehe but tell me you've never done something similar :P. Anyway, a day or three later, I had resurrected my machine and restored all my files from those non-existant backups that we all vow to make...one day.

I deploy stuff to my various websites using ant scripts. Each time I deploy a new version of a product or project or even just make a change to some static HTML, it's automatically shipped using JSCH then some shell commands are rune using SSH to move stuff around/configure things as necessary.

Up until about an hour ago, I had been using a semi-colon (;) as the command delimeter with no issues at all. There is problem with this that I had never considered - If any of the commands fails, the shell keeps on executing the remaining commands! Now in my case, one of those commands changes directories and is followed immedately by, you guessed it, rm -rf *. Not a problem if everything goes to plan but I had just recently renamed said directory! Needless to say, it wasn't pretty.

It then occurred to me that what I should have been using were double-ampersands (&&) which terminate execution at the first failure. Even better would be to simply rename (move) the existing directory and create a new one; My newly adopted strategy.

Thankfully, all my projects and web sites are in CVS so restoring them is never a problem which got me thinking about all the bits of code I've seen commented out or the unused classes left lying around because, like all those off-cuts of building material you're keeping in the shed, "who knows when I might need that."

More often than not CVS is used purely as a central repository that all developers can access but it can and should be more than that. Having everything in CVS allows greater fluidity in development. It allows developers to try something out and if it ends up completely borked, well, we can always just roll back. James even suggested (now that he's a CVS guru along with Jon :P) using CVS branching to do some speculative changes without disturbing the guys on the trunk whilst still allowing us to check-in the code. This is certainly something I would have been extremely reluctant to do even 3 months ago but having seen it work out (so far) for one of the projects on which we depend, I'm rather more inclined to give it a go.

As developers, we need to feel comfortable with and trust that all of this is possible. I doubt that many people I've met actually understand all the subtleties and features of CVS, myself included. One thing I know for sure is that resurrecting dead files in CVS isn't nearly as simple as it sounds. I'm hoping Subversion will address this but right now, something tells me that after everything I've just said, using beta-software for SCM on my critical projects is, perhaps, not the smartest thing one could do.

November 06, 2004

McAppy Time

Listen to this articleListen to this article

Something appeals to me about the way Mac OS X applications are distributed. Sometimes there is an installer (boo!) but more often than not there is only one "file" to drag into /Applications. I say "file" because although it looks like a single file, it's actually an entire directory structure with a special folder Contents containing meta-info that the Mac OS X GUI understands. Sort of like an unpacked Jar file but for native applications. Best of all, if I decide to remove the application (I've played with about 10 IM clients so far) just delete the application and whoosh, it's gone. In most cases about the only thing left lying around are preferences in ~/Library/Preferences.

So, finally understanding all that, I thought I'd try my hand at building a deployable application - called a bundle. I found a web site that documented most of the steps required. I also found an Ant target that allows you to generate a bundle from a build. The next step was to build an application.

Being the Java Snob that I apparently am, I naturally decided to try deploying a Swing app. The fact that Mac OS X comes with Java out-of-othe-box is pretty cool. What's more the deployment mechanism supports Java apps directly meaning that besides some minor L&F issues here and there, Java applications slide right in amongst native applications.

So, I thought I'd put a "Launcher" GUI on Simian just for fun. Nothing special. Certainly nothing to replace the IDE plugins others have written for. And here's a picture. Makes me want to work on a Swing project even more. Though I believe JavaScript is all the rage now ;-)

One thing I didn't do was specify an application icon but that's as simple as converting an image and dropping it into the Resource folder inside the bundle, something that can be achieved by using the icon parameter on the ant task.

The other thing I need to do is to find an ant task that can create disk images - the standard way Mac applications seem to be distributed.

And finally, just as I was about to post this, I stumbled across a three-part series on making your swing applications play nicely on the Mac.

Not sure if any of this is any better, worse or really just the same as a Jar file. But it's kinda fun in that geeky kinda way.

File system info on the generated application bundle
Bundle Info

November 05, 2004

The Sound Of One Man Snapping

Listen to this articleListen to this article

Nothing like waking up after a night of disturbing dreams of zombies drinking bottles of warmed-up coca-cola. It's official. Last nights blog entry was me losing the plot. It's happened twice now in the last couple of weeks and is a sign of me becoming someone I despise. It's an indication that my ability to cope with being asked to be responsible for things over which I have little or no control is non-existant. In all my years of software development, I've honestly never felt this way. It's certainly not in my nature. I woke up this morning wishing I could have the last week all over again.

So to all those who were offended by it, I unreservedly apologise. I have deleted the entry and will make sure I go and have a beer instead next time.

FWIW, I make mistakes. Everyone makes mistakes. Almost everything I've ever blogged about I've done at some stage as well. That's why I write about it. I grew up with the belief that it's not a person that is wrong/stupid/whatever but the things they do. This is why I try so hard to document here all the stupid things I have done in the hope that others won't repeat them.

November 03, 2004

Don't Panic!

Listen to this articleListen to this article

Apparently, "after hours" batch jobs don't require load testing. Yes, you heard it. Supposedly jobs that run when no users are logged in are pretty much free to do whatever they like, all 57 of them! Is it some weird side effect of Heisenberg's Uncertainty Principle that I've never heard of whereby it's possible to be either using the system interactively; or have limitless computing power; but not both at the same time? Who knows but excuse me for suggesting otherwise.

You'll also be relieved to learn that there is no need to load test your applications together on the same box even if they will be co-located in production because we can extrapolate from the results obtained by running a single application stand-alone. That'll be a good cost saving I'm sure.

Oh and as for including the generation and downloading of PDF documents, bah! That does nothing more than test all that pesky "network bandwidth stuff". There's nothing we can do about that anyway so why bother testing it right?

Phew! That's a load off my mind (no pun intended). I had thought that we might end up doubling the load on the production box but it seems I was somewhat misguided. Glad they've got it all sorted out. At less than 6 weeks 'till go live and with the application only just now limping into System Test, I was beginning to worry. Silly me. What was I thinking?

Now where did I put my double pair of Joo Janta 200 Super-Chromatic Peril Sensitive Sunglasses? I'm sure they're around here somewhere...

Mac OS X House Keeping

Listen to this articleListen to this article

Having been a linux weenie for a few years now I had become accustomed to running various house keeping jobs on a regular basis and I wanted to do the same thing on my new PowerBook.

In particular, I use locate for quickly finding files, which to be of any use, requires the the indexer (updatedb) to be run periodically. A quick grep through the man pages and I discovered the OS X version was /usr/libexec/locate.updatedb so the next step was to get it to run as a batch job.

Whilst searching for the appropriate place to put my daily system cron jobs (/etc/daily.local), I ran across this little gem in /etc/daily:

# Clean up NFS turds.  May be useful on NFS servers.

November 02, 2004

Occam Need Not Apply

Listen to this articleListen to this article

Wanted: Software developers for long-term, large-scale enterprise application project. Complex solutions to complex problems. Ability to justify largely redundant framework development to senior management a must.

Why is it that when left to their own devices, and given more than one way to implement something, developers we will almost certainly undertake the most complicated?

Speculative Optimisation

Listen to this articleListen to this article

or pre-factoring as Dave likes to call it, is a common practice. It's an easy trap to fall into. Take a look at any piece of code and I'm sure you will see a way to make it run faster. The problem is that performance bottlenecks are almost never where you would expect them to be. Sure, we might be able to double the speed of a piece of code but if it only accounts of <1% of the overall running time, then it doesn't really matter. Just recently I had someone reccommend that we add in some caching of database results because "It will be a performance problem." The question I had was, when compared with what?

Performance optimisation often (but not always) involves obfuscating the code in some way to achieve the desired performance. Maybe we need to inline some code or unroll a loop here or there. Whatever it is can lead to code that is hard to read and hard to understand and, as we have discussed before, therefore hard to maintain. Ironically, our so-called optimisations can potentially lead to worse performance. If the algorithm is diffuclt to understand or the code simply hard to follow, we might actually introduce unecessary overhead without even realising it. If we have no base-line, no benchmark with which to compare our results, we will never know if we are improving or degrading the performance.

For this we need a profiler. There are plenty around, some free and some you'd have to sell the kids to afford. Quest have a free version of JProbe for use with Linux and Windows that James and I have been using to profile Drools. It's missing some features but certainly nothing we can't live without (how many negatives can a man use in one sentence?). There really is no magic involved. Run it, see where the biggest slice of the pie is and start there. Keep doing that until you've knocked off all the big ticket items. Chances are that'll get you most of the way. Anything beyond that probably requires a fundamental shift in the design. But hopefully, because you have a clean design, that shouldn't be too much of a problem ;-)

Interestingly, one of the simplest things you can do with your design is to make things as close to immutable as possible. So, for example, rather than have lots of JavaBeans with setters, use constructors. Mark your fields final. Not so because that in itself is a performance enhancement (although it maybe?) but to ensure that the state of your objects is as stable as possible. It also makes it much easier to find out who's messing with the state. To achieve this, you may find you need to de-compose those monolithic classes into smaller ones. I've found it helpful to introduce Builders to accumulate state before constructing your objects. You can think of mutable objects as having many moving parts and the more moving parts to a system, the harder it is to work out what's happening and the harder it will be to re-factor when you finally perform your profiling.

Experience has taught me over and over again that correct code is much easier to optimise than clever code. This is why I'm a firm believer in Make It Work, Make It Right, Then Make It Fast.