« December 2006 | Main | March 2007 »

February 28, 2007

Pulse on PostgreSQL

Listen to this articleListen to this article

By default, my current toy (Pulse) stores its meta-data in an HSQLDB database. Which is fine (it's a darn good database) but I already have plenty of infrastructure around periodically dumping all PostgreSQL databases on my system for backups. Wouldn't it be nice if Pulse ran on PostgreSQL.

Thankfully, the fellas at Zutubi (purveyors of the afore-mentioned toy) have written up a quick HOWTO on migrating to PostgreSQL and it worked like a charm.

About the only downside is that each time I upgrade, I need to remember to add the postgres.jar to the lib directory. Hmmm. Perhaps I need to add it to the classpath before starting the server...?

February 21, 2007

Rails, PostgreSQL and Case-Sensitivity

Listen to this articleListen to this article

Possibly the only thing I like about MySQL is when performing a search, the values 'SIMON' and 'sImOn' are considered equal—case-insensitive searching. PostgreSQL on the other hand considers them to be different—case-sensitive searching. Now I don't know about you but for %99.999~ of the applications I've ever written, I'd rather 'Australia' and 'AuStRaLiA' weren't considered different countries.

The "standard" approach to solving this problem is to change a query from this:

SELECT * FROM countries WHERE name = ?;

To something like this:

SELECT * FROM countries WHERE LOWER(name) = LOWER(?);

Thereby forcing the database to perform a pseudo case-insensitive search. The only problem is that all those nice indexes you've created to ensure fast, efficient searching are totally ignored. (Who can spell full-table scan?) Performance issues aside (I mean after all we know that premature optimisation is the root of all evil right?) what's just as annoying is that I can't actually guarantee uniqueness, which is pretty much the whole point! No, even with a unique index on countries.name, the database will still quite happily allow me to:

INSERT INTO countries (name) VALUES ('Australia');
INSERT INTO countries (name) VALUES ('AUSTRALIA');
INSERT INTO countries (name) VALUES ('aUsTrAlIa');

So when I perform a case-insensitive search as previously discussed, I'll end up with three (count 'em 3) records. Thankfully, there is a solution (of sorts): expression indexes.

PostgreSQL allows you to create indexes based on expressions, say for example LOWER(name), allowing us to create a unique, case-insensitive index as simply as:

CREATE UNIQUE INDEX index_countries_on_name ON countries (LOWER(name));

Ok, so perhaps you knew this already and you're wondering what all this has to do with Rails? Well I'm glad you asked.

Rails (as of 1.2) has a new option for validates_uniqueness_of named, oddly enough, case_sensitive. This is assumed to be true by default (meaning all searches are case-sensitive). Set it to false however and you'll magically get validation queries that look like:

SELECT * FROM countries WHERE (LOWER(countries.name) = 'australia') LIMIT 1;

To compliment this feature, I've recently enhanced the RedHill on Rails Plugins in two interesting (and hopefully useful) ways.

The first is in the core and supports the creation of case-insensitive indexes during schema migration:

add_index :countries, [:name], :unique => true, :case_sensitive => false

The second is in schema validations and causes case_sensitive => false to be passed as an option to validates_uniqueness_of whenever a case-insensitive index is detected.

(I also looked at the possibility of automagically surrounding query parameters, etc. with LOWER() inside find methods but given the myriad forms queries can take, it seems altogether too difficult for my feeble mind at this point.)

The upshot of all this is that at the very least, it should now be possible to add case-insensitivity to your queries and be assured that (bugs not withstanding) the performance of your application won't suddenly plummet as a consequence.

February 19, 2007

WYSIWYL

Listen to this articleListen to this article

With the rising popularity of video podcasts as training material, I thought we needed a suitable acronym:

What You See Is What You Learn.

If only Google weren't around to prove that I wasn't the first to think of it. D'oh!

On the positive side, it's apparently a method with "...guaranteed results."

February 14, 2007

Semantic APIs

Listen to this articleListen to this article

I was chatting with my brother today (somewhat of a professional student with degrees in neuroscience, physics and maths) about software development. I was explaining how yesterday had a been a rather unpleasant day working out how to integrate with Crystal Reports. You see, what I wanted was an interface that looked something like this:

CrystalReport report = new CrystalReport("report.rpt");
report.setParameter("Posting Year", "2007");
report.setParameter("Account Number", "5678");

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

report.writeTo(outputStream);

outputStream.close();

Only instead what I got was this:

ReportClientDocument document = new ReportClientDocument();
document.open(_reportName, OpenReportOptions._openAsReadOnly);

DataDefController dataDefController = document.getDataDefController();
ParameterFieldController parameterFieldController = dataDefController.getParameterFieldController();

parameterFieldController.setCurrentValue("", "Posting Year", "2007");
parameterFieldController.setCurrentValue("", "Account Number", "5678");

InputStream inputStream = document.getPrintOutputController().export(ReportExportFormat.PDF);

int b;

while ((b = inputStream.read()) != -1) {
    outputStream.write(b);
}

inputStream.close();

document.close();

The first is nice and semantic; it's pretty obvious what the code is doing. The second requires you to read, very carefully, each line in order to work out what is going on. Talk about leaky abstractions. Apparently my Document's connected to my, DataDefController; my DataDefController's connected to my, ParamaterFieldController; ...

My brother drew an analogy with explaining to someone how a telephone works. In the first case, we've gone through a very simple explanation with just enough information to allow someone to have a go themselves; in the second example, we're now explaining how the spin of each electron determines the probability of it going down the wire and thus contributing to the current that ultimately makes the phone call possible.

At first I was concerned: Having just recently ranted (for the umpty umpth time) that developers don't seem to understand even the most fundamental principles of software development, here I was lamenting the fact that most APIs weren't simple enough. My brother then asserted that perhaps the reason software is (in general) so poorly written is precisely because the APIs we are forced to use are so primitive. That because we are forced to follow so many steps in achieving something that is conceptually so simple (such as producing a report) that the likely-hood of failure is much greater.

The scary thing is, this is certainly not an isolated example. Have you ever tried using the javax.mail packages? JNDI anyone?

In the end, I wrote a class named, unsurprisingly, CrystalReport with an interface exactly as in the first example and implemented almost exactly as in the second. But I seem to need to do this quite a lot when dealing in the "Enterprise" Java world.

Design Improvement Workshop

Listen to this articleListen to this article

My mate Marty Andrews is running a Design Improvement Workshop on March 24th here in Melbourne as part of Cogent's Easy Access Training (EAT). The content is in part, a follow on to the TDD Workshop with emphasis on identifying and removing code smells using refactoring tools and techniques.

One interesting point about these courses is that rather than being a fixed price, places are allocated by auction with the final price determined by the lowest bid! So, if you think some first-class developer training is worth $100, get in quick before you're outbid.

February 12, 2007

Hear Me Sort

Listen to this articleListen to this article

James has a home-grown GTD system which he'll hopefully write about if not release to the world one day. It's totally command-line driven using a combination of bash, perl and ruby to manipulate plain text files and, from what I can tell, it's not only blindingly fast, it seems to work seriously well.

One of the daily tasks that James performs is to prioritise the things he wants to (attempt) to get done and what better time to do that than on the 45 minute train trip to work. As you can well imagine, prioritising 40 items takes a fair amount of time not only due to the sheer number but also, being vision-impaired, reading (and re-reading) takes a bit of concentration. So, we came up with a nifty solution in two parts.

The first part was quite simple: use binary insertion. That way, instead of O(n^2) comparisons, we'd get O(n log n). So far so good. But the that was pretty obvious. The next bit was the real doozy!

Rather than have each pair printed out so that James could read them and make a decision, we instead chose to use some Text-To-Speech (TTS) to literally speak each pair; something along the lines of:

"Is calling your boss to discuss staff pay reviews more important than send wife a bunch of flowers for her birthday?"

In addition, we actually run the TTS in another thread so that we can easily interrupt it, primarily because James: has already chosen a suitable answer; needs to repeat the question; or wants to cancel the whole thing.

About the only problem we encountered was that, on his Powerbook, the say command (built-in to Mac OS X) ran a little too slowly for our liking so we simply aliased it to use the much faster swift command that comes as part of the commercial Cepstral William voice he uses for all his Text-To-Speech.

Now James can listen to the minimal set of questions, making the appropriate decision for each by way of a simple keystroke, and have it all done within 15 minutes all the while free to sit and enjoy the view (such that it is on a suburban train during peak hour) rather than staring at the screen.

Yes, the rumours are true.

Listen to this articleListen to this article

Oh, you haven't heard? Well then, let me be the first to tell you: I'm getting married on April 14th to the loverly Jesseca ;-). We're then heading to Japan (possibly via Borneo) in early June to spend 12 months in a little country town where we will hopefully spend our time telecommuting for work, travelling and partying. Oh and of course all I can eat Aikido training. Woohoo!

So if anyone knows of any conferences, geeky get-togethers or the like I'd love to get a little taste of the software development community in Japan.

Just Tell 'Em Joe Sent You

Listen to this articleListen to this article

A customer walks into a bank and, after a waiting in the queue, reaches the teller:

Teller: How may I help you?
Customer: I'd like to transfer a million, billion, squintillion dollars from Jo Blogg's account into mine please.
Teller: Certainly but first I'll need some proof of identification.
Customer: enters account number and PIN and clicks enter.
Teller: Ok, now I'll need to authorise the transfer, please wait a moment.
Customer: Oh, that's ok, Joe gave me permission.
Teller: I'll get to it immediately then.

As ridiculous (and somewhat contrived) as this scenario seems, it's a pretty apt reflection of how many large companies implement security in their web applications. Indeed many VERY large companies. Companies that, if you knew who they were, would certainly have a lot of explaining to do.

The problem as I see it is that developers in general don't understand security. Actually, no, the problem is a little more fundamental than that. Developers in general don't understand the technology stack with which they work day in, day out.

I can appreciate how difficult it might be to accept that you're forking out in excess of $80k a year for a developer that doesn't seem to understand that just because the user can't see a hidden field on a form doesn't mean they can't look at the page source; that just because a user typically interacts with the server using a browser, doesn't preclude the use of something as simple as a telnet client. In fact, I'm willing to bet that if you asked something along the lines of "have you tried seeing if you can connect to the server using telnet" you'll get blank stares all around.

The sad truth is that using hidden form fields, magic request parameters, default credentials, and even sending permissions to the client (even if that means web server as a client of EJB) on login and then trusting the client to send those same permissions back with each request in order to, I don't know, perhaps "save database lookups" or something, is disturbingly common in practise.

Heck, I don't even claim to be a security expert but I'm pretty sure that not using SQL bind parameters is generally considered a bad idea. Then again, perhaps it's not so much of a problem when all you're doing is reading clear-text passwords out of a database?!

February 11, 2007

Integrating with MacPorts

Listen to this articleListen to this article

When I used to run FreeBSD, I loved the ports system that comes as the standard mechanism for installing most freely available (and some commercial) software. When switched to the Mac, I immediately starting using DarwinPortsMacPorts as my main method for obtaining and installing all my "geekware": PostgreSQL; Subversion; Ruby; and even Ant to name but a few.

Installing a new bit of software using MacPorts is usually pretty trivial. Something like:

sudo port install subversion

is all that's needed to install Subversion and all its dependencies.

Sure, there are Mac-specific GUI-based installers for some software that I use but I generally find it easier to use MacPorts for managing dependencies, versioning, updates, etc.

There are however, other command-line based software such as Pulse that, although relatively simple to install, still take some time and thinking not only to install but to install in such a way as to not turn my Mac into a glorified Windoze machine with all those lovely security holes. If only there was a way to create a MacPort installation for these too.

Well, there is: Create one yourself. So this morning I decided to see how difficult it would be to do just that.

Reading the quick-start guide, I soon learned that writing a Portfile—Literally a TCL script named Portfile—is often all you need to do. Unfortunately, the "quick" in "quick-start" is pretty much just that, leaving quite a lot to the imagination.

So, after reading a few existing Portfiles and following a bit of experimentation, I managed to get a relatively clean (and importantly, working) Portfile for installing Pulse which I figured I'd not only share with you but I would use as a guide to creating your own.

Now, before we continue, if you haven't already done so, I highly recommend reading the quick-start guide. It's ok, I'll wait...

The start of a Portfile looks something like this:

PortSystem 1.0
name              pulse
version           1.2.14
categories        java
maintainers       simon@redhillconsulting.com.au
description       Pulse automated build server
long_description  Pulse is an automated build or continuous integration server. \
                  Pulse regularly checks out your project's source code from your \
                  SCM, builds the project and reports on the results. A project \
                  build typically involves compiling the source code and running \
                  tests to ensure the quality of the code. By automating this \
                  process, pulse allows you to constantly monitor the health of \
                  your project.
homepage          http://www.zutubi.com/products/pulse/
master_sites      http://www.zutubi.com/download/

All pretty self explanatory really: The name and version of the software; what exactly it is and why I'd want to use it; who to blame when something goes wrong; and where to download the installation (in this case .tar.gz) file from.

Following on, we have:

checksums         md5 dcebf03b7a7099a476371b8142ba7624
depends_lib       bin:java:kaffe

These specify the MD5 checksum for the installation file to ensue we download the correct one, and any dependencies; in this case we need a Java runtime which comes pre-installed on most Macs anyway but I figured it doesn't hurt to make sure.

Next, we set up some variables for use later on in the script, again all pretty self explanatory:

set pulseuser     pulse
set pulsegroup    pulse
set home          ${prefix}/share/java/${name}
set bin           ${home}/bin
set executable    ${bin}/pulse
set dbdir         ${prefix}/var/db/${name}

(${prefix} is pre-defined by the port system as the path to the base of the installation—usually /opt/local.)

Now for something kinda cool: launchctl integration. If you don't know, launchctl is the preferred method for starting, stopping and otherwise controlling server processes under OS X. It can be a little tricky to get right with various XML configuration files, etc. however MacPorts comes to the rescue. With a few simple TCL commands, the installer will magically do all the heavy-lifting for us:

startupitem.create  yes
startupitem.init    "PULSE_HOME=${home}"
startupitem.start   "su ${pulseuser} -c \"${executable} start\""
startupitem.stop    "su ${pulseuser} -c \"${executable} shutdown\""

Neat huh? Now when we install the port, we'll automagically have scripts generated in all the right places so the pulse server can be started and stopped by the system.

We're almost up to the installation part but before we get to that, we need to stub out a few things.

You see, the whole installation process caters not only for unpacking and deploying of files but in the case of C/C++, etc. also configuring, patching and compiling, etc. In our case though, not only are these latter steps unnecessary, they're not even possible—we're dealing with a binary distribution—so we need to override them to do nothing:

configure {}
build {}

At last we get to the actual deployment of the files, creation of users, setting of permissions, etc.:

destroot {
  # Create the Pulse user
  addgroup ${pulsegroup}
  set gid [existsgroup ${pulsegroup}]
  adduser ${pulseuser} shell=/bin/sh gid=${gid} home=${dbdir} realname=Pulse\ Server

  # Ensure we have the needed directories
  xinstall -m 755 -d ${destroot}${home}

  # Copy the files
  system "cp -R ${worksrcpath}/ ${destroot}${home}"

  # Keep empty directories
  destroot.keepdirs-append ${destroot}${home}/logs ${destroot}${home}/versions

  # Fix ownership of some directories pulse really needs to write to
	system "chown -R ${osuser}:${osgroup} ${destroot}${home}/logs"
	system "chown -R ${osuser}:${osgroup} ${destroot}${home}/versions"

  # Add a symlink from bin directory to the pulse script
  system "ln -fs ${executable} ${destroot}${prefix}/bin/pulse"
}

Once again, pretty straight forward except for that odd looking ${destroot}. Why is it being used in some cases and not in others you may ask.

This actually got me for a while to until I realised that what the destroot step actually does is create an "image" of the installed software in a working directory—${destroot}—after which the predefined install step copies the staged files into their final destination. Not a bad idea really when you think about it. Certainly saves screwing up the production environment when something goes horribly wrong—like when you're playing around with things to see what happens ;-).

Last but not least, a little bit of post-installation goodness to instruct the user with any manual (or perhaps optional) processes, or even just a nice message to say "why thanks for using our software":

post-install {
  ui_msg "#"
  ui_msg "# The script ${executable} has been installed to facilitate starting and"
  ui_msg "# stopping ${name} as a true daemon process. It must be run as ${osuser}."
  ui_msg "# For example:"
  ui_msg "#"
  ui_msg "#   sudo su pulse -c "${executable} start"
  ui_msg "#"
  ui_msg "# This script assumes it is run from ${home}. To run from outside this"
  ui_msg "# directory, you must set the value of PULSE_HOME to the absolute path"
  ui_msg "# of this directory. For example:"
  ui_msg "#"
  ui_msg "#   PULSE_HOME=${home} sudo su pulse -c "${executable} start"
  ui_msg "#"
  ui_msg "# You will also need To create the directory ${dbdir} if it does not"
  ui_msg "# already exist:"
  ui_msg "#"
  ui_msg "#   sudo mkdir -p ${dbdir}"
  ui_msg "#   sudo chown ${osuser}:${osgroup} ${dbdir}"
  ui_msg "#"
}

And there you have it. Not too bad really and although we didn't cover quite a number of the other features the port system provides, I think you will have seen enough to get you started and on your way to integrating your favourite software into the ports system.

It sure beats rememberingforgetting a lot of manual steps each time you need to install/re-install software.