« February 2004 | Main | April 2004 »

March 31, 2004

Tests and Refactoring

Listen to this articleListen to this article

TDD means no writing production code unless there is a breaking test right? Most practitioners agree on this. However the topic of TDD in the context of refactoring came up in a conversation last night. One of my colleagues challenged that because refactoring wasn't changing the behaviour just the implementation, it can largely be done without a breaking test.

I'm not convinced that is the case. If I think about why we refactor there really is only one possibility - something is wrong with the code. Otherwise, why would we bother. It seems to me this "wrongness" then breaks down into three main causes:

  • Missing or superfluous functionality;
  • Deprecated implementation; and;
  • Smelly code.

The first case is easy to justify writing tests for - clearly, any change in functionality necessitates a test.

The second is more subtle but again, clearly there was some reason for changing the implementation (such as persisting to a database versus in memory) in which case we would need to add some kind of test(s) to verify this.

The third case is a little harder. On what basis do I determine that the code smells and therefore needs a cleanup? In the specific case from last night, Dave had renamed an abstract class to AbstractXXX and extracted an interface. But he had written no extra tests. Admittedly, I usually write a test to ensure that my class implements the necessary interface(s) so he could have done that. However the reason he made the change in the first place was because a checkstyle check was barfing. Why was it barfing? Because it had detected a likely code smell. To my mind, this is a breaking test. It's a test for code quality as opposed to functionality.

Unfortunately, not all code smells can be picked up by machine (though I'm trying very hard lol). But I thought it was interesting to challenge my own notions of what constitutes a test and, getting back to a previous entry (as is my habit), what kind of test is appropriate in various circumstances.

I also want to see if I can come up with many cases where it is justifiable to modify code without some kind of test to prove that the change is in fact necessary. If I can't prove that a change is required (and what better way than with a breaking test) why would I make it?

March 26, 2004

More Broken Tests

Listen to this articleListen to this article

Following on from my entry yesterday on broken tests I came across another interesting problem I see from time to time in peoples test code.

Here's a very simple example to demonstrate. As far as I'm concerned it's a big fat broken test:

public class FileBuilderTest extends TestCase {
    public void testHeaderIsWritten() throws Exception {
        StringWriter writer = new StringWriter();
        FileBuilder builder = new FileBuilder(writer);

        BufferedReader reader = new BufferedReader(new StringReader(writer.toString()));

        String line = reader.readLine();
        assertNotNull("too few lines", line);
        assertEquals(FileBuilder.HEADER, line);

        ...
    }
}
 
public class FileBuilder {
    public static final String HEADER = "# Created by " + FileBuilder.class.getName();
    
    private final Writer _writer;

    public FileBuilder(Writer writer) throws IOException {
        if (writer == null) {
            throw new IllegalArgumentException("writer can't be null");
        }
        _writer = writer;
        
        writer.write(HEADER);
    }
}
 

Seems pretty sensible right? We check to make sure that the header gets written to the stream. Putting aside the are tests for testing or design debate for a moment, one thing remains clear - a test should break if I change the class in anyway that breaks the intended behaviour.

Right?

If so, then this test is truly crippled. Don't believe me? Well just change the value of header to some absolute garbage and re-run the test.

Did it break?

No. Why? Because the test never checks that what gets written is what is supposed to be written. Instead, it is simply checking that whatever header the FileBuilder has defined is what gets written. So anyone can inadvertently (or deliberately as we have done) change that header and the test never breaks.

I'd rather see this:

public class FileBuilderTest extends TestCase {
    public void testHeaderIsWritten() throws Exception {
        StringWriter writer = new StringWriter();
        FileBuilder builder = new FileBuilder(writer);

        BufferedReader reader = new BufferedReader(new StringReader(writer.toString()));

        String line = reader.readLine();
        assertNotNull("too few lines", line);
        assertEquals("# Created by " + FileBuilder.class.getName(), line);

        ...
    }
}
 
public class FileBuilder {
    private static final String HEADER = "# Created by " + FileBuilder.class.getName();
    
    private final Writer _writer;

    public FileBuilder(Writer writer) throws IOException {
        if (writer == null) {
            throw new IllegalArgumentException("writer can't be null");
        }
        _writer = writer;
        
        writer.write(HEADER);
    }
}
 

I recall one of the things I found disturbing when reading (one of) Mr. Becks TDD book being his suggestion that our first example is in fact good practice because we are removing duplication. Personally, I don't care about duplication in tests nearly (if at all) as much as I do in production code.

An interesting variation I heard about involved a developer essentially coding the same algorithm in both the production class and the test so they could verify the results.

At any rate, hopefully you can see the similarity in both cases - the tests weren't explicit about what the expected results should be instead using a "calculated" value. I'd much rather see the test explicitly state exactly what it expected to be written. That way if someone accidentally does say a find+replace within our FileBuilder code, we will catch it.

Build Speed

Listen to this articleListen to this article

I've inherited the poorest performing machine on the team. Only fitting I suppose as I'm probably the poorest performing member (it's a pretty good team!) But the main problem I have is that it takes so long (between 10 and 17 minutes depending on how long it's been since I rebooted windows) to do a pre-checkin build (to make sure I haven't screwed anything up) that by the time it finishes, I really need to to another CVS update. This invariably brings back a number of changes that good old paranoid me thinks necessitates another local build. This vicious cycle continues until I give up and just check in my code.

Of course I then need to wait until the integration build on the cruise machine completes before I can be happy I've not broken anything. Now the chances are that there is already a build in progress and, on average, about half-way through. The integration build itself takes about 20 minutes all up (and that's on a fast machine). So, it's now about an hour by the time I've got anything checked in and I'm happy I can go home.

This morning, Andy Trigg removed the generation of the test reports from the local machine build as this was taking around 2-3 minutes on my machine and besides, it's really only needed if the tests break right?

But that didn't really help all that much and I finally cracked it and decided to take a look at what was causing the build to take so long. How long can it take to compile and run 1200 unit tests and around 250 functional tests? I mean sheesh! A lot of time and efforts has gone into ensuring that the unit tests are as close to "real" unit tests as possible and not just functional tests in disguise so they should be blindingly fast to run.

Closer examination of the build file revealed that the tests were being forked. This was because there is some bug in the build process that causes OutOfMemoryErrors. This was clearly going to slow things down but not much I can do about it as far as quick wins goes.

The next thing I noticed was that version 1.6 of ant has a new option on the JUnit task (reloading) that allows you to confugure if a new class loader will be used for running each test class. I quickly set this to false!

And lastly, I discovered that if I ran all the unit tests in IntelliJ they took around 30 seconds but run from the Ant script they take around 3 minutes!

Then I remembered playing around with test suites some time ago. I vagualy recalled that putting tests into a suite ran orders of magnitude faster than if you ran them one at a time. A few minutes using good old of find, sed, grep, etc. and I created a class named Suite that looked something like this:

public class Suite extends TestCase {
    public static Test suite() {
        TestSuite suite = new TestSuite();
        suite.addTestSuite(BlahBlahTest.class);
        suite.addTestSuite(FooBarTest.class);
        ...
        return suite;
    }
}
 

Then I changed the build script to run my suite instead of each individual file, making sure that I forked it just in case.

Well blow me down. The pre-checkin-build now runs in 3 1/2 minutes! That's a full clean and rebuild plus unit and acceptances tests. Whats more amazing is the actual unit tests now take an amazing 9.4 seconds to run. A clean unit test build now only takes 1 minute and an incremental build a mere 30 seconds. Woohooo! I can't wait to see how the cruise box performs now as it was already at least twice as fast as my machine.

The only real issue was then the fact that we would have to maintain the suite to ensure we didn't leave out any tests. Stuff that I thought so I quickly whipped up an ant task to generate it for me and away we went. I also suggested we could remove the need to compile the suite by generating the byte-code directly to which James just frowned dissaprovingly and accused me of being "such a geek!" Maybe he's right? hehehe

Next week I'll see if I can get similar improvements in the acceptance and regression tests.

Oh yeah the test report generation now takes a whole 1.5 seconds to run so it might make it's way back into the local build again but I doubt it. Every second counts! :-)

March 25, 2004

Code Normalisation

Listen to this articleListen to this article

I've been doing some incidental code cleaning as I implement functionaility for the system I'm currently working on. It's something I really enjoy. Code quality is somewhat of an obsession. I often wonder if it's actually worth it. But I dislike leaving "ugly" code lying around so I fix it anyway. I'm sure some of the developers wished I'd just shut up. Some of them have even taken to not letting me actually look at the code when answering a problem for fear that I'll get distracted :-)

Last year, James Ross and I wrote some 20+ custom checkstyle checks (most have made it into the core distribution). They were brought about by a project we were both working on at the time. We would manually identify code smells and then set about developing some heuristics for automagically finding all of them.

I've been applying most of those same checks to this project and, I'm happy to say, although they turned up some "violations", for the most part the code base is pretty clean. I still come up with more checks I'd like to implement when I decide to make the time and last night I made the time and wrote a very simple check for "missing abstractions" or more properly, opportunities to normalise.

There are clearly many ways we determine that there are missing abstractions but there is one thing I've noticed on many projects: code containing common prefixes and/or suffixes is often a good candidate for extracting a class. Some very simple yet recurring examples might be date ranges (dateFrom and dateTo) and addresses (addressStreet, addressCity, addressPostcode), etc. I'm sure you can think of other less obvious examples.

The check simply identifies clusters of instance variables or parameters that share a common prefix and/or suffix. When I ran it against the code base this morning I was amazed to find hundreds of examples! Probably 1/3rd were in constants: MINIMUM_AGE, MAXIMUM_AGE, etc. with the remainder largely instance variables and a small number of parameters.

After trawling through the results, I'd say I had probably found a dozen or more absolute, no-bones-about-it cases for refactoring which I thought was a pretty good result. What concerned me was there seemed to be a lot of noise. But then again, was it really noise? Would it not be better to have minimum and maximum values actually represented as a Range? If so, how far do you perform this kind of normalisation? As James pointed out, humans really do have a left and a right arm. Do we really need to make an Arms class with getLeft() and getRight()? Probably not. It could be argued though that constants sharing the same prefix/suffix such as _ACTION_NAME or DISPATCH_ are really just introducing an implicit namespace that might be better moved to a separate, explicit, enumerated type?

I'd be interested to know what you think and if you'd be interested in running the check against your own code to generate some feedback. To do so, download the jar file and add it to your build class path. Then, to run the check, simply add the following code fragment to your checkstyle configuration as a child of TreeWalker:

<module name="au.com.redhillconsulting.jamaica.tools.checkstyle.MissingAbstractionCheck"/>  

You'll also find some more checks (all in the same package) that have yet to make it into the checkstyle distribution:

  • ClassDataAbstractionCouplingCheck
  • ClassFanOutComplexityCheck
  • DuplicateLiteralCheck
  • FinalFieldCheck
  • GuardingIfCheck
  • NPathComplexityCheck
  • NullPointerCheck
  • ReturnFromCatchCheck
  • ReturnFromFinallyCheck

Broken Tests

Listen to this articleListen to this article

As I mentioned, I've been doing some code cleaning up as I add new functionality to the system I'm currently working on. I have to agree with Andy Trigg and say the delete refactor is one of my favourites. It's gratifying to replace great swathes of code with something much simpler and yet more functional. The best is when I refactor some code that allows me to delete a test. This is usually because responsibility for some behaviour has been smushed over a number of classes which leads to duplication among several test classes.

If it weren't for all those failing tests, I'm sure I would have screwed up pretty early on. As I was fixing tests here and there, I noticed something interesting. Some of the tests were breaking due to assertion failures. Just what you would hope for. However, rather more were failing due to exceptions. Usually a NullPointerException. What's worse is the NPEs were in the test case code, not code the test was calling.

Tests should assert expected behaviour. If you expect a return value not to be null, be explicit about it. If you expect an iterator to have more values, assert it. Don't wait for the underlying code to blow up in your face unless you're actually testing for that behaviour of course.

I also like to keep tests as small, simple and obvious as possible. If a test case checks for a limited set of conditions, its much easier to work out where to fix the code when it breaks. Heavily factored tests may appear small, but if you were to inline all the test code, you'd find that it's definitely not simple nor obvious.

What I've found most interesting is that the worst tests are generally the ones written after the fact. The "oh yeah I'm gonna need to test this" kind. They stand out like the proverbial k9's gonads. They're usually long (the tests not the testes), difficult to follow, test more than just a few simple concepts and tend to be very brittle. That last point seems counter intuitive at first but is definitely backed up by experience.

So what's the point? Tests aren't just for keeping jCoverage happy! In fact crappy tests are a liability. They will end up being a maintence nightmare if left unchecked (pardon the pun).

March 13, 2004

Software Ghettos

Listen to this articleListen to this article

I was sticking my nose into a code review James Ross was performing the other day. Among other things, naturally, he was checking for code duplication. Now my informal analysis of open source projects and commercial projects I have been brought onto suggests that, a figure of anywhere between 10% and 20% code duplication is pretty standard. As horrifying as that may seem (to me at least), it was rather amusing to think that the final report might include words to the effect that, "your software falls within industry norms with respect to duplication of source code."

This lead me to question what is acceptable. Why is it that one developers clean code is anothers quagmire? James suggested that over and above the obvious differences in experience and "smarts", it may come down to how much "smell" we can tolerate. He likened it to cleaning the house - Some people (ok me!) can leave dishes in the kitchen for days while others (like James) apparently can't go to bed if there's even one dirty dish lying around.

James also recalled an article he had read on Broken window syndrome and we wondered if the same thing applied to software via accumulated technical debt? That maybe as the density of code smell increases, there is some point beyond which there is no escaping. A point beyond which your team subconciously stops caring anymore because "well the rest of the code is crap so I'll just add this hack in here."

Maybe that's why Windows™ is so very broken? ;-)

I recall my mum being infuriated that as kids, we would simply walk past the rubbish or toys on the floor as if they weren't there. She couldn't understand why it never ocurred to us to pick up the Lego and put it away. Even now I find myself noticing a piece of paper on the floor and forcing myself not to just leave it.

So while your team might start to understand the value of design patterns, TDD, whatever, it's really important to get them to be more sensitive to identifying and eliminating rubbish as they walk through the code and not just leave it there for mum to pick up.

Naming Tests

Listen to this articleListen to this article

Your business analysts have no doubt spent days coming up with test cases for an interest calculation. Writing it all out in nice, simple, plain english so you can understand what you need to implement. They've probably produced half-a-dozen scenarios for you to test against. Don't go and spoil it all by naming your method testInterestCalculation. I want to see testInterestCalculationWithFixedRateFor24Months or something similar.

I know there will be those who say that it's redundant information to encode into the test name. I mean just look at the code right? Phooey! Ever written a method named remove that actually added something instead? If the answer is yes, go directly to jail. Do not pass go. Do not collection a hundred dollars! If, like most sane people, the answer is no, that's because we name methods to indicate intention. We then look at the code to see how it's been implemented.

Ever spent what seemed like hours trying to come up with a good name for your class? Why bother? Again, because it's so important to communicate not only to others but also to yourself the role of the class, it's purpose for being. Without an appropriate name, it's often hard to know what the classess responsibilities should be.

If I go onto a project and need to get an idea of what the system does, the first place I look is the test suite. It should tell me almost everything I need to know about the implementation.

So do me a favour and spend a bit of time naming your test cases something meaningful. Please! I beg you. It'll stop me wasting your time asking lots of stupid questions ;-)

March 09, 2004

JRules - A brief overview

Listen to this articleListen to this article

As requested, my brief overview of JRules. I'm sure I will have failed to answer many questions so feel free to ask. If I know the answer, I'll add it to the entry, othwerwise I'll either cop out and say I don't know or go and find out. As always, if I've made a mistake or three, feel free to point them out.

First off, I found JRules very easy to use. It comes with very good documentation. As I mentioned in a previous entry, foremost in my mind was to prove testability. I can safely say that rules can be fully unit tested. So there are no excuses! ;-)

Like many rule engines, JRules usess a Rete network. The cool thing about the Rete algorithm is that rules with similar conditions (or subsets of conditions) are identified and evaluated together meaning that the conditions need only be tested the minimum number of times. This differs from java coded rules where typically every condition in every rule must be evaluated regardless of whether we might already have tested for it in previous rule.

Continue reading "JRules - A brief overview" »

March 08, 2004

re: Spiking and TDD

Listen to this articleListen to this article

Jon - Six Million Dollar Man - Eaves raises some interesting points on spiking versus TDD. A very good summary on a tried and true approach for learning new stuff. While I agree totally with the article (I do this myself quite often so it must be good hehehe), I've really started to try and use JUnit as a mechanism for doing spikes as well.

Just recently, as part of a new 3-month gig (many thanks to David Pattinson and James Ross and much to the obvious displeasure of one who shall remain nameless) I started doing some investigation into ILog JRules, something I had never used before. My primary responsibility was to mitigate any technical risk associated with using JRules. Naturally, testability was one of the main concerns for the team.

Now usually in this kind of scenario, one would usually whip up a quick-and-dirty main method, and start writing some code to both learn about JRules and to prove what it can/can't do. Instead, I started off by writing a test. Of course I had no idea what the test would do but I wanted to force myself to do it anyway.

At first it was a very simple test to work out what jar files I would need to just get the engine up and running. Then as I had more and more dicsussions with various team members about what our overall requirements would be and we thrashed out the different ways we could achieve the same thing with JRules, I simply added more and more sophisticated tests to test out our assumptions.

Doing my spiking as fully-blown tests, forced me to think much more critically about what I was doing rather than just spew stuff to stdout and look at the results. I was forced to clearly state my assumptions and the tests proved conclusively if these assumptions were right or wrong.

The great thing now is that we have a full compliment of integration tests that both prove our assumptions about, and demonstrate our chosen use of, JRules. This will be extremely valuable when the rest of the team start to write their own rules as part of implementing user stories. It also means we can happily upgrade to newer versions of JRules if/when they are made available with much less worry about breaking our code. This is of course why you all write integration tests isn't it ;-)

A few days ago, James and I did much the same thing when doing some swing development. I can't say I've perfected it. I still do some debugging and println but in the context of writing a test. But, I'm forcing myself to be as "pure" about it as I can to see how far it's practical to push the technique.

As a complete aside, I found JRules pretty damn easy to use and test. The documentation is very good. If anyone is interested, I'll post a brief overview.

March 05, 2004

It wasn't me, I was framed!

Listen to this articleListen to this article

People who know me well, know well (and it's not even christmas) my disdain for frameworks in general. I must have missed it but at some point the ideal of component based development was hijaked by the framework. Was this because it didn't actually work or because someone decided to play a cruel joke on us to see how long we, as an industry, could keep going around in circles re-inventing the proverbial wheel?

Almost every large-scale project I've ever seen starts of with lofty goals of re-using the framework dejour and soon realises that it just wont cut it. Then comes the process of deciding if it is better to stick with it and forego the application you really want/need, roll-your own for maximum flexibilty or bastardise it so as to render your application largely incompatible with any future releases of said framework.

Sourceforge is littered with every-man-and-his-dogs attempt to build a better mousetrap. And they're just the open sourced ones. I dread to think how many companies have their own flavours!

I've had much success building applications from scratch with no framework, just some user stories and test cases. Then letting the infrastructure requirements emerge and hunting around for libraries that will satisfy my needs. This seems to work much better than the times I've tried to decide up front which frameworks and tools I'd like to use, invariably leading to more work than it was worth. The key point here is identifying real requirements and addressing them. "We will need struts" is not a requirement!

Off the top of my head, I can think of maybe half-a-dozen libraries that I've been using regularly that don't lock me into someone elses idea of how my application should be structured. Some for persistence, some for rendering, others for business rules, etc.

Sometimes there are so-called "strategic" reasons for using a particular tool. Unfortunately, the average whiteboard architect has usually been drinking way too much Gartner Koolaide. To be re-usable, software components should solve a particular problem without trying to dictate my overall design. And more importantly, they should solve the problem well.

Yikes! Is that bile I can smell? Hmmmm