Tests and Refactoring
Listen to this article
TDD means no writing production code unless there is a breaking test right? Most practitioners agree on this. However the topic of TDD in the context of refactoring came up in a conversation last night. One of my colleagues challenged that because refactoring wasn't changing the behaviour just the implementation, it can largely be done without a breaking test.
I'm not convinced that is the case. If I think about why we refactor there really is only one possibility - something is wrong with the code. Otherwise, why would we bother. It seems to me this "wrongness" then breaks down into three main causes:
- Missing or superfluous functionality;
- Deprecated implementation; and;
- Smelly code.
The first case is easy to justify writing tests for - clearly, any change in functionality necessitates a test.
The second is more subtle but again, clearly there was some reason for changing the implementation (such as persisting to a database versus in memory) in which case we would need to add some kind of test(s) to verify this.
The third case is a little harder. On what basis do I determine that the code smells and therefore needs a cleanup? In the specific case from last night, Dave had renamed an abstract class to AbstractXXX and extracted an interface. But he had written no extra tests. Admittedly, I usually write a test to ensure that my class implements the necessary interface(s) so he could have done that. However the reason he made the change in the first place was because a checkstyle check was barfing. Why was it barfing? Because it had detected a likely code smell. To my mind, this is a breaking test. It's a test for code quality as opposed to functionality.
Unfortunately, not all code smells can be picked up by machine (though I'm trying very hard lol). But I thought it was interesting to challenge my own notions of what constitutes a test and, getting back to a previous entry (as is my habit), what kind of test is appropriate in various circumstances.
I also want to see if I can come up with many cases where it is justifiable to modify code without some kind of test to prove that the change is in fact necessary. If I can't prove that a change is required (and what better way than with a breaking test) why would I make it?
Comments
Simon,
This might be a case of reductio ad absurdum but let's take one of the simplest Fowler-condoned refactorings you can apply: renaming. I frequently come across lots of parameter, method and attribute names that don't convey their full raison d'etre. When I come across these in code I am working on, I rename them without writing a test first.
I use the philosophy that I'll TDD stuff which can break, and a rename is not one of these beasties (unless I use the thing I'm renaming as a literal somewhere my IDE won't notice) given the code won't compile if I've renamed a method at declaration but not in the third location I've called it, for example.
From the perspective of pure refactoring, I don't think the term "necessary" is appropriate. Refactoring is never needed (in the "it'll break otherwise" sense), but has so many positive side effects that it's damm "nice to have".
Of course, if Checkstyle could tell me that my method names are inappropriate, I'd have it's children.
Posted by: Andy Marks | March 31, 2004 05:52 PM
I don't think having a failing CheckStyle rule or raucous complaint from simian 'proves' that I have to make a change.
A functional test documents a requirement - and a failing one proves that a requirement has not been met.
On the other hand if I find a smelly piece of code and write a CheckStyle rule to expose it - that doesn't prove that it needs to change, it just proves that it exists. Its a circular argument to suggest that failing checkstyle proves I need to change something when I just wrote the rule to complain about it.. to prove that it needs to change I would somehow have to write a test that proves the code is bad. ( perhaps by writing an AI and getting it to perform maintenance tasks on the code, and showing that it finds it harder with the 'smelly' code ).
Ok so im getting silly now, but basically Im not sure I see a checkstyle rule as a test in the TDD sense, but rather a more efficient automated 'nose' for finding code smells.
or something
Posted by: Perryn Fowler | March 31, 2004 06:23 PM
Yup I'll buy that example in part. Though the "which can break" part I am less convinced about.
It's like speeding in a car. You wouldn't speed if you thought you were going to kill someone right? You speed because you think you can handle the speed. And the speed you think you can safely drive at is probably lower than my 18 year old cousing even though you can probably safely drive faster because you have more exprience.
And therin lies the problem. Letting people arbitrarily decide when it's appropriate is dangerous especially on a team of inexperienced developers who by definition don't know what they don't know.
But rename maybe one to add to the list. I'm sure there are many but it seems that no one has actually articulated what they might be it's more of a "gut feel" thing which in many ways I understand but as I've just stated, can be dangerous.
Posted by: Simon Harris | March 31, 2004 06:29 PM
That's like saying that even though we've come up with a better way of detecting breast cancer we can safely assume that anyone we checked before with the older less reliable method is to be considered free of cancer.
I take the view that I have a check in there to identify "bad code". Even if I add it post-hoc.
I think my reference to TDD was probably unfortunate. As the title says it's more about extending the maxim about tests from pure TDD to refactoring.
Posted by: Simon Harris | March 31, 2004 06:34 PM
Ok so lets exchange the word "prove" with "justify" ;-)
Rather tha "prove" you need to make the change I want you to "justify" the code change with a breaking test of some kind (preferably automated) if at all possible.
Posted by: Simon Harris | March 31, 2004 06:37 PM
Andy, just to make a silly point: if you rename a class or method, you _have_ to write a new test. You typically do it at the same time, particularly with a refactoring tool, but it you still get a new test. Otherwise your test won't compile (or run, for dynamically typed languages). Your old test, of course, is "deleted" by removing it.
I've always taken the idea that refactoring can preserve tests unchanged as silly. Refactoring, to me, intrinsically involves deciding what level of breakage you are willing to accept. Sometimes you don't need a new test to drive that (ExtractMethod, for example, is a good one), but sometimes your tests break and you need to decide if the broken test represents a problem ("fix the code") or not ("fix the test"). I typically do this with a mental "black box" image of what is involved in the refactoring.
I also find that when I do an ExtractMethod/Class (or MoveMethod) refactoring, I also tend to shuffle tests around. The reason is simple: although the tests still cover me, I get annoyed if I'm working in one area, and the breaking test is in another. I'm frequently lazy, in that I only run tests that I've deemed "local" on a regular basis, and leave the bigger ones until I'm nearly done. If I don't move tests around, then moving code breaks this pattern of working.
It's probably worth pointing out here that refactoring actually requires a higher level of tests than pure TDD would give you. Arguably, with TDD, you can stop testing once your design is teased out, but at that point you wouldn't always have enough tests to CYA with refactoring. That's why I still prefer the second 'D' as meaning 'Development', not 'Design'.
Posted by: Robert Watkins | March 31, 2004 09:48 PM
I'm usually more concerned about how to improve the judgement of those "inexperienced developers" rather than just train them to respond to a machine. There are forms of non-obvious duplication that Simian doesn't detect and what tool can tell you that something is not communicating intent clearly.
In the end, it is an arbitrary decision on whether any particular refactoring is appropriate based on our experience. The machine is just a tool to assist.
In general, I consider it more dangerous to not correct the fundamental problem of the developers not having good judgement. Do reviews and/or setup automated checks as a safety net. But I wouldn't want to not also focus on mentoring by pairing, etc.
Posted by: Jason Yip | March 31, 2004 11:37 PM
Jason is quite right. I think a balance of both mentoring of inexperience developers and automated safety checks are required. Developers are not crack smoking monkeys if they were I would recommend electric shock treatment on test failures ;-)
To determine the best balance we need to consider the team dynamics and make up.
Posted by: Michael Lee | April 1, 2004 05:56 AM
Nope I don't like 'justify' either ;)
I wouldn't add a checkstyle rule to justify my removing some smelly code - I write it to see if there is any similarly smelly code somewhere else in
the code and to raise alerts if some gets added in the future.
To use the cancer analogy, I wouldn't use my new test on someone who I already know has cancer.
Nevertheless this is a fairly silly argument since we both agree that its a good idea to add a check for similar problems when removing a code smell.
And I would add it before I did the refactor - mostly to prove that the check worked.
Posted by: Perryn Fowler | April 1, 2004 09:57 AM
Yup. fair enough. But I still want to use the new test on that person at some stage in the future to see if the cancer has returned. Maybe I even want to use it now to validate that my original diagnosis was correct (stretching the analogy a little hehehe).
See my next post. It mught (I hope) clear up what I was actually trying to communicate with the entry.
Good discussion though :-)
Posted by: Simon Harris | April 1, 2004 11:13 AM