« Fatally flawed password schemes #27 | Main | The Irksome Power of Ignorance »

Comparing Collections

Listen to this articleListen to this article

After a long week, Achilles finds he has too much time on his hands. His friend the Tortoise takes pity and indulges him with a bit of IM'ing.

Achilles:I've done nothing but read blog entries this weekend.
Tortoise:You must be bored! Anything interesting?
Achilles:I just read an entry that reminded me of some stuff I refactored during the week.
Tortoise:Do you ever get any real work done?
Achilles:Now that Java has a LinkedHashSet can you think of any reason to use a simple List except for "performance" reasons?
Tortoise:Won't it just look like a List?
Achilles:Sort of but, importantly, it's also a Set. Since when do you actually mean to add the same item to a List more than once? I'm being pedantic here.
Tortoise:But it happens, life is full of duplicates.

Achilles:I'm sure it does but I can't think of many examples where that's actually what you want. It just seems too often people use Lists when they should actually be using a Set. Clearly, Lists are useful but ArrayList has to be the most abused Collection class around

Tortoise:People generally think in terms of Lists - it's a simple concept.
Achilles:Yes, but people also think that AND and OR mean exactly the opposite. What we think and what we mean aren't always the same and programming is about expressing what you mean.
Tortoise:Do people really think about the correct Collection type to use?
Achilles:No, they probably don't but they should.
Tortoise:I try to but I can't guarantee that I won't be lazy and default to ArrayList.
Achilles:Exactly! And then the code ends up iterating over stuff and assuming a particular order on things that have no order. I see it all the time and this damn CollectionUtils.isEquals(Collection, Collection) just makes it worse. Its ludicrous. It basically allows you to compare a List with a Set and see if the contents are the same. Which is just wrong! A List and a Set are not the same thing. They are symantically very different and thinking that it's just a matter of comparing the contents is, IMHO, flawed.
Tortoise:Which takes us back to your original question - if you want to allow duplicates then you can't use a Set, so when would you want to allow dups?
Achilles:Very rarely I suspect. In fact how often do you ever want to allow duplicates and how often does order really matter? Part of the problem I think is a misunderstanding of what equals(Object) actually means. It implies substitutability and therefore must be reflexive. But many people don't realise that their equals(Object) method isn't so that we end up with a == b but b != a.
Tortoise:I've not seen that happen.
Achilles:It usually happens with inhreitence and using instanceof instead of class comparison.
Tortoise:You must have looked at a lot of shitty code!
Achilles:You mean you can't tell? Why do you think I bitch so much :-)
Tortoise:Ok what if I have a situation where it is possible to have more than one object of the same type and content? That Collection could not be stored in a Set, correct?
Achilles:Correct. So you just want a Collection, not a List. I repeat NOT A List.
Tortoise:Then what implementation class do I use?
Achilles:The implementation can be a List but the variable should be a Collection as in Collection things = new ArrayList(); because a List implies ordering and so far you haven't mentioned anything about order being important.
Tortoise:Ok so then I decide that ordering is important.
Achilles:Sure make it a List but the key thing is that you don't just assume that order is important because then people will try and write tests assuming something about the order and then they'll build screens assuming something about the order, etc. etc.
Tortoise:I've just rememberd...I added a method to compare Collections (for that domain object) to see if there had been any changes - there is no check to see if they are the same implementation of Collection so i could be iterating over a List and a Set
Achilles:Why can't you just call Collection.equals(Object)? Thats what it's for.
Tortoise:On the Collection?
Achilles:Yes. I see people writing "convenience" methods for comparing Collections all the time when they already have an equals(Object) method that does a perfectly good job.
Tortoise:I assumed that it wouldn't do a deep comparison.
Achilles:It iterates over the contents, calling equals(Object) and or checking object identity (whatever is appropriate for the Collection). I use assertEquals(Object, Object) on Collections all the time.

Tortoise:Hmmm, that didn't get picked up in the tech review.
Achilles:Probably because everyone on the project uses CollectionUtils.isEqual(Collection, Collection)!

Comments

Thanks for the useful reminders - far too often is something more specific than necessary used with Collections. I'll admit to being guilty of abusing List; it really does imply order & it's often used where no order has been defined.

By the way - good use of GEB. I lost my copy during a move & had been meaning to pick it up again.

Cheers,
-Ryan

It is a great book. One that can be read many times over ones life.

Cheers,

Simon

What's GEB ?

Godel, Escher, Bach: An Enternel Golden Braid, Douglas R. Hofstadter

Post a comment