« Be gone ye foul smelling custom type-safe collections! | Main | Generics »

Immutable Collections

Listen to this articleListen to this article

Oh what joy. Back on the last week of my project from a brief break for xmas and straight to the inevitable cvs update. Yikes!

This is probably old hat to a lot of you but for all others, Collections.unmodifiableCollection() and its siblings don't actually make a collection immutable!

By way of example, the following code is broken:

public class Component  {
    private final Collection _subComponents;

    public Component(Collection subComponents) {
        Assert.isTrue(subComponents != null, "subComponents can't be null!");
        _subComponents = Collections.unmodifiableCollection(subComponents);
    }

    public Collection getSubComponents() {
        return _subComponents;
    }
}

Looks pretty good but unfortunately, the following test breaks:

public void testImmutability() {
    Collection originalSubComponents = new HashSet();

    Component component = new Component(originalSubComponents);

    Collection returnedSubComponents = component.getSubComponents();

    assertNotNull(returnedSubComponents);

    // First, we test to see if the returned collection is immutable
    try {
        returnedSubComponents.add(new Object());
        fail("sub components should be unmodifiable");
    } catch (UnsupportedOperationException e) {
        // Pass
    }

    // Next, we test to see if we can subvert the immutability
    // by addding to the original collection
    assertTrue(returnedSubComponents.isEmpty());

    originalSubComponents.add(new Object());

    assertTrue("sub components should be a defensive copy", returnedSubComponents.isEmpty());
}

We haven't made a defensive copy! So let's add a little more code to get the test to pass:

public class Component  {
    private final Collection _subComponents;

    public Component(Collection subComponents) {
        Assert.isTrue(subComponents != null, "subComponents can't be null!");
        _subComponents = Collections.unmodifiableCollection(Arrays.asList(subComponents.toArray()));
    }

    public Collection getSubComponents() {
        return _subComponents;
    }
}

Remember, anytime a caller passes you a mutable object (such as Collection, Date, Calendar, etc.), you need to make a defensive copy if you're going to hold on to it. This will ensure no one accidentally subverts your own immutability.

Why did I use Arrays.asList() to create the copy? Well I'm glad you asked:

  • Because I know the collection will never be modified (that's what immutable means after all), I don't need to maintain the semantics of the original collection (which may have been a Set for example);
  • The resulting List will be have been created with exactly the right size to accomodate the contents of the collection, meaning no re-allocating buffers. As a reader points out the standard ArrayList constructor that takes a collection as an argument will allocate an additional 10%;
  • Arrays List implementation performs well when iterating; and;
  • The iteration order is preserved (if that was important).

It is also possible to use new ArrayList(int) followed by ArrayList.addAll(Collection), which is probably easier to read?:

public class Component  {
    private final Collection _subComponents;

    public Component(Collection subComponents) {
        Assert.isTrue(subComponents != null, "subComponents can't be null!");
        List temp = new ArrayList(subComponents.size());
        temp.addAll(subComponents);
        _subComponents = Collections.unmodifiableCollection(temp);
    }

    public Collection getSubComponents() {
        return _subComponents;
    }
}

As one reader has demonstrated, on his version of the JDK it uses an iterator which is the way I had assumed it would be implemented but I was thrown off when I looked at the source code for the version of the JDK I have and discovered that it creates a temporary array. Either way this post wasn't really about performance so I guess I'm getting a little off track now.

P.S. Anyone see the funny side of this test given my previous post?

Comments

not that it makes much difference (what's 10% between friends) but ArrayList will allocate an additional 10% to it's size in the event that there will be additions to it... Arrays.asList(subcomponents.toArray()) should achieve the same end without the additional overhead.

Well there ya go. Guess I'll read the javadoc next time :-) Thanks for the tip. I've updated the entry as suggested. Gotta love collaboration -- Cheers, Simon

Yeesh! Turn a collection into an array, then immediately back into a collection? Ugly, unclear, and innefficient! Try this:
Collection theCopy = new ArrayList(subComponents.size());
theCopy.addAll(subComponents);
_subComponents = Collections.unmodifiableCollection(theCopy);
Clear - you're copying the collection. Efficient - exact memory is used. Not as ugly.

Of course, the best way to do it, really, would probably be to use reflection to see if the underlying collection had a public clone() method, and use that to clone the collection, falling back on the above method if it didn't (I believe all the jdk standard collection implementations do have a public clone() method). But admittedly, that would more work - would it be worth it?

I think you've missed the point. The immutableXXX methods are there as a efficiency hack to let you avoid copying collections all the time. What's the point of making a copy unmodifiable? If you pass a copy of a collection out of your object, in order to ensure encapsulation, external code can do whatever it likes to that collection without affecting your object. So the collection doesn't need to be immutable.

Anonymous, the point being that I don't want anyone else even internally to modify the collection. I want to ensure that the object is guranateed to be unmodifiable. Ie. thread-safe.

Hey will, You know I first thought that's what I would do too but then I looked at the source code for the JDK and realised that it's actually less efficient would you believe! It creates an additional array that would need to be garbage collected.

So I looked up the source code for ArrayList.addAll(Collection), and I don't see anywhere where it allocates a new array:
public boolean addAll(Collection c) {
modCount++;
int numNew = c.size();
ensureCapacity(size + numNew);

Iterator e = c.iterator();
for (int i=0; i<numNew; i++)
elementData[size++] = e.next();

return numNew != 0;
}
(I guess it won't be to readable with html syntax disabled, shucks). Only the single call the ensure capacity could possibly allocate a new array, and it won't because the array is already big enough. So where is it? By the way, my comments are digressing on a minor point, overall - Nice blog entry! I liked it.

It seems to me that it is some times useful to have an unmodifiable copy, and it is some times useful to just have a unmodifiable wrapper around the original.

unmodifiable copy - when the programmer is not sure if any other code has a handle to the original. (Protection: When a collection is pass as an argument, and you don't know where it's been.)

unmodifiable wrapper - when the programmer is sure that no other code has a handle to the original. (Performance: I run in to this situation frequently in static initializers because I instantiate the collection, populate it, then assign a static member it's unmodifiable wrapper. I don't need to/want to make a copy of it because I'm positive no other code has a reference to the original.)

Given that both cases are very useful, and need to get desired performance and protection, I think the Java collection methods are near perfect. Also the javadoc clearly states that it's only "an unmodifiable view of the specified collection", not a copy.

It's great that you are educating developers who don't read the Java doc, but I would emphasize that this is good and intended by the designers. It's flexible and well documented.

Will, Thanks for getting back to me. It's really weird. I've looked at the source code for JDK 1.4.2_2 and it looks like this: public boolean addAll(Collection c) {
Object[] a = c.toArray();
int numNew = a.length;
ensureCapacity(size + numNew); // Increments modCount
System.arraycopy(a, 0, elementData, size, numNew);
size += numNew;
return numNew != 0;
}. Notice the .toArray(). So maybe it changed between releases?

Andres, I agree. If you're creating the collection yourself then there's no need to make a copy. I really was just ranting about shitty code I saw when I got back to work. No amount of reading javadoc makes up for thinking clearly hehehe.

Oops I meant JDK 1.4.2_03

Ok, I've looked into it. Yes, the implementation HAS changed between jdk1.4.1_03 and jdk1.4.2._03. Go figure. I figured that there's a reason for this additional array allocation, and that's probably speed. So I went about testing it, using both System.currentTimeMillis() and a native timing suite I downloaded some time back (I believe from JavaWorld) that has far better resolution than 10 milliseconds on the windows platform. And...my results are inconclusive. The results depend on 1. The jdk used 2. Whether a "warm up run" is done before timing. 3. The size of the data set. 4. The phase of the moon 5. Whether it's a Tuesday... By the way, here's the line to call the public clone() method on the collection, if it has a public clone() method: _subComponents = (List) subComponents.getClass().getMethod("clone", null).invoke(subComponents, null); - of course you would have to add code to handle if the collection didn't have a public clone() method. So after all this, I stick with my original suggestion of using Collection.addAll() because that code would be the clearest to read, while not sacrificing any permanent memory.

Yup I'm gonna have to agree with you. Definitely clearer to use addAll(). I'll change my entry to reflect that.

I must admit that I wanted to use clone but it would have been nice if the Collection interface had mandated its implementation. I guess that opens up another can of worms. :-) Thanks again. -- Cheers, Simon

Coming in a little late and all... but I would have thought this was obvious from the javadoc:

"Returns an unmodifiable view of the specified collection" It's a view, not a copy; you'd expect it to see the changes automatically.

It may or may not be "old hat", but at least the doco didn't lie. :)

I've always only used it for passing data out, or to get a constant collection on construction (as it's not enough to declare it final). It's fine for those purposes.

Using the wrappers is a lot better than creating defensive copies for handing data out, as you're only creating one extra copy. Of course, taking data back in is a different story...

Yup exactly. But it's a subtle problem as you'll see from the examples and manifests when you accept a collection. It often doesn't occur to developers that having accepted a collection requires a defensive copy.

Post a comment