Simian bake-off
Listen to this article
Well you asked for it and here it is. Results from running the native, C#, flavour of Simian versus the Java flavour.
As I mentioned earlier, I had originally run the comparison on my linux machine using mono. As many people had pointed out, this was far from a "fair" comparison. Some people even suggesting that purely porting the code would result in poor performance. To this I reply fooey.
The test (and I use the term loosely) was performed on a DELL 2.0 GHz Inspiron 4150 with 512MB RAM running Microsoft Windows XP Pro against the JDK 1.4.1_01 source.
And the winner is...I'll let you be the judge:
The java version using the Sun JDK 1.4.2_03 ran in 64MB:
> java -jar simian.jar -recurse=*.java > java.txt
Similarity Analyser 2.1.0 - http://www.redhillconsulting.com.au/products/simian/index.html
Copyright (c) 2003-04 RedHill Consulting, Pty. Ltd. All rights reserved.
Simian is not free unless used solely for non-commercial or evaluation purposes.
Loading (recursively) *.java from C:\jdk1.4.1_01\src
{ignoreCurlyBraces=true, ignoreModifiers=true, ignoreStringCase=true, threshold=9}
...
Found 40880 duplicate lines in 2339 blocks in 872 files
Processed a total of 369957 significant (1187603 raw source) lines in 3889 files
Processing time: 18.337sec
The C# version ran natively in 61MB:
> simian.exe -recurse=*.java > csharp.txt
Similarity Analyser 2.1.0 - http://www.redhillconsulting.com.au/products/simian/index.html
Copyright (c) 2003-04 RedHill Consulting, Pty. Ltd. All rights reserved.
Simian is not free unless used solely for non-commercial or evaluation purposes.
Loading (recursively) *.java from C:\jdk1.4.1_01\src
{ignoreCurlyBraces=True, ignoreModifiers=True, ignoreStringCase=True, threshold=9}
...
Found 40880 duplicate lines in 2339 blocks in 872 files
Processed a total of 369957 significant (1187603 raw source) lines in 3889 files
Processing time: 12.628sec
Running with -server gains us about an 8% improvement in performance for the Java version but certainly still nothing like the nearly 30% needed to catch up to the native .Net
Surprisingly, running under BEA JRockit 1.4.2_03 used 235MB and tookaround 35 seconds using all default settings. The disk seemed to be thrashing but we made no attempt to tune the performance using JRockit options.
The C# version running under mono on the same hardware ran in 250MB and took around 78 seconds. Unfortunatele\y we couldn't get any of the optimize features to work on the windows version of mono. Besides, we figure this comparison is rather moot. Rather it is better to compare Java versus Mono on linux.
So here are the results on a DELL 1.8GHz Inspiron 8200 with 1GB RAM running Gentoo Linux (2.4 kernel) against the JDK 1.4.2_03 source.
The java version using the Sun JDK 1.4.2_03 ran in around 60MB and took 25 seconds.
The C# version under mono (with -O=all) ran in around 90MB and took 34 seconds.
Amusingly, nay astoundingly, the .Net version runs natively faster under windows+VMWare+linux than the mono on straight windows or straight linux!! Go figure?
Interesting to say the least. I wait with baited breath for the ensuing storm of abuse from the Java community this entry generates Hehehe. Though it'll make a change from receiving a serve from the .Net community.
Now the task is to see if we can pin-point what accounts for the difference. Unfortunately I fear, that because it's a direct port (ie line by line), any improvements I make to the Java version will likely carry forward into the .Net version.
Performance aside .Net is still not my bag baby. It still feels a little clumsy. But then I've years of practise getting my Java up to scratch.
Comments
Hmm, I'm also curious about the numbers when using Mono. It seems that you didn't published them anywhere on the earlier blogs. Can I ask you this little favour? ;)
Also, running with BEA JRockit should be interesting (and give Java a better score, I hope).
Posted by: Carlos Villela | January 25, 2004 08:34 PM
Good call Carlos. I'll do that. Anything I can to get the Java numbers up ;-)
Posted by: Simon Harris | January 25, 2004 09:36 PM
I can only hope no one will cast you down for blasphemy; at least you are providing real-world results and not conceptual arguments.
What will be very interesting is seeing how much scouring is performed on the Java version of the Simian source code... let's hope it doesn't become another pet store!
Posted by: R.J. | January 26, 2004 08:30 AM
Interestingly it performs better on linux. Now that may not be so surprising given mono is still very early days now.
I have to say that the source code is pretty much tuned to within an inch of its life. The fact that I did a simple port (ie line by ine) to C# tells me that no matter what I do it'll only get faster on .Net.
Now don't get me wrong. I would love for Java to be faster but even if it's not nothing could make me want to go and use C# as my primary language. anytime soon.
Posted by: Simon Harris | January 26, 2004 02:32 PM
See how the java figures go under other VM's. IBM? JRocket? But since your's is a client-side app, you really have to be dealing with the most wide-spread VM.... but I would be curious on figures.
Posted by: Daniel Sheppard | January 26, 2004 03:46 PM
Ignore me while I actually READ the entry properly and notice you're already using jrockit.
Posted by: Daniel Sheppard | January 26, 2004 04:52 PM
I've since performed some more testing that runs over much larger code bases and therefore takes much longer to run. I'm sad ti say the results were the same. Around 30% faster. DOH!
Posted by: Simon Harris | January 31, 2004 10:37 AM
One possible reason for speed dif. could be Java's collections. Did you use generics for C#? C#'s generics are able to use primitives whereas Java's collection classes have to wrap primitives with objects. If the collections usage is high, this will have a significant impact on performance.
You can have some 2...6x faster collections with equal amount of memory savings by using Stephano Vigna's Fastutil collections. E.g. IntOpenHashSet in Fastutil uses 6x less memory and performs at least 5x faster than plain HashSet with Integers.
Posted by: Nipsu | February 3, 2004 09:18 AM
Nipsu,
No i didn't use generics in C#. In fact I implemented my own collection classes that are optimised not only for holding primitives but also for the specific usage in the app.
As you might expect, I simply ported these to C# as well so that, unfortunately, doesn't explain the difference in performance.
Cheers,
Simon
Posted by: Simon Harris | February 3, 2004 12:17 PM