Simian - Similarity Analyser

Purpose

Simian (Similarity Analyser) identifies duplication in Java, C#, C, C++, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source code and even plain text files. In fact, simian can be used on any human readable files such as ini files, deployment descriptors, you name it.

Especially on large enterprise projects, it can be difficult for any one developer to keep track of all the features (classes, methods, etc.) of the system.

Simian runs natively in any .NET 1.1 or higher supported environment and on any Java 1.4 or higher virtual machine, meaning Simian can be run on just about any hardware and any operating system you can hope for. Both the Java and .NET runtimes are included as part of the distribution.

Simian can be used as part of the build process during development or as a guide when re-factoring. Think of Simian as an independent pair of eyes that will assist in raising the quality of your software.

Within minutes, Simian can save you literally thousands of dollars in time spent performing maintenence, debugging and re-factoring.

With licensing available to suite personal, project and enterprise use, simian is ideally suited for use on your project.

Why do I need Simian?

Imagine for example that a bug is discovered in a method somewhere in a project. The developer duly writes a test case, makes the necessary code changes, ensures the test passes, checks the code in and considers the job finished!

Right?

Wrong!

Unknown to the developer, some weeks earlier, a fellow team mate discovered the same piece of code and realised that it did almost everything they needed to solve a problem thye were working on at the time. So they copied the 15 lines of code into their new method, added some more code to do the extra functionality required and checked in the changes.

Of course what they didn't know at the time was that the code they were copying had a bug in it! In fact at the time no one knew this. So now the original bug has been fixed but unfortunately none of the copies were fixed because no one knew they even existed.

Copying and pasting isn't the only way for this to occur. Duplicate code can also creep into through developers independently implementing similar features.

Simian catches these and other instances of duplication and can be configured to either flag them as warnings or even "break the build", ensuring that copy and pasting never again causes you or your project problems.

Performance

Running against a large source base such as the entire 390,309 LOC* (1.2 million lines of raw source) in 4,242 files of the JDK 1.5.0_13 source, identified 66,375 duplicate LOC* in 1,260 files in less than 10 seconds using as little as 48M of heap**!

* A line of code is any line considered to be significant. Blank lines, comments, etc. do not count towards this figure.

** Results may vary depending on factors such as hardware, operating system, processing options, etc.

What's in a name?

The dictionary defines Simian to mean "Relating to, characteristic of, or resembling an ape or a monkey."

What does this have to do with code duplication? Well we thought that when people copied and pasted code, they were acting like monkeys. In fact the Checkstyle plugin is called SimianCheck :-).

Or you could think of Simian as helping you with some of the monkey work. The Ant task is called SimianTask.

In any event, it only occured to us later that Simian could stand for SIMIlarity ANalyser.


Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.

.NET and all .NET-based marks are trademarks or registered trademarks of Microsoft® in the United States and other countries.

Copyright (c) 2003-08 RedHill Consulting Pty. Ltd. All rights reserved.