Beware The Cross-Product Join
Listen to this article
An intersting discussion started on the Drools user mailing list regarding some problems writing a rule. The particular problem is not unique to business rules though. RETE-based inferences engines share much in common with relational databases and in fact this particular problem can affect SQL queries in the same way as it affects business rules.
Let's say we wanted to find all pairs of people that were maternal siblings (ie that had the same mother). In SQL we could write a query like this*:
SELECT * FROM Child c1, Child c2 WHERE c1.motherId = c2.motherId
If we imagine we have only two children in our database, Bob (childId = 1) and Mary (childId = 2), both having the same mother, this query would generate four rows:
- Bob, Mary
- Mary, Bob
- Bob, Bob
- Mary, Mary
This is called a cross-product; every row is joined to every other row. This results in rows we're not interested in: Bob, Bob and Mary, Mary. So the first thing we would do is try and ignore rows where the child was the same:
SELECT * FROM Child c1, Child c2 WHERE c1.motherId = c2.motherId AND c1.childId != c2.childId
Which results in:
- Bob, Mary
- Mary, Bob
The next thing you'll notice is that we still have redundant rows - rows that mean the same thing. There are a few "tricks" to avoiding this and really come down to a knowledge of the underlying attributes of the tables involved. The simplest in our case is to change the condition:
SELECT * FROM Child c1, Child c2 WHERE c1.motherId = c2.motherId AND c1.childId < c2.childId
By imposing an arbitrary ordering, we prevent rows being joined to themselves and ensure that for any two siblings, we only get one row. Best of all, this technique translates directly into the implementation of business rules.
Not only do cross-products produce redundant and possibly incorrect results, the extra tuples (rows) generated as a consequence can cause your rule engine to grind to a halt.
* I realise that no one is going to model Children and Mothers in different tables but please cut me some creative slack ;-)
Comments
This is a great article Simon. About once a month on the drools mailing list someone asks about this problem, and this entry is always referenced (could almost automate it). The SQL analogy is great, as for "young players" this proves that it is not something wierd about rules engines.
Posted by: Michael Neale | October 14, 2005 11:28 AM
Just for reference (for when I direct people here in future),
in rules terms, the equivalent of doing somethingId < somethingElseId
is:
System.identityHashCode(obj1) < System.identityHashCode(obj2)
Posted by: Michael Neale | January 23, 2006 02:54 PM
Hey Michael,
One thing to note however is that works consistently for a working memory that is never serialized; that is, if A B. Of course it may still be that A < B. The point is you can't be sure.
Therefore be careful if you have any logic that requires consistent behaviour of the reltionship across serialized instances of a Working Memory.
Cheers,
Simon
Posted by: Simon Harris
|
January 23, 2006 03:00 PM
I saw you hovercraft in internewt and I am serious to buy one. Would you mind to send anything what you have, any product of hovercraft for 2-3 person, do you have also a mini hovercraft?, we use not in the sand, but only in the grass, or water/ lake/ sea. please send me also your full data of price list, weight, height, what engine do you use, price list + include franco Singapore, thanks
Posted by: budiman | August 23, 2006 10:29 AM