JRules Memory Leak Gotcha
Listen to this article
About 6 months ago we were profiling our application to ensure we had no memory leaks, etc. We did find some and we were able to fix them pretty much immediately. However, today I happened to be chatting with a colleague who is investigating a memory leak in another application and it sounded scarily similar. So in the interests of all you JRules developers, here's a little gotcha.
JRules maintains a binary, 1-to-many, association between the rule set (IlrRuleSet) and the, possibly, many instances of the working memory (IlrContext) - Also referred to as a "rule engine" by the JRules documentation. I'll spare you my diatribe on binary associations for now, suffice to say that if you weren't aware of this little "feature" (or if you were and simply hadn't given it much thought) you're in for a nasty surprise.
When you're done with an IlrContext the natural thing to do would be to simply remove all application references to it and let it be garbage collected. Unfortunately, due to the two-way nature of the relationship, this doesn't have the expected effect. Instead, because the rule set still holds a reference to the context, it will NEVER be garbage collected.
To combat this problem, ILOG thankfully provided a somewhat innocuous looking method IlrContext.end(). To quote from the documentation:
Prepares this rule engine instance for garbage collection. After this call, the engine will not keep any reference to this rule engine instance. The rule engine instance will be detached from the ruleset and will no longer be notified of modifications on the rules. The rule engine instance will also disconnect all its tools and all the related resources will be released. If the application does not keep this object, it is then subject to garbage collection.
In other words, anytime you've finished with a context and wish it to become a candidate for garbage collection, be sure to call end() or be prepared for a slow and painful application death as the heap runs out.
One final tip, if you make use of context pooling, be sure to also call IlrContext.reset() before returning it to the pool. This will remove all references to your application objects within the context.
<blatant-plug>If you're in the market for a cheaper alternative, you might like to try out the latest version of Drools.</blatant-plug>
P.S. If anyone from ILOG is listening, this is exactly the kind of problem WeakReferences (and WeakHashMaps in particular) are designed to prevent :)
Comments
Correct me if I am wrong, It really doesn't effect if you are pooling IlrContext(rule engine).
Posted by: Ravi | December 4, 2004 04:25 AM
Normally I'd say yes you are correct but the term "pool" can mean different things to different people.
For a start, pools can often have high and low "tide" marks - after a period, the pool will discard any unused objects down to some preset minimum.
Secondly, a pool may have a reset option that simply clears it's contents.
Thirdly, the pool may hand out objects expecting the client code to return them to the pool. If the client code fails to for some reason (say poor exception handling) the pool may lose track of the object.
And lastly, even if all your code is bullet-proof, you may, as we did, implement ThreadLocal pooling. Unfortunately WebSphere 4/IBM JDK 1.3 has an annoying bug where it periodically drops a thread and all the ThreadLocal contents. In this case, you have absolutely no control over when you will lose your rule engine contexts. The only choice is to wrap them (not extend!) and call end() from finalize.
IMHO, this last option is a good thing to do anyway.
Cheers,
Simon
Posted by: Simon Harris | December 4, 2004 09:12 AM
We basically, load the ruleset and build IlrContext(s) and pool them [all happening during server start-up].Business logic gets the IlrContext from the pool, asserts facts, fires rules. Before returning the IlrContext to the pool, we call releaseContext(). After looking at this post, it makes sense to call reset()....... I have been reading all your posts on JRules & Drools. They are really interesting and informative. I myself tried to replace JRules with Drools at home with the existing project I am working on, but I am facing lot of issues. Especially around RuleFlow. I mean we are using JRules RuleFlow extensively and I don't know how to move this to Drools, meaning it effects the whole flow. Is Drools planning on introducing RuleFlow feature.... I also heard about your new JOODI project, and news on that.....
Posted by: Ravi | December 5, 2004 05:29 AM
Nice update. Does this problem happen with Drools? Does anyone have a comparision of all the different rules engines. Jess/ABLE/Drools/Jrules etc.
Thanks
Posted by: Mr.Investigator | January 5, 2006 07:26 AM
Simon, have you seen a similar thing happening at the beginning when the rules are loading? We saw it eat up the heap (see below) during deployment when setting java memory to 3GB. Strangely, the problem doesn't appear when it was set low to 2GB and below. Is there some memory limitation for JRules or was do you think it would have been caused by something else?
#### > <> (ZipFile.java:112)
at java.util.jar.JarFile.(JarFile.java:127)
at java.util.jar.JarFile.(JarFile.java:65)
at weblogic.servlet.internal.WebAppModule.loadDescriptor(WebAppModule.java:512)
at weblogic.j2ee.J2EEApplicationContainer.loadDescriptors(J2EEApplicationContainer.java:1398)
at weblogic.j2ee.J2EEApplicationContainer.prepare(J2EEApplicationContainer.java:1223)
at weblogic.j2ee.J2EEApplicationContainer.prepare(J2EEApplicationContainer.java:1070)
at weblogic.j2ee.J2EEApplicationContainer.prepare(J2EEApplicationContainer.java:823)
at weblogic.management.deploy.slave.SlaveDeployer$Application.prepare(SlaveDeployer.java:3029)
at weblogic.management.deploy.slave.SlaveDeployer.prepareAllApplications(SlaveDeployer.java:967)
at weblogic.management.deploy.slave.SlaveDeployer.resume(SlaveDeployer.java:349)
at weblogic.management.deploy.DeploymentManagerServerLifeCycleImpl.resume(DeploymentManagerServerLifeCycleImpl.java:229)
at weblogic.t3.srvr.SubsystemManager.resume(SubsystemManager.java:131)
at weblogic.t3.srvr.T3Srvr.resume(T3Srvr.java:966)
at weblogic.t3.srvr.T3Srvr.run(T3Srvr.java:361)
at weblogic.Server.main(Server.java:32)
--------------- nested within: ------------------
weblogic.management.ManagementException: [Deployer:149233]An unexpected error was encountered during the deployment process. - with nested exception:
[java.lang.OutOfMemoryError]
at weblogic.management.deploy.slave.SlaveDeployer.convertThrowable(SlaveDeployer.java:1017)
at weblogic.management.deploy.slave.SlaveDeployer.access$500(SlaveDeployer.java:124)
at weblogic.management.deploy.slave.SlaveDeployer$Application.prepare(SlaveDeployer.java:3034)
at weblogic.management.deploy.slave.SlaveDeployer.prepareAllApplications(SlaveDeployer.java:967)
at weblogic.management.deploy.slave.SlaveDeployer.resume(SlaveDeployer.java:349)
at weblogic.management.deploy.DeploymentManagerServerLifeCycleImpl.resume(DeploymentManagerServerLifeCycleImpl.java:229)
at weblogic.t3.srvr.SubsystemManager.resume(SubsystemManager.java:131)
at weblogic.t3.srvr.T3Srvr.resume(T3Srvr.java:966)
at weblogic.t3.srvr.T3Srvr.run(T3Srvr.java:361)
at weblogic.Server.main(Server.java:32)
Thanks much!
-Jiin Joo
Posted by: Jiin Joo | December 3, 2006 05:54 PM
Eek, the < and > got eaten. This was the first line:
####<Dec 2, 2006 4:00:20 PM SGT> <Error> <Deployer> <App2> <Server4> <main> <<WLS Kernel>> <> <BEA-149205> <Failed to initialize the application jrules-bres-management-WL81 due to error weblogic.management.ManagementException: [Deployer:149233]An unexpected error was encountered during the deployment process. - with nested exception:
Posted by: Jiin Joo | December 3, 2006 05:55 PM