Friday, March 13, 2009

FOW: Fighting against the memory leak

Todays feature of the week is a bit more technical. As you might know BibSonomy is based on a MySQL/Tomcat architecture. Usually BibSonomy is running very stable but from time to time the Java virtual machine stops with an "java.lang.OutOfMemoryError: PermGen space" error. This mostly happens after a redeploy of the BibSonomy project on the Tomcat. Why does this happens? The simple answer is: Because the Java VM does not have enough memory for the so called permanent generation space. This space is used to hold the Java classes in main memory. A simple solution is to give the JVM more PermGen space. But this does not solve the underlying problem. Usually the JVM has enough PermGen space. The only result from giving more memory is: the error will happen a bit later and not directly after the redeploy.

So we decided to search for the cause of the memory leak. Soon we found out, that there were some classes from the web application which the classloader could not remove from the PermGen space because they were "linked" to classes which were loaded by the standard classloader. There could be several causes for that and using the right tools (jmap and jhat from the JDK) plus some small programm to find reference chains we found the culprits:

* MySQL Connector/J (see http://bugs.mysql.com/bug.php?id=36565)
* iBatis (see https://issues.apache.org/jira/browse/IBATIS-540)
* JabRef
* Tomcat (see https://issues.apache.org/bugzilla/show_bug.cgi?id=46221)
* and some we could fix by just moving some JARs to the right places (see also here and here).

Identifying the subjects was an iterative task - fixing one leak caused appearing the next one ... We did not know that there were so many candidates at the beginning. We could fix iBatis by switching to a newer version, MySQL, JabRef and Tomcat were a little harder to fix.
For JabRef we had to modify the source code such that it does not start AWT. Additionally, a Tomcat LifecycleListener kills the java.util.prefs.FileSystemPreferences after webapp shutdown using awful Java introspection hacks:

final Class clazz = CleanupListener.class.getClassLoader().loadClass("java.util.prefs.FileSystemPreferences");
final Field f = clazz.getDeclaredField("syncTimer");
f.setAccessible(true);
final Timer timer = (Timer) f.get(null);
timer.cancel();

To fix the MySQL bug, the listener ensures on the startup of the web application that the MySQL connection class is loaded before the web app and by the standard classloader, such that the cancellation timer threads (which is the cause of the leak) don't block unloading of the webapp. The loggers from the StandardContext in Tomcat (which are loaded via the webapps classloader - for whatever reason) are also killed by the listener.

After several weeks of work we have a leak free application. The bad thing is that every library we are using can bring back a leak and if we are not careful the leak will be back quicker as we like. Unfortunately we are not aware of a method which we could put into the Tomcat or into our application which just checks for memory leaks.

Hope you found this interesting and good luck with your own applications ...