Wednesday, December 9, 2009

Weblogic Related Issues/Solutions - Case 1

Problem:
Platform running Weblogic 8.1 on Sun V880 servers. Total RAM of 32 Gb on the machine.
2 Gb assigned to the managed server JVM heap. JDK 1.4

Initial settings:

-XX:+AggressiveHeap -Xms2048m -Xmx2048m  -XX:SurvivorRatio=32 -XX:MaxPermSize=128m
But still there are 20 Full GCs per hour in peak times, before the server crashes.


Analysis
  • 1. It was decided to reduce the SurvivorRatio to 4 and restart with some more flags.
  • The size of ONE Survivor Space is calculated as
  • SurvivorSpace = NewSize / (SurvivorRatio + 2)
  • Keeping SurvivorRatio as 32 means the Survivor spaces are too small for promoting stuff from Eden. Hence we reduce this to 4 which allows for larger Survivor spaces.
  • 2. As per Sun Bug ID: 6218833, setting AggressiveHeap set before Heapsize (Xmx and Xms) can confuse the JVM. Revert the order to have -Xms and -Xmx to come before -XX:+AggressiveHeap or not use it
  • 3. The application has 180+ EJBs with pools of beans. Hence set the -Dsun.rmi.dgc.client.gcInterval=3600000 (1 hour) instead of the default 60000 (1 min). More on this here: http://docs.sun.com/source/817-2180-10/pt_chap5.html
  • 4. The site is restarted once a week at 4:30AM. The patterns stays normal for 2 days – and then degrades into full GC.
  • 5. The Old space is pretty much full – at every minor collection – the Old space must be cleared up for promotion from Young to Old to take place.
  • 6. Permanent space is pretty much full – keeps loading classes and classes ( could that be a problem – the difference between the number of JSP’s per Release?)
  • Hence we increased the PermSpace from 128M to 256M
  • 7. Ensure we are running the server JVM by using the -server flag
  • 8. Use OptimizeIt or similar profiling tool to see the memory usage and find code bottlenecks.
The settings now were
-server -Xms2048m -Xmx2048m  -XX:MaxNewSize=512m -XX:NewSize=512m -XX:SurvivorRatio=4 -XX:MaxPermSize=256m -Xincgc -XX:+DisableExplicitGC -XX:+AggressiveHeap -XX:-OmitStackTraceInFastThrow
This reduced the Full GCs to one a day.

Error Logs

At the time of the server going out of memory prior to a crash, the logs are filled with repeated errors (up to 100 repetitions) of this sort

java.lang.NullPointerException
<>
Adding the -XX:-OmitStackTraceInFastThrow flag resolves this problem, the root cause of the NPE it self has to be tracked down but we do not have any longer the issue of huge recursive exception strings.

We could now see the stack trace as

java.lang.NullPointerException
at java.util.StringTokenizer.(StringTokenizer.java:117)
at java.util.StringTokenizer.(StringTokenizer.java:133)
at jsp_servlet._framework._security.__login._jspService(login.jsp:294)
at weblogic.servlet.jsp.JspBase.service(JspBase.java:27)
at weblogic.servlet.internal.ServletStubImpl$ServletInvocation

No comments: