Wednesday, June 26, 2013

Thread Dump Analysis


Thread Dump Analysis - High level steps
1. Make sure at least 3 Thread Dumps, 10 seconds apart, are collected when the server hang issue is observed.
• That’s a way to identify patterns among threads and see if some end up moving with their tasks
2. Collect Thread Dumps from all the servers involved
• Other clustered managed servers
• Managed Servers from other domains which may be interacting with your domain
3. Look first at default and custom execute threads
4. Check if some execute threads are stuck in the same state across multiple thread dumps
5. Ignore the idle threads
6. Identify the execute threads that are not moving along and try to understand the reason behind that by looking at the last method and java class invoked
7. In most cases, WebLogic internal threads such as Muxer threads, listen threads and other threads spawned by the Application should be ignored when investigating a problem from a WLS perspective.
8. Check all the thread dumps from different managed servers to find potential relationship between the stuck threads
Example: A thread from one server could be waiting on another thread on another managed server

Thread states
 Idle Threads: Available threads to process work request for any work manager
Stuck Threads: Threads blocked by other threads. They might be waiting for some other threads to complete their operations. Deadlock condition results in stuck threads.
 Hogging Threads: Threads that are taking a very long time. WLS will generally declare a thread as HOGGING before it is declared as stuck (after the configured timeout is reached).
 Standby Threads: Threads that are in Sleep. Those threads won't do anything as it is in a state of monitoring.
 Active Threads: Threads actively working on requests.


Tools
• Samurai is a light weight open source tool which runs as a Java Web Start application or from a command prompt. It allows to identify thread dumps from stdout and visualize them. It also helps in visualizing heap usage and GC activities (verbose:gc data) http://yusuke.homeip.net/samurai/en/index.html
• TDA (Thread Dump Analyzer) a is a Swing GUI for analyzing Thread Dumps and Heap Information http://java.net/projects/tda/pages/Home
• ThreadLogic built upon TDA by adding logic for common patterns found in application servers.


What does a Thread Dump contain?
• Date and time the dump was created as well as JVM detailed info (version, architecture, mode…)
• Each thread is dumped with its complete java stack that indicates the packages, classnames and method calls involved
• All this gives an overview of the thread execution and the nature of each method invocation
• What is getting executed by the thread can be understood even by non developers

How to Read Thread dumps?
• Thread dumps could contain a lot of data. Trying to read them in Text editor can be difficult
• Different JVM Vendors display the data in different formats (markers for start/end of thread dumps, reporting of locks and thread states, method signatures) but the underlying data (stack) exposed remains the same across vendors
• Many free tools help analyzing thread dumps

Thread Dumps
• If you found some stuck threads in the server logs, then Thread Dumps are the most important piece of information required to narrow down where the problem lies
• A Java Thread Dump is a snapshot that reports what every thread in the process is doing at a particular point in time. It’s an essential debugging tool
• Thread dumps do not give a direct solution but they help in identifying problems and hotspots

When Threads need to be dumped
• The server does not respond to new requests
• Requests time out
• Requests take longer and longer to process
• The server is no longer reported as running in the Admin Console but its state changed to unknown or failed

Different ways to take thread dumps…
Ref: oracle document id [ID 1098691.1]

Observed Issues: Examples


Thread Dumps - Examples of thread hanging on SocketRead/SocketWrite
[STUCK] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)'" id=76 idx=0x128 tid=15140 prio=1 alive, in native, daemon
at jrockit/net/SocketNativeIO.readBytesPinned(Ljava/io/FileDescriptor;[BIII)I(Native Method)
at jrockit/net/SocketNativeIO.socketRead(SocketNativeIO.java:46)[optimized]
at java/net/SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I(SocketInputStream.java)[inlined]
at java/net/SocketInputStream.read(SocketInputStream.java:129)[optimized]
at java/io/FilterInputStream.read(FilterInputStream.java:111)[inlined]
at com/wily/introscope/agent/probe/net/ManagedSocketInputStream.read(ManagedSocketInputStream.java:214)[opti mized]
at com/sun/mail/util/TraceInputStream.read(TraceInputStream.java:97)
at java/io/BufferedInputStream.fill(BufferedInputStream.java:218)[inlined]
at java/io/BufferedInputStream.read(BufferedInputStream.java:235)[optimized]
^-- Holding lock: java/io/BufferedInputStream@0x291bf2b8[thin lock]
at com/sun/mail/util/LineInputStream.readLine(LineInputStream.java:75)
at com/sun/mail/smtp/SMTPTransport.readServerResponse(SMTPTransport.java:1440)
at com/sun/mail/smtp/SMTPTransport.openServer(SMTPTransport.java:1260)
at com/sun/mail/smtp/SMTPTransport.protocolConnect(SMTPTransport.java:370)
at javax/mail/Service.connect(Service.java:275)
at javax/mail/Service.connect(Service.java:156)
at javax/mail/Service.connect(Service.java:105)
at javax/mail/Transport.send0(Transport.java:168)

Thread Dumps - Example of thread stuck in JDBC call
<[STUCK] ExecuteThread: '4' for queue: 'weblogic.kernel.Default (self-tuning)' id=76 idx=0x128 tid=15140 prio=1 alive, in native, daemon
java.net.SocketInputStream.socketRead0(Native Method)
java.net.SocketInputStream.read(SocketInputStream.java:129)
oracle.net.ns.Packet.receive(Unknown Source)
oracle.net.ns.DataPacket.receive(Unknown Source)
oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source)
oracle.net.ns.NetInputStream.read(Unknown Source) 
oracle.jdbc.driver.T4CMAREngine.unmarshalUB1(T4CMAREngine.java:1099)
oracle.jdbc.driver.T4CMAREngine.unmarshalSB1(T4CMAREngine.java:1070)
oracle.jdbc.driver.T4CTTIoauthenticate.receiveOauth(T4CTTIoauthenticate.java:751)
oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:362)
oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:420)
oracle.jdbc.driver.T4CConnection.(T4CConnection.java:165)
oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:35)
oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:801)
java.sql.DriverManager.getConnection(DriverManager.java:525)
java.sql.DriverManager.getConnection(DriverManager.java:140)
com.foo.sbm.util.ConnectionPool.createConnection(Unknown Source)


Thread Dumps - Examples of thread stuck in RMI call

"ExecuteThread: '52' for queue: 'default'" daemon prio=5 tid=0x4b3e40b0 nid=0x1170 waiting on monitor[0x4c74f000..0x4c74fdbc] at java.lang.Object.wait(Native Method) at weblogic.rjvm.ResponseImpl.waitForData(ResponseImpl.java:72) at weblogic.rjvm.ResponseImpl.getTxContext(ResponseImpl.java:97) at weblogic.rmi.internal.BasicOutboundRequest.sendReceive(BasicOutboundRequest.java:80) at weblogic.rmi.cluster.ReplicaAwareRemoteRef.invoke(ReplicaAwareRemoteRef.java:262) at weblogic.rmi.cluster.ReplicaAwareRemoteRef.invoke(ReplicaAwareRemoteRef.java:229) atweblogic.rmi.internal.ProxyStub.invoke(ProxyStub.java:35) at $Proxy6.lookup(Unknown Source) at weblogic.jndi.internal.WLContextImpl.lookup(WLContextImpl.java:341) atweblogic.servlet.internal.HttpServer.lookupROIDS(HttpServer.java:789) at weblogic.servlet.security.internal.SecurityModule.getCurrentUser(SecurityModule.java:207) at weblogic.servlet.security.internal.SecurityModule.checkAuthenticate(SecurityModule.java:235) at weblogic.servlet.security.ServletAuthentication.weak(ServletAuthentication.java:271 at weblogic.servlet.security.internal.ServletSecurityManager.checkAccess(ServletSecurityManager.ja va:124)at weblogic.servlet.internal.WebAppServletContext.invokeServlet(WebAppServletContext.jav a:2518) at 

weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2260) at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:139) at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:120)


Thread Dumps - Examples of thread stuck in LDAP call awaiting LDAP response


"[STUCK] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x000000004c8a2000 nid=0x7559 in Object.wait() [0x000000004296c000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaaeced3718> (a netscape.ldap.LDAPSearchListener)
at java.lang.Object.wait(Object.java:485)
at netscape.ldap.LDAPMessageQueue.waitForMessage(LDAPMessageQueue.java:200)
- locked <0x00002aaaeced3718> (a netscape.ldap.LDAPSearchListener)
at netscape.ldap.LDAPMessageQueue.waitFirstMessage(LDAPMessageQueue.java:101)
- locked <0x00002aaaeced3718> (a netscape.ldap.LDAPSearchListener)
at netscape.ldap.LDAPConnection.sendRequest(LDAPConnection.java:1796)
- locked <0x00002aaaeced3818> (a netscape.ldap.LDAPConnection)
at netscape.ldap.LDAPConnection.search(LDAPConnection.java:2566)
at weblogic.security.providers.authentication.LDAPAtnDelegate.getDNForUser(LDAPAtnDelegate.java:3782)
at weblogic.security.providers.authentication.LDAPAtnDelegate.authenticate(LDAPAtnDelegate.java:3625)
at weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl.login(LDAPAtnLoginModuleImpl.java:220)
at com.bea.common.security.internal.service.LoginModuleWrapper$1.run(LoginModuleWrapper.java:110)
at java.security.AccessController.doPrivileged(Native Method)

Dumps - Examples of idle thread
"[ACTIVE] ExecuteThread: '96' for queue: 'weblogic.kernel.Default (self-tuning)'" id=313028 idx=0x1d0 tid=4686 prio=5 alive, in native, waiting, daemon
-- Waiting for notification on: weblogic/work/ExecuteThread@0x739b42e0[fat lock]


at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
at jrockit/vm/Locks.wait(Locks.java:2229)[inlined]
at java/lang/Object.wait(Object.java:474)[inlined]
at weblogic/work/ExecuteThread.waitForRequest(ExecuteThread.java:156)[inlined]
at weblogic/work/ExecuteThread.run(ExecuteThread.java:177)[optimized]
^-- Lock released while waiting: weblogic/work/ExecuteThread@0x739b42e0[fat lock]
at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
-- end of trace


Thread Requesting Locks
• If the lock is not held by another thread then the thread becomes the new holder of the lock until it relinquishes it
• If the lock is currently held then the thread requesting it joins the blocked list of contenders for this lock
• Threads holding locks should relinquish them as soon as they stop needing them to avoid impairing on other threads

Deadlocks
• Deadlocks describe a situation where two or more threads are blocked waiting for each other (circular dependency). Neither block will ever end because each thread is waiting for the other
• Deadlocks can occur when multiple threads need the same locks, at the same time, but obtain them in different order
• Restart of the server is the only option

Open Lock Chains
• JRockit JVM automatically detects deadlocked, blocked, and open lock chains among the running threads
• Open lock chains represent a straight dependency, thread A is waiting for B which is waiting for C, and so on. If you have long open lock chains, your application might be wasting time waiting for locks. You may then want to reconsider how locks are used for synchronization in your application Deadlock detection with HotSpot

• Java HotSpot VM also provides a deadlock detection utility invoked by a Ctrl+\ on the command line while an application is running
• If any deadlocks are detected, additional information after the thread dump on each deadlocked thread is printed
• The utility detects Java-platform-level deadlocks, including locking done from JNI, JVMPI, and JVMDI.

Locking Issues
• Analyze the dependencies between code and locks
• Resolving bottlenecks due to locking
• In case of deadlocks, check the locking and try to establish an order of locking or to avoid the locks entirely

Stuck Threads
• Check health of backend database, network communication, external services (e.g. LDAP) or other running servers that could lead to latencies in responding to requests
• Threads stuck in database SQL read/execute can indicate db response issue or SQL tuning issue
• Threads stuck on socket read calls indicate slow read or backend services or resources not responding in a timely fashion or socket closures due to firewall or other bad socket handling
• Threads stuck in socket connect calls indicate an unavailable backend service or incorrect connection info

Other causes of the stuck thread(s)
• Once the stuck thread(s) is/are identified, look at the thread stack to understand what work is being done. It could be any of the following, but not limited to:
• EJB or WebService Call
• HTTP servlet or JSP

• JDBC call using WLS pooling connectivity
• A WebService invoking another WS
• A HTTP servlet invoking an EJB
• Application Deadlock


Origin of the problematic class
• If the thread is stuck on a class that’s part of your application then you should look at the method invoked and debug the class
• If the thread is stuck on a class outside your control then you should check for known defects or contact the vendor

GC and heap
• Make sure that GC cycles do not create bottlenecks
• Use –verbose:GC combined with - XX:+PrintGCDetails -XX:+PrintGCTimeStamps to to log GC stats about each GC
• The -verbose:gc option can be dynamically enabled at runtime using the management API or JVM tool interface (TI)

Heap size
• Insure enough java heap is available especially at peak times
• Not enough heap can lead to OutOfMemoryError exceptions and impact the performance of the Java application

JRockit/JRA
• Use JRockit Runtime Analyzer (JRA) recordings to gather general data about the application (GCs, Heap, Optimizations, Objects stats etc…)

JConsole
• Java Monitoring and Management Console included in JDK 5.0
• Heap Memory
• Threads
• Classes
• CPU Usage
• VM Summary
• Mbeans
TDA can be used as JConsole or VisualVM plug-in