Tuesday, September 29, 2009

Analyzing Java Thread Dump - "main" thread

A Sample Thread Dump

Below is a sample thread dump from a single-threaded application that is using the Sun JVM 1.4.1.

  • The thread named "main" is the main application thread.
  • All other threads are created by the JVM to do housekeeping work.
  • While analyzing application-level problems, we generally focus only on the application threads.
Let's analyze the stack trace of the "main" thread from below TD

"main" prio=5 tid=0x002358B8 nid=0x7f8 runnable [6f000..6fc40] at test.method1(test.java:10) at test.main(test.java:5)

From this code snippet you can see that a thread stack trace has a
name,
thread priority (prio=5),
state (runnable),
source code line numbers,
and the method calls.

You can conclude from this stack trace that the thread "main" is executing some code in the method "method1" of class "test". The call to this method came from the method "main" of the same class. You can also see the exact source code line numbers in those methods.

Before moving on to analyzing thread dumps from more complex scenarios, let's discuss the different state of threads that we normally see in the thread dumps and what they mean.

  • Runnable: Either running or ready to run when it gets its CPU turn. JRockit thread dumps refer to this state as ACTIVE.
  • Waiting on monitor: Either sleeping or waiting on an object for said period of time or waiting to be notified by another thread. This will happen when any of the sleep() methods on Thread object or wait() method on an Object is called.
  • Waiting for monitor entry: Waiting to lock an object (some other thread may be holding the lock). This happens if two or more threads try to execute some synchronized code blocks/methods of an object at the same time. Please note that the lock is always for an object and not for individual methods. That is, if a thread has to execute a synchronized method of an object, it has to first lock that object.
For example, in BEA WebLogic Server the idle execute threads are in this condition and they wait until a socket reader thread notifies them of some new work to be done. The stack trace will look like this:

"ExecuteThread: '2' for queue: 'weblogic.admin.RMI'" daemon prio=5 tid=0x1752F040 nid=0x180c in Object.wait() [1887f000..1887fd8c] at java.lang.Object.wait(Native Method) waiting on <04134d98> (a weblogic.kernel.ExecuteThread) at java.lang.Object.wait(Object.java:426)

Some other versions of the JVM call this state CW. JRockit and refers to this state as WAITING.

The following is a sample stack trace of a thread in "Waiting for monitor entry" condition:

"ExecuteThread: '24' for queue: 'DisplayExecuteQueue'" daemon prio=5 tid=0x5541b0 nid=0x3b waiting for monitor entry [49b7f000..49b7fc24] at weblogic.cluster.replication.ReplicationManager.createSecondary (ReplicationManager.java:908)
- waiting to lock <6c4b9130> (a java.lang.Object) at weblogic.cluster.replication.ReplicationManager.updateSecondary (ReplicationManager.java:715)

In this snippet you can see that this thread holds the lock on an object (6c408700) and is waiting to lock another object (6c4b9130).

Some other version of a JVM may not give the object IDs lock information in the stack trace. In those cases, we can guess that the thread is waiting to lock the object from the state of the thread. The same state may also be called as "MW". JRockit refers to this state as LOCKED.

Now, let's see some thread dumps from different scenarios and analyze them to find the problem in the application.

Tuesday, September 15, 2009

Tools For Thread Dump Analysis

If an application seems stuck, or is running out of resources, a thread dump will reveal the state of the server.

Tools for Thread dump Analysis

a)Samurai
http://yusuke.homeip.net/samurai/en/index.html

b) Tda
https://tda.dev.java.net/

Friday, September 11, 2009

Analyzing Java Thread Dumps - "weblogic.socket.Muxer"

What are these weblogic.socket.Muxer threads seen in thread dumps ?

Socket Reader Threads accept the incoming request from the Listen Thread Queue and put it on the Execute Thread Queue.

There are 3 socket reader threads by default "weblogic.socket.Muxer".

One socket reader thread is usually in the poll function, while the others are available to process requests.


"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon prio=5 tid=0x016b2148 nid=0x42 waiting for monitor entry [5997f000..5997fc28]
at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:91)
- waiting to lock <0x94846b40> (a java.lang.String)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:219)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:178)

"ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'" daemon prio=5 tid=0x00683c28 nid=0x41 waiting for monitor entry [59a7f000..59a7fc28]
at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:91)
- waiting to lock <0x94846b40> (a java.lang.String)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:219)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:178)

"ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'" daemon prio=5 tid=0x0079e5b0 nid=0x40 runnable [59b7f000..59b7fc28]
at weblogic.socket.PosixSocketMuxer.poll(Native Method)
at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:100)
- locked <0x94846b40> (a java.lang.String)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:219)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:178)


Should we worry about the Weblogic.socket.Muxer threads which always show 2 threads waiting for lock and 3rd thread locking the same object?
No.

weblogic.socket.Muxer threads are not a cause of stuck threads.



Why do they mostly show as being Stuck in Samurai TD analyzer ?




The muxer threads are shown as being Stuck since they're all locked on the same object.

"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'"
- waiting to lock <0xd9b61098> (a java.lang.String)

"ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'"
- waiting to lock <0xd9b61098> (a java.lang.String)

"ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'"
- locked <0xd9b61098> (a java.lang.String)


But you will see the same in any Thread dump even on a development instance with no requests.
The locks mentioned do show up as red in Samurai - but they aren't deadlocks just regular locks.

A thread gains an exclusive lock on an object to perform some action, then frees it allowing the next thread to gain access.

Additionally, if you look at the thread dumps over time, you'll see that these specific locks are not always present - they are moving between the threads which is indicative of their transitory nature.


I want to know more details on Muxers

The socket Muxer manages the server’s existing socket connections. It first determines which sockets have incoming requests waiting to be processed. It then reads enough data to determine the protocol and dispatches the socket to an appropriate runtime layer based on the protocol.
In the runtime layer, the socket muxer threads determine which execute thread queue to be used and delegates the request accordingly.



Weblogic has two versions of the socket muxer, one is the Java version and the other uses a native library which makes better use of operating system calls. The Enable Native IO checkbox on the server’s configuration settings tells the server which version to use. This is ON by default for most platforms.

Native muxers provide superior scalability because they implement a non-blocking thread model. When a native muxer is used, the server creates a fixed number of threads dedicated to reading incoming requests. Oracle recommends using the default setting of true for the Enable Native IO parameter which allows the server to automatically select the appropriate muxer to use.

You must ensure that to use Native I/O, the native library must be present in the server’s shared library path . This is set up with the default scripts. When the server does not find the native library, it throws an error java.lang.UnsatisfiedLinkError: no muxer in java.library.path and then loads the Java version of the muxer.

Confirm the LD library path is okay and pointing to the Solaris LD path. Check the startup log when starting a managed server.

What is the value of
java.library.path?

This is where the JVM actually get's the library from.


http://m-button.blogspot.com/2008/08/how-does-weblogic-handle-socket-muxers.html has a good example of how to identify Native vs Java muxer in a thread dump.

The Thread Dump I’ve used in my examples above uses the Native muxer (weblogic.socket.PosixSocketMuxer) on Solaris.

Solaris has another Native muxer called the weblogic.socket.DevPollSocketMuxer
An example TD using this muxer is shown below.


"ExecuteThread: '4' for queue: 'weblogic.socket.Muxer'" waiting for lock java.lang.String@4edf4f BLOCKED
weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:95)
weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)



"ExecuteThread: '3' for queue: 'weblogic.socket.Muxer'" RUNNABLE native
weblogic.socket.DevPollSocketMuxer.doPoll(Native Method)
weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:96)
weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)



weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)
"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" waiting for lock java.lang.String@4edf4f BLOCKED
weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:95)
weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)



"ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'" waiting for lock java.lang.String@4edf4f BLOCKED
weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:95)
weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)



"ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'" waiting for lock java.lang.String@4edf4f BLOCKED
weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:95)
weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)


To change the number of Muxers from the default, follow the instructions given at http://e-docs.bea.com/wls/docs92/ConsoleHelp/taskhelp/tuning/TuningSocketReaders.html


Another ERROR pattern associated with Muxers is shown below


#### <>

Monitored the File Descriptors available to this Weblogic server using

Here the Weblogic pid = 8359

/usr/proc/bin/pfiles 8359 | grep rlimit


1024

ls /proc/8359/fd | wc –l


140


The first command shows that 1024 FDs are allocated to this process, and the second shows how many are being used in real-time.

A plot of the second value during high load times shows the FDs go up to 900+ in use and hence a resolution used in this case is to increase the number of FDs.

A recommended value is to increase it to 8192.