Thursday, December 10, 2009

Analyzing Java Thread Dump - weblogic.kernel.Default

Sample Thread Dump JDK 1.5/WL9.2

Below is snippet from JDK1.5/Weblogic 9.2 thread dump

"[ACTIVE] ExecuteThread: '53' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x008c78c8 nid=0x38f4d in Object.wait() [0xd45ff000..0xd45ff9f0]
at java.lang.Object.wait(Native Method)
- waiting on <0xeade2668> (a weblogic.work.ExecuteThread)
at java.lang.Object.wait(Object.java:474)
at weblogic.work.ExecuteThread.waitForRequest(ExecuteThread.java:165)
- locked <0xeade2668> (a weblogic.work.ExecuteThread)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:186)


"[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" daemon prio=10 tid=0x00efa208 nid=0x15 runnable [0xd93ff000..0xd93ffa70]
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
- locked <0xe9606518> (a java.net.PlainDatagramSocketImpl)
at java.net.PlainDatagramSocketImpl.receive(PlainDatagramSocketImpl.java:136)
- locked <0xe9606518> (a java.net.PlainDatagramSocketImpl)
at java.net.DatagramSocket.receive(DatagramSocket.java:712)
- locked <0xf31a4af0> (a java.net.DatagramPacket)
- locked <0xe92f9968> (a java.net.MulticastSocket)
at weblogic.cluster.FragmentSocket.receive(FragmentSocket.java:202)
at weblogic.cluster.MulticastManager.run(MulticastManager.java:400)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:209)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:181)

Wednesday, December 9, 2009

Find User unix Command - fuser

=== Find User ===

  • fuser displays the PIDs of processes using the specified files or file systems.
  • fuser command is very useful in getting rid of the .nfs files created when processes are killed.
  • The .nfs* names refer to files that have been deleted but are still being held open by a running process.

You can find .nfs files using ls -la

rm does not work in removing .nfs files - use fuser to find the process that's locking the file and then kill the process

 /usr/sbin/fuser {filename} 
kill -9 {pid}

If you stop the processes that have them open, the files will be tidied-up and removed by the file server.

General background Info:

Most operating systems, including UNIX, operate a policy of not actually removing a deleted file (and freeing up it's data blocks) until the last process that has the file open closes it. So, if a running process has a file open and you use the rm(1) command to delete the file the data blocks on the disk will not be freed up by the OS until the process that has the file open closes it.

On a UNIX host using local disk store this behaviour can manifest itself it some seemingly confusing situations. For example, you may wish to free up some space on a file system that's used 900 MB of it's 1GB quota. You have a large file, 200MB say, named myjava.jar that you believe is no longer required, but is actually currently open in your WebLogic server. Not knowing this you delete myjava.jar and do an ls(1) command to see that the file is no longer listed. However, when you use the df(1) command it still reports that 900 MB of it's 1GB quota is used because your WebLogic server still has the file open. When you shutdown the WebLogic server or the server closes the file the disk space will be released and df(1) will report 700MB of it's 1GB is used.

If the file that is removed is on NFS mounted store then it is possible for a file to be deleted on one client whilst still being open on another client. In this situation the same rule of not actually deleting the file until the last process with it open closes it still applies. However, in order for the NFS file handle used by the client that still has the file open not to be broken a filename reference must be maintained. In order to achieve this and remove the files name from the directory ( so it doesn't show up in an ls command output) the NFS file server renames the deleted file to a name beginning '.nfs'. These are the files you are seeing. When the last process with these files open dies or closes them they will be tidied up and removed. Trying to delete them before they are closed will only result in the file being renamed again.

Useful Unix Commands

=== Find Class within Jar file ===

To find a class within a binary jar file

 for i in `ls *.jar`; do (jar tf $i  grep '{classname}' ) && echo $i;done   

If you want to search within all subdirectories use this

 for i in `find . -name ‘*.jar’`; do (jar tf $i  grep '{CLASSNAME}' ) && echo $i;done 

=== Find string within file of name ===

find . -name *.xml -exec grep  {} \;

example

find . -name web.xml -exec grep -i servlet {} \;

=== Grep within zip file without unzipping it ===

gzip -c -d {file}.gz | grep {string}

Example

gzip -c -d admin_access.log0001_4Dec08_0147.gz | grep -i adq | wc -l


=== Find process using the port ===

The easy way to do this is using netstat passing the port number

netstat -a | grep 61014

If that does not help getting a pid, run this line below

for i in `ls /proc`; do pfiles $i | grep AF_INET | grep 61014 ; done

Output:

pfiles: permission denied: 12363
sockname: AF_INET 0.0.0.0 port: 61014
pfiles: permission denied: 12384

this shows the process appearing in between pids 12363 and 12384 uses that port.


Weblogic Related Issues/Solutions - Case 1

Problem:
Platform running Weblogic 8.1 on Sun V880 servers. Total RAM of 32 Gb on the machine.
2 Gb assigned to the managed server JVM heap. JDK 1.4

Initial settings:

-XX:+AggressiveHeap -Xms2048m -Xmx2048m  -XX:SurvivorRatio=32 -XX:MaxPermSize=128m
But still there are 20 Full GCs per hour in peak times, before the server crashes.


Analysis
  • 1. It was decided to reduce the SurvivorRatio to 4 and restart with some more flags.
  • The size of ONE Survivor Space is calculated as
  • SurvivorSpace = NewSize / (SurvivorRatio + 2)
  • Keeping SurvivorRatio as 32 means the Survivor spaces are too small for promoting stuff from Eden. Hence we reduce this to 4 which allows for larger Survivor spaces.
  • 2. As per Sun Bug ID: 6218833, setting AggressiveHeap set before Heapsize (Xmx and Xms) can confuse the JVM. Revert the order to have -Xms and -Xmx to come before -XX:+AggressiveHeap or not use it
  • 3. The application has 180+ EJBs with pools of beans. Hence set the -Dsun.rmi.dgc.client.gcInterval=3600000 (1 hour) instead of the default 60000 (1 min). More on this here: http://docs.sun.com/source/817-2180-10/pt_chap5.html
  • 4. The site is restarted once a week at 4:30AM. The patterns stays normal for 2 days – and then degrades into full GC.
  • 5. The Old space is pretty much full – at every minor collection – the Old space must be cleared up for promotion from Young to Old to take place.
  • 6. Permanent space is pretty much full – keeps loading classes and classes ( could that be a problem – the difference between the number of JSP’s per Release?)
  • Hence we increased the PermSpace from 128M to 256M
  • 7. Ensure we are running the server JVM by using the -server flag
  • 8. Use OptimizeIt or similar profiling tool to see the memory usage and find code bottlenecks.
The settings now were
-server -Xms2048m -Xmx2048m  -XX:MaxNewSize=512m -XX:NewSize=512m -XX:SurvivorRatio=4 -XX:MaxPermSize=256m -Xincgc -XX:+DisableExplicitGC -XX:+AggressiveHeap -XX:-OmitStackTraceInFastThrow
This reduced the Full GCs to one a day.

Error Logs

At the time of the server going out of memory prior to a crash, the logs are filled with repeated errors (up to 100 repetitions) of this sort

java.lang.NullPointerException
<>
Adding the -XX:-OmitStackTraceInFastThrow flag resolves this problem, the root cause of the NPE it self has to be tracked down but we do not have any longer the issue of huge recursive exception strings.

We could now see the stack trace as

java.lang.NullPointerException
at java.util.StringTokenizer.(StringTokenizer.java:117)
at java.util.StringTokenizer.(StringTokenizer.java:133)
at jsp_servlet._framework._security.__login._jspService(login.jsp:294)
at weblogic.servlet.jsp.JspBase.service(JspBase.java:27)
at weblogic.servlet.internal.ServletStubImpl$ServletInvocation

Java Object Life Cycle and Garbage Collection

Java object life cycle and garbage collection occurs like this:




  • Objects when created are always first allocated to Eden.
  • When Eden fills up, a fast but not comprehensive GC (minor collection) is run over the young generation only.
  • All surviving objects are moved from Eden into one Survivor Space.
  • In consequent minor collections, new objects move from Eden into the other Survivor Space, plus everything from the first Survivor Space (survivors from the previous minor collection) is also moved into the second Survivor Space. Thus one survivor should be empty at that time.
  • When objects in Survivor Space are old enough (or survivor fills up), they are moved to Tenured. By default the long-lived objects may be copied up to 31 times between the Survivor Spaces before they are finally promoted to the Old generation.
  • When tenured fills up, a Full GC collection is run that is comprehensive: the entire heap is analyzed, all objects that can be destroyed are killed and memory is reclaimed.
  • Note: the above lifecycle changes slightly when advanced options such as ConcurrentMarkSweep etc are enabled.

Tuesday, September 29, 2009

Analyzing Java Thread Dump - "main" thread

A Sample Thread Dump

Below is a sample thread dump from a single-threaded application that is using the Sun JVM 1.4.1.

  • The thread named "main" is the main application thread.
  • All other threads are created by the JVM to do housekeeping work.
  • While analyzing application-level problems, we generally focus only on the application threads.
Let's analyze the stack trace of the "main" thread from below TD

"main" prio=5 tid=0x002358B8 nid=0x7f8 runnable [6f000..6fc40] at test.method1(test.java:10) at test.main(test.java:5)

From this code snippet you can see that a thread stack trace has a
name,
thread priority (prio=5),
state (runnable),
source code line numbers,
and the method calls.

You can conclude from this stack trace that the thread "main" is executing some code in the method "method1" of class "test". The call to this method came from the method "main" of the same class. You can also see the exact source code line numbers in those methods.

Before moving on to analyzing thread dumps from more complex scenarios, let's discuss the different state of threads that we normally see in the thread dumps and what they mean.

  • Runnable: Either running or ready to run when it gets its CPU turn. JRockit thread dumps refer to this state as ACTIVE.
  • Waiting on monitor: Either sleeping or waiting on an object for said period of time or waiting to be notified by another thread. This will happen when any of the sleep() methods on Thread object or wait() method on an Object is called.
  • Waiting for monitor entry: Waiting to lock an object (some other thread may be holding the lock). This happens if two or more threads try to execute some synchronized code blocks/methods of an object at the same time. Please note that the lock is always for an object and not for individual methods. That is, if a thread has to execute a synchronized method of an object, it has to first lock that object.
For example, in BEA WebLogic Server the idle execute threads are in this condition and they wait until a socket reader thread notifies them of some new work to be done. The stack trace will look like this:

"ExecuteThread: '2' for queue: 'weblogic.admin.RMI'" daemon prio=5 tid=0x1752F040 nid=0x180c in Object.wait() [1887f000..1887fd8c] at java.lang.Object.wait(Native Method) waiting on <04134d98> (a weblogic.kernel.ExecuteThread) at java.lang.Object.wait(Object.java:426)

Some other versions of the JVM call this state CW. JRockit and refers to this state as WAITING.

The following is a sample stack trace of a thread in "Waiting for monitor entry" condition:

"ExecuteThread: '24' for queue: 'DisplayExecuteQueue'" daemon prio=5 tid=0x5541b0 nid=0x3b waiting for monitor entry [49b7f000..49b7fc24] at weblogic.cluster.replication.ReplicationManager.createSecondary (ReplicationManager.java:908)
- waiting to lock <6c4b9130> (a java.lang.Object) at weblogic.cluster.replication.ReplicationManager.updateSecondary (ReplicationManager.java:715)

In this snippet you can see that this thread holds the lock on an object (6c408700) and is waiting to lock another object (6c4b9130).

Some other version of a JVM may not give the object IDs lock information in the stack trace. In those cases, we can guess that the thread is waiting to lock the object from the state of the thread. The same state may also be called as "MW". JRockit refers to this state as LOCKED.

Now, let's see some thread dumps from different scenarios and analyze them to find the problem in the application.

Tuesday, September 15, 2009

Tools For Thread Dump Analysis

If an application seems stuck, or is running out of resources, a thread dump will reveal the state of the server.

Tools for Thread dump Analysis

a)Samurai
http://yusuke.homeip.net/samurai/en/index.html

b) Tda
https://tda.dev.java.net/

Friday, September 11, 2009

Analyzing Java Thread Dumps - "weblogic.socket.Muxer"

What are these weblogic.socket.Muxer threads seen in thread dumps ?

Socket Reader Threads accept the incoming request from the Listen Thread Queue and put it on the Execute Thread Queue.

There are 3 socket reader threads by default "weblogic.socket.Muxer".

One socket reader thread is usually in the poll function, while the others are available to process requests.


"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" daemon prio=5 tid=0x016b2148 nid=0x42 waiting for monitor entry [5997f000..5997fc28]
at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:91)
- waiting to lock <0x94846b40> (a java.lang.String)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:219)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:178)

"ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'" daemon prio=5 tid=0x00683c28 nid=0x41 waiting for monitor entry [59a7f000..59a7fc28]
at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:91)
- waiting to lock <0x94846b40> (a java.lang.String)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:219)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:178)

"ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'" daemon prio=5 tid=0x0079e5b0 nid=0x40 runnable [59b7f000..59b7fc28]
at weblogic.socket.PosixSocketMuxer.poll(Native Method)
at weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:100)
- locked <0x94846b40> (a java.lang.String)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:219)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:178)


Should we worry about the Weblogic.socket.Muxer threads which always show 2 threads waiting for lock and 3rd thread locking the same object?
No.

weblogic.socket.Muxer threads are not a cause of stuck threads.



Why do they mostly show as being Stuck in Samurai TD analyzer ?




The muxer threads are shown as being Stuck since they're all locked on the same object.

"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'"
- waiting to lock <0xd9b61098> (a java.lang.String)

"ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'"
- waiting to lock <0xd9b61098> (a java.lang.String)

"ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'"
- locked <0xd9b61098> (a java.lang.String)


But you will see the same in any Thread dump even on a development instance with no requests.
The locks mentioned do show up as red in Samurai - but they aren't deadlocks just regular locks.

A thread gains an exclusive lock on an object to perform some action, then frees it allowing the next thread to gain access.

Additionally, if you look at the thread dumps over time, you'll see that these specific locks are not always present - they are moving between the threads which is indicative of their transitory nature.


I want to know more details on Muxers

The socket Muxer manages the server’s existing socket connections. It first determines which sockets have incoming requests waiting to be processed. It then reads enough data to determine the protocol and dispatches the socket to an appropriate runtime layer based on the protocol.
In the runtime layer, the socket muxer threads determine which execute thread queue to be used and delegates the request accordingly.



Weblogic has two versions of the socket muxer, one is the Java version and the other uses a native library which makes better use of operating system calls. The Enable Native IO checkbox on the server’s configuration settings tells the server which version to use. This is ON by default for most platforms.

Native muxers provide superior scalability because they implement a non-blocking thread model. When a native muxer is used, the server creates a fixed number of threads dedicated to reading incoming requests. Oracle recommends using the default setting of true for the Enable Native IO parameter which allows the server to automatically select the appropriate muxer to use.

You must ensure that to use Native I/O, the native library must be present in the server’s shared library path . This is set up with the default scripts. When the server does not find the native library, it throws an error java.lang.UnsatisfiedLinkError: no muxer in java.library.path and then loads the Java version of the muxer.

Confirm the LD library path is okay and pointing to the Solaris LD path. Check the startup log when starting a managed server.

What is the value of
java.library.path?

This is where the JVM actually get's the library from.


http://m-button.blogspot.com/2008/08/how-does-weblogic-handle-socket-muxers.html has a good example of how to identify Native vs Java muxer in a thread dump.

The Thread Dump I’ve used in my examples above uses the Native muxer (weblogic.socket.PosixSocketMuxer) on Solaris.

Solaris has another Native muxer called the weblogic.socket.DevPollSocketMuxer
An example TD using this muxer is shown below.


"ExecuteThread: '4' for queue: 'weblogic.socket.Muxer'" waiting for lock java.lang.String@4edf4f BLOCKED
weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:95)
weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)



"ExecuteThread: '3' for queue: 'weblogic.socket.Muxer'" RUNNABLE native
weblogic.socket.DevPollSocketMuxer.doPoll(Native Method)
weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:96)
weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)



weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)
"ExecuteThread: '2' for queue: 'weblogic.socket.Muxer'" waiting for lock java.lang.String@4edf4f BLOCKED
weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:95)
weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)



"ExecuteThread: '1' for queue: 'weblogic.socket.Muxer'" waiting for lock java.lang.String@4edf4f BLOCKED
weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:95)
weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)



"ExecuteThread: '0' for queue: 'weblogic.socket.Muxer'" waiting for lock java.lang.String@4edf4f BLOCKED
weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:95)
weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)


To change the number of Muxers from the default, follow the instructions given at http://e-docs.bea.com/wls/docs92/ConsoleHelp/taskhelp/tuning/TuningSocketReaders.html


Another ERROR pattern associated with Muxers is shown below


#### <>

Monitored the File Descriptors available to this Weblogic server using

Here the Weblogic pid = 8359

/usr/proc/bin/pfiles 8359 | grep rlimit


1024

ls /proc/8359/fd | wc –l


140


The first command shows that 1024 FDs are allocated to this process, and the second shows how many are being used in real-time.

A plot of the second value during high load times shows the FDs go up to 900+ in use and hence a resolution used in this case is to increase the number of FDs.

A recommended value is to increase it to 8192.

Sunday, February 15, 2009

Visiting places in Boston


Top Visiting Places

New England Aquarium
-- $20 (adult) (Winter 9-5pm, Summer 9-6pm)
-- http://www.neaq.org/index.php

Whale Watch
-- $36 (adult) (Winter 9-5pm, Summer 9-6pm)
-- http://www.neaq.org/visit_planning/tickets_and_hours/index.php

Quincy Market/Faneuil Hall -- Shopping mall
-- http://www.faneuilhallmarketplace.com/

USS Constitution
--http://www.ussconstitution.navy.mil/

Museum of Science
-- $23.50
-- http://www.mos.org/

Skywalk Observatory (the Prudential shop center)
-- $11 (Summer 10:00 a.m. to 10:00 p.m)
-- http://www.prudentialcenter.com/shop/shop_detail.php?id=64

Public Garden/Swan boats
-- April 18, 2009 - Sep 20, 2009
-- http://www.swanboats.com/
-- Spring (April to June 20) 10 am to 4 pm
--Summer (June 21 to Labor Day) 10 am to 5 pm
--Fall (After Labor Day) 12 pm to 4 pm Weekdays
--10 am to 4 pm Weekends
--Adults $2.75

Take DUCK TOURS - $29 per head
(
http://www.bostonducktours.com/)

Websites
1. City View Trolley Tours (Hop on & off bus)- $29 -- http://cityviewtrolleys.com/tour.html
2. Duck Tours - $29 -- http://www.bostonducktours.com/
3. Boston Harborcruises -- http://www.bostonharborcruises.com/
4. Boston Visiting places Map - http://cityviewtrolleys.com/schedule.html
5. Boston Harbor Cruises-- http://www.bostonharborcruises.com/

Address for GPS
1. Quincy Market/Faneuil Hall -- 5 broad street,Boston MA, 02109 (parking at quincy market)
$13 flat rate on Saturday and Sunday (http://www.75statestreetgarage.com/).

Tips & Recommendations
Take City View Trolley Tours
By Car ---> Park your car in 75 StateStreet garage which is close to Quincy market/Boston Harbor cruises/ New england Acquarium. Then buy tickets for Hop on & off bus tickets.
By train ---> Take train to South station. Then buy tickets for Hop ON & off bus.

Take Duck Tour -- $29 per head
By Car -->

Indian Restaurants
Mela
578 Tremont Street
South End • Boston, MA
617.859-4805 Fax: 617.859.4808
Open daily 11:30AM - 11:00 pm

Bukhra
701 Centre Street
Jamaica Plain, MA 02130
(617) 522-2195 Fax: (617) 971-0890

Diva
246 Elm Street
Davis Square • Somerville, MA 02144
(617) 629–4963 Fax: (617) 629-4966
Open daily 11:30 am - 11:00 pm

cafe of india
52A Brattle Street
Harvard Square • Cambridge, MA 02138
(617) 661–0683 Fax: (617) 661-5922
Open daily 11:30 am - 11:00 pm