Friday, August 19, 2011

Calculating Memory Utilization on Linux

Below is the way to calculate the "actual" memory utilization in the past using "sar" file. Below is the way to do the same.

Method 1:

tch memory utilization data from sar files using -r switch.

sar -r -f

It will display data in this format

04:40:01 PM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
04:50:01 PM 258788 16372260 98.44 12696 13912224 12484624 94144 0.75 420

Information being displayed here, is somewhat misleading. According to it, at 04:50PM, 98.44% memory was utilized (As its merely calculated by formula kbmemused/sum(kbmemused, kbmemfree)).

But in actual that was not the case. In order to get actual memory utilization, subtract kbbuffers and kbcached from kbmemused, and then used the above formula. In this case,
Memory utilization = (16372260-12696-13912224)/(258788+16372260) = 14.71%

The reason behind this is Linux treats unused memory as a wasted resource and so uses as much RAM as it can to cache process/kernel information.

Here Buffers= amount of physical memory used as buffers for disk writes
Cached = amount of physical memory used as cache for disk reads

Method 2:
I prefer this way

free -m command


# free -m
                total       used       free     shared    buffers     cached
Mem:         12011       9825       2186          0        243       5829
-/+ buffers/cache:       3752       8259
Swap:        16378        313      16065      
Real Free memory  = ((Buffers+Cached)  + free
                              = ((243 + 5829)  + 2186
                              = 8259 Free memory

Tuesday, August 16, 2011

Examine NFS Performance nfsstat

nfsstat

nfsstat can be used to examine NFS performance.

nfsstat -s reports server-side statistics. In particular, the following are important:

  • calls: Total RPC calls received.
  • badcalls: Total number of calls rejected by the RPC layer.
  • nullrecv: Number of times an RPC call was not available even though it was believed to have been received.
  • badlen: Number of RPC calls with a length shorter than that allowed for RPC calls.
  • xdrcall: Number of RPC calls whose header could not be decoded by XDR (External Data Representation).
  • readlink: Number of times a symbolic link was read.
  • getattr: Number of attribute requests.
  • null: Null calls are made by the automounter when looking for a server for a filesystem.
  • writes: Data written to an exported filesystem.

Sun recommends the following tuning actions for some common conditions:

  • writes > 10%: Write caching (either array-based or host-based, such as a Prestoserv card) would speed up operation.
  • badcalls >> 0: The network may be overloaded and should be checked out. The rsize and wsize mount options can be set on the client side to reduce the effect of a noisy network, but this should only be considered a temporary workaround.
  • readlink > 10%: Replace symbolic links with directories on the server.
  • getattr > 40%: The client attribute cache can be increased by setting theactimeo mount option. Note that this is not appropriate where the attributes change frequently, such as on a mail spool. In these cases, mount the filesystems with the noac option.

nfsstat -c reports client-side statistics. The following statistics are of particular interest:

  • calls: Total number of calls made.
  • badcalls: Total number of calls rejected by RPC.
  • retrans: Total number of retransmissions. If this number is larger than 5%, the requests are not reaching the server consistently. This may indicate a network or routing problem.
  • badxid: Number of times a duplicate acknowledgement was received for a single request. If this number is roughly the same as badcalls, the network is congested. The rsize and wsize mount options can be set on the client side to reduce the effect of a noisy network, but this should only be considered a temporary workaround.
    If on the other hand, badxid=0, this can be an indication of a slow network connection.
  • timeout: Number of calls that timed out. If this is roughly equal to badxid, the requests are reaching the server, but the server is slow.
  • wait: Number of times a call had to wait because a client handle was not available.
  • newcred: Number of times the authentication was refreshed.
  • null: A large number of null calls indicates that the automounter is retrying the mount frequently. The timeo parameter should be changed in the automounter configuration.

nfsstat -m (from the client) provides server-based performance data.

  • srtt: Smoothed round-trip time. If this number is larger than 50ms, the mount point is slow.
  • dev: Estimated deviation.
  • cur: Current backed-off timeout value.
  • Lookups: If cur>80 ms, the requests are taking too long.
  • Reads: If cur>150 ms, the requests are taking too long.
  • Writes: If cur>250 ms, the requests are taking too long.

Monday, August 15, 2011

Unix netstat -an Output Explained

$ netstat -an


TCP: IPv4
Local Address Remote Address Swind Send-Q Rwind Recv-Q State
-------------------- -------------------- ----- ------ ----- ------ -----------
*.* *.* 0 0 49152 0 IDLE
10.128.27.50.16043 192.168.170.31.54871 49640 0 49640 0 ESTABLISHED
10.128.27.50.14353 10.128.14.211.44398 24820 0 49640 0 ESTABLISHED
127.0.0.1.57958 127.0.0.1.57957 49152 0 49152 0 TIME_WAIT
10.128.27.50.57959 10.128.27.108.11919 49640 0 49640 0 TIME_WAIT
10.128.27.50.16041 10.128.17.34.62013 49640 0 49640 0 TIME_WAIT
10.128.27.50.62393 10.128.27.51.16001 49640 0 49640 0 CLOSE_WAIT
10.128.27.50.36198 10.128.27.39.11035 49640 0 49640 0 ESTABLISHED
10.128.27.50.16035 *.* 0 0 49152 0 LISTEN


Useful Command : netstat -na | awk '{print $7}' | sort | uniq -c | sort -n

Swind/Rwind Are the window size for sending/receiving packets

Send-Q/Recv-Q Are the send/receive queue, meaning how many packets are waiting to be sent/received

LISTEN- Waiting to receive a connection

ESTABLISHED- A connection is active

CLOSE_WAIT:
This is a state where socket is waiting for the application to execute close()
CLOSE_WAIT is not something that can be configured where as TIME_WAIT can be set through tcp_time_wait_interval (The attribute tcp_close_wait_interval has nothing to do with close_wait state and this was renamed to tcp_time_wait_interval starting from Solaris 7)
A socket can be in CLOSE_WAIT state indefinitely until the application closes it.
Faulty scenarios would be like filedescriptor leak, server not being execute close() on socket leading to pile up of close_wait sockets. (At java level, this manifests as "Too many open files" error)

TIME_WAIT :
When the TCP socket closes, the side starting the close puts the socket into the TIME_WAIT state.this should last only a minute or two, then change back to LISTEN. The socket pair cannot be re-used as long the TIME_WAIT state persists.This is just a time based wait on socket before closing down the connection permanently.
Under most circumstances, sockets in TIME_WAIT is nothing to worry about.

A netstat will short the sockets in the TIME_WAIT state. The following shows an example of the TIME_WAIT sockets generated while benchmarking. Each client connection has a unique ephemeral port and the server always uses its public port:

Typical Benchmarking Netstat
unix> netstat
...
tcp 0 0 localhost:25033 localhost:8080 TIME_WAIT
tcp 0 0 localhost:25032 localhost:8080 TIME_WAIT
tcp 0 0 localhost:25031 localhost:8080 TIME_WAIT
tcp 0 0 localhost:25030 localhost:8080 TIME_WAIT
tcp 0 0 localhost:25029 localhost:8080 TIME_WAIT
tcp 0 0 localhost:25028 localhost:8080 TIME_WAIT

The socket will remain in the TIME_WAIT state for a system-dependent time, generally 120 seconds, but usually configurable. Since there are less than 32k ephemeral socket available to the client, the client will eventually run out and start seeing connection failures. On some operating systems, including RedHat Linux, the default limit is only 4k sockets. The full 32k sockets with a 120 second timeout limits the number of connections to about 250 connections per second.

If mod_caucho or isapi_srun are misconfigured, they can use too many connections and run into the TIME_WAIT limits. Using keepalives effectively avoids this problem. Since keepalive connections are reused, they won't go into the TIME_WAIT state until they're finally closed. A site can maximize the keepalives by setting thread-keepalive large and setting live-time and request-timeout to large values. thread-keepalive limits the maximum number of keepalive connections. live-time and request-timeout will configure how long the connection will be reused.

TCP Parameters on Different Platforms:



# ndd /dev/icmp \? - Solaris 10.

# /proc/sys/net/   All TCP/IP tuning parameters are located under this directory

Friday, August 5, 2011

Restricting Direct Access to Weblogic and Jboss by IP and Port

How to Prevent direct Access to Weblogic and Jboss ??

Weblogic

In order to prevent access directly to the port, we can implement ip filtering.

The steps to do so are:

1) Login into the Admin Console.
2) Click on the Domain on the left panel.
3) On the right side, Navigate to Security --> Filter tab.
4) Enable 'Connection Logger Enabled'
5) Enter the class name in the Connection Filter tab as ' weblogic.security.net.ConnectionFilterImpl '
6) Now at the Connection Filter Rules add the IP address and you can either set the roles as 'allow' or 'deny' for the mentioned IP address.

The format is: Accesing_Client ServerHosting Port Action Protocol

For example:

With this rule, I'll block any http requests coming from ip 10.157.152.62 to managed server located on socket: 10.157.153.161:7003

10.157.152.62 10.157.153.161 7003 deny http

With this other one, I'll grant access from all the requests http coming from 10.157.152.62 to managed server located on socket: 10.157.153.161:7004

10.157.152.62 10.157.153.161 7004 allow http

When trying to access the denied socket, a message like this should appear on the browser and in the log file for the server being tried to accessed.

The Server is not able to service this request: [Socket:000445]Connection rejected, filter blocked Socket, weblogic.security.net.FilterException: [Security:090220]rule 1

For further information, please refer to documentation: http://download.oracle.com/docs/cd/E12839_01/web.1111/e13711/con_filtr.htm#i1029317


JBoss

Open $JBOSS_HOME/server/$PROFILE/deploy/$JBOSSWEB/server.xml and as a child of
the Host element:


Host name="localhost"
autoDeploy="false" deployOnStartup="false" deployXML="false"
configClass="org.jboss.web.tomcat.security.config.JBossContextConfig"


Add:

Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="192.168.0.1"


The allow attribute is a comma-delimited series of regular expressions, so:


Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="192\.168\.0\..*,192\.168\.1\..*"


Would allow access to all computers in that range. The list can also contain additional IP addresses and ranges via comma separated values.


One can also specify a deny attribute to deny port ranges and also use the RemoteHostValve instead of RemoteAddrValve like so:

Valve className="org.apache.catalina.valves.RemoteHostValve" allow="*.mydomain.com"

would allow connections from all virtual hosts in *.mydomain.com.

Monitoring your Connection Pool in Jboss 5 using JMX

Each datasource translates into several MBeans that you can interact with in the JMX Console. All the datasource-related objects are in the jboss.jca domain. You can find them by searching through the JMX Console page, or by using jboss.jca:* as the query filter.
Supposing you want to monitor your OracleDS datasource from the previous example: you could use a more specific filter, such as jboss.jca:name=OracleDS,*, to see only the OracleDS entries. In either case, four MBeans will be related to the OracleDS datasource:



name=OracleDS ,service=DataSourceBinding

name=OracleDS ,service=LocalTxCM

name=OracleDS ,service=ManagedConnectionFactory

name=OracleDS ,service=ManagedConnectionPool



While each plays a critical role in providing the datasource functionality in JBoss, you are most likely to need to interact with the connection pool. Click the connection pool MBean to expose the management attributes and operations.

In our datasource file we've been using specified a minimum connection pool size of 2 and a maximum pool size of 10. You'll see those values reflected in the MinSize and MaxSize attributes. You can change the values in the running server by adjusting the values and clicking Apply Changes.



Setting the values here affects the connection pool only in memory. To change the configuration permanently, update the datasource file. Try setting the pool sizes there. When you save the file, JBoss will redeploy the datasource and the new pool sizes will be displayed when you reload the page.



You might occasionally want to adjust the pool size to account for usage; you are more likely to be curious how much of the connection pool is being used. The ConnectionCount attribute shows how many connections are currently open to the database.



However, open connections are not necessarily in use by application code. The InUseConnectionCount attribute shows how many of the open connections are in use. Viewing the statistic from the other direction, AvailableConnectionCount shows how much room is left in the pool.



Finally, the MBean has several statistics that track connection pool usage over the pool's lifetime. ConnectionCreatedCount and Connection-DestroyedCount keep running totals of the number of connections created and destroyed by the pool. If IdleTimeout is greater than 0, connections will eventually timeout, be destroyed, and be replaced by fresh connections. This will cause the created and destroyed counts to rise constantly. The MaxConnectionsInUseCount attribute keeps track of the highest number of connections in use at a time.



If you notice anything awkward in the connection pool, or you just want to reset the statistics, you can flush the connection pool using the flush operation on the MBean. This will cause a new connection pool to be created, abandoning the previous connection pool.





How to monitor JBoss CPU usage and Thread Usage












Search Issue in Liferay

1)Search results not showing up in Liferay


This could be index issue. You man need to re-index individual parts (Blogs, Document Library and Web Content) to solve this. How to do it ? See image below





2)Blog related search results not showing up in Liferay


This is due to Blog re-index Failing. We may run into the errors when we attempted to index blog and see error like below

2011-08-05 13:00:50,647 ERROR [com.liferay.portlet.admin.action.EditServerAction] (ajp-) com.liferay.portal.kernel.search.SearchException: com.liferay.portal.NoSuchGroupException: No Group exists with the primary key 193844
com.liferay.portal.kernel.search.SearchException: com.liferay.portal.NoSuchGroupException: No Group exists with the primary key 193844


Then run the following query:

select * from BlogsEntry where groupId not in (select groupId from Group_ where parentGroupId = 0 or parentGroupId in (select groupId from Group_));

This returned 3 results. This tells us that there are 3 oprhaned blog entries. We went into the database and deleted those 3 entries. Once we did that we were able to index blogs as well