A Simple Way to Monitor Java in Linux

Written by Timur Akhmadeev | Apr 3, 2014 4:00:00 AM

A quick and easy way to know what is it inside Java process that is using your CPU. Using just Linux command line tools and JDK supplied command line utilities.

Introduction

Here are a few things you need to know before starting. Following the links is not necessary, they are available for the reference.

there are different vendors of Java Virtual Machine. This post is about Oracle's JVM which is called HotSpot. Linux x86-64 is considered as OS platform. Most of the things about HotSpot are applicable to other vendors too but with slight changes. OSes other than Linux may add some more complications
it's called Virtual Machine, because it virtualizes runtime environment for a Java application. So to know where to look at you need to know a few things about how specific VM is organized. For a detailed overview of the HotSpot, please refer to this article
on Linux, a thread inside HotSpot VM is mapped directly to an OS level thread. Well, it may not be true on all OSes, but for modern Linux kernels this is true. So every thread on the OS level is a thread inside a Java application
there are generally two types of threads inside a HotSpot VM: native and application threads. Application threads are those that run some Java code, and usually this is what applications are using to run their code. Native threads run something which is not written in Java, usually it's code in C/C++ and usually all these threads are special utility threads started by a VM itself.

Since a Java program may start many threads, each executing some program code, it is necessary to understand which threads are using CPUs. On Linux, top -H will show you CPU usage on a per-thread basis. Here is an example.

1. Identify the High-CPU Process and Thread

First, use the top command to identify the process (PID) consuming the most resources.

top - 16:32:29 up 10:29, 3 users, load average: 1.08, 0.64, 0.56 Tasks: 172 total, 1 running, 171 sleeping, 0 stopped, 0 zombie Cpu(s): 48.7%us, 51.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 1500048k total, 1476708k used, 23340k free, 62012k buffers Swap: 4128764k total, 75356k used, 4053408k free, 919836k cached    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND  7599 oracle    20   0 1151m  28m 9100 S 85.5  1.9   0:12.67 java

In this case, PID 7599 is utilizing 85.5% of a CPU core. To dig deeper, we view the specific threads within that process using top -H -p 7599:

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND  7601 oracle    20   0 1151m  28m 9100 R 85.9  1.9   7:19.98 java  7602 oracle    20   0 1151m  28m 9100 S  1.0  1.9   0:02.95 java  7599 oracle    20   0 1151m  28m 9100 S  0.0  1.9   0:00.01 java

We can now see that a single execution thread, PID 7601, is responsible for the CPU spike.

2. Capture a Java Thread Dump

In the HotSpot JVM on Linux, there is a one-to-one mapping between OS threads and Java threads. To see what the threads are doing internally, use the jstack utility.

[oracle@oel6u4-2 test]$ jstack 7599

Partial Thread Dump Output:

"Low Memory Detector" daemon prio=10 tid=0x00007f05c4088000 nid=0x1db8 runnable "C2 CompilerThread1" daemon prio=10 tid=0x00007f05c4085800 nid=0x1db7 waiting on condition "main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000]    java.lang.Thread.State: RUNNABLE         at java.security.SecureRandom.getInstance(SecureRandom.java:254)         at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:176)         at java.security.SecureRandom.<init>(SecureRandom.java:133)         at RandomUser.main(RandomUser.java:9)

3. Map the OS Thread to the Java Stack Trace

The OS identifies threads using decimal IDs, while Java thread dumps identify them using hexadecimal "native IDs" (nid).

Take the OS Thread ID: 7601
Convert to Hexadecimal: 7601 in decimal is 1db1 in hex.
Search the Dump: Look for nid=0x1db1.

By searching the thread dump for 0x1db1, we find the culprit is the "main" thread. We can now see the exact stack trace where the execution is hanging or looping:

"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable    at java.security.SecureRandom.getInstance(SecureRandom.java:254)    at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:176)    at RandomUser.main(RandomUser.java:9)

This indicates the application is spending excessive CPU cycles generating random numbers via SecureRandom.

One-Liner Thread Isolation

To get a stack trace for a specific thread, we combine jstack with awk. Since the OS provides a decimal Thread ID (TID) and Java uses a hexadecimal Native ID (nid), we use printf to perform the conversion on the fly

jstack $PID | awk '/ nid='"$(printf '%#x' $TID)"' /,/^$/'

Result Output

"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable    at java.io.FileInputStream.readBytes(Native Method)    at java.io.FileInputStream.read(FileInputStream.java:220)    at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:185)    at RandomUser.main(RandomUser.java:9)

Real-Time Monitoring with `watch`

Single snapshots can be misleading. To see if a thread is truly stuck or just moving through a complex process, use the watch command to refresh the stack trace every half-second

watch -n .5 "jstack $PID | awk '/ nid='"$(printf '%#x' $TID)"' /,/^$/'"

Aggregated Sampling (The "Poor Man's Profiler")

When a thread is executing rapidly, visual monitoring isn't enough. We need to aggregate samples to find where the thread spends the most time. The following script, prof.sh, samples the thread multiple times and counts the frequency of unique stack traces.

The Profiling Script (`prof.sh`)

This script automates the sampling and uses awk to normalize memory addresses (replacing them with <address>) so that identical code paths are grouped together during the sort and uniq phase

#!/bin/bash P_PID=$1 P_NID=$2 P_SLEEP=${P_SLEEP:-0.5} P_CNT=${P_CNT:-10}  echo Sampling PID=$P_PID every $P_SLEEP seconds for $P_CNT samples  if [ "$P_NID" == "" ]; then      CMD="awk '//'"  else      CMD="awk '/ nid='"$(printf '%#x' $P_NID)"' /,/^$/'" fi  for i in `seq $P_CNT`; do      jstack $P_PID | eval $CMD     sleep $P_SLEEP done | awk '     BEGIN { x = 0; s = "" }     /nid=/ { x = 1; }     // { if (x == 1) { s = s ", "gensub(/<0x\w*>/, "<address>", "g") } }     /^$/ { if ( x == 1) { print s; x = 0; s = ""; } } ' | sort | uniq -c | sort -n -r | head -10 | sed -e 's/$/\n/g' -e 's/\t/\n\t/g' -e 's/,//g'

Interpreting the Results

Running the script identifies which code path is the "hottest." In the example below, 6 out of 10 samples show the thread blocked on FileInputStream.readBytes, suggesting an I/O or entropy bottleneck

[oracle@oel6u4-2 test]$ ./prof.sh 7599 7601 Sampling PID=7599 every 0.5 seconds for 10 samples  6 "main" prio=10 tid=0x00007f05c4007000 nid=0xdb1 runnable   at java.io.FileInputStream.readBytes(Native Method)   ... 2 "main" prio=10 tid=0x00007f05c4007000 nid=0xdb1 runnable   at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:268)

The script does the following things:

Takes $P_CNT thread dumps of the Java process ID passed as $1 (10 by default)
If a native thread ID has been supplied as $2, then searches for the thread stack of this thread in the thread dump
Concatenates each thread stack trace into a comma-separated string
Aggregates strings and sorts them by the number of occurrences
Prettifies the output: removes tabs, commas, and adds new lines back to the thread stack

Conclusion

With a few little things it is possible to understand quite a lot of things in almost any situation related to Java: you can find out the most frequent stack trace by sampling thread dumps. With this knowledge it is then easy to understand why an issue happening. In my test case, the application instantly generates random numbers without a pause, and 1 thread is occupying 1 CPU core.

Managed IT Consulting Services

Are you ready to save up to 60% in operational costs?

View full post