Here are a few things you need to know before starting. Following the links is not necessary, they are available for the reference.
Since a Java program may start many threads, each executing some program code, it is necessary to understand which threads are using CPUs. On Linux, top -H will show you CPU usage on a per-thread basis. Here is an example.
First, use the top command to identify the process (PID) consuming the most resources.
top - 16:32:29 up 10:29, 3 users, load average: 1.08, 0.64, 0.56 Tasks: 172 total, 1 running, 171 sleeping, 0 stopped, 0 zombie Cpu(s): 48.7%us, 51.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 1500048k total, 1476708k used, 23340k free, 62012k buffers Swap: 4128764k total, 75356k used, 4053408k free, 919836k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7599 oracle 20 0 1151m 28m 9100 S 85.5 1.9 0:12.67 java
In this case, PID 7599 is utilizing 85.5% of a CPU core. To dig deeper, we view the specific threads within that process using top -H -p 7599:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7601 oracle 20 0 1151m 28m 9100 R 85.9 1.9 7:19.98 java 7602 oracle 20 0 1151m 28m 9100 S 1.0 1.9 0:02.95 java 7599 oracle 20 0 1151m 28m 9100 S 0.0 1.9 0:00.01 java
We can now see that a single execution thread, PID 7601, is responsible for the CPU spike.
In the HotSpot JVM on Linux, there is a one-to-one mapping between OS threads and Java threads. To see what the threads are doing internally, use the jstack utility.
[oracle@oel6u4-2 test]$ jstack 7599
Partial Thread Dump Output:
"Low Memory Detector" daemon prio=10 tid=0x00007f05c4088000 nid=0x1db8 runnable "C2 CompilerThread1" daemon prio=10 tid=0x00007f05c4085800 nid=0x1db7 waiting on condition "main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable [0x00007f05c82f4000] java.lang.Thread.State: RUNNABLE at java.security.SecureRandom.getInstance(SecureRandom.java:254) at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:176) at java.security.SecureRandom.<init>(SecureRandom.java:133) at RandomUser.main(RandomUser.java:9)
The OS identifies threads using decimal IDs, while Java thread dumps identify them using hexadecimal "native IDs" (nid).
76017601 in decimal is 1db1 in hex.nid=0x1db1.By searching the thread dump for 0x1db1, we find the culprit is the "main" thread. We can now see the exact stack trace where the execution is hanging or looping:
"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable at java.security.SecureRandom.getInstance(SecureRandom.java:254) at java.security.SecureRandom.getDefaultPRNG(SecureRandom.java:176) at RandomUser.main(RandomUser.java:9)
This indicates the application is spending excessive CPU cycles generating random numbers via SecureRandom.
To get a stack trace for a specific thread, we combine jstack with awk. Since the OS provides a decimal Thread ID (TID) and Java uses a hexadecimal Native ID (nid), we use printf to perform the conversion on the fly
jstack $PID | awk '/ nid='"$(printf '%#x' $TID)"' /,/^$/'
Result Output
"main" prio=10 tid=0x00007f05c4007000 nid=0x1db1 runnable at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:220) at sun.security.provider.NativePRNG$RandomIO.readFully(NativePRNG.java:185) at RandomUser.main(RandomUser.java:9)
watchSingle snapshots can be misleading. To see if a thread is truly stuck or just moving through a complex process, use the watch command to refresh the stack trace every half-second
watch -n .5 "jstack $PID | awk '/ nid='"$(printf '%#x' $TID)"' /,/^$/'"
When a thread is executing rapidly, visual monitoring isn't enough. We need to aggregate samples to find where the thread spends the most time. The following script, prof.sh, samples the thread multiple times and counts the frequency of unique stack traces.
prof.sh)This script automates the sampling and uses awk to normalize memory addresses (replacing them with <address>) so that identical code paths are grouped together during the sort and uniq phase
#!/bin/bash P_PID=$1 P_NID=$2 P_SLEEP=${P_SLEEP:-0.5} P_CNT=${P_CNT:-10} echo Sampling PID=$P_PID every $P_SLEEP seconds for $P_CNT samples if [ "$P_NID" == "" ]; then CMD="awk '//'" else CMD="awk '/ nid='"$(printf '%#x' $P_NID)"' /,/^$/'" fi for i in `seq $P_CNT`; do jstack $P_PID | eval $CMD sleep $P_SLEEP done | awk ' BEGIN { x = 0; s = "" } /nid=/ { x = 1; } // { if (x == 1) { s = s ", "gensub(/<0x\w*>/, "<address>", "g") } } /^$/ { if ( x == 1) { print s; x = 0; s = ""; } } ' | sort | uniq -c | sort -n -r | head -10 | sed -e 's/$/\n/g' -e 's/\t/\n\t/g' -e 's/,//g'
Running the script identifies which code path is the "hottest." In the example below, 6 out of 10 samples show the thread blocked on FileInputStream.readBytes, suggesting an I/O or entropy bottleneck
[oracle@oel6u4-2 test]$ ./prof.sh 7599 7601 Sampling PID=7599 every 0.5 seconds for 10 samples 6 "main" prio=10 tid=0x00007f05c4007000 nid=0xdb1 runnable at java.io.FileInputStream.readBytes(Native Method) ... 2 "main" prio=10 tid=0x00007f05c4007000 nid=0xdb1 runnable at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:268)
The script does the following things:
With a few little things it is possible to understand quite a lot of things in almost any situation related to Java: you can find out the most frequent stack trace by sampling thread dumps. With this knowledge it is then easy to understand why an issue happening. In my test case, the application instantly generates random numbers without a pause, and 1 thread is occupying 1 CPU core.
Are you ready to save up to 60% in operational costs?