查看进程的线程资源使用情况: 15047为进程PID
ps -Lp 15047 cu
top -H -p 15047
1. 首先排查哪些进程cpu占用率高。 通过命令 ps ux
[]
$ps ux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
admin 1502 0.0 0.0 51172 1032 ? S 11:04 0:00 sshd: admin@pts/1
admin 1503 0.0 0.0 68136 1512 pts/1 Ss 11:04 0:00 -bash
admin 1555 0.0 0.0 96640 3356 pts/1 S+ 11:04 0:00 vim jstack15047.12.2
admin 1993 0.0 0.0 51172 1032 ? S 11:06 0:00 sshd: admin@pts/2
admin 1994 0.0 0.0 68136 1492 pts/2 Ss 11:06 0:00 -bash
admin 2038 0.0 0.0 65576 912 pts/2 R+ 11:06 0:00 ps ux
admin 10191 0.2 0.4 670904 23880 ? Sl 09:31 0:13 /usr/alibaba/httpd/bin/httpd -d /home/admin/run/deploy
admin 10756 0.2 0.4 670476 23092 ? Sl 09:32 0:12 /usr/alibaba/httpd/bin/httpd -d /home/admin/run/deploy
admin 14467 0.2 0.4 671700 24436 ? Sl 09:47 0:10 /usr/alibaba/httpd/bin/httpd -d /home/admin/run/deploy
admin 15037 0.0 0.0 65908 1168 ? S Nov30 0:00 /bin/sh /usr/alibaba/jboss/bin/run.sh -Djboss.server.home.dir=/home/admin/run/deploy/../.myjboss -Djboss.server.home.url=
file:/home/admi
admin 15047 25.4 42.9 2915448 2252040 ? Sl Nov30 312:31 /usr/alibaba/java/bin/java -Dprogram.name=run.sh -server -Xmx2g -Xms2g -Xmn256m -XX:PermSize=196m -Xss256k -XX:+DisableExplicitGC -XX:+U
admin 15834 0.0 0.0 3840 472 ? S Nov30 0:00 /usr/alibaba/cronolog/sbin/cronolog /home/admin/out/logs/443-error_log.%w
admin 15835 0.0 0.0 3840 480 ? S Nov30 0:00 /usr/alibaba/cronolog/sbin/cronolog /home/admin/out/logs/cookie_logs/%w/cookie_log
admin 15836 0.0 0.0 58900 612 ? S Nov30 0:00 /usr/bin/logger -p local2.info
admin 15837 0.0 0.0 3840 476 ? S Nov30 0:07 /usr/alibaba/cronolog/sbin/cronolog /home/admin/out/logs/jk_logs/%w/mod_jk.log
admin 16316 0.2 0.4 669448 21740 ? Sl 09:53 0:10 /usr/alibaba/httpd/bin/httpd -d /home/admin/run/deploy
admin 27702 0.0 0.0 51320 1060 ? S 10:39 0:00 sshd: admin@pts/0
admin 27703 0.0 0.0 68136 1524 pts/0 Ss+ 10:39 0:00 -bash
2. 查看对应java进程的每个线程的CPU占用率。通过命令:ps -Lp 15047 cu
[admin@us-escrow-web4.hst.scl.en.alidc.net ~]
$ps -Lp 15047 cu
USER PID LWP %CPU NLWP %MEM VSZ RSS TTY STAT START TIME COMMAND
。。。。。。
admin 15047 25491 70.8 285 42.9 2915448 2252032 ? Rl 10:29 22:35 java
admin 15047 25495 71.0 285 42.9 2915448 2252032 ? Rl 10:29 22:34 java
admin 15047 25499 0.0 285 42.9 2915448 2252032 ? Sl 10:29 0:00 java
admin 15047 25500 0.0 285 42.9 2915448 2252032 ? Sl 10:29 0:00 java
admin 15047 25517 0.0 285 42.9 2915448 2252032 ? Sl 10:30 0:00 java
admin 15047 25521 0.0 285 42.9 2915448 2252032 ? Sl 10:30 0:00 java
admin 15047 25540 72.4 285 42.9 2915448 2252032 ? Rl 10:30 22:31 java
admin 15047 25541 0.0 285 42.9 2915448 2252032 ? Sl 10:30 0:00 java
admin 15047 25542 0.0 285 42.9 2915448 2252032 ? Sl 10:30 0:00 java
admin 15047 25741 70.7 285 42.9 2915448 2252032 ? Rl 10:31 21:33 java
admin 15047 25766 0.0 285 42.9 2915448 2252032 ? Sl 10:31 0:00 java
admin 15047 26022 0.0 285 42.9 2915448 2252032 ? Sl 10:31 0:00 java
admin 15047 26032 69.6 285 42.9 2915448 2252032 ? Rl 10:32 20:38 java
3. 追踪线程内部,查看load过高原因。通过命令:jstack 15047。
以线程25495为例,现将25495转换成16进制6397。 再通过多次监控jstack日志,排查线程25495的运行轨迹。
"ActiveMQ Session Task" prio=10 tid=0x000000004a598000 nid=
0x6397
runnable [0x0000000044948000]
java.lang.Thread.State: RUNNABLE
at Ice.ConnectionI.sendRequest(ConnectionI.java:519)
- locked <0x00002aaac2877ff8> (a Ice.ConnectionI)
at IceInternal.Outgoing.invoke(Outgoing.java:72)
at AliIMInterface._WWMessageInterfaceDelM.SendNotifyMessage(_WWMessageInterfaceDelM.java:36)
at AliIMInterface.WWMessageInterfacePrxHelper.SendNotifyMessage(WWMessageInterfacePrxHelper.java:40)
at AliIMInterface.WWMessageInterfacePrxHelper.SendNotifyMessage(WWMessageInterfacePrxHelper.java:18)
"ActiveMQ Session Task" prio=10 tid=0x000000004a598000 nid=
0x6397
runnable [0x0000000044948000]
java.lang.Thread.State: RUNNABLE
at IceInternal.Outgoing.invoke(Outgoing.java:72)
at AliIMInterface._WWMessageInterfaceDelM.SendNotifyMessage(_WWMessageInterfaceDelM.java:36)
at AliIMInterface.WWMessageInterfacePrxHelper.SendNotifyMessage(WWMessageInterfacePrxHelper.java:40)
at AliIMInterface.WWMessageInterfacePrxHelper.SendNotifyMessage(WWMessageInterfacePrxHelper.java:18)
"ActiveMQ Session Task" prio=10 tid=0x000000004a598000 nid=
0x6397
runnable [0x0000000044947000]
java.lang.Thread.State: RUNNABLE
at java.lang.Throwable.fillInStackTrace(Native Method)
- locked <0x00002aaab53435e8> (a IceInternal.LocalExceptionWrapper)
at java.lang.Throwable.<init>(Throwable.java:181)
at java.lang.Exception.<init>(Exception.java:29)
at IceInternal.LocalExceptionWrapper.<init>(LocalExceptionWrapper.java:16)
at Ice.ConnectionI.sendRequest(ConnectionI.java:530)
- locked <0x00002aaac2877ff8> (a Ice.ConnectionI)
at IceInternal.Outgoing.invoke(Outgoing.java:72)
4. 通过jstack查看代码运行轨迹,结合已有源码,一般可以分析出死循环的地方。