Monday, October 04, 2010
Top 10 Java Issues in Production
10 Heap/network issues
Tools: dtrace, hprof, introscope, jconsole, visualvm, yourkit, azul zvision
Invasive tools : bci, jvmti, jvmdi
OS Tools : dtrace, osprofile, vtune
Network/DISK tools: ganglia, iostat, lsof, nagios, netstat, snoop
Gotcha:
- instrumentation is not cheap
- avoid expensive heap walks
- thread dumps
- asynchronous logging
- finish task and then increment performance counter
- jconsole is cheap
9 Leaks
symptoms :
- app consumes all memory
- heap trend is ramping saw tooth
- finally Out of Memory
Tools : yourkit, hprof, eclipse mat, jconsole, jhat, jps, visualvm, azul zvision
Theory:
- Allotted Vs Live Objects
- Finalizers, Classloaders
8 I/O
I/O Serialization
Symptoms :
- Multi-node scale-out does not scale linearly
- Spending time CPU & IO
Tools :
- cpu profiling
- io profiling
Solution
- pick a high performance serialization library
- avro, krgo
I/O Limits/Tuning
Symptoms:
- Too many file descriptors, cursors etc
- inconsistent response times
Tools:
- Nagios
- pkg
- rpm
- info
- ulimit
- yum
Solution
- check OS patches
- check user and process limits
I/O Sockets, Files, DB
Symptoms:
- socket open/close takes a long time
- JRMP timeouts, long JDBC calls
- running out of files, cursors
Tools:
- dbms tools, du, iostat, gmon, lsof, netstat
Workaround
- ping/telnet tests
7 Locks and Synchronization
Symptoms:
- Adding users/threads/CPUs causes slow down
- High lock acquire times and contention
- race conditions, dead locks
- I/O under load
Tools:
- dtrace, lockstat, azulzvision
- thread dumps
- IBM visual analyzer ( j.u.c in eclipse )
Solution:
- Use non-blocking collections
6 Endless Compilation
symptoms:
- Time in compiling
- Time in interpretor
Tools:
- -XX:+PrintCompilation
- CPU profiler
5 Endless Exceptions
symptoms:
- Application spending time filling stack trace
tools:
- CPU profiler, zvision
- thread dumps
- Track caller/callee
- repeated kill -3
Solution:
- don't throw , return
- JVM's don't optimize exception paths
4 Fragmentation
Symptoms:
- performance degrades over time
- "Full GC" makes problem go away
- Lot of free memory, but in tiny fragments
Tools;
- GC logging flags, for CMS -XX:PrintFLSStatistics=2 -XX:+PrintCMSInitializationStatistics
- Fragger
Solution:
- Upgrade to latest CMS
- azul zing and gen pausless GC
- pooling similar sized and similar aged objects together
3 GC Tuning
Symptoms:
- Entropy(GC) = number of GC flags
- Too many free parameters
- 64bit/ large heap no solution
- Constant 40%-60% CPU utilization by GC
- Scheduled reboot before full GC
- Full-time engineer working on GC flags
Workarounds:
- Ask JVM vendor for one flag solution
- G1 GC, Zing GPGC
2 Spikes in Load
Symptoms:
- Rush hour traffic, tax day, black friday
- outage under spikes, power law of networks
Solution:
- Measure
- Test with realistic load and realistic multi-node setup
- build redundancy
1 Versionitis
Symptoms:
- different nodes have different configuration, stack components, versions
- classpath dist/* , -verbose
- hard to reproduce
Solutions:
- Method
- Version control
- rigor
0 Collapsing under load
juc profiling