Solaris 10 Performance Tuning - Analyzing CPU


Solaris 10 Performance Tuning – Analyzing CPU

Tools :
psrinfo
uptime
vmstat
sar -u
sar -q (cpu queue, same as vmstat kthr:r)
mpstat
dtrace



$ psrinfo -vp
The physical processor has 8 virtual processors (32-39)
SPARC64-VII (portid 1056 impl 0x7 ver 0x91 clock 2400 MHz)
The physical processor has 8 virtual processors (40-47)
SPARC64-VII (portid 1064 impl 0x7 ver 0x91 clock 2400 MHz)
The physical processor has 8 virtual processors (48-55)
SPARC64-VII (portid 1072 impl 0x7 ver 0x91 clock 2400 MHz)
The physical processor has 8 virtual processors (56-63)
SPARC64-VII (portid 1080 impl 0x7 ver 0x91 clock 2400 MHz)



$ uptime
1:04pm up 114 day(s), 14:15, 6 users, load average: 4.41, 4.39, 3.76

the intervals are 1,5,15 minutes



$ vmstat 5
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 sd sd in sy cs us sy id
0 0 0 107752976 11913152 74 296 901 2 2 0 0 2 2 -0 -349 5960 727992 5910 9 2 89
0 0 0 106811152 8983688 8 31 2 0 0 0 0 0 0 0 2 8818 615705 13764 13 2 85
0 0 0 106701904 8884808 0 14 0 0 0 0 0 0 0 0 2 14250 584633 24692 12 3 85
0 0 0 106535552 8730784 9 87 0 0 0 0 0 9 9 0 0 9247 379414 14642 12 2 87
0 0 0 106403648 8603144 0 10 0 0 0 0 0 0 0 0 0 6668 550100 9664 12 2 87
0 0 0 106296328 8504712 0 5 0 0 0 0 0 0 0 0 0 10490 471906 17677 12 2 85
0 0 0 106277472 8510232 1 3 6 0 0 0 0 0 0 0 0 12011 535986 20477 11 2 87
0 0 0 106277472 8527712 3 37 0 0 0 0 0 0 0 0 0 13685 668057 23689 9 2 89


The first line is the summary since boot. vmstat reads its statistic from kstat,
which maintains CPU utilizztion statistics for each CPU.

cpu:id : for percent idle.
kthr:r : the total number of threads on the ready to run queues, CPU saturation.

faults:in : number of interrupts/sec
faults:sys: number of system calls/sec
faults:cs : number of context switches/sec

cpu:us : percent user time/user-mode threads
cpu:sy : percent system time
cpu:id : percent idle


sar : turn on sar if it's not on.

$ svcs -a | grep sar
online Jan_22 svc:/system/sar:default
$ svcs -l svc:/system/sar:default
fmri svc:/system/sar:default
name system activity reporting package
enabled true
state online
next_state none
state_time Fri Jan 22 22:08:22 2010
logfile /var/svc/log/system-sar:default.log
restarter svc:/system/svc/restarter:default
dependency require_all/none svc:/system/filesystem/minimal (online)



$ sar 5 10

SunOS betapm 5.10 Generic_142900-03 sun4u 05/17/2010

13:07:48 %usr %sys %wio %idle
13:07:53 13 4 0 84
13:07:58 11 4 0 85
13:08:03 9 3 0 88
13:08:08 9 3 0 88
13:08:13 11 4 0 85
13:08:18 11 3 0 86
13:08:23 10 3 0 86
13:08:28 9 3 0 88
13:08:33 9 3 0 88
13:08:38 11 3 0 86

Average 10 3 0 86


%usr vs %sys : 70% vs 30%

%wio is always 0 in solaris 10 (disabled)


$ sar -q 2 10

SunOS betapm 5.10 Generic_142900-03 sun4u 05/17/2010

13:11:34 runq-sz %runocc swpq-sz %swpocc
13:11:36 1.0 50 0.0 0
13:11:38 0.0 0 0.0 0
13:11:40 0.0 0 0.0 0
13:11:42 0.0 0 0.0 0
13:11:44 0.0 0 0.0 0
13:11:46 0.0 0 0.0 0
13:11:48 0.0 0 0.0 0
13:11:50 1.0 50 0.0 0
13:11:52 0.0 0 0.0 0
13:11:54 0.0 0 0.0 0

Average 1.0 10 0.0 0


runq-sz: run queue size , same as vmstat kthr:r

%runocc: identify short bursts of run queue activity occurred

swpq-sz: number of swapped-out threads

%swpocc: swapped out occupancy, percentage of time there were swapped out threads


$ mpstat 3

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
32 0 0 18 460 126 257 0 3 10 0 2858 0 0 0 99
33 0 0 14 131 34 180 1 7 49 0 740 5 1 0 94
34 0 0 5 134 45 161 1 6 3 0 879 0 0 0 100
35 0 0 60 78 33 93 0 7 54 0 457 0 0 0 100
36 0 0 3 199 91 204 0 4 2 0 1125 0 0 0 100
37 0 0 105 49 13 55 0 3 17 0 164 0 0 0 100
38 0 0 2 124 56 126 0 2 1 0 1716 0 0 0 99
39 30 0 21 5 0 7 0 2 4 0 13 0 0 0 100
40 0 0 1 6 2 3 0 1 0 0 50 0 0 0 100
41 0 0 0 3 0 5 0 3 5 0 0 0 0 0 100
42 0 0 10 22 10 0 0 0 0 0 0 0 0 0 100
43 0 0 1 7 1 10 0 5 12 0 1 0 0 0 100
44 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100
45 0 0 1 6 0 8 0 4 7 0 2 0 0 0 100
46 0 0 29 68 30 0 6 0 510 0 795891 72 28 0 0
47 0 0 2 5 1 2 0 1 4 0 0 0 0 0 100
48 0 0 26 346 153 331 1 6 8 0 4467 1 0 0 99
49 0 0 0 3 0 7 0 4 8 0 0 0 0 0 100
50 0 0 53 129 92 22 0 0 3 0 8 0 0 0 100
51 0 0 65 131 42 142 0 7 61 0 568 1 0 0 99
52 0 0 23 878 671 382 0 4 22 0 3765 1 2 0 97
53 0 0 43 43 32 15 0 3 9 0 0 0 0 0 100
54 0 0 0 22 6 27 0 0 0 0 39 0 0 0 100
55 0 0 91 237 108 241 0 6 11 0 213 0 0 0 100
56 0 0 38 303 143 231 0 5 2 0 3360 1 1 0 99
57 0 0 8 59 14 81 0 2 19 0 352 3 0 0 97
58 0 0 1 5 1 3 0 1 0 0 7 0 0 0 100
59 0 0 9 61 18 57 0 3 13 0 62 0 1 0 99
60 0 0 23 212 96 183 1 6 2 0 1159 0 0 0 99
61 0 0 25 134 38 169 0 5 81 0 921 5 0 0 95
62 0 0 0 19 3 27 0 0 2 0 27 0 0 0 100
63 0 0 6 62 22 62 0 6 7 0 271 0 0 0 100


csw : the number of context switch (both voluntary and involuntary);
occurs when a thread perfomrs a blocking system call e.g. performing I/O when
the thread sleeps until the I/O event has completed.

icsw: the number of involuntary context switches; indicates if there were more
threads ready to run than available phyiscal processors.

migr: migrations of threads between processors; the number of times the OS
scheduler moves ready-to-run threads to an idle prcessor. OS normally keeps
the threads on the last processor it ran unless the process is busy, then the
thread migrates. Such migration is bad for performance.

intr: the number of interrupts taken on CPU.

ithr: the number of interrupts that are converted to real threads, as result of
inbound network packets, blocking for a mutex, or a synchronization event.


smtx: the number of mutex contentions in the kernel, high shows high %sys time.

srw : the number of reader-writer lock contention events in the kernel.
Excessive reader/write lock contention results in nonscaling performance and
systems that are unable to use all the available CPU resources (symptom is idle time).


$ mpstat -ap 5 20
SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl sze
0 183 1 1186 874 207 1104 7 158 156 0 4229 0 1 0 99 16
1 112 104 2329 5085 2222 4805 215 710 1958 187 723869 17 4 0 80 16
SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl sze
0 107 0 1505 995 323 1128 5 78 368 0 3755 2 1 0 97 16
1 0 0 374 3011 1364 2399 7 68 35 0 38155 0 0 0 99 16
SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl sze
0 4 0 244 1010 322 1170 4 74 304 0 3549 2 0 0 98 16
1 0 0 472 2907 1313 2303 5 62 25 0 45064 0 0 0 99 16
SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl sze
0 1 0 290 990 323 1156 4 79 322 0 3577 2 1 0 97 16
1 0 0 265 2600 1119 2080 6 51 37 0 48610 1 0 0 99 16
SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl sze
0 74 0 715 1074 323 1275 6 90 287 0 3816 3 1 0 96 16
1 0 0 325 2695 1203 2077 6 67 31 0 59557 1 0 0 99 16


$prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
29302 sybase 11G 11G sleep 59 0 195:35:53 0.1% dataserver/259
29347 sybase 11G 11G sleep 59 0 151:18:42 0.1% dataserver/259
29348 sybase 11G 11G sleep 59 0 153:15:41 0.0% dataserver/259
29340 sybase 11G 11G sleep 59 0 147:23:56 0.0% dataserver/259
29341 sybase 11G 11G sleep 59 0 153:08:31 0.0% dataserver/259
29342 sybase 11G 11G sleep 59 0 149:55:02 0.0% dataserver/259
29346 sybase 11G 11G sleep 59 0 149:56:52 0.0% dataserver/259
29334 sybase 11G 11G sleep 59 0 153:15:53 0.0% dataserver/259
29349 sybase 11G 11G sleep 59 0 182:37:54 0.0% dataserver/259
29343 sybase 11G 11G sleep 59 0 148:44:13 0.0% dataserver/259
29345 sybase 11G 11G sleep 59 0 152:04:00 0.0% dataserver/259
29333 sybase 11G 11G sleep 59 0 154:10:37 0.0% dataserver/259
29332 sybase 11G 11G sleep 59 0 146:54:46 0.0% dataserver/259
29335 sybase 11G 11G sleep 59 0 156:39:37 0.0% dataserver/259
29344 sybase 11G 11G sleep 59 0 144:58:21 0.0% dataserver/259
1540 sybase 14M 11M sleep 59 0 0:00:00 0.0% xpserver/1
29320 sybase 10M 9152K sleep 59 0 0:00:15 0.0% backupserver/1
28892 sybase 4296K 2136K sleep 59 0 0:00:06 0.0% SYBASE/1
29304 sybase 10M 6872K sleep 59 0 0:00:00 0.0% backupserver/1
28891 sybase 3440K 1880K sleep 59 0 0:00:06 0.0% lmgrd/1
Total: 20 processes, 3890 lwps, load averages: 1.02, 1.73, 2.12

Quick note for prstat:
CPU field is thread->pr_pctcpu
load average is the same as 'uptime'
When average load average exceeds the number of CPUs in the system (psrinfo),
the system is very likely over loaded.

prstat -mL ; turn on microstates '-m' and show per-thread '-L'
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
14853 sjing 21 72 0.7 0.0 0.0 0.0 6.5 0.0 73 73 78K 0 prstat/1
12047 betapm 34 0.8 0.0 0.0 0.0 0.0 65 0.1 773 21 12K 0 MathKernel/1
29302 sybase 22 4.5 0.0 0.0 0.0 0.0 74 0.0 227 71 .3M 9 dataserver/1
29349 sybase 20 3.7 0.0 0.0 0.0 0.0 76 0.0 234 75 .2M 7 dataserver/1
29335 sybase 17 3.4 0.0 0.0 0.0 0.0 79 0.0 240 67 .2M 7 dataserver/1
29333 sybase 17 3.4 0.0 0.0 0.0 0.0 79 0.0 242 67 .2M 8 dataserver/1
29341 sybase 17 3.2 0.0 0.0 0.0 0.0 80 0.0 244 66 .2M 9 dataserver/1
29348 sybase 17 3.2 0.0 0.0 0.0 0.0 80 0.0 243 67 .2M 8 dataserver/1
29334 sybase 17 3.2 0.0 0.0 0.0 0.0 80 0.0 242 67 .2M 7 dataserver/1
29345 sybase 17 3.2 0.0 0.0 0.0 0.0 80 0.0 242 66 .2M 7 dataserver/1
29347 sybase 17 3.1 0.0 0.0 0.0 0.0 80 0.0 243 67 .2M 8 dataserver/1
29342 sybase 17 3.2 0.0 0.0 0.0 0.0 80 0.0 245 66 .2M 11 dataserver/1
29346 sybase 17 3.2 0.0 0.0 0.0 0.0 80 0.0 244 67 .2M 8 dataserver/1
29343 sybase 17 3.0 0.0 0.0 0.0 0.0 80 0.0 244 67 .2M 8 dataserver/1
29340 sybase 17 3.1 0.0 0.0 0.0 0.0 80 0.0 245 66 .2M 8 dataserver/1
Total: 141 processes, 4479 lwps, load averages: 0.87, 1.24, 1.83
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
29302 sybase 10 4.5 0.0 0.0 0.0 0.0 85 0.0 367 9 .3M 126 dataserver/1
29347 sybase 9.9 4.0 0.0 0.0 0.0 0.0 86 0.0 373 8 .2M 127 dataserver/1
12047 betapm 9.1 0.4 0.0 0.0 0.0 0.0 90 0.1 536 6 8K 0 MathKernel/1
29348 sybase 1.0 0.4 0.0 0.0 0.0 0.0 99 0.0 303 1 29K 12 dataserver/1
29340 sybase 0.8 0.4 0.0 0.0 0.0 0.0 99 0.0 302 2 26K 9 dataserver/1
14853 sjing 0.3 0.8 0.0 0.0 0.0 0.0 99 0.0 17 1 727 0 prstat/1
14831 dbop 0.0 0.9 0.0 0.0 0.0 0.0 99 0.1 826 2 1K 0 find/1
29341 sybase 0.6 0.3 0.0 0.0 0.0 0.0 99 0.0 297 1 19K 7 dataserver/1
29342 sybase 0.6 0.2 0.0 0.0 0.0 0.0 99 0.0 298 0 17K 4 dataserver/1
29349 sybase 0.5 0.2 0.0 0.0 0.0 0.0 99 0.0 298 1 16K 5 dataserver/1
12063 betapm 0.2 0.4 0.0 0.0 0.0 0.0 99 0.0 505 1 6K 0 MathAddon/1
29346 sybase 0.4 0.2 0.0 0.0 0.0 0.0 99 0.0 297 1 14K 4 dataserver/1
29333 sybase 0.4 0.2 0.0 0.0 0.0 0.0 99 0.0 297 0 11K 3 dataserver/1
29334 sybase 0.4 0.2 0.0 0.0 0.0 0.0 99 0.0 299 0 12K 3 dataserver/1
29335 sybase 0.4 0.2 0.0 0.0 0.0 0.0 99 0.0 297 1 9K 2 dataserver/1
Total: 141 processes, 4479 lwps, load averages: 0.86, 1.23, 1.82

LAT : (latency) is the amount of time spent waiting for CPU.