System Utilization Monitoring Tools

System and Application Performance, Monitoring, and Tuning

Brought to you by Sun Microsystems and the engineers from

Solaris Performance Monitoring Tools: System Utilization

This page helps you to pick the right monitoring tools based on the system component you want to observe

This section discusses the following tools:

Name Level
Use Case
UI Tech.
Access
Comments
uptime
beginner
initial assement of past load
command line
any Solaris
The tool to start any system untilzation assessment
perfbar
intermediate
analyze multi proc. (CMT!) systems visually
graphical
separate download
a must for any CMT scaling test. Very intuitive
cpubar
advanced
detailed visual analysis of load
graphical
separate download
much more visual information compared to perfbar
vmstat
advanced
analyze system load, io, paging and more command line
any Solaris
the standard tool to get a detailed understanding of the system utilization.
mpstat
advanced
system utilzation at processor level
command line
any Solaris
a must for CMT systems. What happens at a per processor level?
zonestat advanced get statistics at a zone level
command line
separate download
Very useful in virtualized environments

uptime - Print Load Average

This standard Solaris tool (/bin/uptime) is the  easiest way to gain an overview on:

  • how long a system has been running.
  • current CPU load averages.
  • how many active users.

is with the command uptime.

  • uptime - print CPU load averages
    The numbers printed to the right of "load average: " are the 1-, 5- and 15-minute load averages of the system. The load average numbers give a measure of the number of runnable threads and running threads. Therefore the number has to be put in relation with the number of active CPUs in a system. For example, a load average of three (3) on a single CPU system would indicate some CPU overloading, while the same load average on a thirty-two (32) way system would indicate an unloaded system.

perfbar [Tools CD] - A lightweight CPU Meter

perfbar is a tool that displays a single bar graph that color codes system activity. The colors are as follows:

  • Blue = system idle.
  • Red = System time.
  • Green = CPU time.
  • Yellow = I/O activity (obsolete on Solaris 10 and later).
    perfbar - sample output of a system with 16 CPU cores


Perfbar has been enhancend in Version 1.2 to provide better support for servers with many CPUs through a multi line visualisation. See below 
perfbar: Visualisation of a Sun T5240 System with 128 strands (execution units) without any load

perfbar can be called without specifying any command line arguments. perfbar provides a large number of options which can be viewed with the -h option: $ perfbar -h

perfbar 1.2 is being maintained by Ralph Bogendörfer based on the original perfbar by: Joe Eykholt, George Cameron, Jeff Bonwick, Bob Larson

Usage: perfbar [X-options] [tool-options]
supported X-options:
    -display <display> or -disp <display>
    -geometry <geometry> or -geo <geometry>
    -background <background> or -bg <background>
    -foreground <foreground> or -fg <foreground>
    -font <font> or -fn <font>
    -title <title> or -t <title>
    -iconic or -icon
    -decoration or -deco
supported tool-options:
    -h, -H, -? or -help: this help
    -v or -V: verbose
    -r or -rows: number of rows to display, default 1
    -bw or -barwidth: width of CPU bar, default 12
    -bh or -barheight: height of CPU bar, default 180
    -i or -idle: idle color, default blue
    -u or -user: user color, default green
    -s or -system: system color, default red
    -w or -wait: wait color, default yellow
    -int or -interval: interval for display updates (in ms),default 100
    -si or -statsint: interval for stats updates (in display intervals), default 1
    -avg or -smooth: number of values for average calculation, default 8

There are also a number of key strokes understood by the tool:

  • Q or q: Quit
  • R or r: Resize - this changes the window to the default size according to the number of CPU bars, rows and the chosen bar width and height.
  • Number keys 1 - 9: Display this number of rows.
  • + and -: Increase or decrease number of rows displayed.
    The tool is currently available as a beta in version 1.2. This latest version is not yet part of the Performance Tools CD 3.0. The engineers from the Sun Solution Center in Langen/Germany made it available for free through:
  • downloads

cpubar [Tools CD] - A CPU Meter, showing Swap, and Run Queue

cpubar displays one bar-graph for each processor with the processor speed(s) displayed on top. Each bar-graph is divided in four areas (top to bottom):

  • Blue - CPU is available.
  • Yellow - CPU is waiting for one or more I/O to complete (N/A on Solaris 10 and later).
  • Red - CPU is running in kernel space.
  • Green - CPU is running in user space.
    As with netbar and iobar, a red and a dashed black & white marker shows the maximum and average used ratios respectively.
    The bar-graphs labeled 'r', 'b' and 'w' are displaying the run, blocked and wait queues. A non empty wait queue is usually a symptom of a previous persistent RAM shortage. The total number of processes is displayed on top of these three bars.
    The bar-graph labeled 'p/s' is displaying the process creation rate per second.
    The bar-graph labeled 'RAM' is displaying the RAM usage (red=kernel, yellow=user, blue=free), the total RAM is displayed on top.
    The bar-graph ('sr') is displaying (using a logarithmic scale) the scan rate (a high level of scans is usually a symptom of RAM shortage).

The bar-graph labeled 'SWAP' is displaying the SWAP (a.k.a Virtual Memory) usage (red=used, yellow=reserved, blue=free), the total SWAP space is displayed on top.

cpubar-sample output

vmstat - System Glimpse

The vmstat tool provides a glimpse of the current system behavior in a one line summary including both CPU utilisation and saturation. vmstat is part of the standard SOlaris shipment (/bin/vmstat).
In its simplest form, the command vmstat <interval> (i.e. vmstat 5) will report one line of statistics every <interval> seconds. The first line can be ignored as it is the summary since boot, all other lines report statistics of samples taken every <interval> seconds. The underlying statistics collection mechanism is based on kstat (see kstat(1)).
Let's run two copies of a CPU intensive application (cc_usr) and look at the output of vmstat 5. First start two (2) instances of the cc_usr program.
two (2) instances of cc_usr started
Now let's run vmstat 5 and watch its output.

vmstat 5
First observe the cpu:id column which represents the system idle time (here 0%). Then look at the kthr:r column which represents the total number of runnable threads on dispatcher queues (here 1).
From this simple experiment, one can conclude that the system idle time for the five second samples was always 0, indicating 100% utilisation. On the other hand, kthr:r was mostly one and sustained indicating a modest saturation for this single CPU system (remember we launched two (2) CPU intensive applications).
A couple of notes with regard to CPU utilisation:

  • 100% utilisation may be fine for your system. Think about a high-performance computing job: the aim will be to maximise utilisation of the CPU.
  • Values of kthr:rgreater than zero indicate some CPU saturation (i.e. more jobs would like to run but cannot because no CPU was available). However, performance degradation should be gradual.
  • Sampling interval is important. Don't choose too small or too large intervals.
    vmstat reports some additional information that can be interesting such as:
    Column Comments
    in Number of interrupts per second.
    sys Number of system calls per second.
    cs Number of context switches per second (both voluntary and involuntary).
    us Percent user time: time the CPUs spent processing user-mode threads.
    sy Percent system time: time the CPUs spent processing system calls on behalf of user-mode threads, plus the time spent processing kernel threads.
    id Percent of time the CPUs are waiting for runnable threads.

mpstat - Report per-Processor or per-Processor Set Statistics

The mpstatcommand reports processor statistics in tabular form. It's part of the standard Solaris shipment(/bin/mpstat). Each row of the table represents the activity of one processor. The first table summarizes all activity since boot. Each subsequent table summarizes activity for the preceding interval. The output table includes:

Column Comments
CPU Prints processor ID.
minf Minor faults (per second).
mjf Major faults (per second).
xcal Inter-processor cross-calls (per second).
intr Interrupts (per second).
ithr Interrupts as threads (not counting clock interrupt) (per second).
csw Context switches (per second).
icsw Involuntary context switches (per second).
migr Thread migrations (to another processor) (per second).
smtx Spins on mutexes (lock not acquired on first try) (per second).
srw Spins on readers/writer locks (lock not acquired on first try) (per second).
syscl System calls (per second).
usr Percent user time.
sys Percent system time.
wt Always 0.
idl Percent idle time.

The reported statistics can be broken down into following categories:

  • Processor utilisation: see columns usr, sys and idl for a measure of CPU utilisation on each CPU.
  • System call activity: see syscl column for the number of system call per second on each CPU.
  • Scheduler activity: see column csw and column icsw. As the ratio icsw/csw comes closer to one (1), threads get preempted because of higher priority threads or expiration of their time quantum. Also the column migr displays the number of times the OS scheduler moves ready-to-run threads to an idle processor. If possible, the OS tries to keep the threads on the last processor on which it ran. If that processor is busy, the thread migrates.
  • Locking activity: column smtx indicates the number of mutex contention events in the kernel. Column srw indicates the number of reader-writer lock contention events in the kernel.
    Now, consider the following sixteen-way (16) system used for test. This time four (4) instances of the cc_usr program were started and the output ofvmstat 5 and mpstat 5 recorded.
    Below, observe the output of processor information. Then the starting of the four (4) copies of the program and last the output of vmstat 5.


vmstat - vmstat 5 output on sixteen way system
Rightly, vmstatreports a user time of 25% because one-fourth (¼) of the system is used (remember 4 programs started, 16 available CPUs, i.e. 4/16 or 25%).
Now let's look at the output of mpstat 5.

mpstat - mpstat 5 sample output on sixteen way system
In the above output (two sets of statistics), one can clearly identify the four running instances of cc_usr on CPUs 1, 3, 5 and 11. All these CPUs are reported with 100% user time.

vmstat - Monitoring paging Activity

The vm stat command can also be used to report on system paging activity with the -poption. Using this form of the command, one can quickly get a clear picture on whether the system is paging because of file I/O (OK) or paging because of physical memory shortage (BAD).
Use the command as follows: vmstat -p <interval in seconds> . The output format includes following information:

Column Description
swap Available swap space in Kbytes.
free Amount of free memory in Kbytes.
re Page reclaims - number of page reclaims from the cache list (per second).
mf Minor faults - number of pages attached to an address space (per second)
fr Page frees in Kbytes per second.
de Calculated anticipated short-term memory shortfall in Kbytes.
sr Scan rate - number of pages scanned by the page scanner per second.
epi Executable page-ins in Kbytes per second.
epo Executable page-outs in Kbytes per second.
epf Executable page-frees in Kbytes per second.
api Anonymous page-ins in Kbytes per second.
apo Anonymous page-outs in Kbytes per second.
apf Anonymous page-frees in Kbytes per second.
fpi File system page-ins in Kbytes per second.
fpo File system page-outs in Kbytes per second.
fpf File system page-frees in Kbytes per second.

As an example of vmstat -p output, let's try following commands:
find / > /dev/null 2>&1
and then monitor paging activity with: vmstat -p 5As can be seen from the output, the system is showing paging activity because of file system read I/O (column fpi).


vmstat - sample output reporting on paging activity

zonestat - [OpenSolaris.org] Monitoring Resource Consumption within Zones 

Jeff Victor developed an Open Source Perl script to measure utilization within zones. The tool is freely available for download on the OpenSolaris.org project pages.

It may be called with the following syntax:

zonestat [-l] [interval [count]]

The output looks like:

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

Sign up or Log in to add a comment or watch this page.


The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact