DTrace Quick Start Guide
The DTrace Quick Start Guide: Observing Native and Web Applications in Production [PDF] is a small book (36 pages) to get you started using DTrace to collect information about your system or application. DTrace is especially well suited for collecting data about applications running on a live system under a real workload.
This guide provides example DTrace scripts, including examples that examine an AMP stack and Web 2.0 applications. These examples are included here on this page so that you can copy them more easily. Since these are the examples that come with the book, please do not change these, but of course feel free to add your own.
Many more sample scripts are available in /usr/demo/dtrace on your Solaris or OpenSolaris system and in /opt/DTT (DTraceToolkit) on your OpenSolaris system. See the Preface of this Quick Start Guide for many references to DTrace information.
Thank you to Angelo Rajadurai, Sun ISV Engineer, for providing the content for this guide. See also Angelo's presentation at CommunityOne West 2009: "Observing Applications with DTrace" [PDF] [video]
Developing a DTrace Script
Using Aggregate Functions
In the following example, the $1 is a pid that you enter when you invoke the script. Aggregations collect the output in tables in memory and output a summary. In this example, the output table lists each function only one time, with the number of times that function was entered, instead of listing each function each time it was entered.
#!/usr/sbin/dtrace -s
pid$1:::entry
{
@count_table[probefunc] = count() ;
}
Other popular aggregation functions include average(), min(), max(), and sum().
Calculating Time Spent in Each Function
The following example gives the function name and the total time spent in that function during the time the script ran.
#!/usr/sbin/dtrace -s
pid$1:::entry
{
ts[probefunc] = timestamp;
}
pid$1:::return
{
@func_time[probefunc] = sum(timestamp - ts[probefunc]);
ts[probefunc] = 0;
}
In the following modification, functions that were already entered before the script started are not counted. The predicate (ts[probefunc]) ignores a function return if there was no enter for that function.
#!/usr/sbin/dtrace -s
pid$1:::entry
{
ts[probefunc] = timestamp;
}
pid$1:::return
/ts[probefunc]/
{
@func_time[probefunc] = sum(timestamp - ts[probefunc]);
ts[probefunc] = 0;
}
The following modification handles multithreaded applications. Anything that you add to the self variable is made thread local.
#!/usr/sbin/dtrace -s
pid$1:::entry
{
self->ts[probefunc] = timestamp;
}
pid$1:::return
/self->ts[probefunc]/
{
@func_time[probefunc] = sum(timestamp - self->ts[probefunc]);
self->ts[probefunc] = 0;
}
Collecting Information About an Application
The following script counts the number of times libc functions are called from a specified command:
#!/usr/sbin/dtrace -s
pid$target:libc::entry
{
@[probefunc]=count();
}
Use the -c option to specify the target command, as in this example:
# ./myDscript.d -c "man dtrace"
Scripts for Developers
The syscall Provider
The following script displays the stack trace when a program makes a write(2) system call.
#!/usr/sbin/dtrace -s
syscall::write:entry
{
@[ustack()]=count();
}
The sysinfo Provider
The following script counts the number of times various processes get to run in the CPU.
#!/usr/sbin/dtrace -s
sysinfo:::pswitch
{
@[execname] = count();
}
The proc Provider
The proc:::exec-success event occurs when a new process is started successfully. The -q option suppresses output such as column headings and number of probes matched.
#!/usr/sbin/dtrace -qs
proc:::exec-success
{
printf("%s(pid=%d) started by uid - %d\n",execname, pid, uid);
}
Scripts for System Administrators
The syscall Provider
The syscall provider is probably the most important provider to learn and use because system calls are the primary communication channel between user level applications and the kernel.
The following script identifies the process that wrote to a particular process:
#!/usr/sbin/dtrace -s
syscall::write:entry
{
trace(pid);
trace(execname);
}
The following script reports how much time your web server is spending at read(2). The BEGIN probe is a place to put actions that you want to perform only once at the beginning of the script.
#!/usr/sbin/dtrace -qs
BEGIN
{
printf("size\ttime\n");
}
syscall::read:entry
/execname == "httpd"/
{
self->start = timestamp;
}
syscall::read:return
/self->start/
{
printf("%d\t%d\n", arg0, timestamp - self->start);
self->start = 0;
}
The proc Provider
The following script traces all signals sent to all processes currently running on the system. The -w option permits
destructive actions.
#!/usr/sbin/dtrace -wqs
proc:::signal-send
{
printf("%d was sent to %s by ", args[2], args[1]->pr_fname);
system("getent passwd %d | cut -d: -f5", uid);
}
Add the conditional statement (/args[2] == SIGKILL/) to the script and send SIGKILL signals to different processes from different users.
#!/usr/sbin/dtrace -wqs
proc:::signal-send
/args[2] == SIGKILL/
{
printf("SIGKILL was sent to %s by ", args[1]->pr_fname);
system("getent passwd %d | cut -d: -f5", uid);
}
The sched Provider
The following script determines the amount of time the CPU spends on I/O wait and working. It also breaks down the I/O process and indicates the data that was retrieved during the I/O wait time by StarOffice.
#!/usr/sbin/dtrace -qs sched:::on-cpu /execname == "soffice.bin"/ { self->on = vtimestamp; } sched:::off-cpu /self->on/ { @time["<on cpu>"] = sum(vtimestamp - self->on); self->on = 0; } io:::wait-start /execname == "soffice.bin"/ { self->wait = timestamp; } io:::wait-done /self->wait/ { @io[args[2]->fi_name] = sum(timestamp - self->wait); @time["<I/O wait>"] = sum(timestamp - self->wait); self->wait = 0; } END { printf("Time breakdown (milliseconds):\n"); normalize(@time, 1000000); printa(" %-50s %15@d\n", @time); printf("\nI/O wait breakdown (milliseconds):\n"); normalize(@io, 1000000); printa(" %-50s %15@d\n", @io); }
The io Provider
The following script traces which files are being accessed on which device and determines whether the task being performed is a read or a write. This example script is for UFS file systems, not for ZFS file systems.
#!/usr/sbin/dtrace -qs
BEGIN
{
printf("%10s %58s %2s\n", "DEVICE", "FILE", "RW");
}
io:::start
{
printf("%10s %58s %2s\n", args[1]->dev_statname, args[2]->fi_pathname, args[0]->b_flags & B_READ ? "R" : "W");
}
Web 2.0 Applications
Observing Multiple Layers
The following script reports how much time you are spending on the different layers in an AMP stack (Apache/MySQL/PHP). The output is %s of time spent in Apache, Java,MySQL, the browser, and the operating system. This script is like a TOP tool for AMP.
#!/usr/sbin/dtrace -qs
BEGIN
{
total=mysqlcnt=httpcnt=phpcnt=javacnt=ffxcnt=othercnt=0;
printf("%10s %10s %10s %10s %10s %10s\n","% MYSQL","% APACHE","% FIREFOX","% PHP","% Java","% OTHER");
}
php*:::request-startup
{
inphp[pid,tid]=1;
}
php*:::request-shutdown
{
inphp[pid,tid]=0;
}
profile-1001
{
total++;
(execname=="mysqld")?mysqlcnt++:(execname=="httpd")?(inphp[pid,tid]==1?phpcnt++:httpcnt++): \
(execname=="java")?javacnt++:(execname=="firefox-bin")?ffxcnt++:othercnt++;
}
tick-30s
{
printf("%10s %10s %10s %10s %10s %10s\n","% MYSQL","% APACHE","% FIREFOX","% PHP","% Java","% OTHER");
}
tick-2s
{
printf("%10d %10d %10d %10d %10d %10d\n",mysqlcnt*100/total,httpcnt*100/total,ffxcnt*100/total, \
phpcnt*100/total,javacnt*100/total,othercnt*100/total);
total=mysqlcnt=httpcnt=phpcnt=ffxcnt=javacnt=othercnt=0;
}
Observing the MySQL Database
The following very simple D script to print out the SQL that are executed in a live running instance of MySQL:
#!/usr/sbin/dtrace -qs
#pragma D option strsize=1024
pid$1::*dispatch_command*:entry
{
printf("%d::%s\n",tid,copyinstr(arg2));
}
Observing Drupal in Production
The following D script shows the total time spent in each Drupal module and the number of calls to the module:
#!/usr/sbin/dtrace -Zs
#pragma D option quiet
BEGIN
{
start_time = timestamp;
printf("Collecting data, press ^c to end...\n");
}
php*:::function-entry
/arg0/
{
self->pfunc[arg0]=timestamp;
}
php*:::function-return
/arg0 && self->pfunc[arg0]/
{
@php_times[dirname(copyinstr(arg1))]=sum(timestamp - self->pfunc[arg0]);
@php_calls[dirname(copyinstr(arg1))]=count();
}
END
{
printf("\n====================================\n");
printf("---Elapsed time (usec): %d\n", (timestamp - start_time) / 1000);
printf("====================================\n\n");
normalize(@php_times, 1000);
printf("%-40.40s %12.12s %12.12s\n", "DIR", "TOTAL USEC", "CALLS");
printa("%-40.40s %@12d %@12d\n", @php_times, @php_calls);
printf("----------------\n");
}
The following D script shows you how much time is spent on each hook of a given Drupal module.
#!/usr/sbin/dtrace -Zs
#pragma D option quiet
BEGIN
{
start_time = timestamp;
printf("Collecting data, press ^c to end...\n");
}
php*:::function-entry
/arg0 && dirname(copyinstr(arg1))==$$1/
{
self->pfunc[arg0]=timestamp;
}
php*:::function-return
/arg0 && self->pfunc[arg0]/
{
@php_times[copyinstr(arg0)]=sum(timestamp - self->pfunc[arg0]);
@php_calls[copyinstr(arg0)]=count();
}
END
{
printf("\n====================================\n");
printf("---Elapsed time (usec): %d\n", (timestamp - start_time) / 1000);
printf("====================================\n\n");
normalize(@php_times, 1000);
printf("%-40.40s %12.12s %12.12s\n", "HOOK", "TOTAL USEC", "CALLS");
printa("%-40.40s %@12d %@12d\n", @php_times, @php_calls);
printf("----------------\n");
}
The following is a sample call to the above script:
# ./php_hook_times.d /htfs/htdocs/drupal-6-10/modules/user