Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Student Guide
This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and
decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of
Sun and its licensors, if any.
Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.
Sun, Sun Microsystems, the Sun logo, Solaris, and OpenBoot are trademarks or registered trademarks of Sun Microsystems, Inc., in the U.S.
and other countries.
All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc., in the U.S. and
other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.
UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd.
Federal Acquisitions: Commercial Software Government Users Subject to Standard License Terms and Conditions
Export Laws. Products, Services, and technical data delivered by Sun may be subject to U.S. export controls or the trade laws of other
countries. You will comply with all such laws and obtain all licenses to export, re-export, or import as may be required after delivery to
You. You will not export or re-export to entities on the most current U.S. export exclusions lists or to any country subject to U.S. embargo
or terrorist controls as specified in the U.S. export laws. You will not use or provide Products, Services, or technical data for nuclear, missile,
or chemical biological weaponry end uses.
THIS MANUAL IS DESIGNED TO SUPPORT AN INSTRUCTOR-LED TRAINING (ILT) COURSE AND IS INTENDED TO BE
USED FOR REFERENCE PURPOSES IN CONJUNCTION WITH THE ILT COURSE. THE MANUAL IS NOT A STANDALONE
TRAINING TOOL. USE OF THE MANUAL FOR SELF-STUDY WITHOUT CLASS ATTENDANCE IS NOT RECOMMENDED.
Please
Recycle
Copyright 2005 Sun Microsystems Inc., 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits rservs.
Ce produit ou document est protg par un copyright et distribu avec des licences qui en restreignent lutilisation, la copie, la distribution,
et la dcompilation. Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme, par quelque moyen que ce soit,
sans lautorisation pralable et crite de Sun et de ses bailleurs de licence, sil y en a.
Le logiciel dtenu par des tiers, et qui comprend la technologie relative aux polices de caractres, est protg par un copyright et licenci
par des fournisseurs de Sun.
Sun, Sun Microsystems, le logo Sun, Solaris, et OpenBoot sont des marques de fabrique ou des marques dposes de Sun Microsystems,
Inc., aux Etats-Unis et dans dautres pays.
Toutes les marques SPARC sont utilises sous licence sont des marques de fabrique ou des marques dposes de SPARC International, Inc.
aux Etats-Unis et dans dautres pays. Les produits portant les marques SPARC sont bass sur une architecture dveloppe par Sun
Microsystems, Inc.
UNIX est une marques dpose aux Etats-Unis et dans dautres pays et licencie exclusivement par X/Open Company, Ltd.
Lgislation en matire dexportations. Les Produits, Services et donnes techniques livrs par Sun peuvent tre soumis aux contrles
amricains sur les exportations, ou la lgislation commerciale dautres pays. Nous nous conformerons lensemble de ces textes et nous
obtiendrons toutes licences dexportation, de r-exportation ou dimportation susceptibles dtre requises aprs livraison Vous. Vous
nexporterez, ni ne r-exporterez en aucun cas des entits figurant sur les listes amricaines dinterdiction dexportation les plus courantes,
ni vers un quelconque pays soumis embargo par les Etats-Unis, ou des contrles anti-terroristes, comme prvu par la lgislation
amricaine en matire dexportations. Vous nutiliserez, ni ne fournirez les Produits, Services ou donnes techniques pour aucune utilisation
finale lie aux armes nuclaires, chimiques ou biologiques ou aux missiles.
CE MANUEL DE RFRENCE DOIT TRE UTILIS DANS LE CADRE DUN COURS DE FORMATION DIRIG PAR UN
INSTRUCTEUR (ILT). IL NE SAGIT PAS DUN OUTIL DE FORMATION INDPENDANT. NOUS VOUS DCONSEILLONS DE
LUTILISER DANS LE CADRE DUNE AUTO-FORMATION.
Please
Recycle
Table of Contents
About This Course ...............................................................Preface-xi
Course Goals.......................................................................... Preface-xi
Topics Not Covered.............................................................Preface-xiii
How Prepared Are You?.....................................................Preface-xiv
Introductions ......................................................................... Preface-xv
How to Use Course Materials ............................................Preface-xvi
Conventions .........................................................................Preface-xvii
Typographical Conventions ................................... Preface-xviii
DTrace Fundamentals ......................................................................1-1
Objectives ........................................................................................... 1-1
Relevance............................................................................................. 1-2
Additional Resources ........................................................................ 1-3
DTrace Features.................................................................................. 1-4
Transient Failures...................................................................... 1-4
Debugging Transient Failures................................................. 1-5
DTrace Capabilities................................................................... 1-6
DTrace Architecture........................................................................... 1-7
Probes and Probe Providers .................................................... 1-7
DTrace Components ................................................................. 1-8
DTrace Tour ...................................................................................... 1-12
Listing Probes .......................................................................... 1-12
Writing D Scripts..................................................................... 1-21
Using DTrace ....................................................................................2-1
Objectives ........................................................................................... 2-1
Relevance............................................................................................. 2-2
Additional Resources ........................................................................ 2-3
DTrace Performance Monitoring Capabilities............................... 2-4
Features of the DTrace Performance Monitoring
Capabilities ............................................................................. 2-4
Aggregations.............................................................................. 2-4
Examining Performance Problems Using the vminfo Provider . 2-8
The vminfo Probes.................................................................... 2-9
v
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Finding the Source of Page Faults Using vminfo Probes.. 2-11
Examining Performance Problems Using the sysinfo
Provider .......................................................................................... 2-17
The sysinfo Probes ............................................................... 2-18
Using the quantize Aggregation Function With
the sysinfo Probes.............................................................. 2-21
Finding the Source of Cross-Calls ........................................ 2-22
Examining Performance Problems Using the io Provider ........ 2-26
The io Probes .......................................................................... 2-26
Information Available When io Probes Fire ...................... 2-27
Finding I/O Problems ........................................................... 2-32
Obtaining System Call Information .............................................. 2-36
The syscall Provider............................................................ 2-36
D Language Variables ............................................................ 2-43
Associative Arrays .................................................................. 2-44
Thread-Local Variables .......................................................... 2-45
Timing a System Call.............................................................. 2-46
Following a System Call........................................................ 2-48
Creating D Scripts That Use Arguments ...................................... 2-53
Built-in Macro Variables ....................................................... 2-54
PID Argument Example......................................................... 2-56
Executable Name Argument Example................................. 2-57
Custom Monitoring Tools..................................................... 2-60
Debugging Applications With DTrace............................................ 3-1
Objectives ........................................................................................... 3-1
Relevance............................................................................................. 3-2
Additional Resources ........................................................................ 3-3
Application Profiling ......................................................................... 3-4
The pid Provider....................................................................... 3-4
The profile Provider............................................................ 3-19
Application Variables...................................................................... 3-30
Displaying Process Global Variables ................................... 3-30
Displaying Library Global Variables ................................... 3-34
The plockstat Provider ................................................................ 3-36
Transient System Call Errors.......................................................... 3-38
User Stack Traces on System Call Failures.......................... 3-39
Processes Using a Lot of System Time................................ 3-41
Open Files.......................................................................................... 3-45
Accessing System Call Pointer Arguments......................... 3-45
Displaying Names of Files Being Opened........................... 3-48
Finding System Problems With DTrace......................................... 4-1
Objectives ........................................................................................... 4-1
Relevance............................................................................................. 4-2
Additional Resources ........................................................................ 4-3
Accessing Kernel Variables .............................................................. 4-4
vii
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
The void trace(expression) Action................................ A-3
The void tracemem(address, size_t nbytes) Action . A-3
The void printf(string format, ...) Action............ A-3
The printa Action................................................................. A-10
The stack() Action ................................................................ A-12
The ustack() Action .............................................................. A-13
Destructive Actions......................................................................... A-16
Process Destructive Actions ................................................. A-16
Kernel Destructive Actions................................................... A-18
Special Actions ............................................................................... A-21
Actions Associated With Speculative Tracing ................... A-21
The void exit(int status) Action................................ A-22
Subroutines ..................................................................................... A-22
The void *alloca(size_t size) Subroutine ............... A-22
The string basename(char *str) Subroutine.............. A-23
The void bcopy(void *src, void *dest, size_t size)
Subroutine............................................................................ A-23
The string cleanpath(char *str) Subroutine........... A-23
The void *copyin(uintptr_t addr, size_t size)
Subroutine............................................................................ A-24
The string copyinstr(uintptr_t addr) Subroutine A-24
The string dirname(char *str) Subroutine ............... A-25
The size_t msgdsize(mblk_t *mp) Subroutine........... A-25
The size_t msgsize(mblk_t *mp) Subroutine ............. A-25
The int mutex_owned(kmutex_t *mutex) Subroutine A-25
The kthread_t *mutex_owner(kmutex_t *mutex)
Subroutine............................................................................ A-25
The int mutex_type_adaptive(kmutex_t *mutex)
Subroutine............................................................................ A-26
The int progenyof(pid_t pid) Subroutine................... A-26
The int rand(void) Subroutine ....................................... A-26
The int rw_iswriter(krwlock_t *rwlock) Subroutine.......
A-26
The int rw_write_held(krwlock_t *rwlock) Subroutine ..
A-27
The int speculation(void) Subroutine ........................ A-27
The string strjoin(char *str1, char *str2)
Subroutine............................................................................ A-27
The size_t strlen(string str) Subroutine ............... A-27
D Built-in and Macro Variables .......................................................B-1
Built-in Variables................................................................................B-2
Macro Variables..................................................................................B-4
D Operators ......................................................................................C-1
Arithmetic Operators........................................................................ C-2
Relational Operators......................................................................... C-3
ix
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Preface
Course Goals
Upon completion of this course, you should be able to:
Describe the features and architecture of the Solaris Dynamic
Tracing (DTrace) facility
Use the DTrace facility to find the source of intermittent problems
Use DTrace to help debug applications
Use DTrace to look at the cause of performance problems
Troubleshoot DTrace script problems
Preface-xi
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Course Goals
Course Map
The following course map enables you to see what you have
accomplished and where you are going in reference to the course goals.
Troubleshooting DTrace
Troubleshooting DTrace
Problems
Refer to the Sun Educational Services catalog for specific information and
registration.
Introductions
Now that you have been introduced to the course, introduce yourself to
the other students and the instructor, addressing the following items:
Name
Company affiliation
Title, function, and job responsibility
Experience related to topics presented in this course
Reasons for enrolling in this course
Expectations for this course
Conventions
The following conventions are used in this course to represent various
training elements and alternative learning resources.
Icons
Note Indicates additional information that can help students but is not
crucial to their understanding of the concept being described. Students
should be able to understand the concept or complete the task without
this information. Examples of notational information include keyword
shortcuts and minor system adjustments.
Typographical Conventions
Courier is used for the names of commands, files, directories,
programming code, and on-screen computer output; for example:
Use ls -al to list all files.
system% You have mail.
Courier bold is used for characters and numbers that you type; for
example:
To list the files in this directory, type:
# ls
Courier bold is also used for each line of programming code that is
referenced in a textual description; for example:
1 import java.io.*;
2 import javax.servlet.*;
3 import javax.servlet.http.*;
Notice the javax.servlet interface is imported to allow access to its
life cycle methods (Line 2).
Palatino italics is used for book titles, new words or terms, or words that
you want to emphasize; for example:
Read Chapter 6 in the Users Guide.
These are called class options.
DTrace Fundamentals
Objectives
Upon completion of this module, you should be able to:
Describe the features of the Solaris Dynamic Tracing (DTrace)
facility
Describe the DTrace architecture
List and enable probes, and create action statements and D scripts
1-1
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Relevance
Additional Resources
DTrace Features
DTrace is a comprehensive dynamic tracing facility that is bundled into
the Solaris 10 Operating System (Solaris 10 OS). It is intended for use by
system administrators, service support personnel, kernel developers,
application program developers, and users who are given explicit access
permission to the DTrace facility
Transient Failures
DTrace provides answers to the causes of transient failures. A transient
failure is any unacceptable behavior that does not result in fatal failure of
the system. You might have a clear, specific failure, such as:
read(2) is returning EIO errno values on a device that is not
reporting any errors.
An application occasionally does not receive its expected timer
signal.
A thread is missing a condition variable wakeup.
In these situations, you must understand the problem and either eliminate
the performance inhibitors or reset your expectations. Eliminating the
performance inhibitors could involve:
Adding more resources, such as memory or central processing units
(CPUs)
Reconfiguring existing resources, for example, tuning parameters or
rewriting software
Lessening the load
If existing tools cannot find the root cause of a transient failure, then you
must use more invasive techniques. Typically this means developing
custom instrumentation for the failing user program, the kernel, or both.
This can involve using the Trace Normal Form (TNF) facility. You then
reproduce the problem using the instrumented binaries. This technique
requires:
Running the instrumented binaries in production
Reproducing a transient problem in a development environment
Such invasive techniques are undesirable because they are slow, error-
prone, and often ineffective.
Relying on the existing static TNF trace points found in the kernel, which
you can enable with the prex(1) command, is also unsatisfactory. The
number of TNF trace points in the kernel is limited and the overhead is
substantial.
DTrace Capabilities
The DTrace framework allows you to enable tens of thousands of tracing
points called probes. When these instrumentation points are hit, you can
display arbitrary data in the kernel (or user process).
Using DTrace, you can explore all aspects of the Solaris 10 OS to:
Understand how the software works
Determine the root cause of performance problems
Examine all layers of software sequentially from the user level to the
kernel
Track down the source of aberrant behavior
DTrace also provides a mechanism to trace during boot and to retrieve all
traced data from a kernel crash dump.
DTrace Architecture
DTrace helps you understand a software system by enabling you to
dynamically modify the operating system kernel and user processes to
record additional data that you specify at locations of interest called
probes.
DTrace Components
DTrace has the following components: probes, providers, consumers, and
the D programming language. The entire DTrace framework resides in the
kernel. Consumer programs access the DTrace framework through a well-
defined application programming interface (API).
Probes
These four attributes define a 4-tuple that uniquely identifies each probe:
provider:module:function:name
Providers
Note You should check the Solaris Dynamic Tracing Guide, part number
817-6223, regularly for the addition of any new DTrace providers.
Consumers
D Programming Language
Architecture Summary
intrstat(1M) plockstat(1M)
DTrace consumers
dtrace(1M) lockstat(1M)
libdtrace(3LIB)
dtrace(7D)
userland
kernel
DTrace
DTrace Tour
In this section you tour the DTrace facility and learn to perform the
following tasks:
List the available probes using various criteria:
Probes associated with a particular function
Probes associated with a particular module
Probes with a specific name
All probes from a specific provider
Explain how to enable probes
Explain default probe output
Describe action statements
Create a simple D script
Listing Probes
You can list all DTrace probes with the -l option of the dtrace(1M)
command:
# dtrace -l
ID PROVIDER MODULE FUNCTION NAME
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
4 syscall nosys entry
5 syscall nosys return
6 syscall rexit entry
7 syscall rexit return
8 syscall forkall entry
9 syscall forkall return
10 syscall read entry
11 syscall read return
12 syscall write entry
13 syscall write return
14 syscall open entry
15 syscall open return
...
The previous output shows that for each probe, the following is
displayed:
The probes uniquely assigned probe ID (The probe ID is only
unique within a given release or patch level of Solaris).
The provider name.
The module name (if applicable).
The function name (if applicable).
The probe name.
Elements of the 4-tuple can be left off from the left-hand side. For example,
open:entry matches probes from all providers and kernel modules that
have a function name of open and a probe name of entry:
# dtrace -l -n open:entry
ID PROVIDER MODULE FUNCTION NAME
14 syscall open entry
7386 fbt genunix open entry
Enabling Probes
As you can see from the output, the default action displays the CPU
where the probe fired, the DTrace assigned probe ID, the function where
the probe fired, and the probe name.
To enable the entry probe in the clock function (which should fire every
1/100th second):
# dtrace -n clock:entry
dtrace: description 'clock:entry' matched 1 probe
CPU ID FUNCTION:NAME
0 4198 clock:entry
0 4198 clock:entry
0 4198 clock:entry
0 4198 clock:entry
0 4198 clock:entry
0 4198 clock:entry
0 4198 clock:entry
^C
DTrace Actions
For now, you will use D expressions that consist only of built-in D
variables. The following are some of the most useful built-in D variables.
See Appendix B for a complete list of the D built-in variables.
pid The current process ID
execname The current executable name
timestamp The time since boot in nanoseconds
curthread A pointer to the kthread_t structure that represents
the current thread
probemod The current probes module name
probefunc The current probes function name
There are also many built-in functions that perform actions. Appendix A,
Actions and Subroutines provides the complete list of D built-in
functions. Start with the trace() function, which records the result of a D
expression to the trace buffer. For example:
trace(pid) traces the current process ID.
trace(execname) traces the name of the current executable.
trace(curthread->t_pri) traces the t_pri field of the current
thread.
trace(probefunc) traces the function name of the probe.
In the last example the process identification number (PID) appears in the
last column of output.
The next action example traces the time of entry to each system call:
# dtrace -n 'syscall:::entry {trace(timestamp)}'
dtrace: description 'syscall:::entry ' matched 226 probes
CPU ID FUNCTION:NAME
0 312 portfs:entry 157088479572713
0 98 ioctl:entry 157088479637542
0 98 ioctl:entry 157088479674339
0 234 sysconfig:entry 157088479767243
0 234 sysconfig:entry 157088479774432
0 168 sigaction:entry 157088479993155
0 168 sigaction:entry 157088480229390
0 98 ioctl:entry 157088480318855
0 234 sysconfig:entry 157088480398692
0 38 brk:entry 157088480422525
0 38 brk:entry 157088480438097
0 98 ioctl:entry 157088480794819
0 98 ioctl:entry 157088480959666
0 98 ioctl:entry 157088480986498
0 98 ioctl:entry 157088481033225
0 60 fstat:entry 157088481050686
0 60 fstat:entry 157088481074680
...
The following example traces the executable name in every entry to the
pagefault function:
# dtrace -n 'fbt::pagefault:entry {trace(execname)}'
dtrace: description 'fbt::pagefault:entry ' matched 1 probe
CPU ID FUNCTION:NAME
0 2407 pagefault:entry dtrace
0 2407 pagefault:entry dtrace
0 2407 pagefault:entry dtrace
0 2407 pagefault:entry sh
0 2407 pagefault:entry sh
0 2407 pagefault:entry sh
0 2407 pagefault:entry sh
0 2407 pagefault:entry sh
...
Writing D Scripts
Complicated DTrace enablings become difficult to manage on the
command line. The dtrace(1M) command supports scripts, specified
with the -s option. Alternatively, you can create executable DTrace
interpreter files. Interpreter files always begin with:
#!/usr/sbin/dtrace -s
Executable D Scripts
For example, you can write a script to trace the executable name upon
entry to each system call as follows:
# cat syscall.d
syscall:::entry
{
trace(execname);
}
By convention, D scripts end with a .d suffix. You can run this D script as
follows:
# dtrace -s syscall.d
dtrace: script 'syscall.d' matched 226 probes
CPU ID FUNCTION:NAME
0 312 pollsys:entry java
0 98 ioctl:entry dtrace
0 98 ioctl:entry dtrace
0 234 sysconfig:entry dtrace
0 234 sysconfig:entry dtrace
0 168 sigaction:entry dtrace
0 168 sigaction:entry dtrace
0 98 ioctl:entry dtrace
0 234 sysconfig:entry dtrace
0 38 brk:entry dtrace
^C
If you give the syscall.d file execute permission and add a first line to
invoke the interpreter, you can run the script by entering its name on the
command line as follows:
# cat syscall.d
#!/usr/sbin/dtrace -s
syscall:::entry
{
trace(execname);
}
# chmod +x syscall.d
# ls -l syscall.d
-rwxr-xr-x 1 root other 62 May 12 11:30 syscall.d
# ./syscall.d
dtrace: script './syscall.d' matched 226 probes
CPU ID FUNCTION:NAME
0 98 ioctl:entry java
0 98 ioctl:entry java
0 312 pollsys:entry java
0 312 pollsys:entry java
0 312 pollsys:entry java
0 98 ioctl:entry dtrace
0 98 ioctl:entry dtrace
0 234 sysconfig:entry dtrace
0 234 sysconfig:entry dtrace
D Literal Strings
The D language supports literal strings that you can use with the trace
function as follows:
# cat string.d
#!/usr/sbin/dtrace -s
fbt::bdev_strategy:entry
{
trace(execname);
trace(" is initiating a disk I/O\n");
}
The \n at the end of the literal string produces a new line. To run this
script, enter the following:
# dtrace -s string.d
dtrace: script 'string.d' matched 1 probe
CPU ID FUNCTION:NAME
0 9215 bdev_strategy:entry bash is initiating a disk I/O
The quiet mode option, -q, in dtrace(1M) tells DTrace to record only the
actions explicitly stated. This option suppresses the default output
normally produced by the dtrace command. The following example
shows the use of the -q option on the string.d script:
# dtrace -q -s string.d
ls is initiating a disk I/O
cat is initiating a disk I/O
fsflush is initiating a disk I/O
vi is initiating a disk I/O
vi is initiating a disk I/O
The simple dtrace provider has only three probes. They are BEGIN, END,
and ERROR. The BEGIN probe fires before all others and performs pre-
processing steps. For example, it enables you to initialize variables, as
well as to display headings for output that is displayed by other actions
that occur later. The END probe fires after all other probes have fired and
enables you to perform post-processing. The ERROR probe fires when there
are any runtime errors in your D programs. The following example shows
a simple use of the BEGIN and END probes of the dtrace provider:
# cat beginEnd.d
#!/usr/sbin/dtrace -s
BEGIN
{
trace("This is a heading\n");
}
END
{
trace("This should appear at the END\n");
}
# ./beginEnd.d
dtrace: script './beginEnd.d' matched 2 probes
CPU ID FUNCTION:NAME
0 1 :BEGIN This is a heading
^C
0 2 :END This should appear at the END
Note The END probe does not fire until you interrupt (^C) the dtrace
command.
Module 2
Using DTrace
Objectives
Upon completion of this module, you should be able to:
Describe the DTrace performance monitoring capabilities
Examine performance problems using the vminfo provider
Examine performance problems using the sysinfo provider
Examine performance problems using the io provider
Use DTrace to obtain information about system calls
Create D scripts that use arguments
2-1
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Relevance
Additional Resources
Aggregations
Aggregated data is more useful than individual data points in answering
performance-related questions. For example, if you want to know the
number of page faults by process, you do not necessarily care about each
individual page fault. Rather, you want a table that lists the process names
and the total number of page faults.
DTrace is not required to store the entire set of data items for
aggregations; it keeps a running count, needing only the current
intermediate result and the new element. Intermediate results are kept per
central processing unit (CPU), enabling a scalable implementation
(because of not requiring the use of locks).
Function
Arguments Result
Name
Function
Arguments Result
Name
lquantize scalar A linear frequency distribution, sized by the specified
expression, range, of the values of the specified expression.
lower bound, Increments the value in the highest bucket that is less
upper bound, than or equal to the specified expression.
step value
quantize scalar A power-of-two frequency distribution of the values of
expression the specified expression. Increments the value in the
highest power-of-two bucket that is less than or equal to
the specified expression.
# ./writes.d
dtrace: script 'writes.d' matched 1 probe
^C
dtrace 1
date 1
bash 3
grep 20
file 197
ls 201
# ./writes2.d
dtrace: script 'writes2.d' matched 1 probe
^C
dtrace 1
bash 27
date 29
file 37
grep 60
ls 68
# kstat -n vm
module: cpu instance: 0
name: vm class: misc
anonfree 0
anonpgin 4
anonpgout 0
as_fault 157771
cow_fault 34207
crtime 0.178610697
dfree 56
execfree 0
execpgin 3646
execpgout 0
fsfree 56
fspgin 16257
fspgout 57
hat_fault 0
kernel_asflt 0
maj_fault 6743
pgfrec 34215
pgin 9188
pgout 36
pgpgin 19907
pgpgout 57
pgrec 34216
pgrrun 4
pgswapin 0
pgswapout 0
prot_fault 39794
rev 0
scan 28668
snaptime 349429.087071013
softlock 165
swapin 0
swapout 0
zfod 12835
anonfree Probe that fires when an unmodified anonymous page is freed as part of
paging activity. Anonymous pages are those that are not associated with
a file; memory containing such pages include heap memory, stack
memory, or memory obtained by explicitly mapping zero(7D).
anonpgin Probe that fires when an anonymous page is paged in from a swap
device.
anonpgout Probe that fires when a modified anonymous page is paged out to a swap
device.
as_fault Probe that fires when a fault is taken on a page and the fault is neither a
protection fault nor a copy-on-write fault.
cow_fault Probe that fires when a copy-on-write fault is taken on a page. The arg0
argument contains the number of pages that are created as a result of the
copy-on-write.
dfree Probe that fires when a page is freed as a result of paging activity. When
dfree fires, exactly one of the anonfree, execfree, or fsfree probes
also subsequently fires.
execfree Probe that fires when an unmodified executable page is freed as a result of
paging activity.
execpgin Probe that fires when an executable page is paged in from the backing
store.
execpgout Probe that fires when a modified executable page is paged out to the
backing store. Most paging of executable pages occurs in terms of the
execfree probe; the execpgout probe can only fire if an executable page
is modified in memory, an uncommon occurrence in most systems.
fsfree Probe that fires when an unmodified file system data page is freed as part
of paging activity.
softlock Probe that fires when a page is faulted as a part of placing a software lock
on the page.
swapin Probe that fires when a swapped-out process is swapped back in.
swapout Probe that fires when a process is swapped out.
zfod Probe that fires when a zero-filled page is created on demand.
Here the pi column denotes the number of kilobytes paged in per second.
The vminfo provider makes it easy to discover more about the source of
these page-ins. The following example uses an anonymous aggregation:
This output shows that the find command is responsible for most of the
page-ins. For a more complete picture of the find command in terms of
vm behavior, you can enable all vminfo probes. Before doing this,
however, you must introduce a filtering capability of DTrace called a
predicate.
Predicates
You might wonder why, with such a large memory load, scans do not
show up in the output of the dtrace command. This is because the
pageout daemon is running during scans, not the find user process. The
following example shows this behavior.
# cat mem.d
#!/usr/sbin/dtrace -s
vminfo:::
{
@vm[execname,probename] = count();
}
END
{
printa("%16s\t%16s\t%@d\n", @vm);
}
mkfile prot_fault 1
find prot_fault 1
dtrace pgrec 1
mkfile execpgin 2
mkfile kernel_asflt 2
vmstat prot_fault 2
rm zfod 3
find execpgin 3
sleep zfod 3
mkfile zfod 3
sendmail anonpgin 3
mkfile cow_fault 4
rm cow_fault 4
bash anonpgin 4
rm maj_fault 4
sendmail pgfrec 4
sleep cow_fault 4
find cow_fault 4
sendmail pgrec 4
...
bash pgrec 205
pageout fspgout 293
pageout anonpgout 293
pageout pgpgout 293
pageout pgout 293
pageout execpgout 293
pageout pgrec 293
pageout anonfree 360
pageout execfree 510
bash as_fault 519
pageout fsfree 519
sched dfree 523
sched pgrec 523
sched pgout 523
sched pgpgout 523
sched anonpgout 523
sched anonfree 523
sched execpgout 523
sched execfree 523
pageout dfree 803
rm pgrec 1388
rm pgfrec 1388
find maj_fault 5067
find fspgin 5085
find pgin 5088
find pgpgin 5088
The sysinfo provider probes fire immediately before the sys named
kstat is incremented. The following example displays the sys named
kstat.
# kstat -n sys
module: cpu instance: 0
name: sys class: misc
bawrite 112
bread 6359
bwrite 1401
canch 374
cpu_ticks_idle 2782331
cpu_ticks_kernel 46571
cpu_ticks_user 12187
cpu_ticks_wait 30197
cpumigrate 0
...
syscall 3991217
sysexec 1088
sysfork 1043
sysread 131334
sysvfork 47
syswrite 676775
trap 266286
ufsdirblk 1027383
ufsiget 1086164
ufsinopage 873613
ufsipage 2
wait_ticks_io 30197
writech 5144172931
xcalls 0
xmtint 0
# dtrace -s read.d
dtrace: script 'read.d' matched 5 probes
CPU ID FUNCTION:NAME
0 36754 :tick-10sec
bash
value ------------- Distribution ------------- count
0 | 0
1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13
2 | 0
file
value ------------- Distribution ------------- count
-1 | 0
0 | 2
1 | 0
2 | 0
4 | 6
8 | 0
16 | 0
32 | 6
64 | 6
128 |@@ 16
256 |@@@@ 30
512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 199
1024 | 0
2048 | 0
4096 | 1
8192 | 1
16384 | 0
grep
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@ 99
1 | 0
2 | 0
4 | 0
8 | 0
16 | 0
32 | 0
64 | 0
128 | 1
256 |@@@@ 25
512 |@@@@ 23
1024 |@@@@ 24
2048 |@@@@ 22
4096 | 4
8192 | 3
16384 | 0
The xcal and syscl columns display relatively high numbers, which
might be affecting the systems performance. Yet the system is relatively
idle, and is not spending time waiting on input/output (I/O). The xcal
numbers are per-second and are read from the xcalls field of the sys
kstat. To see which executables are responsible for the xcalls, enter the
following dtrace(1M) command:
# dtrace -n 'xcalls {@[execname] = count()}'
dtrace: description 'xcalls ' matched 3 probes
^C
find 2
cut 2
snmpd 2
mpstat 22
sendmail 101
grep 123
bash 175
dtrace 435
sched 784
xargs 22308
file 89889
#
The xargs and file commands appear to be part of a custom user shell
script. You can locate this script as follows:
# find / -name findtxt
/users1/james/findtxt
# cat /users1/james/findtxt
#!/bin/sh
find / -type f | xargs file | grep text | cut -f1 -d: >/tmp/findtxt$$
cat /tmp/findtxt$$ | xargs grep $1
rm /tmp/findtxt$$
#
You can gather more details on which kernel code is involved in all of the
cross-calls while the file and xargs commands are running. The
following example uses the stack() built-in DTrace function as the
aggregation key to show which kernel code is requesting the cross-call.
The number of unique kernel stack traces is being counted.
# dtrace -n 'xcalls {@[stack()] = count()}'
dtrace: description 'xcalls ' matched 3 probes
^C
SUNW,UltraSPARC-IIIi`send_mondo_set+0x9c
unix`xt_some+0xc4
unix`xt_sync+0x3c
unix`hat_unload_callback+0x6ec
unix`memscrub_scan+0x298
unix`memscrubber+0x308
unix`thread_start+0x4
2
SUNW,UltraSPARC-IIIi`send_mondo_set+0x9c
unix`xt_some+0xc4
unix`sfmmu_tlb_demap+0x118
unix`sfmmu_hblk_unload+0x368
unix`hat_unload_callback+0x534
unix`memscrub_scan+0x298
unix`memscrubber+0x308
unix`thread_start+0x4
2
...
SUNW,UltraSPARC-IIIi`send_mondo_set+0x9c
unix`xt_some+0xc4
unix`xt_sync+0x3c
unix`hat_unload_callback+0x6ec
genunix`anon_private+0x204
genunix`segvn_faultpage+0x778
genunix`segvn_fault+0x920
genunix`as_fault+0x4a0
unix`pagefault+0xac
unix`trap+0xc14
unix`utl0+0x4c
2303
SUNW,UltraSPARC-IIIi`send_mondo_set+0x9c
unix`xt_some+0xc4
unix`sfmmu_tlb_range_demap+0x190
unix`sfmmu_chgattr+0x2e8
genunix`segvn_dup+0x3d0
genunix`as_dup+0xd0
genunix`cfork+0x120
unix`syscall_trap32+0xa8
7175
SUNW,UltraSPARC-IIIi`send_mondo_set+0x9c
unix`xt_some+0xc4
unix`xt_sync+0x3c
unix`sfmmu_chgattr+0x2f0
genunix`segvn_dup+0x3d0
genunix`as_dup+0xd0
genunix`cfork+0x120
unix`syscall_trap32+0xa8
11492
As this output shows, the majority of the cross-calls are the result of a
significant number of fork(2) system calls. (Shell scripts are notorious for
abusing their fork(2) privileges.) Page faults of anonymous memory are
also involved, which probably accounts for the large number of minor
page faults seen in the mpstat output.
The io Probes
Table 2-4 describes the io probes.
start Probe that fires when an I/O request is about to be made to a disk
device or to an NFS server. The buf(9S) structure corresponding to the
I/O request is pointed to by the args[0] argument. The devinfo_t
structure of the device to which the I/O is being issued is pointed to
by the args[1] argument. The fileinfo_t structure of the file that
corresponds to the I/O request is pointed to by the args[2]
argument. Note that file information availability depends on the file
system making the I/O request.
done Probe that fires after an I/O request has been fulfilled. The buf(9S)
structure corresponding to the I/O request is pointed to by the
args[0] argument. The devinto_t structure of the device to which the
I/O was issued is pointed to by the args[1] argument. The
fileinfo_t structure of the file that corresponds to the I/O request is
pointed to by the args[2] argument.
The io provider uses three I/O structures: the buf(9S) structure, the
devinfo_t structure, and the fileinfo_t structure.
When the io probes fire, the following arguments are made available:
args[0] Set to point to the buf(9S) structure corresponding to the
I/O request.
args[1] Set to point to the devinfo_t structure of the device to
which the I/O was issued.
args[2] Set to point to the fileinfo_t structure containing file
system related information regarding the issued I/O request.
The buf(9S) structure is the abstraction that describes an I/O request. The
address of this structure is made available to your D programs through
the args[0] argument. Here is its definition:
struct buf {
int b_flags; /* flags */
size t b_bcount; /* number of bytes */
caddr_t b_addr; /* buffer address */
uint64_t b_blkno; /* expanded block # on device */
uint64_t b_lblkno; /* block # on device */
size_t b_resid; /* # of bytes not transferred */
size t b_bufsize; /* size of allocated buffer */
caddr_t b_iodone; /* I/O completion routine */
int b_error; /* expanded error field */
dev_t b_edev; /* extended device */
}
The b_flags member indicates the state of the I/O buffer and consists of
a bitwise OR operator of different state values.
Table 2-5 shows the valid state values for the b_flags field.
Table 2-6 shows the field descriptions for the buf(9S) structure.
Field Description
Table 2-7 shows the field descriptions for the devinfo_t structure.
Field Description
Table 2-8 shows the field descriptions for the fileinfo_t structure.
Field Description
This output indicates that a large amount of data is being read from disk
drive sd2 and written to disk drive sd0. Someone appears to be
transferring many megabytes of data between these two drives. Both
disks are consistently over 50% busy. Is someone running a file transfer
command such as tar(1), cpio(1), cp(1), or dd(1M)? The iosnoop.d D
script enables you to determine who is performing this I/O.
The following D script displays data that enables you to determine which
commands are running, what type of I/O those commands are
performing, and which disk devices are involved.
# cat -n iosnoop.d
1 #!/usr/sbin/dtrace -qs
2 BEGIN
3 {
4 printf("%16s %5s %40s %10s %2s %7s\n", "COMMAND", "PID", "FILE",
5 "DEVICE", "RW", "MS");
6 }
7
8 io:::start
9 {
10 start[args[0]->b_edev, args[0]->b_blkno] = timestamp;
11 command[args[0]->b_edev, args[0]->b_blkno] = execname;
12 mypid[args[0]->b_edev, args[0]->b_blkno] = pid;
13 }
14
15 io:::done
16 /start[args[0]->b_edev, args[0]->b_blkno]/
17 {
18 elapsed = timestamp - start[args[0]->b_edev, args[0]->b_blkno];
19 printf("%16s %5d %40s %10s %2s %3d.%03d\n", command[args[0]->b_edev,
20 args[0]->b_blkno], mypid[args[0]->b_edev, args[0]->b_blkno],
21 args[2]->fi_pathname, args[1]->dev_statname,
22 args[0]->b_flags&B_READ? "R": "W", elapsed/1000000,
23 (elapsed/1000)%1000);
24 start[args[0]->b_edev, args[0]->b_blkno] = 0; /* free memory */
25 command[args[0]->b_edev, args[0]->b_blkno] = 0; /* free memory */
26 mypid[args[0]->b_edev, args[0]->b_blkno] = 0; /* free memory */
27 }
The following output results from running the previous iosnoop.d script.
It clearly shows who is performing the I/O operations. Someone is
copying the shared object files from /usr/lib on drive sd2 to a backup
directory on drive sd0.
# ./iosnoop.d
COMMAND PID FILE DEVICE RW MS
bash 725 /usr/bin/bash sd2 R 9.471
bash 725 /usr/lib sd2 R 7.128
bash 725 /usr/lib sd2 R 3.193
bash 725 /usr/lib sd2 R 11.283
bash 725 /lib/libc.so.1 sd2 R 7.696
bash 725 /lib/libnsl.so.1 sd2 R 10.293
bash 768 /lib/libnsl.so.1 sd2 R 0.582
cp 768 /lib/libc.so.1 sd2 R 10.154
cp 768 /lib/libc.so.1 sd2 R 7.262
cp 768 /lib/libc.so.1 sd2 R 9.914
cp 768 /usr/lib/0@0.so.1 sd2 R 9.270
cp 768 /usr/lib/0@0.so.1 sd2 R 13.654
cp 768 /mnt/lib.backup/0@0.so.1 sd0 W 2.431
cp 768 /usr/lib/ld.so sd2 R 6.890
cp 768 /usr/lib/ld.so sd2 R 7.085
cp 768 /usr/lib/ld.so sd2 R 0.376
cp 768 /mnt/lib.backup/ld.so sd0 W 6.698
cp 768 /mnt/lib.backup/ld.so sd0 W 6.437
cp 768 /mnt/lib.backup/ld.so.1 sd0 W 4.394
cp 768 <unknown> sd2 R 2.206
cp 768 /mnt/lib.backup/ld.so.1 sd0 W 8.479
cp 768 /mnt/lib.backup/ld.so.1 sd0 W 8.440
cp 768 /usr/lib/lib300.so.1 sd2 R 5.771
cp 768 /usr/lib/lib300.so.1 sd2 R 6.003
cp 768 /usr/lib/lib300.so.1 sd2 R 0.530
cp 768 /usr/lib/lib300.so.1 sd2 R 7.912
cp 768 <unknown> sd2 R 3.014
cp 768 /mnt/lib.backup/lib300.so sd0 W 7.861
^C
The probe for return from the read(2) system call is:
syscall::read:return
Note that the module name is undefined for the syscall provider probes.
The system call names are usually, but not always, the same as those
documented in Section 2 of the Solaris 10 OS manual pages. The actual
names are listed in the /etc/name_to_sysnum system file. Examples of
system call names that do not match the manual pages are:
rexit for exit(2)
gtime for time(2)
semsy for semctl(2), semget(2), semids(2), and semtimedop(2)
signotify, which has no manual page, and is used for POSIX.4
message queues
Large file system calls such as:
creat64 for creat(2)
lstat64 for lstat(2)
open64 for open(2)
mmap64 for mmap(2)
For the entry probes, the arguments (arg0, arg1, ... argn) are the
arguments to the system call. For return probes, both arg0 and arg1
contain the same value: return value from the system call. You can check
system call failure in the return probe by referencing the errno D
variable. The following example shows which system calls are failing for
which applications and with what errno value.
# cat errno.d
#!/usr/sbin/dtrace -qs
syscall:::return
/arg0 == -1 && execname != "dtrace"/
{
printf("%-20s %-10s %d\n", execname, probefunc, errno);
}
# ./errno.d
sac read 4
ttymon pause 4
ttymon read 11
nscd lwp_kill 3
in.routed ioctl 12
in.routed ioctl 12
tty open 2
tty stat 2
bash setpgrp 13
bash waitsys 10
bash stat64 2
snmpd ioctl 12
^C
The errno.d D program has a predicate that uses the AND operator:
&&. The predicate states that the return from the system call must be -1,
which is how all system calls indicate failure, and that the process
executable name cannot be dtrace. The printf built-in D function uses
the %-20s and %-10s format specifications to left-justify the strings in the
given minimum column width.
The following simple D script counts the number of system calls being
issued system wide.
# cat syscall.d
#!/usr/sbin/dtrace -qs
syscall:::entry
{
@[probefunc] = count();
}
# ./syscall.d
^C
mmap64 1
mkdir 1
umask 1
getloadavg 1
getdents64 2
...
stat 1754
ioctl 1956
close 2708
write 2733
mmap 3006
read 3880
sigaction 7886
brk 12695
The output indicates that the majority of the system calls are setting up
signal handling (sigaction(2)) or growing the heap (brk(2)). The
following D script enables you to discover who is making the brk(2)
system calls.
# cat brk.d
#!/usr/sbin/dtrace -qs
syscall::brk:entry
{
@[execname] = count();
}
# ./brk.d
^C
dtrace 6
prstat 22
nroff 48
cat 48
tbl 142
eqn 144
rm 166
ln 166
col 222
expr 332
head 492
fgrep 492
dirname 581
grep 722
instant 738
sh 917
nawk 984
sgml2roff 1259
nsgmls 13296
# ps -ef | grep nsgmls
root 591 590 2 07:56:32 pts/2 0:00 /usr/lib/sgml/nsgmls -
gl -m/usr/share/lib/sgml/locale/C/dtds/catalog -E0 /usr/s
# man nsgmls
No manual entry for nsgmls.
# man -k sgml
sgml sgml (5) - Standard Generalized Markup Language
solbook sgml (5) - Standard Generalized Markup Language
The ptree command returns no results because the nsgmls process is too
short-lived for the command to be run on it. You have learned, however,
that the problem is not a long-lived process causing a memory leak. Now
write a quick D script to print out the ancestry. You must keep trying the
next previous parent iteratively, because many of the other processes
involved are also short-lived.
Note This particular D script fails if an ancestor does not exist. This is
because the top ancestor, the sched process has no parent. You cannot
harm the kernel even if a D script uses a bad pointer. The intent of this
example is to show how you can quickly create custom D scripts to
answer questions about system behavior. Many of your D scripts will be
throw-away scripts that you will not re-use. You can fix the script by
testing each parent pointer with a predicate before printing. You will see
this fix later with the ancestors3.d D script.
# cat ancestors.d
# cat -n ancestors.d
1 #!/usr/sbin/dtrace -qs
2 syscall::brk:entry
3 /execname == "nsgmls"/
4 {
5 printf("process: %s\n",
6 curthread->t_procp->p_user.u_psargs);
7 printf("parent: %s\n",
8 curthread->t_procp->p_parent->p_user.u_psargs);
9 printf("grandparent: %s\n",
10 curthread->t_procp->p_parent->p_parent->p_user.u_psargs);
11 printf("greatgrandparent: %s\n",
12 curthread->t_procp->p_parent->p_parent->p_parent->p_user.u_psargs);
13 printf("greatgreatgrandparent: %s\n",
14 curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_user.u_psargs);
15 printf("greatgreatgreatgrandparent: %s\n",
16 curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_parent->p_user.u_psargs);
17 }
# ./ancestors.d
process: /usr/lib/sgml/nsgmls -gl -m/usr/share/lib/sgml/locale/C/dtds/catalog -E0 /usr/s
parent: /usr/lib/sgml/instant -d -c/usr/share/lib/sgml/locale/C/transpec/roff.cmap -s/u
grandparent: /bin/sh /usr/lib/sgml/sgml2roff /usr/share/man/sman4/rt_dptbl.4
greatgrandparent: sh -c cd /usr/share/man; /usr/lib/sgml/sgml2roff
/usr/share/man/sman4/rt_dptbl.
greatgreatgrandparent: catman
greatgreatgreatgrandparent: bash
2326 -sh
2332 bash
2333 catman
17232 sh -c cd /usr/share/man; rm -f /usr/share/man/cat4/variables.4;
ln -s ../cat4/e
17235 sh -c cd /usr/share/man; rm -f /usr/share/man/cat4/variables.4;
ln -s ../cat4/e
The previous output indicates that all of the brk(2) system calls resulted
from the catman(1M) command, creating many short-lived children that
issued this system call.
Figure 2-1 shows a diagram of the kernel data structures being accessed
by this example.
kthread_t
curthread
t_state
t_pri
proc_t proc_t proc_t
t_lwp
p_exec p_exec
.
t_procp .
p_as p_as
. .
. p_cred p_cred
.
p_parent p_parent p_parent
/usr/include/sys/thread.h
p_tlist p_tlist
.
. . .
. . .
. .
user_t p_user p_user p_user
u_start . .
. .
u_ticks . .
/usr/include/sys/user.h u_psargs[ ] u_psargs[ ] u_psargs[ ]
u_cdir
. .
. . .
. . .
.
/usr/include/sys/proc.h
hypothesis->instrumentation->data gathering->analysis->hypothesis
D Language Variables
The D language has five basic variable types:
Scalar variables Have fixed-size values such as integers, structures
and pointers
Associative arrays Store values indexed by one or more keys,
similar to aggregations
Thread-local variables Have one name, but storage is local to each
separate kernel thread. These variables are prefixed with the self->
keyword.
Clause-local variables Appear when an action block is entered;
storage is reclaimed after leaving the probe clause. These variables
are prefixed with the this-> keyword.
Kernel external variables DTrace has access to all kernel global and
static variables. These variables are prefixed with a backquote ().
You can access kernel global and static variables within your D programs.
To access these external variables, you prefix the global kernel variable
with the (back quote or grave accent) character. For example, to
reference the freemem kernel global variable use: freemem. If the
variable is part of a kernel module that conflicts with other module
variable names, use the character between the module name and the
variable name. For example, sdsd_state references the sd_state
variable within the sd kernel module.
Associative Arrays
Associative arrays enable the storing of scalar values in elements of an
array (or table) that are identified by one or more sequences of comma-
separated key fields (an n-tuple). The keys can be any combination of
strings or integers. The following code example shows the use of an
associative array to track how often any command issues more than a
given number of any single system call:
# cat -n assoc2.d
1 #!/usr/sbin/dtrace -qs
2 syscall:::entry
3 {
4 ++namesys[pid,probefunc];
5 x = namesys[pid,probefunc] > 5000 ? 1 : 0;
6 }
7 syscall:::entry
8 /x && execname != "dtrace"/
9 {
10 printf("Process: %d %s has just made more than 5000 %s
calls\n",
11 pid, execname, probefunc);
12 namesys[pid,probefunc] = 0; /* reset the count */
13 }
# ./assoc2.d
Process: 14837 find has just made more than 5000 lstat64 calls
Process: 14837 find has just made more than 5000 lstat64 calls
Process: 14854 ls has just made more than 5000 lstat64 calls
Process: 14854 ls has just made more than 5000 acl calls
Process: 14854 ls has just made more than 5000 lstat64 calls
^C
Thread-Local Variables
Thread-local variables are useful when you wish to enable a probe and
mark with a tag every thread that fires the probe. Thread-local variables
share a common name but refer to separate data storage associated with
each thread. Thread-local variables are referenced with the special
keyword self followed by the two characters ->, as shown in the
following example:
syscall::read:entry
{
self->read = 1;
}
syscall::read:return
/self->read/
{
printf("Same thread is returning from read\n");
}
# ./timegrep.d
size time
8192 7108972
319 1526616
0 12112
3293 5663329
0 18816
^C
The next example uses an associative array to time every system call
performed by the grep command.
# cat -n timesys.d
1 #!/usr/sbin/dtrace -qs
2 BEGIN
3 {
4 printf("System Call Times for grep:\n\n");
Syscall Microseconds
mmap 50
resolvepath 47
resolvepath 67
stat 37
open 46
stat 34
open 32
...
brk 25
open64 43
read 8126
brk 20
brk 28
read 24
close 26
^C
Predictably, the system call that took the most time was read, because of
the disk I/O wait time (the second read was of 0 bytes).
The fbt provider probe clause has an empty action. The default action for
DTrace tracks every time you enter and return from all kernel functions
involved in a read(2) system call until it terminates. Option -F of the
dtrace(1M) command indents the output of each nested function call and
shows this with the -> symbol; it un-indents the output when that
function returns back up the call tree and shows this with the <- symbol.
# dtrace -F -s follow.d
dtrace: script './follow.d' matched 38108 probes
CPU FUNCTION
0 -> read32
0 <- read32
0 -> read
0 -> getf
0 -> set_active_fd
0 <- set_active_fd
0 <- getf
...
0 <- ufs_rwlock
0 -> fop_read
0 <- fop_read
0 -> ufs_read
0 -> ufs_lockfs_begin
...
0 -> rdip
0 -> rw_write_held
0 <- rw_write_held
0 -> segmap_getmapflt
0 -> get_free_smp
0 -> grab_smp
0 -> segmap_hashout
...
0 <- sfmmu_kpme_lookup
0 -> sfmmu_kpme_sub
...
0 <- page_unlock
0 <- grab_smp
0 -> segmap_pagefree
0 -> page_lookup_nowait
0 -> page_trylock
...
0 <- segmap_hashin
0 -> segkpm_create_va
0 <- segkpm_create_va
0 -> fop_getpage
0 -> ufs_getpage
0 -> ufs_lockfs_begin_getpage
0 -> tsd_get
...
0 <- page_exists
0 -> page_lookup
0 <- page_lookup
0 -> page_lookup_create
0 <- page_lookup_create
0 -> ufs_getpage_miss
0 -> bmap_read
0 -> findextent
0 <- findextent
0 <- bmap_read
0 -> pvn_read_kluster
0 -> page_create_va
0 -> lgrp_mem_hand
...
0 <- page_add
0 <- page_create_va
0 <- pvn_read_kluster
0 -> pagezero
0 -> ppmapin
0 -> sfmmu_get_ppvcolor
0 <- sfmmu_get_ppvcolor
0 -> hat_memload
0 -> sfmmu_memtte
0 <- sfmmu_memtte
...
0 -> xt_some
0 <- xt_some
0 <- xt_sync
...
0 <- sema_init
0 <- pageio_setup
0 -> lufs_read_strategy
0 -> logmap_list_get
0 <- logmap_list_get
0 -> bdev_strategy
0 -> bdev_strategy_tnf_probe
0 <- bdev_strategy_tnf_probe
0 <- bdev_strategy
0 -> sdstrategy
0 -> getminor
...
0 <- drv_usectohz
0 -> timeout
0 <- timeout
0 -> timeout_common
...
0 <- getminor
0 -> scsi_transport
0 <- scsi_transport
0 -> glm_scsi_start
0 -> ddi_get_devstate
...
0 <- ddi_get_soft_state
0 -> pci_pbm_dma_sync
0 <- pci_pbm_dma_sync
0 <- pci_dma_sync
0 <- glm_start_cmd
0 <- glm_accept_pkt
0 <- glm_scsi_start
0 <- sd_start_cmds
0 <- sd_core_iostart
0 <- xbuf_iostart
0 <- lufs_read_strategy
0 -> biowait
0 -> sema_p
0 -> disp_lock_enter
0 <- disp_lock_enter
0 -> thread_lock_high
0 <- thread_lock_high
0 -> ts_sleep
0 <- ts_sleep
0 -> disp_lock_exit_high
0 <- disp_lock_exit_high
0 -> disp_lock_exit_nopreempt
0 <- disp_lock_exit_nopreempt
0 -> swtch
0 -> disp
0 -> disp_lock_enter
0 <- disp_lock_enter
0 -> disp_lock_exit
0 <- disp_lock_exit
0 -> disp_getwork
0 <- disp_getwork
0 <- disp
0 <- swtch
0 -> resume
0 <- resume
0 -> disp_lock_enter
,,,
0 <- hat_page_getattr
0 <- segmap_getmapflt
0 -> uiomove
0 -> xcopyout
0 <- xcopyout
0 <- uiomove
0 -> segmap_release
0 -> get_smap_kpm
...
0 <- ufs_imark
0 <- ufs_itimes_nolock
0 <- rdip
...
0 <- cv_broadcast
0 <- releasef
0 <- read
0 -> read
Although more than half of the functions were removed from the
previous output, the example shows that a great many functions are
required to perform a disk file read. Some of the key functions are
described below:
read read(2) system call entered
ufs_read UFS file being read
segmap_getmapflt Find segmap page for the I/O
segmap_pagefree Free underlying previous physical page tied to
this segmap virtual page onto the cachelist (this policy replaced the
old priority paging)
ufs_getpage Ask UFS to retrieve the page
page_lookup First check to see if the page is in memory (it is not)
page_create_va Get new physical page for the I/O
hat_memload Map the virtual page to the physical page
xt_some Issue cross-trap call to some CPUs
sdstrategy Issue Small Computer System Interface (SCSI)
command to read page from disk into segmap page
timeout Prepare for SCSI timeout of disk read request
glm_scsi_start In glm host bus adapter driver
biowait Wait for block I/O
sema_p Use semaphore to wait for I/O
ts_sleep Put timesharing (TS) thread on sleep queue
swtch Do a context switch (have thread give up the CPU while it
waits for the I/O)
disp_getwork Find another thread to run while this thread waits
for its I/O
resume I/O has completed and CPU is returned to resume
running
uimove Move data from kernel buffer (page) to user-land buffer
segmap_release Release segmap page for use by another I/O
later
read Read operation ends
You can specify other options to the dtrace(1M) command on this line; be
sure, however, to use only one dash (-) followed by the options, with s
being last:
#!/usr/sbin/dtrace -qvs
You can also specify all options to the dtrace(1M) command by using
#pragma lines inside the D script:
# cat -n mem2.d
1 #!/usr/sbin/dtrace -s
2
3 #pragma D option quiet
4 #pragma D option verbose
5
6 vminfo:::
7 {
8 @[execname,probename] = count();
9 }
10
11 END
12 {
13 printa("%-20s %-15s %@d\n", @);
14 }
Note For the list of option names used in #pragma lines, see the Solaris
Dynamic Tracing Guide, part number 817-6223-10.
# ./params.d 1 2 fubar 4 5 6 7 8 9 10 1
name of script: ./params.d
pid of script: 5363
9th arg passed to script: 9
# ./params.d 1 2 3 4 5 6 7 8 9 10 11
^C
The last invocation of the script did not output anything because the
value of the first argument did not match the value of the eleventh
argument. The following invocations show that the type and number of
arguments must match those referenced inside the D script. This is an
example of the error-checking capability of the DTrace facility:
# ./params.d 1 2 3 4 5 6 7 8 9
dtrace: failed to compile script ./params.d: line 5: macro argument $11
is not defined
# ./params.d 1 2 3 4 5 6 7 8 9 10 11 12 13
dtrace: failed to compile script ./params.d: line 12: extraneous argument
'13' ($13 is not referenced)
# ./params.d a b c d e f g h i j k
dtrace: failed to compile script ./params.d: line 5: failed to resolve a:
Unknown variable name
# ./args.d
dtrace: failed to compile script args.d: line 10: macro argument $1 is
not defined
# dtrace -x defaultargs -qs args.d
x: 5
name:
# dtrace -x defaultargs -qs args.d 2 3 4
dtrace: failed to compile script args.d: line 20: extraneous argument '4'
($3 is not referenced)
# pgrep vi
2208
# ./syscalls2.d 2208
rexit 1
setpgrp 1
creat 1
getpid 1
open 1
lstat64 1
stat64 1
fdsync 1
unlink 2
close 2
alarm 2
lseek 3
sigaction 5
ioctl 45
read 143
write 178
You can run the same script with a different process name and system
call, which shows the power of being able to pass in arguments to a D
script:
# ./ancestors2.d vi sigaction
process: vi /etc/system
parent: bash
grandparent: -sh
greatgrandparent: /usr/sbin/in.telnetd
greatgreatgrandparent: /usr/lib/inet/inetd start
greatgreatgreatgrandparent: /sbin/init
# cat -n ancestors3.d
1 #!/usr/sbin/dtrace -qs
2
3 syscall::$2:entry
4 /execname == $$1/
5 {
6 printf("process: %s\n", curthread->t_procp->p_user.u_psargs);
7 nextpaddr = curthread->t_procp->p_parent;
8 }
9
10 syscall::$2:entry
11 /(execname == $$1) && nextpaddr/
12 {
13 printf("parent: %s\n", nextpaddr->p_user.u_psargs);
14 nextpaddr = curthread->t_procp->p_parent->p_parent;
15 }
16
17 syscall::$2:entry
18 /(execname == $$1) && nextpaddr/
19 {
20 printf("grandparent: %s\n", nextpaddr->p_user.u_psargs);
21 nextpaddr = curthread->t_procp->p_parent->p_parent->p_parent;
22 }
23
24 syscall::$2:entry
25 /(execname == $$1) && nextpaddr/
26 {
27 printf("greatgrandparent: %s\n", nextpaddr->p_user.u_psargs);
28 nextpaddr = curthread->t_procp->p_parent->p_parent->p_parent->p_parent;
29 }
30
31 syscall::$2:entry
32 /(execname == $$1) && nextpaddr/
33 {
34 printf("greatgreatgrandparent: %s\n", nextpaddr->p_user.u_psargs);
35 nextpaddr = curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_parent;
36 }
37
38 syscall::$2:entry
39 /(execname == $$1) && nextpaddr/
40 {
41 printf("greatgreatgreatgrandparent: %s\n", nextpaddr->p_user.u_psargs);
42 exit(0);
43 }
36 ++exec;
37 }
38
39 sysinfo:::readch
40 {
41 rchar = rchar + arg0;
42 }
43
44 sysinfo:::writech
45 {
46 wchar = wchar + arg0;
47 }
48
49 tick-1sec
50 {
51 ++i;
52 }
53
54 tick-1sec
55 /i == $1/
56 {
57 ++n;
58 printf("%10d %10d %10d %10d %10d %10d %10d\n", scall/i,
59 sread/i, swrit/i, fork/i, exec/i, rchar/i, wchar/i);
60 i = 0;
61 scall = 0;
62 sread = 0;
63 swrit = 0;
64 fork = 0;
65 exec = 0;
66 rchar = 0;
67 wchar = 0;
68 }
69
70 tick-1sec
71 /n == $2/
72 {
73 exit(0);
74 }
# ./sar-c.d 5 6
scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
43 0 0 0 0 0 15
70 1 2 0 0 1 32
42 2 2 0 0 2 17
75 0 1 0 0 351 39
436 26 34 3 3 3329 317
38 0 0 0 0 0 15
# cat -n vm.d
1 #!/usr/sbin/dtrace -qs
2 /*
3 * Usage: vmd.d interval count
4 */
5
6 BEGIN
7 {
8 printf("%8s %8s %8s\n", "free", "re", "sr");
9 }
10
11 tick-1sec
12 {
13 ++i;
14 @free["freemem"] = sum(8*`freemem);
15 }
16
17 vminfo:::pgrec
18 {
19 ++re;
20 }
21
22 vminfo:::scan
23 {
24 ++sr;
25 }
26
27 tick-1sec
28 /i == $1/
29 {
30 normalize(@free, $1);
31 printa("%8@d ", @free);
32 printf("%8d %8d\n", re/i, sr/i);
33 ++n;
34 i = 0;
35 re = 0;
36 sr = 0;
37 clear(@free);
38 }
39
40 tick-1sec
41 /n == $2/
42 {
43 exit(0);
44 }
# ./vm.d 5 12
free re sr
385296 0 0
385296 0 0
385296 0 0
385296 0 0
316180 2 0
22297 1 19040
1976 2 31727
1964 3 31727
1971 2 31727
1968 3 31727
1964 3 31727
1955 4 31728
Like the vmstat(1M) command, the vm.d script expects two arguments:
the interval value and a count value. The i, re, sr, and n variables are D
global scalar variables used for counting. Note the special reference to the
kernels freemem variable: freemem. The script multiplies freemem by 8
because it sums in units of kilobytes, not pages, and the assumption is
that a page is 8 Kbytes in size. The script uses the sum() aggregation with
the normalize() built-in function which divides the current sum by the
interval value to get per second averages. The script also clears the
running sum of freemem every interval with the clear() built-in
function. The printa() built-in function, which is covered in detail in
Appendix A, prints the value of the sum() aggregation.
Because you are using integer-truncated arithmetic, you can lose some
data. This is also true when using the vmstat(1M) command. For
example, if there are only four page reclaims in the five-second interval,
then the average per second shows as 0. This output shows that the
system is experiencing sustained scanning of memory by the page
daemon, as indicated by the consistently high number of scans per
second. It also shows that someone has used most of the free memory
within a short period of time, which explains the high scan rates.
Objectives
Upon completion of this module, you should be able to:
Use DTrace to profile an application
Use DTrace to access application variables
Use Dtrace to find transient system call errors in an application
Use DTrace to determine the names of files being opened
3-1
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Relevance
Additional Resources
Application Profiling
DTrace provides tools for understanding the behavior of user processes. It
can help you to:
Debug applications
Analyze application performance problems
Understand the behavior of a complex application
These tools can be used alone to determine the cause of problems with
application program behavior, or as an adjunct to traditional debugging
tools such as the mdb(1) debugger.
This module describes the DTrace facilities used to trace user process
activity. It also provides examples of how to use those facilities.
The pid provider defines a class of providers; any process can have its
own associated pid provider. You trace a process with process
identification number (PID) 1234, for example, by using the pid1234
provider.
Unlike most other providers, the pid provider creates probes on demand
based on the probe descriptions found in your D programs. As a result,
you do not see any pid probes listed in the output of the dtrace -l
command until you have enabled them. This is shown in the following
example:
# dtrace -l | awk '{print $2}' | sort -u
PROVIDER
dtrace
fasttrap
fbt
fpuinfo
io
lockstat
mib
proc
profile
sched
sdt
syscall
sysinfo
vminfo
#
In the following example, you enable all of the function entry probes for
the shell:
# echo $$
8586
# dtrace -n 'pid8586:::entry'
dtrace: description 'pid8586:::entry' matched 6653 probes
^C
You name the object using only the file name portion, not the complete
path name. You can also omit the suffixes. The following names describe
the same probe:
pid8586:libc.so.1:strcmp:entry
pid8586:libc.so:strcmp:entry
pid8586:libc:strcmp:entry
For the executable load object, use either the file name of the executable or
a.out. The following two probe descriptions name the same probe:
pid8586:bash:main:return
pid8586:a.out:main:return
# echo $$
8577
# dtrace -n 'pid8567:libc:strcmp:entry'
dtrace: description 'pid8567:libc:strcmp:entry' matched 1 probe
CPU ID FUNCTION:NAME
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
0 45136 strcmp:entry
The simplest mode of operation for the pid provider is as the user-level
analogue to the fbt provider. The following example traces all function
entries and returns made from a given function. The tracecalls.d D
script takes two command-line arguments: $1 for the PID of the process
being traced, and $2 for the function name from which you want to trace
all function calls. The simple C program that the script is going to trace is
shown below. This C program calls one function after another, performing
simple arithmetic operations:
# cat -n calls.c
1 int f5(int a, int b)
2 {
3 return (a+b);
4 }
5
6 int f4(int a, int b)
7 {
8 int r;
9
10 r = f5(a,b)+13;
11 return(r);
12 }
13
14 int f3(int a)
15 {
16 int r;
17
18 usleep(650);
19 r = f4(a-3, a+3);
20 return(r);
21 }
22
23 int f2(int a)
24 {
25 return(f3(5*a));
26 }
27
28 int f1(int a, int b)
29 {
30 int r;
31
32 usleep(90);
33 r = f2(a-b);
34 return(r);
35 }
36
37 main()
38 {
39 int x;
40
41 x = f1(13,6);
42 printf("%d\n", x);
43 x = f1(17,5);
44 printf("%d\n", x);
45 }
# gcc calls.c -o calls
# calls
83
133
# cat -n tracecalls.d
1 #!/usr/sbin/dtrace -s
2
3 pid$1:calls:$2:entry
4 {
5 self->trace = 1;
6 }
7
8 pid$1:calls:$2:return
9 /self->trace/
10 {
11 self->trace = 0;
12 }
13
14 pid$1:calls::entry,
15 pid$1:calls::return
16 /self->trace/
17 {
18 }
You start the calls application in a second window through the mdb(1)
debugger. This enables you to stop it as soon as possible in the start-up
function that calls the main() function. The _start:b command sets a
breakpoint in the _start function where the application starts running.
The :r command starts the process running; it immediately hits the
breakpoint and stops. You then escape from the debugger by using the
!ps command to find the PID of the calls process:
# mdb calls
> _start:b
> :r
mdb: stop at _start
mdb: target stopped at:
_start: clr %fp
> !ps
PID TTY TIME CMD
8916 pts/3 0:00 ps
8914 pts/3 0:00 calls
8586 pts/3 0:01 bash
8915 pts/3 0:00 sh
8580 pts/3 0:00 sh
8913 pts/3 0:00 mdb
You can now run the dtrace command in the first terminal window to
trace the function calls, starting with the f1 function. You must also
continue the process with the :c mdb command after starting the dtrace
command:
# dtrace -F -s tracecalls.d 8914 f1
dtrace: script 'tracecalls.d' matched 16 probes
By adding a line to the tracecalls.d script, you can print the arguments
to the functions as well as return value information. Arguments to
functions are represented with arg0, arg1, arg2, and so on. The function
return value is placed in the arg1 argument, with the arg0 argument
containing the offset within the function where the return occurred. The
following D script example prints the arguments to functions:
# cat -n tracecalls2.d
1 #!/usr/sbin/dtrace -s
2
3 pid$1:calls:$2:entry
4 {
5 self->trace = 1;
6 }
7
8 pid$1:calls:$2:return
9 /self->trace/
10 {
11 self->trace = 0;
12 }
13
14 pid$1:calls::entry,
15 pid$1:calls::return
16 /self->trace/
17 {
18 printf("%d %d", arg0, arg1);
19 }
The following commands are entered in the mdb(1) window which started
the calls program. On return from a function, the arg0 argument is the
offset within the function where the restore instruction executed to leave
the function, and the arg1 argument is the return value, as follows:
> f5+0t40/i
f5+0x28:
f5+0x28: restore
> f5+0x24,2/i
f5+0x24:
f5+0x24: ret
f5+0x28: restore
> f2+0t48,2/i
f2+0x30:
f2+0x30: ret
f2+0x34: restore
>
30
31 r = f2(a-b);
32 return(r);
33 }
34
35 main()
36 {
37 int x;
38
39 x = f1(13,6);
40 printf("%d\n", x);
41 }
# cat -n traceall.d
1 #!/usr/sbin/dtrace -qs
2 #pragma D option flowindent
3
4 pid$1::$2:entry
5 {
6 self->trace = 1;
7 }
8
9 pid$1:::entry, pid$1:::return, fbt:::
10 /self->trace/
11 {
12 printf("%s\n", curlwpsinfo->pr_syscall ? "K"
: "U");
13 }
14
15 pid$1::$2:return
16 /self->trace/
17 {
18 self->trace = 0;
19 }
The traced calls follow. Many of the function calls are for setting up the
dynamic binding to the library functions on first call. The following
example shows a portion of the output of this script:
# traceall.d 12861 main
CPU FUNCTION
0 -> main U
0 -> f1 U
0 -> f2 U
0 -> f3 U
0 -> f4 U
0 -> f5 U
0 <- f5 U
0 <- f4 U
0 <- f3 U
0 <- f2 U
0 <- f1 U
0 -> elf_rtbndr U
0 -> elf_bndr U
0 -> enter U
0 -> rt_bind_guard U
0 <- rt_bind_guard U
0 -> _ti_bind_guard U
0 <- _ti_bind_guard U
0 -> rt_mutex_lock U
0 <- rt_mutex_lock U
0 -> _lwp_mutex_lock U
0 <- _lwp_mutex_lock U
0 <- enter U
0 -> lookup_sym U
0 -> elf_hash U
0 <- elf_hash U
0 -> callable U
0 <- callable U
0 -> elf_find_sym U
0 -> strcmp U
...
0 <- elf_bndr U
0 <- elf_rtbndr U
0 -> printf U
0 -> _flockget U
0 -> mutex_lock U
0 <- mutex_lock U
0 -> mutex_lock_impl U
0 <- mutex_lock_impl U
0 <- _flockget U
0 -> _setorientation U
0 <- _setorientation U
0 -> _ndoprnt U
0 -> elf_rtbndr U
0 -> elf_bndr U
0 -> enter U
0 -> rt_bind_guard U
...
0 -> _write U
0 -> pre_syscall K
0 -> syscall_mstate K
0 <- syscall_mstate K
0 <- pre_syscall K
0 -> write32 K
0 <- write32 K
0 -> write K
0 -> getf K
0 -> set_active_fd K
...
0 <- clear_active_fd K
0 -> cv_broadcast K
0 <- cv_broadcast K
0 <- releasef K
0 <- write K
0 -> post_syscall K
0 -> clear_stale_fd U
0 <- clear_stale_fd U
0 -> syscall_mstate U
0 <- syscall_mstate U
0 <- post_syscall U
0 <- _xflsbuf U
0 -> ferror_unlocked U
0 <- ferror_unlocked U
0 <- _ndoprnt U
0 -> ferror_unlocked U
0 <- ferror_unlocked U
0 -> mutex_unlock U
0 <- mutex_unlock U
0 <- printf U
0 <- main U
^C
You can use the pid provider to trace any instruction in any user function.
Upon demand, the pid provider creates a probe for every instruction in a
function. The name of each probe is the offset in hexadecimal of the
corresponding instruction in the function. The following example traces
the instruction 10 (hexadecimal) bytes into the strcmp function while the
bash shell runs the date(1) command:
# dtrace -n 'pid28845:libc:strcmp:10'
dtrace: description 'pid28845:libc:strcmp:10' matched 1 probe
CPU ID FUNCTION:NAME
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
^C
You see this instruction near the beginning of the strcmp C library
function, where it is called 14 times when the bash shell runs the date(1)
command. You can see which instructions within the strcmp C library
function are executed by tracing all of the functions instructions, as
follows:
# dtrace -n 'pid28845:libc:strcmp:'
dtrace: description 'pid28845:libc:strcmp:' matched 128 probes
CPU ID FUNCTION:NAME
0 39494 strcmp:entry
0 39495 strcmp:0
0 39496 strcmp:4
0 39497 strcmp:8
0 39498 strcmp:c
0 39492 strcmp:10
0 39499 strcmp:14
0 39500 strcmp:18
0 39511 strcmp:44
0 39512 strcmp:48
0 39513 strcmp:4c
0 39582 strcmp:160
0 39583 strcmp:164
0 39584 strcmp:168
0 39585 strcmp:16c
0 39586 strcmp:170
0 39587 strcmp:174
0 39588 strcmp:178
0 39589 strcmp:17c
0 39597 strcmp:19c
0 39598 strcmp:1a0
0 39599 strcmp:1a4
0 39600 strcmp:1a8
0 39601 strcmp:1ac
0 39602 strcmp:1b0
0 39603 strcmp:1b4
0 39604 strcmp:1b8
0 39605 strcmp:1bc
0 39606 strcmp:1c0
0 39607 strcmp:1c4
0 39618 strcmp:1f0
0 39619 strcmp:1f4
0 39493 strcmp:return
0 39494 strcmp:entry
0 39495 strcmp:0
0 39496 strcmp:4
# ./timespent.d 8950
^C
...
usleep
value ------------- Distribution ------------- count
1048576 | 0
2097152 |@@@@@@@@@@ 1
4194304 |@@@@@@@@@@ 1
8388608 |@@@@@@@@@@@@@@@@@@@@ 2
16777216 | 0
...
f4
value ------------- Distribution ------------- count
16384 | 0
32768 |@@@@@@@@@@@@@@@@@@@@ 1
65536 |@@@@@@@@@@@@@@@@@@@@ 1
131072 | 0
...
f1
value ------------- Distribution ------------- count
4194304 | 0
8388608 |@@@@@@@@@@@@@@@@@@@@ 1
16777216 |@@@@@@@@@@@@@@@@@@@@ 1
33554432 | 0
...
main
value ------------- Distribution ------------- count
16777216 | 0
33554432 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
67108864 | 0
^C
Currently you cannot specify a time interval less than 200 microseconds
with the profile provider, as the following example shows:
# dtrace -q -n 'profile-199us {printf("%d\n", timestamp)}'
dtrace: invalid probe specifier profile-199us {printf("%d\n",
timestamp)}: probe description :::profile-199us does not match any probes
# dtrace -q -n 'profile-200us {printf("%d\n", timestamp)}'
275328143837997
275328144030602
275328144229696
275328144431022
^C
The following D script samples 109 times per second to see which
processes are running. The count indicates which processes have run the
most often during the interval that the script runs:
# cat -n running.d
1 #!/usr/sbin/dtrace -qs
2
3 profile-109
4 /pid != 0/
5 {
6 @[pid, execname] = count();
7 }
8
9 END
10 {
11 printf("%-8s %-40s %s\n", "PID", "CMD", "COUNT");
12 printa("%-8d %-40s %@d\n", @);
13 }
# ./running.d
^C
PID CMD COUNT
9190 grep 1
9191 bash 1
9190 bash 1
9189 bash 1
9188 uptime 2
8586 bash 2
9191 vi 12
3 fsflush 24
9192 find 80
# ./profilepri.d 8586
^C
bash
value ------------- Distribution ------------- count
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@ 271
10 |@@@@@@ 63
20 |@@@@ 48
30 |@@@ 32
40 |@ 15
50 |@@ 24
60 | 0
In the following example, you see the results of running the next
invocation of the script when the shell is running in its more normal mode
of executing a few interactive commands:
# ./profilepri.d 8586
^C
bash
value ------------- Distribution ------------- count
30 | 0
40 |@@@@@@@@@@@@@ 1
50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2
60 | 0
This shows that the shells priority is higher when run interactively, where
it spends most of its time waiting on input; the small counts indicate that
it was not running frequently.
Like profile-n probes, tick-n probes fire every fixed interval at high
interrupt level. However, the tick-n probes fire only on one CPU per
interval, rather than on every CPU like the profile-n probes. These
probes should not be used to profile an application because it many run
on any CPU at any instant in time. You specify the n suffix just as you do
for the profile-n probes. For example, tick-20ms fires every 20
milliseconds, but only on one CPU. One use of the tick-n probes is to
provide periodic output or to take periodic action. You saw this usage in
Module 2 with the custom monitoring tools.
You can use the arguments to the profile probes to determine if the
executing thread is currently in kernel mode and, if it is not, where within
its process address space it is executing when the probe fires. The
program counter (PC) registers value is made available when the
profile probes fire. The arguments are set as follows:
The arg0 argument The PC register value in the kernel at the time
the probe fired, or 0 if the current thread was not executing in the
kernel at the time that the probe fired
The arg1 argument The PC register value in the user-level process
at the time the probe fired, or 0 if the current thread was executing in
the kernel at the time the probe fired
You can learn whether your application is executing within its own
process address space or within the kernel space by using the arg0 and
arg1 arguments, which are set when the profile probes fire. The following
D script samples the PC slightly faster than every millisecond. The script
runs for 10 seconds on a compute-bound application. It also shows how
many time intervals, out of the total that occurred in 10 seconds, the
application used:
# cat -n profile.d
1 #!/usr/sbin/dtrace -qs
2
3 profile-1009
4 {
5 ++t;
6 }
7
8 profile-1009
9 /pid == $1/
10 {
11 @pc[arg1] = count();
12 @mode[arg0 ? "kernel" : "user"] = count();
13 ++n;
14 }
15
16 tick-10sec
17 /n/
18 {
19 printa("%-10x\t%@u\n", @pc);
20 printf("Total: %u out of %u\n", n, t);
21 exit(0);
22 }
# ./profile.d 9240
ff3163ac 1
0 5
107f8 60
10810 60
10710 64
1084c 64
10754 65
10734 66
10824 69
1083c 69
10738 69
1081c 71
10820 73
106f4 75
10730 75
10728 76
10744 77
10814 77
1074c 79
1074c 79
106e4 79
106d8 79
1075c 80
10770 80
10828 80
1072c 82
10760 83
106f0 86
10758 86
106dc 87
106d4 88
106d0 92
ff2a11e8 132
10764 134
20ac8 137
20acc 141
ff2a11ec 142
10840 144
20ac4 147
10834 172
106cc 306
106e0 562
10714 611
107fc 623
ff2a11e4 716
ff2a11e0 3723
Total: 9887 out of 10002
kernel 5
user 9882
In the previous example, the high count in user mode versus kernel mode
indicates that this process is compute-bound. By using the mdb(1)
debugger as shown in the following example, you can tell where the
process is spending most of its time:
> ff2a11e0/i
libc.so.1`.umul:
libc.so.1`.umul:umul %o0, %o1, %o0
> ff2a11e4/i
libc.so.1`.umul+4: rd %y, %o1
> 107fc/i
mod+0x34: cmp %o0, %o1
> 10714/i
prod+0x1c: cmp %o0, %o1
> 106e0/i
sum+0x14: add %o0, %o1, %o0
This output shows that this process spent most of its time in the C library
multiply function: .umul. It spent most of the remaining time in its own
mod, prod, and sum functions. The programmer should investigate
compiler options to have the multiplication occur with hardware
instructions instead of in software. This program was compiled with the
gcc compiler with no optimizations.
You can use the timespent2.d D script to obtain a graph of the time
spent in each function of this process. A special macro, $target, is set to
the process ID of the application that is started for you with the -c option
to the dtrace(1M) command. The command after the -c must be quoted
if it contains arguments:
# cat -n timespent2.d
1 #!/usr/sbin/dtrace -qs
2
3 pid$target:::entry
4 {
5 self->t[probefunc] = timestamp;
6 }
7
8 pid$target:::return
9 /self->t[probefunc]/
10 {
11 this->elapsed = timestamp - self->t[probefunc];
12 @[probefunc] = quantize(this->elapsed);
13 self->t[probefunc] = 0; /* frees memory */
14 }
# dtrace -s timespent2.d -c ./pgm
dtrace: script 'timespent2.d' matched 5836 probes
^C
...
.rem
value ------------- Distribution ------------- count
4096 | 0
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 15
16384 | 0
memchr
value ------------- Distribution ------------- count
4096 | 0
8192 |@@@@@@@@@@@@@ 5
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 11
32768 | 0
.div
value ------------- Distribution ------------- count
2048 | 0
4096 |@@@@@@@ 5
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24
16384 |@ 1
32768 | 0
mutex_lock
value ------------- Distribution ------------- count
8192 | 0
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16
32768 | 0
...
sum
value ------------- Distribution ------------- count
4096 | 0
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13986319
16384 | 15890
32768 | 419
65536 | 14174
131072 | 426
262144 | 282
524288 | 59
1048576 | 57
2097152 | 24
...
prod
value ------------- Distribution ------------- count
17179869184 | 0
34359738368 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
68719476736 |@@@ 1
137438953472 | 0
...
.umul
value ------------- Distribution ------------- count
4096 | 0
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 27699230
16384 | 37144
32768 | 943
65536 | 30290
131072 | 864
262144 | 579
524288 | 111
1048576 | 157
2097152 | 74
4194304 | 3
This output shows that the process is spending an average of only 816
microseconds in both the sum and the .umul functions, but they are being
called significantly more often than the other functions. The process spent
between 3468 seconds in the prod function 14 times that it was called
and between 68137 seconds the other time it was called.
Application Variables
Accessing process address space information is more difficult than
accessing kernel information because DTrace actions run in the kernel.
Therefore, to access process data such as application variables or system
call argument strings (for example, path names), you must copy the
information from the process address space to the kernel. DTrace provides
two built-in functions to accomplish this:
void *copyin(uintptr_t addr, size_t size)
The copyin function copies the specified size in bytes from the
specified user address into a DTrace scratch buffer and returns the
address of this buffer. The user address is interpreted as being within
the address space of the process associated with the currently
running thread when the probe fires.
string copyinstr(uintptr_t addr)
The copyinstr function copies a null-terminated C string from the
specified user address into a scratch buffer and returns its address.
18
19 int f3(int a)
20 {
21 int r;
22
23 usleep(650);
24 r = f4(a-3, a+3);
25 z = r*y;
26 return(r);
27 }
28
29 int f2(int a)
30 {
31 return(f3(5*a));
32 }
33
34 int f1(int a, int b)
35 {
36 int r;
37
38 usleep(90);
39 r = f2(a-b);
40 y = z*r;
41 return(r);
42 }
43
44 main()
45 {
46 int x;
47
48 x = f1(13,6);
49 printf("x=%d y=%d z=%d\n", x, y, z);
50 x = f1(17,5);
51 printf("x=%d y=%d z=%d\n", x, y, z);
52 }
# calls3
x=83 y=633788 z=7636
x=133 y=137443530 z=1033410
The nm(1) command is used to display the symbol table entry for the z
variable in the calls3 executable file.
# /usr/ccs/bin/nm calls3 | grep '|z$'
[70] | 133952| 4|OBJT |GLOB |0 |16 |z
The libvars.d D script was run while the bash shell performed the
following loop:
# while :; do cd /fubar; done
bash: cd: /fubar: No such file or directory
bash: cd: /fubar: No such file or directory
bash: cd: /fubar: No such file or directory
bash: cd: /fubar: No such file or directory
This shows that the first errno at address 0xff3ee670 is the one set as a
result of the cd command failing in the bash shell. The No such file
or directory error message corresponds to an errno value of 2.
The next example monitors readers/writer lock activity for the vold
process. The -p option to dtrace(1M) attaches to a running process and
sets the $target macro it its PID:
# pgrep vold
1098
# dtrace -n 'plockstat$target:::rw* {trace(timestamp)}' -p 1098
dtrace: description 'plockstat$target:::rw* ' matched 11 probes
CPU ID FUNCTION:NAME
0 51474 rwlock_lock:rw-block 1529287107214473
0 51494 rwlock_lock:rw-acquire 1529287107231728
When a system call returns -1, the C library interface sets a global user
variable named errno to a positive error code, as shown in the following
example. These errno values are documented in the Intro(2) manual
page and in the /usr/inlude/sys/errno.h header file.
# cat -n errno.d
1 #!/usr/sbin/dtrace -qs
2 syscall:::return
3 /arg0 == -1 && pid != $pid/
4 {
5 printf("%-20s %-10s %d\n",execname,probefunc,errno);
6 }
# ./errno.d
svc.startd portfs 62
nscd lwp_park 62
fmd lwp_park 62
svc.startd portfs 62
svc.startd portfs 62
bash stat64 2
bash chdir 2
bash chdir 2
bash stat64 2
nscd lwp_kill 3
find open 2
find stat 2
bash setpgrp 13
bash waitsys 10
date open 2
date stat 2
ls open 2
ls stat 2
bash setpgrp 13
bash waitsys 10
nscd lwp_kill 3
^C
# ./errno2.d
bash setpgrp 13
libc.so.1`_syscall6+0x1c
35c6c
34fa8
bash`execute_command_internal+0x414
bash`execute_command+0x50
bash`reader_loop+0x220
bash`main+0x90c
bash`_start+0x108
svc.startd portfs 62
libc.so.1`_portfs+0x4
svc.startd`wait_thread+0x30
libc.so.1`_lwp_start
svc.startd portfs 62
libc.so.1`_portfs+0x4
svc.startd`wait_thread+0x30
libc.so.1`_lwp_start
bash waitsys 10
libc.so.1`_waitid+0x8
libc.so.1`waitpid+0x60
410a0
41004
libc.so.1`__sighndlr+0xc
libc.so.1`call_user_handler+0x3b8
libc.so.1`__lwp_sigmask+0x30
libc.so.1`pthread_sigmask+0x1b4
libc.so.1`sigprocmask+0x20
bash`make_child+0x254
35c6c
34fa8
bash`execute_command_internal+0x414
bash`execute_command+0x50
bash`reader_loop+0x220
bash`main+0x90c
bash`_start+0x108
bash stat64 2
libc.so.1`stat64+0x4
bash`sh_canonpath+0x258
63638
bash`cd_builtin+0x364
352a0
35a8c
34fc8
bash`execute_command_internal+0x414
bash`execute_command+0x50
bash`reader_loop+0x220
bash`main+0x90c
bash`_start+0x108
find open 2
ld.so.1`__open+0x4
ld.so.1`elf_config+0x120
ld.so.1`setup+0xc20
ld.so.1`_setup+0x37c
ld.so.1`_rt_boot+0x88
Hexadecimal addresses are shown on the stack trace output when the
dtrace command cannot resolve the PC value to a symbol. To find what
transient system call errors are occurring in a specific application and
where, you simply change the errno2.d script to pass in the PID of the
application.
You can obtain more details on the unknown process by using the
following command:
# prstat -m -p 12663
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
12663 root 43 57 0.0 0.0 0.0 0.0 0.0 0.3 0 129 .12 0 unknown/1
read 940592
unknown read 89
libc.so.1`_read+0x8
unknown`main+0x134
unknown`_start+0x5c
unknown read 89
libc.so.1`_read+0x8
unknown`main+0x134
unknown`_start+0x5c
unknown read 89
libc.so.1`_read+0x8
unknown`main+0x134
unknown`_start+0x5c
^C
# grep 89 /usr/include/sys/errno.h
/* Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T */
* (c) 1983,1984,1985,1986,1987,1988,1989 AT&T.
#define ENOSYS 89 /* Unsupported file system operation */
# pkill unknown
You can again get details on what system calls are being made, as follows:
# dtrace -n 'syscall:::entry /pid == 12745/ { @syscalls[probefunc] = count();}'
dtrace: description 'syscall:::entry ' matched 225 probes
^C
stat 6
open 6
write 6
close 6
read 760747
# truss -p 12745
read(3, "\b", 1) = 1
read(3, "92", 1) = 1
read(3, "10", 1) = 1
read(3, "\0", 1) = 1
read(3, "14", 1) = 1
read(3, " @", 1) = 1
read(3, "\0", 1) = 1
read(3, "82", 1) = 1
read(3, " #", 1) = 1
read(3, "90", 1) = 1
^C
This application appears to be reading all of the files under the /usr/lib
directory one byte at a time. This programmer must not realize that using
the standard I/O library functions to buffer reads is more efficient than
issuing system call reads of one character at a time. The OS is reading the
disk in blocks, as the iosnoop.d D script output indicates, but the
application is only extracting the information from the kernel buffers one
byte at a time.
Open Files
In this section you learn how to display the path names of files being
opened. Note that in DTrace it is more difficult to display pointer
arguments passed to system calls than those passed as integer arguments.
Examples of system calls that take pointer arguments are open(2), stat(2),
unlink(2), and chmod(2), which each take path name string arguments.
There are also system calls that pass the address of structures, for
example, sigaction(2). You must use the appropriate copinstr() and
copyin() built-in functions to display the actual strings or structures being
passed to the kernel.
You might try to display these strings using the following D script:
# cat -n write.d
1 #!/usr/sbin/dtrace -s
2
3 syscall::write:entry
4 /pid == $target/
5 {
6 printf("%s\n", stringof(arg1));
7 }
# dtrace -s write.d -c writemsg
dtrace: script 'write.d' matched 1 probe
This is some text being written to standard output to prove a point
dtrace: pid 1532 exited with status 1
The arg1 argument used in the write.d D script is the second argument
to the write(2) system call, which in this case is the address of the string
you want to display. It is a process address, however, and DTrace is
running the action statements in the kernels address space. The
stringof() built-in function converts the write(2) system call argument
to the proper string type. For the script to work, you must use the
copyinstr() or copyin() built-in DTrace functions showed previously.
The following example shows the correct way to access the processs
string arguments:
# cat -n write2.d
1 #!/usr/sbin/dtrace -s
2
3 syscall::write:entry
4 /pid == $target/
5 {
6 printf("%s\n", copyinstr(arg1));
7 }
# dtrace -s write2.d -c writemsg
dtrace: script 'write2.d' matched 1 probe
This is some text being written to standard output to prove a point
dtrace: pid 1537 exited with status 1
CPU ID FUNCTION:NAME
0 12 write:entry This is some text being
6 printf("%s\n", copyinstr(arg1));
7 }
# ./write3.d
dtrace: script './write3.d' matched 1 probe
CPU ID FUNCTION:NAME
ore--ion, name)
iption specifiers (provider, module, func-
e
describes how to use
4maction]]
You received garbage output because the write(2) system call does not
necessarily write out null terminated strings. The copyin() system call is
the more appropriate function to use for specifying the size of the write:
# cat -n write4.d
1 #!/usr/sbin/dtrace -s
2
3 syscall::write:entry
4 /pid != $pid/
5 {
6 printf("%s\n", stringof(copyin(arg1, arg2)));
7 }
# ./write4.d
dtrace: script './write4.d' matched 1 probe
CPU ID FUNCTION:NAME
0 914 write:entry p
0 914 write:entry w
0 914 write:entry d
0 914 write:entry
0 914 write:entry d
0 914 write:entry a
0 914 write:entry t
0 914 write:entry e
0 914 write:entry
^C
The following example shows how to know when an open(2) system call
fails and how to display the pertinent information to determine the
problem:
# cat -n failedopen.d
1 #!/usr/sbin/dtrace -qs
2
3 syscall::open*:entry
4 /pid == $1/
5 {
6 self->path = copyinstr(arg0);
7 self->entry = 1;
8 }
9
10 syscall::open*:return
11 /self->entry && arg0 == -1/
12 {
13 printf("open for '%s' failed, errno=%d", self->path, errno);
14 ustack();
15 self->entry = 0;
16 }
# failedopen.d 13026
open for '/usr/openwin/lib/X11/XtErrorDB' failed, errno=2
febbcf78
febb05a0
fec97b38
fec97a78
fedbbffc
fedbbeac
fedbbe40
fedc0220
fedc037c
fed8fb6c
fed8f2f8
fed8f290
cf3f8
3f648
d1c98
5c658
^C
A breakpoint was set on the C library open function and the dtmail
utility was continued in the debugger to hit the breakpoint. The $c mdb
command was used to display the stack trace symbolically after the
breakpoint hit.
The next example shows the failedopen2.d D script run on the cat(1)
command while it opens a non-existent file. This script assumes that
dtrace(1M) will start the command.
# cat -n failedopen2.d
1 #!/usr/sbin/dtrace -qs
2
3 syscall::open*:entry
4 /pid == $target/
5 {
6 self->path = copyinstr(arg0);
7 self->entry = 1;
8 }
9
10 syscall::open*:return
Objectives
Upon completion of this module, you should be able to:
Use DTrace to access kernel variables
Use DTrace to obtain information about read calls
Use DTrace to perform anonymous tracing
Use DTrace to perform speculative tracing
Explain the privileges necessary to run DTrace operations
4-1
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Relevance
Additional Resources
To display the value of this variable, you can write the D statement:
printf(%x\n, kmem_flags);
DTrace associates each kernel symbol with the type used for it in the
operating system C code, providing source-based access to the native
operating system data structures. Because kernel symbol names are kept
in a separate namespace from D variables and function identifiers, naming
conflicts are not an issue.
When you prefix a variable with a backquote, the D compiler searches the
known kernel symbols in order, using the list of loaded modules to find a
matching variable definition. Because the Solaris OS kernel supports
dynamically loaded modules with separate symbol namespaces, the same
variable name or function name can be used more than once in the kernel.
You resolve this conflict by preceding the variable or function name with
the kernel module name and the backquote character as a separator. For
example, you refer to the _init(9E) function in the sd module as follows:
sd_init
You can apply any of the D operators to external kernel variables, except
those that modify values. When you launch DTrace, the D compiler loads
the set of variable names corresponding to active kernel modules, so
declarations of these variables are not required.
# ./monitor.d
Processes Threads Free Memory
41 232 322mb
42 232 306mb
41 232 322mb
53 242 320mb
47 249 251mb
41 232 252mb
41 232 252mb
41 232 232mb
41 232 111mb
47 235 110mb
47 241 110mb
The first two structures are part of the proc(4) interface and are used by
commands like ps(1) and prstat(1M). These variables provide access to
kernel state information at the time any probe fires. The following
examples define the data structures.
Another built-in D variable that is set when a probe fires is the curthread
variable, which you used in the ancestry.d D script in Module 2. The
curthread variable points to the kthread_t kernel structure of the
currently running thread. Using the curthread pointer to access
information in the kthread_t structure (or most other kernel data
structures) provides a less stable interface than using the lwpsinfo_t and
psinfo_t structures. The reason for this is that the psinfo_t and
lwpsinfo_t structures are abstractions of process and thread information
as advertised by the proc(4) interface. In contrast, curthread gets at the
actual kernel implementation of this information which may change. For
more details on the stability of DTrace interfaces, see the Solaris Dynamic
Tracing Guide, part number 817-6223-10. The dtrace(1M) command has a
-v option that will tell you the stability of a D program.
# ./ps.d bdev_strategy
TID PID PPID UID PRI COMMAND
1 4640 4639 0 55 find / -type f
1 4640 4639 0 55 find / -type f
1 4698 4641 0 51 file
/var/sadm/pkg/SUNWfontconfig-root/save/pspool/SUNWfontconfig-
root/install/
1 4640 4639 0 55 find / -type f
1 4698 4641 0 51 file
/var/sadm/pkg/SUNWfontconfig-root/save/pspool/SUNWfontconfig-
root/install/
^C
# ps.d nanosleep
TID PID PPID UID PRI COMMAND
11 279 1 0 59 /usr/sbin/nscd
12 279 1 0 59 /usr/sbin/nscd
21 279 1 0 59 /usr/sbin/nscd
18 279 1 0 59 /usr/sbin/nscd
17 279 1 0 59 /usr/sbin/nscd
16 279 1 0 59 /usr/sbin/nscd
13 279 1 0 59 /usr/sbin/nscd
1 2120 2119 0 59 sleep 5
12 279 1 0 59 /usr/sbin/nscd
11 279 1 0 59 /usr/sbin/nscd
13 279 1 0 59 /usr/sbin/nscd
14 279 1 0 59 /usr/sbin/nscd
15 279 1 0 59 /usr/sbin/nscd
16 279 1 0 59 /usr/sbin/nscd
17 279 1 0 59 /usr/sbin/nscd
18 279 1 0 59 /usr/sbin/nscd
21 279 1 0 59 /usr/sbin/nscd
18 279 1 0 59 /usr/sbin/nscd
TID PID PPID UID PRI COMMAND
17 279 1 0 59 /usr/sbin/nscd
16 279 1 0 59 /usr/sbin/nscd
^C
Note Refer to the Solaris Dynamic Tracing Guide for details on the probes
provided by the sched provider.
The following D script uses the on-cpu sched probe to display the name
of the executable process starting to run on a CPU and the priority of its
thread:
# cat -n start2run.d
1 #!/usr/sbin/dtrace -qs
2
3 sched:::on-cpu
4 /pid != $pid && pid != 0/
5 {
6 printf("Thread %d from: %s starting on CPU %d at priority %d\n",
7 curlwpsinfo->pr_lwpid, curpsinfo->pr_psargs, curcpu->cpu_id,
8 curlwpsinfo->pr_pri);
9 }
# ./start2run.d
Thread 1 from: fsflush starting on CPU 0 at priority 60
Thread 1 from: bash starting on CPU 0 at priority 59
Thread 1 from: bash starting on CPU 2 at priority 49
Thread 1 from: pgm starting on CPU 1 at priority 49
Thread 1 from: pgm starting on CPU 1 at priority 29
Thread 1 from: pgm starting on CPU 1 at priority 29
Thread 1 from: pgm starting on CPU 1 at priority 19
Thread 1 from: pgm starting on CPU 1 at priority 9
Thread 1 from: pgm starting on CPU 1 at priority 9
Thread 1 from: pgm starting on CPU 1 at priority 0
Thread 6 from: /lib/svc/bin/svc.startd starting on CPU 0 at priority 59
Thread 1 from: fsflush starting on CPU 0 at priority 60
Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 1 at priority 59
Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 1 at priority 59
Thread 4 from: /usr/lib/picl/picld starting on CPU 2 at priority 59
Thread 1 from: fsflush starting on CPU 0 at priority 60
Thread 18 from: /usr/sbin/nscd starting on CPU 0 at priority 59
Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 1 at priority 59
Thread 4 from: /usr/lib/picl/picld starting on CPU 2 at priority 59
Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 2 at priority 59
Thread 2 from: /usr/lib/autofs/automountd starting on CPU 2 at priority 59
Thread 1 from: fsflush starting on CPU 0 at priority 60
Thread 18 from: /usr/sbin/nscd starting on CPU 0 at priority 59
Thread 1 from: /usr/lib/sendmail -bd -q15m starting on CPU 0 at priority 59
Thread 1 from: bash starting on CPU 0 at priority 59
Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 0 at priority 59
Thread 1 from: fsflush starting on CPU 0 at priority 60
The following D script uses the on-cpu sched probe with an aggregation
to display a summary of who has recently been running on what CPU:
# cat -n whorun.d
1 #!/usr/sbin/dtrace -qs
2
3 sched:::on-cpu
4 /pid != $pid && pid != 0/
5 {
6 @[curpsinfo->pr_psargs, curcpu->cpu_id] = count();
7 }
8
9 END
10 {
11 printf("%-30s %4s %6s\n", "Command", "CPU", "Count");
12 printa("%-30s %4d %@6d\n", @);
13 }
# ./whorun.d
^C
Command CPU Count
/usr/lib/fm/fmd/fmd 1 1
uptime 2 1
find / -name fubar 3 2
/usr/lib/autofs/automountd 2 3
-sh 2 3
-sh 1 3
/usr/lib/picl/picld 1 4
/usr/lib/fm/fmd/fmd 3 6
/usr/sbin/nscd 3 8
/usr/lib/fm/fmd/fmd 2 11
/usr/sbin/nscd 2 14
/usr/sbin/nscd 0 15
/usr/lib/sendmail -bd -q15m 0 16
ls -lR / 1 18
/usr/sfw/sbin/snmpd 0 18
-sh 3 20
/usr/lib/utmpd 0 20
/usr/lib/sendmail -bd -q15m 2 20
/usr/lib/picl/picld 0 32
/usr/sfw/sbin/snmpd 3 44
/usr/sfw/sbin/snmpd 2 55
fsflush 0 72
/usr/sbin/nscd 1 77
find / -name fubar 1 152
/usr/sbin/vold 2 152
/usr/sfw/sbin/snmpd 1 237
Note See the Solaris Dynamic Tracing Guide for details on other lockstat
provider probes.
The following D script displays CPU, thread, process, wait time, and stack
trace information related to a thread blocking on an adaptive mutex:
# cat -n mutex.d
1 #!/usr/sbin/dtrace -qs
2
3 lockstat:::adaptive-block
4 {
5 printf("\nCPU\tTID\tPID\tUID\tWAIT TIME\tCOMMAND\n");
6 printf("%d\t%d\t%d\t%d\t%d\t\t%s\n", curcpu->cpu_id,
7 curlwpsinfo->pr_lwpid, curpsinfo->pr_pid,
8 curpsinfo->pr_uid, arg1, curpsinfo->pr_psargs);
9 stack();
10 }
Test the mutex.d D script by starting four instances of the readchar user
application, which reads every file in the current directory one byte at a
time using the read(2) system call:
# (cd /usr/lib; /var/dtrace/readchar)& (cd /usr/lib; /var/dtarce/readchar)&
[1] 2323
[2] 2325
# (cd /usr/lib; /var/dtrace/readchar)& (cd /usr/lib; /var/dtrace/readchar)&
[3] 2327
[4] 2329
# ./mutex.d
^C
# mpstat 2
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 2 0 0 409 307 45 8 0 0 0 65534 13 14 0 73
0 3 0 0 401 301 54 31 0 0 0 103605 21 79 0 0
0 0 0 0 406 305 50 30 0 0 0 100905 20 80 0 0
0 1 0 0 402 302 55 32 0 0 0 104497 21 79 0 0
^C
genunix`clock+0x3f0
genunix`cyclic_softint+0xa4
unix`cbe_level10+0x8
unix`intr_thread+0x144
genunix`clock+0x3f0
genunix`cyclic_softint+0xa4
unix`cbe_level10+0x8
unix`intr_thread+0x144
sd`sdintr+0x14
glm`glm_doneq_empty+0x144
glm`glm_intr+0xf4
pcipsy`pci_intr_wrapper+0x9c
unix`intr_thread+0x144
genunix`clock+0x3f0
genunix`cyclic_softint+0xa4
unix`cbe_level10+0x8
unix`intr_thread+0x144
genunix`clock+0x3f0
genunix`cyclic_softint+0xa4
unix`cbe_level10+0x8
unix`intr_thread+0x144
^C
genunix`kmem_cache_free+0x4c
uata`atapi_tran_destroy_pkt+0x58
scsi`scsi_destroy_pkt+0x14
sd`sd_return_command+0x16c
sd`sdintr+0x224
uata`ghd_doneq_process+0x64
unix`intr_thread+0x144
The following output results from running four instances of the readchar
process on a four-processor server. In this case you do not run the extra
find and ls -lR commands, as you did on the uniprocessor system.
There is significantly more mutex contention, as indicated by the smtx
column (you should always ignore the first set of numbers output by the
mpstat(1M) command). There is also significantly more frequent output
from the mutex.d D script:
# mpstat 2
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 1 0 3 4 1 65 0 1 8 0 27 0 1 0 99
1 1 0 3 7 4 30 0 1 8 0 29 0 0 0 100
2 1 0 3 4 1 28 0 1 8 0 28 0 0 0 100
3 1 0 3 214 111 15 0 0 9 0 28 0 0 0 100
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 2 0 5 21 1 56 17 8 74478 0 225870 14 81 0 4
1 0 0 5 29 4 67 22 8 76857 0 228291 11 83 0 6
2 0 0 1 53 1 150 49 5 83973 0 224372 16 74 0 9
# ./mutex.d
ufs`rdip+0x150
ufs`ufs_read+0x208
genunix`read+0x274
genunix`read32+0x1c
unix`syscall_trap32+0xa8
ufs`rdip+0x488
ufs`ufs_read+0x208
genunix`read+0x274
genunix`read32+0x1c
unix`syscall_trap32+0xa8
ufs`rdip+0x150
ufs`ufs_read+0x208
genunix`read+0x274
genunix`read32+0x1c
unix`syscall_trap32+0xa8
ufs`rdip+0x488
ufs`ufs_read+0x208
genunix`read+0x274
genunix`read32+0x1c
unix`syscall_trap32+0xa8
ufs`ufs_lockfs_end+0x70
ufs`ufs_read+0x25c
genunix`read+0x274
genunix`read32+0x1c
unix`syscall_trap32+0xa8
ufs`rdip+0x488
ufs`ufs_read+0x208
genunix`read+0x274
genunix`read32+0x1c
unix`syscall_trap32+0xa8
The previous output shows that the mutex contention is in the UNIX File
System (UFS) code. The sleep times are only between 2129 microseconds.
The system() built-in function allows you to run shell commands anytime
a probe fires. This general capability provides great power in that any
probe event can trigger the execution of any command. You can use
format specifications similar to the printf() built-in function to
parameterize the shell command you wish to invoke. The system()
function requires destructive actions to be enabled with either the -w
option to the dtrace(1M) command or with the #pragma statement used
inside the script with the destructive option.
The following script uses the signal-send probe as well as the built-in
system() function to display what user account is sending the SIGKILL
signal and to which process:
# cat -n whosend.d
1 #!/usr/sbin/dtrace -s
2
3 #pragma D option destructive
4 #pragma D option quiet
5
6 proc:::signal-send
7 /args[2] == SIGKILL/
8 {
9 printf("SIGKILL was sent to %s by ", args[1]->pr_fname);
10 system("getent passwd %d | cut -d: -f5", uid);
11 }
# ./whosend.d
SIGKILL was sent to vi by Super-User
SIGKILL was sent to bash by Mary Smith
20 self->started = 0;
21 ++nlines;
22 }
23
24 syscall::read:return, syscall::pread*:return
25 /nlines > 20/
26 {
27 printf("FD\tREQUEST\tACTUAL\tCOMMAND\n");
28 nlines = 0;
29 }
# ./reads.d
FD REQUEST ACTUAL COMMAND
0 1 1 bash
0 1 1 bash
0 1 1 bash
0 1 1 bash
3 877 877 date
0 1 1 bash
0 1 1 bash
...
0 1 1 bash
3 152 152 uptime
4 8192 4092 uptime
4 8192 0 uptime
3 877 877 uptime
0 1 1 bash
0 1 1 bash
...
0 1 1 bash
0 1 1 bash
0 1 1 bash
3 8192 8192 grep
3 8192 200 grep
3 8192 0 grep
1 8192 1006 init
1 8192 0 init
1 8192 1006 init
1 8192 0 init
...
FD REQUEST ACTUAL COMMAND
5 1024 61 nscd
5 8192 4464 utmpd
6 336 336 utmpd
6 336 336 utmpd
6 336 336 utmpd
5 8192 0 utmpd
1 24 -1 sac
2 8 8 ttymon
2 8 -1 ttymon
1 24 24 sac
...
FD REQUEST ACTUAL COMMAND
0 1 1 bash
0 1 1 bash
0 128 4 sh
0 128 3 sh
0 128 4 sh
...
4 416 416 ps
4 416 416 ps
11 336 336 svc.startd
4 416 416 ps
4 416 416 ps
4 416 416 ps
4 416 416 ps
4 416 416 ps
4 416 416 ps
^C
Using the previous output (and help from the truss(1) command), you
can determine the following:
The date(1) command reads a time zone (US/Mountain)
configuration file of size 877 bytes when it starts.
The ps(1) command reads the psinfo_t structure of size 416 bytes
many times.
The init(1M) command re-reads the /etc/inittab file periodically.
The grep(1) command reads its file one page (8192 bytes) at a time.
The sh(1) command reads a whole line from standard input into a
128-byte buffer.
The bash(1) command reads standard input one byte at a time
(probably to implement command line editing).
The uptime(1) command reads the same time zone configuration file
as the date(1) command.
The sac(1M) and ttymon(1M) commands issued reads that failed.
18 printa("%d\t%-24s\t%@d\n", @);
19 }
# ./readsummary.d
^C
4 instant 0
2 more 1
0 vi 1
4 readchar 1
0 bash 1
2 ttymon 8
4 rup 23
1 sac 24
5 nscd 59
19 sgml2roff 119
3 rup 413
3 rpc.rstatd 413
4 ps 416
3 uptime 514
1 init 540
3 man 550
3 ps 687
5 rup 787
4 vi 803
3 grep 845
3 date 877
3 vi 1492
4 nroff 2221
4 uptime 2232
0 nroff 3479
0 tbl 3861
3 cat 3861
6 nsgmls 3894
0 eqn 3914
0 col 3979
0 instant 4072
3 instant 4442
3 nsgmls 4459
6 rpc.rstatd 4464
3 more 4842
5 man 5802
4 nsgmls 6606
0 grep 6815
By changing the aggregation function from avg() to sum(), you can obtain
the total number of bytes read by file descriptor and process name:
# ./totalread.d
^C
4 instant 0
0 vi 6
2 more 8
5 nscd 61
0 bash 121
11 svc.startd 336
10 svc.startd 336
3 man 550
6 readchar 671
3 date 877
4 ls 877
3 vi 2984
19 sgml2roff 3214
1 init 4324
23 readchar 4572
19 readchar 10276
6 nsgmls 11684
5 man 17408
4 nroff 17771
3 more 17876
7 readchar 18064
20 readchar 18116
3 cat 18435
0 tbl 18435
4 vi 18880
10 readchar 19500
14 readchar 19500
0 eqn 20356
0 nroff 20356
0 col 22496
8 readchar 28252
0 grep 30095
3 nsgmls 33799
3 instant 53314
11 readchar 56616
0 instant 160360
4 nsgmls 171763
21 readchar 192636
Most DTrace users do not need this feature, but because boot problems
are particularly difficult to debug, anonymous tracing can prove valuable
for kernel and device driver developers.
Reboot your system. While the system is booting, messages appear on the
console describing the anonymous enabling.
After the machine boots, claim the anonymous state by specifying the -a
option to the dtrace(1M) command. By default the -a option claims the
anonymous state, processes the existing data, and continues to run. To
process the anonymous state data and exit, add the -e option to the
dtrace(1M) command.
# dtrace -A -m conskbd
dtrace: cleaned up old anonymous enabling in /kernel/drv/dtrace.conf
dtrace: cleaned up forceload directives in /etc/system
dtrace: saved anonymous enabling in /kernel/drv/dtrace.conf
dtrace: added forceload directives to /etc/system
dtrace: run update_drv(1M) or reboot to enable changes
# tail /etc/system
* chapter of the Solaris Dynamic Tracing Guide for details.
*
forceload: drv/systrace
forceload: drv/sdt
forceload: drv/profile
forceload: drv/lockstat
forceload: drv/fbt
forceload: drv/fasttrap
forceload: drv/dtrace
* ^^^^ Added by DTrace
# reboot
...
# grep enabling /var/adm/messages
Feb 27 07:34:22 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enabling
probe 0 (:kmdb::)
Feb 27 07:34:22 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enabling
probe 1 (dtrace:::ERROR)
Feb 27 07:45:27 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enabling
probe 0 (:conskbd::)
Feb 27 07:45:27 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enabling
probe 1 (dtrace:::ERROR)
# dtrace -ae
CPU ID FUNCTION:NAME
0 25339 conskbd_attach:entry
0 25340 conskbd_attach:return
0 25327 conskbdopen:entry
0 25328 conskbdopen:return
0 25331 conskbduwput:entry
0 25332 conskbduwput:return
0 25345 conskbdioctl:entry
0 25346 conskbdioctl:return
0 25327 conskbdopen:entry
0 25328 conskbdopen:return
0 25331 conskbduwput:entry
0 25332 conskbduwput:return
0 25345 conskbdioctl:entry
0 25346 conskbdioctl:return
0 25329 conskbdclose:entry
0 25330 conskbdclose:return
0 25327 conskbdopen:entry
0 25328 conskbdopen:return
0 25329 conskbdclose:entry
0 25330 conskbdclose:return
The next example focuses only on those functions called from the
conskbd_attach() function in the conskbd module:
# cat -n cons.d
1 #!/usr/sbin/dtrace -s
2
3 fbt::conskbd_attach:entry
4 {
5 self->trace = 1;
6 }
7
8 fbt:::
9 /self->trace/
10 {
11 }
12
13 fbt::conskbd_attach:return
14 {
15 self->trace = 0;
16 }
0 <- ddi_driver_major
0 -> strcmp
0 <- strcmp
0 -> derive_devi_class
0 -> i_ddi_devi_class
0 <- i_ddi_devi_class
0 -> strncmp
0 <- strncmp
...
0 <- kstat_compare_bykid
0 -> kstat_zone_compare
0 <- kstat_zone_compare
0 <- avl_find
0 <- kstat_hold
0 <- kstat_hold_bykid
0 <- kstat_install
0 -> kstat_rele
0 -> cv_broadcast
0 <- cv_broadcast
0 <- kstat_rele
0 <- conskbd_attach
39 syscall::open*:return
40 /self->spec && arg0 != -1/
41 {
42 /* Throw away data recorded in speculative buffer */
43 discard(self->spec);
44 self->spec = 0;
45 }
# ./spec.d
dtrace: script './spec.d' matched 40768 probes
CPU FUNCTION
0 <= open64 Open failed with errno: 2
0 -> open64
0 <- open64
0 -> copen
0 -> falloc
0 -> ufalloc
0 <- ufalloc
0 -> ufalloc_file
0 -> fd_find
...
0 <- cv_broadcast
0 <- setf
0 -> unfalloc
0 -> crfree
0 <- crfree
0 <- unfalloc
0 -> kmem_cache_free
0 <- kmem_cache_free
0 -> set_errno
0 <- set_errno
0 <- copen
^C
It appears that the spec.d D script never starts a new open speculation
until the current open returns and the current speculation is either
committed or discarded. This is not the case, however, if an open blocks
and does not return before another open is started. You learn in a lab
exercise how to tune the number of speculative buffers.
0 42318 malloc:20
0 42319 malloc:24
0 42320 malloc:28
0 42321 malloc:2c
0 42327 malloc:44
0 42328 malloc:48
0 42329 malloc:4c
0 42330 malloc:50
0 42331 malloc:54
0 42332 malloc:58
0 42333 malloc:5c
0 42334 malloc:60
0 42335 malloc:64
0 42336 malloc:68
0 42337 malloc:6c
0 42309 malloc:return
...
# mdb myapp
> _start:b
> :r
mdb: stop at _start
mdb: target stopped at:
_start: clr %fp
> malloc::nm
Value Size Type Bind Other Shndx Name
0xff2d1cf0|0x00000070|FUNC |GLOB |0x0 |9 |libc.so.1`malloc
> 70%4=x
1c
> malloc,1c/ai
libc.so.1`malloc:
libc.so.1`malloc: save %sp, -0x60, %sp
libc.so.1`malloc+4: mov %o7, %i3
libc.so.1`malloc+8: call +8 <libc.so.1`malloc+0x10>
libc.so.1`malloc+0xc: sethi %hi(0x92400), %i2
libc.so.1`malloc+0x10: add %i2, 0x180, %i2
libc.so.1`malloc+0x14: add %i2, %o7, %i4
libc.so.1`malloc+0x18: mov %i3, %o7
libc.so.1`malloc+0x1c: ld [%i4 + 0xec8], %i5
libc.so.1`malloc+0x20: ld [%i5], %i1
libc.so.1`malloc+0x24: cmp %i1, 0
libc.so.1`malloc+0x28: bne +0x1c <libc.so.1`malloc+0x44>
libc.so.1`malloc+0x2c: nop
libc.so.1`malloc+0x30: call +0x93624 <PLT:___errno>
libc.so.1`malloc+0x34: mov 0x30, %l7
libc.so.1`malloc+0x38: st %l7, [%o0]
libc.so.1`malloc+0x3c: ret
libc.so.1`malloc+0x40: restore %g0, 0, %o0
libc.so.1`malloc+0x44: call +0x657d4
<libc.so.1`assert_no_libc_locks_held>
libc.so.1`malloc+0x48: nop
libc.so.1`malloc+0x4c: call +0x6437c <libc.so.1`lmutex_lock>
libc.so.1`malloc+0x50: ld [%i4 + 0xec0], %o0
libc.so.1`malloc+0x54: call +0x1c <libc.so.1`_malloc_unlocked>
libc.so.1`malloc+0x58: mov %i0, %o0
DTrace Privileges
By default, only the super-user can use DTrace. This is because DTrace
enables visibility into all aspects of the system, including:
User-level functions
System calls
Kernel functions
Kernel data
Kernel-Destructive Actions
Only the super-user can perform kernel-destructive actions. You enable
such actions by running the dtrace(1M) command with the -w option.
Three built-in DTrace functions cause kernel-destructive actions:
The breakpoint() function Action that induces a kernel
breakpoint, causing the system to stop, with control passing to
OpenBoot PROM or kmdb(1), depending on how the system
was booted.
The panic() function Action that induces a kernel panic with
crash files normally being created for postmortem analysis.
The chill() function Action that causes DTrace to spin for the
specified number of nanoseconds. Intended for dealing with
race condition situations.
The following examples show the effect of setting the three DTrace
specific privileges.
user2::::defaultpriv=basic,dtrace_proc
user3::::defaultpriv=basic,dtrace_user
user4::::defaultpriv=basic,dtrace_kernel
user5::::defaultpriv=basic,dtrace_kernel,dtrace_proc
user6::::defaultpriv=basic,dtrace_proc,proc_owner
$ id
uid=1001(user1) gid=101(users)
$ /usr/sbin/dtrace -l
dtrace: failed to initialize dtrace: DTrace requires additional privileges
$ echo $$
919
$ /usr/sbin/dtrace -n pid919:::
dtrace: failed to initialize dtrace: DTrace requires additional privileges
$
This example shows the DTrace features available to a user with the
dtrace_proc privilege:
$ id
uid=1002(user2) gid=101(users)
$ dtrace -l
ID PROVIDER MODULE FUNCTION NAME
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
$ echo $$
9447
$ dtrace -n pid9447:::entry
dtrace: description 'pid9447:::entry' matched 3179 probes
^C
$ ps -ef | grep vi
user2 1534 1528 0 09:48:20 pts/1 0:00 grep vi
user5 1531 1452 0 09:47:55 pts/2 0:00 vi resume
$ dtrace -n pid1531:::
dtrace: invalid probe specifier pid1531:::: failed to grab pid 1531: permission denied
$ dtrace -n syscall::read:
dtrace: invalid probe specifier syscall::read:: probe description syscall::read: does
not match any probes
$
$ id
uid=1006(user6) gid=101(users)
$ grep user6 /etc/user_attr
user6::::defaultpriv=basic,dtrace_proc,proc_owner
$ ps -ef | grep vi
user6 650 637 0 09:41:30 pts/1 0:00 grep vi
user5 649 630 0 09:41:16 pts/2 0:00 vi resume
$ /usr/sbin/dtrace -n pid649:::entry
dtrace: description 'pid649:::entry' matched 3951 probes
CPU ID FUNCTION:NAME
0 42548 peekkey:entry
0 42544 getkey:entry
0 42546 getbr:entry
0 42548 peekkey:entry
0 42544 getkey:entry
0 42546 getbr:entry
...
This example shows the DTrace features available to a user with the
dtrace_user privilege:
$ id
uid=1003(user3) gid=101(users)
$ grep user3 /etc/user_attr
user3::::defaultpriv=basic,dtrace_user
$ echo $$
1171
$ dtrace -n pid1171:::entry
dtrace: invalid probe specifier pid1171::: probe description pid1171::: does not match
any probes
$ pgm
f: 13 p: 0 q: -1952257862 m: -10
f: 640001883 p: -2056615 q: -929109794 m: -7
f: -1660723204 p: -1529159 q: 94444073 m: 25
f: 2041630813 p: 749994 q: -42775360 m: -23
The dtrace_user privilege only allows the use of the syscall and
profile providers on processes owned by the user. Even though there
are many system calls occuring in the system, the above output shows
only the sh, sleep, and pwd commands system calls.
This example shows the DTrace features available to a user with the
dtrace_kernel privilege:
$ id
uid=1004(user4) gid=101(users)
$ grep user4 /etc/user_attr
user4::::defaultpriv=basic,dtrace_kernel
$ dtrace -qn 'sched:::on-cpu {printf("Starting to run: %s\n", execname)}'
Starting to run: sched
Starting to run: sched
Starting to run: fsflush
Starting to run: svc.configd
Starting to run: inetd
Starting to run: svc.startd
Starting to run: fmd
Starting to run: dtrace
Starting to run: sched
Starting to run: sched
^C
$ dtrace -qn 'io:::start {printf("Starting an I/O: %s\n", execname)}'
Starting an I/O: bash
Starting an I/O: bash
Starting an I/O: bash
Starting an I/O: fsflush
Starting an I/O: find
Starting an I/O: find
Starting an I/O: find
Starting an I/O: find
^C
$ echo $$
6711
$ dtrace -n pid6711:a.out::entry
dtrace: invalid probe specifier pid6711:a.out::entry: probe description
pid6711:bash::entry does not match any probes
6736
$ dtrace -n 'pid6736:a.out::entry'
dtrace: description 'pid6736:a.out::entry' matched 211 probes
^C
$ dtrace -l | awk '{print $2}' | sort -u
PROVIDER
dtrace
fasttrap
fbt
fpuinfo
io
lockstat
mib
pid6736
proc
profile
sched
sdt
syscall
sysinfo
vminfo
$
^C
The following interactive session shows the use of the ppriv(1) command
to give a shell specific DTrace privileges. Look at privileges(5) for
details:
$ id
uid=1001(user1) gid=101(users)
$ /usr/sbin/dtrace -l
dtrace: failed to initialize dtrace: DTrace requires additional privileges
$ echo $$
1774
$ ppriv -s A+dtrace_proc 1774
1774: ppriv: Not owner
$ su
Password:
# ppriv -s A+dtrace_proc 1774
# exit
$ /usr/sbin/dtrace -l
ID PROVIDER MODULE FUNCTION NAME
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
$ /usr/sbin/dtrace -n 'pid$target:calls::entry' -c calls
dtrace: description 'pid$target:calls::entry' matched 7 probes
83
133
dtrace: pid 1787 exited with status 1
CPU ID FUNCTION:NAME
0 28355 _start:entry
0 28362 _init:entry
0 28361 main:entry
0 28360 f1:entry
0 28359 f2:entry
0 28358 f3:entry
0 28357 f4:entry
0 28356 f5:entry
0 28360 f1:entry
0 28359 f2:entry
0 28358 f3:entry
0 28357 f4:entry
0 28356 f5:entry
0 28363 _fini:entry
$ ppriv $$
1774: -sh
flags = <none>
E: basic,dtrace_proc
I: basic,dtrace_proc
P: basic,dtrace_proc
L: all
$ bash
bash-2.05b$ ppriv $$
1789: bash
flags = <none>
E: basic,dtrace_proc
I: basic,dtrace_proc
P: basic,dtrace_proc
L: all
bash-2.05b$ /usr/sbin/dtrace -n 'pid$target:calls::entry' -c calls
dtrace: description 'pid$target:calls::entry' matched 7 probes
83
133
dtrace: pid 1850 exited with status 1
CPU ID FUNCTION:NAME
0 28355 _start:entry
0 28362 _init:entry
0 28361 main:entry
0 28360 f1:entry
...
bash-2.05b$ echo $$
1789
bash-2.05b$ su
more
ff2bcb58
15684
149a4
13ad8
12780
1201c
115cc
genunix`str_cv_wait+0x28
genunix`strwaitq+0x238
genunix`strread+0x174
genunix`read+0x274
unix`syscall_trap32+0xcc
DTrace Privileges
Address
Privilege Level Providers Actions Variables
Spaces
Objectives
Upon completion of this module, you should be able to:
Describe how to lessen the performance impact of DTrace
Describe how to use and tune DTrace buffers
Debug DTrace scripts
5-1
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Relevance
Additional Resources
You should also be careful when using the pid provider, because it can
instrument every instruction of an application. This can result in millions
of probes being enabled in the application, slowing the target process to a
crawl.
Nevertheless, there are many conditions in which you must enable a large
number of probes to answer a question. DTrace has been designed with
this in mind. Enabling a large number of probes can slow down the
system substantially, but it can never induce fatal failure of the machine.
You should therefore not hesitate to enable many probes if necessary.
Using Aggregations
DTrace aggregations provide a scalable method of aggregating data.
Although associative arrays appear to offer similar functionality, they are
global, general-purpose variables that cannot provide the linear scalability
of aggregations. Aggregating functions allow for intermediate results to
be kept per-CPU instead of in a shared global data structure. When a
system-wide result is required, the aggregating function may then be
applied to the set consisting of the per-CPU intermediate results. You
should therefore use aggregations rather than associative arrays whenever
possible. For example, you should avoid performing the action shown in
the following script:
syscall:::entry
{
++totals[execname];
}
syscall::rexit:entry
{
printf(%40s %d\n, execname, totals[execname]);
totals[execname] = 0;
}
END
{
printa(%40s %@d\n, @totals);
}
When enabling many probes, you tend to use predicates of a form that
identifies a specific thread or threads of interest, such as /self-
>traceme/ or /pid == 12345/. Many of these predicates evaluate to the
same (false) value for most threads in most probes, but the evaluation
itself can become costly when done for every function entry and return
point in the kernel.
Cacheable Uncacheable
fbt:::
/follow[pid, tid]/
{}
syscall::read:return
/follow[pid, tid]/
{follow[pid, tid] = 0;}
fbt:::
/self->follow/
{}
syscall::read:return
/self->follow/
{
self->follow = 0;
}
Because of the use of global variables, these predicates are all not
cacheable:
/execname == one to_watch/
/traceme[execname]/
/pid == pid_i_care_about/
/se1f->traceme == my_global/
Principal Buffers
The buffer most fundamental to DTrace operation is the principal buffer.
The principal buffer is present in every DTrace invocation, and is the
buffer to which tracing actions record their data by default. These actions
include:
exit()
printf()
trace()
ustack()
printa()
stack()
This support is implemented with the bufpolicy option, and can be set
on a per-consumer basis.
You can also specify the bufsize option with the -b flag to the
dtrace(1M) command:
# dtrace -b 2g -n zfod
The consumer controls the rate at which the buffer is read out (and thus
switched) by using the switchrate option. As with any rate option,
switchrate can be specified with any time suffix, but defaults to rate-per-
second.
Dropped Data
Under the switch policy, if a given enabled probe would trace more data
than there is space available in the active principal buffer, the data is
dropped and a per-CPU drop count is incremented. In the event of one or
more drops, the dtrace(1M) command displays this message or a similar
one:
dtrace: 11 drops on CPU 0
The switch policy allocates scratch space for the copyin(), copyinstr(),
and alloca() commands out of the active buffer.
3 fbt:::
4 {
5 trace(timestamp);
6 }
7
8 tick-5sec
9 {
10 exit(0);
11 }
# ./stress.d >/var/tmp/stress.d.out
dtrace: script './stress.d' matched 38665 probes
dtrace: 451660 drops on CPU 0
dtrace: 1100596 drops on CPU 0
dtrace: 1028767 drops on CPU 0
dtrace: 1103521 drops on CPU 0
# ls -l /var/tmp/stress.d.out
-rw-r--r-- 1 root root 86004878 Mar 13 14:58
/var/tmp/stress.d.out
The drops result from the limited buffer space, the low switchrate value,
or both. The default buffer size for the principal buffer is 4 Mbytes and the
default switchrate is one second. In the next invocation of the script you
increase the buffer size significantly:
# dtrace -x bufsize=300m -s stress.d >/var/tmp/stress.d.out
dtrace: script 'stress.d' matched 38665 probes
dtrace: buffer size lowered to 150m
# ls -l /var/tmp/stress.d.out
-rw-r--r-- 1 root root 18177752 Mar 13 15:03
/var/tmp/stress.d.out
Note that DTrace lowers the setting for buffer size because there is not
enough memory. By increasing the buffer size, you eliminated all drops
and created 18 Mbytes of trace data. In the next example you use a
smaller buffer size, but with an increased switchrate value:
# dtrace -x bufsize=64m -x switchrate=16 -s stress.d >
>/var/tmp/stress.d.out
dtrace: script 'stress.d' matched 38665 probes
^C
# ls -l /var/tmp/stress.d.out
-rw-r--r-- 1 root root 33052791 Mar 13 15:06
/var/tmp/stress.d.out
Under the fill buffer policy, tracing continues until an enabled probe is
about to trace more data than there is space in the principal buffer. At this
time, the buffer is marked as filled and the consumer is notified that at
least one of its per-CPU buffers has filled. When the dtrace(1M) utility
detects a single filled buffer, tracing is stopped, all buffers are processed,
and dtrace exits. Note that no further data is traced to a filled buffer,
even if the data would fit in the buffer.
To use the fill policy, set the bufpolicy option to fill. For example,
the following invocation of DTrace traces every system call entry into a
per-CPU 2-Kbyte buffer with the buffer policy set to fill:
# dtrace -n syscall:::entry -b 2k -x bufpolicy=fill
To allow for END tracing in fill buffers, DTrace calculates beforehand the
amount of space potentially consumed by END probes and subtracts this
from the size of the principal buffer. If the net size is negative, DTrace
refuses to start, and the dtrace(1M) utility outputs a corresponding error
message:
dtrace: END enablings exceed size of principal buffer
CPU ID FUNCTION:NAME
0 9808 disp_lock_enter_high:entry 810424080584641
0 9809 disp_lock_enter_high:return 810424080586093
0 2288 setfrontdq:return 810424080588595
0 668 generic_enq_thread:entry 810424080590727
0 669 generic_enq_thread:return 810424080592504
0 14298 ts_preempt:return 810424080594241
...
With the ring buffer policy, the dtrace(1M) utility does not display any
output until the process terminates; at that time the ring buffer is
consumed and processed.
Note that if a given record cannot fit in the buffer (that is, if the record is
larger than the buffer size), the record is dropped regardless of buffer
policy. By adding the following two lines to a D script, you can enable
ring buffering with a specific buffer size:
#praqma D option bufpolicy=ring
#pragma D option bufsize=16k
Other Buffers
Principal buffers exist in every DTrace enabling. In addition to principal
buffers, some DTrace consumers have additional in-kernel data buffers: an
aggregation buffer, a number of speculative buffers, or both. You tune the
aggregation buffer size with the aggsize option, and you tune the
speculative buffer size with the specsize option. You can tune the size of
each buffer on a per-consumer basis. Note that setting the buffer sizes
denotes the sizes of the buffers on each CPU. Moreover, for the switch
buffer policy, bufsize denotes the individual sizes of the active and
inactive buffers on each CPU.
The policy is set with the bufresize option, and defaults to auto. Under
the auto buffer resize policy, the size of a buffer is halved until a
successful allocation occurs. The dtrace(1M) utility emits a message if a
buffer as allocated is smaller than the requested size:
# dtrace -P syscall -b 4g
dtrace: description 'syscall' matched 450 probes
dtrace: buffer size lowered to 128m
# dtrace -n 'fbt:::entry {@a[probefunc] = count()}' -x aggsize=1g
dtrace: description 'fbt:::entry ' matched 16250 probes
dtrace: aggregation size lowered to 128m
Alternatively, you can set the buffer resize policy to be manual by setting
bufresize to manual. Under this policy, a failure to allocate causes
DTrace to fail to start:
# dtrace -P syscall -x bufsize=500m -x bufresize=manual
dtrace: description 'syscall' matched 450 probes
dtrace: could not enable tracing: Not enough space
The bufresize option dictates the buffer resizing policy of all buffers
principal, speculative and aggregation.
# cat comments.d
/* This D script counts the number of read system calls */
#!/usr/sbin/dtrace -s
syscall::read:entry
{
@["Number of reads:"] = count();
}
# ./comments.d
./comments.d: line 1: /bin: is a directory
./comments.d: line 3: syscall::read:entry: command not
found
./comments.d: line 5: syntax error near unexpected token
`('
./comments.d: line 5: ` @["Number of reads:"] = count();'
# ./comments2.d
dtrace: failed to compile script ./comments2.d: line 7:
end-of-file encountered before matching */
If you have more than one statement in a probe clause, make sure you end
each one with a semicolon:
...
BEGIN
{
a=$1
b=$2
c=$3
}
...
# ./badstart2.d 1 2 3
dtrace: failed to compile script ./badstart2.d: line 6:
syntax error near "b"
When comparing values, make sure that you use the == relational
operator and not =:
# cat test5.d
#!/usr/sbin/dtrace -s
fbt::sema_init:entry
/arg1 = 1/
{
trace(timestamp);
}
# ./test5.d
dtrace: failed to compile script ./test5.d: line 4:
operator = can only be applied to a writable variable
BEGIN
{
vp = `rootdir;
i = 5;
tick-1sec
{
i = *vp;
}
# ./test8.d
dtrace: failed to compile script ./test8.d: line 11:
operands have incompatible types: "int" = "vnode_t"
tick-5sec
/`freemem < `lotsfree/
{
`lotsfree = `lotsfree*2;
}
# ./test6.d
dtrace: failed to compile script ./test6.d: line 6:
operator = can only be applied to a writable variable
If you specify other options on the first line of a D script, be sure the s
option is last:
# head badstart3.d
#!/usr/sbin/dtrace -sq
BEGIN
{
a=$1
b=$2
c=$3
}
tick-1sec
# ./badstart3.d
dtrace: failed to open q: No such file or directory
Make sure that you pass the correct number of arguments expected by the
script (unless you explicitly set the defaultargs option). For example,
the badstart4.d script expects three command-line arguments:
# ./badstart4.d
dtrace: failed to compile script ./badstart4.d: line 5:
macro argument $1 is not defined
# dtrace -x defaultargs -s badstart4.d
dtrace: script 'badstart4.d' matched 2 probes
CPU ID FUNCTION:NAME
0 36401 :tick-1sec
BEGIN
{
a=$1;
b=$2;
}
tick-1sec
/execname == $3/
# ./badstart5.d 1 2 init
dtrace: failed to compile script ./badstart5.d: line 10:
failed to resolve init: Unknown variable name
# ./badstart5.d 1 2 '"init"'
^C
syscall
{
trace(timestamp);
}
# ./test2.d
dtrace: failed to compile script ./test2.d: line 3: probe
description :::syscall does not match any probes
When using the printf() and printa() built-in functions, make sure that
the arguments match the format specifiers in type and number:
# cat -n test3.d
1 #!/usr/sbin/dtrace -qs
2
3 sched:::on-cpu
4 /pid != $pid && pid != 0/
5 {
6 @[curpsinfo->pr_psargs, curcpu->cpu_id] =
count();
7 }
8
9 END
10 {
11 printf("%-30s %4s %6s\n", "Command", "CPU");
12 printa("%-30s %4d %@6d\n", @);
13 }
# ./test3.d
dtrace: failed to compile script ./test3.d: line 11:
printf( ) prototype mismatch: conversion #3 (%s) is missing
a corresponding value argument
# cat -n test3a.d
1 #!/usr/sbin/dtrace -qs
2
3 sched:::on-cpu
4 /pid != $pid && pid != 0/
5 {
6 @[curpsinfo->pr_psargs, curcpu->cpu_id] =
count();
7 }
8
9 END
10 {
11 printf("%-30s %4s %6s\n", "Command", "CPU",
"Count");
12 printa("%-30s %4s %@6d\n", @);
13 }
# ./test3a.d
dtrace: failed to compile script ./test3a.d: line 12:
printa( ) argument #3 is incompatible with conversion #2
prototype:
conversion: %s
prototype: char [] or string (or use stringof)
argument: processorid_t
# cat test4.d
#!/usr/sbin/dtrace -s
syscall::open:entry
{
printf("%s was opening: %s\n", execname, arg0);
}
# ./test4.d
dtrace: failed to compile script ./test4.d: line 5: printf(
) argument #3 is incompatible with conversion #2 prototype:
conversion: %s
Remember that pointer arguments to system calls are user addresses, not
kernel addresses. You must use the copyinstr() built-in function to
retrieve the strings:
# cat test4a.d
#!/usr/sbin/dtrace -s
syscall::open:entry
{
printf("%s was opening: %s\n", execname, stringof(arg0));
}
# ./test4a.d
dtrace: script './test4a.d' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 37: syscall::open:entry): invalid
address (0xff3d79d3) in action #2
dtrace: error on enabled probe ID 1 (ID 37: syscall::open:entry): invalid
address (0xff3ed570) in action #2
dtrace: error on enabled probe ID 1 (ID 37: syscall::open:entry): invalid
address (0xff3ef6d0) in action #2
^C
# cat test4b.d
#!/usr/sbin/dtrace -s
syscall::open:entry
{
printf("%s was opening: %s\n", execname, copyinstr(arg0));
}
# ./test4b.d
dtrace: script './test4b.d' matched 1 probe
CPU ID FUNCTION:NAME
0 37 open:entry ls was opening: /var/ld/ld.config
Avoid enabling probes that generate too much data, causing drops:
# cat drop.d
#!/usr/sbin/dtrace -s
entry
{
printf("%s %s %s\n", probeprov,
probemod, probefunc);
}
# ./drop.d > /tmp/drop.out
dtrace: script './drop.d' matched 19579 probes
dtrace: 29569 drops on CPU 0
dtrace: 903839 drops on CPU 0
^Cdtrace: 448991 drops on CPU 0
# ./test9.d
dtrace: script './test9.d' matched 2 probes
dtrace: error on enabled probe ID 2 (ID 36402:
profile:::tick-3sec): divide-by-zero in action #1 at DIF
offset 20
A-1
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Default Action
Default Action
A clause need not contain an action; it may instead consist simply of
manipulation of variable state, or of any combination of actions and
manipulations of variable state. If a clause contains no actions and no D
manipulation (that is, if a clause is empty), the default action is taken. The
default action is to trace the enabled probe identifier (EPID) to the
principal buffer.
Using the default action allows for simple use of the dtrace(1M)
command. For example, you can enable all probes in the TS module with
the default action by using:
# dtrace -m TS
The printf() action tells DTrace to trace the data associated with each
argument after the first argument, and then to format the results using the
rules described by the first printf() argument, known as a format string.
The format string is a regular string that contains any number of format
conversions, each beginning with the % character, which describe how to
format the corresponding argument. The first conversion in the format
string corresponds to the second printf() argument, the second
conversion to the third argument, and so on. All of the text between
conversions is printed verbatim. The character following the conversion
character describes the format to use for the corresponding argument.
Unlike the printf(3C) action, DTrace printf() is implemented as a built-
in function that is recognized by the D compiler. The D compiler provides
several useful services for the DTrace printf() action that are not found
in the C library printf():
The D compiler compares the arguments to the conversions in the
format string. If an arguments type is incompatible with the format
conversion, the D compiler produces an error message explaining
the problem.
The D compiler does not require the use of size prefixes with
printf() format conversions. The C printf() routine requires that
you indicate the size of arguments by adding prefixes, such as %ld
for long or %lld for long long. The D compiler knows the size and
type of your arguments, so these prefixes are not required in your D
printf() statements.
DTrace provides additional format characters that are useful for
debugging and observability; for example, the %a format conversion
can be used to print a pointer as a symbol name and offset.
Conversion Specifications
field width can also be specified as an asterisk (*), in which case the
field width is set dynamically based on the value of an additional
argument of type int.
An optional precision that provides one of the following:
The minimum number of digits to appear for the d, i, o, u, x,
and X conversions (the field is padded with leading zeroes)
The number of digits to appear after the radix character for the
e, E, and f conversions
The maximum number of significant digits for the g and G
conversions
The maximum number of bytes to be printed from a string by
the a conversion
The precision takes the form of a period (.) followed by either an
asterisk (*), as described in the Width and Precision Specifiers
subsection, or by a decimal digit string.
An optional sequence of size prefixes that indicate the size of the
corresponding argument (described in the Size Prefixes
subsection). The size prefixes are not necessary in D and are
provided solely for compatibility with the C printf() function.
A conversion specifier (described in the following subsection) that
indicates the type of conversion to be applied to the argument.
Flag Specifiers
You enable the printf() conversion flags by specifying one or more of the
following characters, which can appear in any order:
() The integer portion of the result of a decimal conversion (%i,
%d, %u, %f, %g, or %G) is formatted with thousands grouping
characters using the non-monetary grouping character. Not all
locales, including the POSIX C locale, provide non-monetary
grouping characters for use with this flag.
(-) The result of the conversion is left-justified within the field. The
conversion will be right-justified if this flag is not specified.
(+) The result of signed conversion always begins with a sign (+ or
-). If this flag is not specified, the conversion begins with a sign only
when a negative value is converted.
You can specify the minimum field width as a decimal digit string
following any flag specifier, as described previously, in which case the
field width is set to the specified number of columns. You can also specify
the field width as asterisk (*), in which case an additional argument of
type int is accessed to determine the field width. For example, to print an
integer x in a field width determined by the value of the int variable w,
you write this D statement:
printf(%*d, w, x);
The precision for the conversion can be specified as a decimal digit string
following a period (.) or by an asterisk (*) following a period. If an
asterisk is used to specify the precision, an additional argument of type
int prior to the conversion argument is accessed to determine the
precision. If both width and precision are specified as asterisks, the order
of arguments to printf() for the conversion should appear in the order:
width, precision, value.
Size Prefixes
The size prefixes can be placed just before the format conversion name
and after any flags, widths, and precision specifiers. The size prefixes are:
Optional h specifies that a following a, i, o, u, x, or X conversion
applies to a short or unsigned short
Optional l specifies that a following d, i, o, u, x, or X conversion
applies to a long or unsigned long
Optional ll specifies that a following d, i, o, u, x, or X conversion
applies to a long long or unsigned long long
Optional L specifies that a following e, E, f, g, or G conversion
applies to a long double
Optional l specifies that a following c conversion applies to a
wint_t argument; an optional l specifies that a following s
conversion character applies to a pointer to a wchar_t argument
Conversion Formats
The rules for specifying conversions in the format string are as follows:
The format conversions must match the tuple signature used to
create the aggregation. Each tuple element can only appear once. For
example, suppose you aggregate a count using the following D
statements:
@a[hello, 123] = count();
@a[goodbye, 456] = count();
If you then add the D statement printa(format-string, @a) to a
probe clause, the dtrace utility snapshots the aggregation data and
produces output as if you had entered the statements for each tuple
defined in the aggregation, such as:
printf(format-string, hello, 123);
printf(format-string, goodbye, 456);
Unlike printf(), the format string you use for printa() need not
include all elements of the tuple (that is, you can have a tuple of
length 3 and only one format conversion). Therefore you can omit
any tuple keys from your printa() output by changing your
aggregation declaration to move the ones you want to omit to the
end of the tuple and then omitting corresponding conversion
specifiers for them from the printa() format string.
The aggregation result itself can be included in the output by using
the additional @ format flag character, which is only valid when used
with printa(). The @ flag can be combined with any appropriate
format conversion specifier, and can appear more than once in a
format string. This means that your tuple result can appear
anywhere in the output and can appear more than once. The set of
conversion specifiers that can be used with each aggregating
function are implied by the aggregating functions result type, listed
below:
uint64_t avg()
uint64_t count()
int64_t lquantize()
uint64_t max()
uint64_t min()
int64_t quantize()
uint64_t sum()
For example, to format the results of avg(), you can apply the %d, %i,
%o, %u, or %x format conversions. The quantize() and lquantize()
functions format their results as an ASCII table rather than as a
single value.
END
{
printa(@8u %a\n, @a);
}
If you use the dtrace command to execute this program, then wait a few
seconds and type Control-C, you see output similar to the following:
# dtrace -s printa.d
^C
CPU ID FUNCTION: NAME
1 2 :END 1 Oxl
1 ohciohci_handle root hub_status_change+0x148
1 specfsspec_write+OxeO
1 Oxffl4f950
1 genunixcyclicsoftint+0x588
1 Oxfef228Oc
1 genunixgetf+Oxdc
1 ufsufs icheck+0x50
1 genunixinfpollinfo+0x80
1 genunixkmem_log_enter+tOxle8
...
The stack() action records a kernel stack trace to the directed buffer. The
kernel stack is nframes in depth. If you do not provide nframes, the
number of stack frames recorded is the number specified by the
stackframes option. For example:
# dtrace -n uiomove:entry{stack()}
CPU ID FUNCTION:NAME
0 12200 uiomove:entry
ufs`rdip+0x338
ufs`ufs_read+0x208
genunix`vn_rdwr+0x1c0
elfexec`getelfphdr+0xa4
elfexec`elf32exec+0x7a0
genunix`gexec+0x324
genunix`exec_common+0x278
genunix`exece+0xc
unix`syscall_trap32+0xcc
0 12200 uiomove:entry
ufs`ufs_readlink+0x11c
genunix`pn_getsymlink+0x40
genunix`lookuppnvp+0x414
genunix`lookuppnat+0x120
genunix`resolvepath+0x50
unix`syscall_trap32+0xcc
...
The stack() action differs from other actions in that it can also be used as
a key to an aggregation:
genunix`installctx+0xc
genunix`schedctl+0x5c
unix`syscall_trap+0xac
1
genunix`schedctl_shared_alloc+0xc0
genunix`schedctl+0x18
unix`syscall_trap+0xac
1
unix`lgrp_shm_policy_set+0x168
genunix`segvn_create+0x82c
genunix`as_map+0xf0
genunix`schedctl_map+0x98
genunix`schedctl_shared_alloc+0x8c
genunix`schedctl+0x18
unix`syscall_trap+0xac
1
...
sd`xbuf_iostart+0x7c
ufs`log_roll_write_bufs+0x100
ufs`log_roll_write+0xe4
ufs`trans_roll+0x2f8
unix`thread_start+0x4
16
The ustack() action records a user stack trace to the directed buffer. The
user stack is nframes in depth. If you do not specify nframes, the number
of stack frames recorded is the number specified by the ustackframes
option. Although ustack() can determine the address of the calling
frames when the probe fires, the stack frames are not translated into
symbols until the ustack() action is processed at user-level by the DTrace
consumer. Note that some functions are static and therefore do not have
entries in the symbol table; call sites in these functions are displayed with
their hexadecimal address. Also, because ustack() symbol translation
does not occur until after the data is recorded, there exists a possibility
that the process in question has exited, making stack frame translation
impossible. In this case, the dtrace utility emits a warning, followed by
the hexadecimal stack frames. For example:
dtrace: failed to grab process 100941: no such process
c7b834d4
c7bca95d
c7bcala4
c7bd4 374
c7bc2528
8047efc
libc.so.1`_brk_unlocked+0x4
libc.so.1`sbrk+0x24
vi`morelines+0x4
vi`append+0xc4
vi`vdoappend+0x2c
vi`fixzero+0x28
vi`ovbeg+0x30
vi`vop+0x158
vi`commands+0x13d0
vi`main+0xf24
vi`_start+0x108
1
...
libc.so.1`_brk_unlocked+0x4
libc.so.1`sbrk+0x24
vi`morelines+0x4
vi`append+0xc4
vi`put+0xe4
vi`vremote+0x64
vi`vmain+0x1670
vi`vop+0x25c
vi`commands+0x13d0
vi`main+0xf24
vi`_start+0x108
35
Destructive Actions
Some actions are destructive in that they change the state of the system.
Although they change the system in a well-defined way, they change it
nonetheless. You cannot use destructive actions unless you have explicitly
enabled them. In the dtrace(1M) command, you enable destructive
actions with the -w option. If you attempt to use destructive actions in the
dtrace(1M) command without explicitly enabling them, dtrace fails,
returning an error message similar to:
dtrace: could not enable tracing: Destructive actions
not allowed
The stop() action forces the process that hit the enabled probe to stop
when it next leaves the kernel, as if stopped by a proc(4) action. You can
use the prun(1) utility to resume a process that has been stopped by the
stop() action. You can use the stop() action to stop a process at any
DTrace probe point; this allows you to capture a program in a very
particular state (which is difficult to achieve with a simple breakpoint).
You can then attach a traditional debugger (such as mdb(1)) to examine the
programs state, or use the gcore(1) utility to capture that state in a core
file for later analysis.
The raise() action sends the specified signal to the currently running
process. This is similar to using the kill(1) command to send a process a
signal; however, you can use the raise() action to send a signal at a
precise point in a processs execution.
The copyout() action copies nbytes from the buffer specified by buf to
the address specified by addr in the address space of the process
associated with the current thread. If the user-space address does not
correspond to a valid, faulted-in page in the current address space, an
error is generated.
The copyoutstr() action copies the string specified by str to the address
specified by addr in the address space of the process associated with the
current thread. If the user-space address does not correspond to a valid,
faulted-in page in the current address space, an error is generated. The
string length is limited to the value set by the strsize option.
proc:::signal-send
/args[2] == SIGINT/
{
printf("SIGINT sent to %s by ", args[1]->pr_fname);
system("getent passwd %d | cut -d: -f5", uid);
}
# ./whosend.d
SIGINT sent to run-mozilla.sh by Mary Smith
^C
On Solaris running on x86, you might see the following on the console:
dtrace: breakpoint action at probe
fbt:genunix:clock:entry (ecb d2b97060)
stopped at int2O+Oxb: ret
kadb [0]:
The address following the probe description is the address of the enabling
control block (ECB) within DTrace. You can use it to learn more details
about the probe enabling that induced the breakpoint action.
Note that a mistake with the breakpoint() action can cause it to be called
far more often than intended. This can in turn prevent you from even
terminating the DTrace consumer that is inducing the breakpoint actions.
If you find yourself in this situation, set the kernel integer variable
dtrace_destructive_disallow to 1. This disallows all destructive
actions on the machine. This setting should be used only if you find
yourself in this particular situation.
If you are using the kadb(1M) debugger on x86, follow these steps:
1. Use the 4-byte write modifier (W) with the / formatting dcmd:
kadb[0]: dtrace_destructive_disallow/w 1
dtrace_destructive_disallow: 0x0 = 0xl
kadb[0]:
2. Continue by entering :c:
kadb[0]: :c
The panic() action induces a kernel panic when triggered. Use this action
to force a system crash dump at a time of interest. The panic() action can
be used together with ring buffering and postmortem analysis to
understand a problem. When you use the panic() action, you see a panic
message that denotes the probe inducing the panic. For example:
panic[cpu0]/thread=300Ol83Ob80: dtrace: panic action at
probe
syscall::mmap:entry (ecb 300000acfc8)
The message buffer of the crash dump will also contain the probe and
ECB responsible for the panic() action.
The chill() action causes DTrace to spin for the specified number of
nanoseconds. This action is primarily useful for exploring problems that
might be timing related. For example, you can use it to open race
condition windows, or to bring periodic events into or out of phase with
one another.
Because interrupts are disabled while in DTrace probe context, any use of
the chill() action induces interrupt latency, scheduling latency, dispatch
latency, and so on. The chill() action can, therefore, cause strange
systemic effects, and should not be used indiscriminately. Moreover,
because the liveness of the system relies on being able to periodically
handle interrupts, DTrace refuses to implement the chill() action for
longer than 500 milliseconds within any given one-second interval, and
instead reports an illegal operation error:
# dtrace -w -n 'syscall::open:entry {chill(500000001)}'
dtrace: description 'syscall::open:entry ' matched 1 probe
dtrace: allowing destructive actions
dtrace: error on enabled probe ID 2 (ID 18022:
syscall::open:entry): illegal operation in action #1
The cap is enforced even if the time is spread across multiple calls to
chill(), or if the time is spread across multiple DTrace consumers for a
single probe.
Special Actions
Some actions do not fall into either the data recording action or the
destructive action category. These other special actions fall into one of two
sets. The first set contains those actions associated with speculative tracing.
The second set contains the exit() action.
When the exit() action is called, only DTrace actions already underway
on other CPUs are taken; no subsequent actions are taken on any CPU.
The only exception to this is the END probe, which is called after the
DTrace consumer has processed the exit() action and has indicated that
tracing should stop.
Subroutines
Subroutines differ from actions in that they generally only affect internal
DTrace state. There is therefore no such thing as a destructive subroutine,
and subroutines never trace data into buffers. Many subroutines have
analogs in Section 9F or Section 3C of the manual pages; see Intro(9F)
and Intro(3), respectively.
Note that the collapsing of /../ elements is nave in that the parent
component is collapsed without regard to symbolic links. As a result, the
cleanpath() subroutine might take a valid path and return a shorter,
invalid one. For example, if the path specified by str were
/foo/../bar, and /foo were a symbolic link to /net/foo/export,
then cleanpath() would return the string /bar even though bar might
only be in /net/foo, not in /. This limitation is due to the fact that
cleanpath() is called in the context of a firing probe, where full symbolic
link resolution or arbitrary names are not possible. The returned string is
allocated out of scratch memory, and is therefore valid only for the
duration of the clause. If insufficient scratch space is available, cleanpath
aborts and an error is generated.
B-1
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Built-in Variables
Built-in Variables
You have seen a number of special built-in D variables in the example
programs, including timestamp, pid, and others. All of these variables
are scalar global variables; currently D does not define thread-local
variables, clause-local variables, or built-in associative arrays. Table B-1
shows the complete list of D built-in variables.
int64_t arg0, ..., arg9 The first ten input arguments to a probe represented as raw
64-bit integers. If fewer than ten arguments are passed to
the current probe, the remaining variables return zero.
args[] The typed arguments to the current probe, if any. The
args[] array is accessed using an integer index, but each
element is defined to be the type corresponding to the
given probe argument. For example, if args[] is
referenced by a read(2) system call probe, args[0] is of
type int, args[1] is of type void *, and args[2] is of
type size_t.
unintptr_t caller The program counter location of the current thread just
before entering the current probe.
lwpsinfo_t *curlwpsinfo The lightweight process (LWP) state of the LWP associated
with the current thread. This structure is described in
further detail in proc(4).
psinfo_t *curpsinfo The process state of the process associated with the current
thread. This structure is described in further detail in
proc(4).
kthread_t *curthread The address of the operating system kernels internal data
structure for the current thread, the kthread_t structure.
The kthread_t is defined in <sys/thread.h>.
string cwd The name of the current working directory of the process
associated with the current thread.
epid The enabled probe ID (EPID) for the current probe. This
integer uniquely identifies a particular probe that is
enabled with a specific predicate and set of actions.
int errno The error value returned by the last system call executed
by this thread.
Macro Variables
The D compiler defines a set of built-in macro variables that you can use
when writing D programs or interpreter files. Macro variables are
identifiers that are prefixed with a dollar sign ($) and are expanded once
by the D compiler when processing your input file. Table B-2 shows the
complete list of D macro variables.
D Operators
C-1
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Arithmetic Operators
Arithmetic Operators
D provides the standard arithmetic operators for use in your programs.
These operators all have the same meaning as they do in ANSI-C for
integer operands. Table C-1 shows the D binary arithmetic operators.
Operator Meaning
+ Integer addition
- Integer subtraction
* Integer multiplication
/ Integer division
% Integer modulus
Relational Operators
D provides binary relational operators for use in your programs. These
operators all have the same meaning as they do in ANSI-C. Table C-2
shows the D relational operators.
Operator Meaning
D Operators C-3
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Logical Operators
Logical Operators
D provides binary logical operators for use in your programs. Table C-3
shows the D logical operators. The first two are equivalent to the
corresponding ANSI-C operators.
Operator Meaning
Bitwise Operators
D provides binary operators for manipulating individual bits inside of
integer operands. These operators all have the same meaning as they do
in ANSI-C. Table C-4 shows the D bitwise operators.
Operator Meaning
You use the binary & operator to clear bits from an integer operand. You
use the binary | operator to set bits in an integer operand. The binary ^
operator returns 1 in each bit position where exactly one of the
corresponding operand bits is set.
You use the shift operators to move bits left or right in a given integer
operand. Shifting left fills empty bit positions on the right-hand side of
the result with zeroes. Shifting right using an unsigned integer operand
fills empty bit positions on the left-hand side of the result with zeroes.
Shifting right using a signed integer operand (an action known as an
arithmetic shift operation) fills empty bit positions on the left-hand side
with the value of the sign bit.
In addition to the binary logical operators, you can use the unary ~
operator to perform a bitwise negation of a single operand: it converts
each 0 bit in the operand into a 1 bit, and each 1 bit in the operand into a
0 bit.
D Operators C-5
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Assignment Operators
Assignment Operators
D provides the following binary assignment operators for modifying D
variables. Remember that you can only modify D variables and arrays:
kernel data objects and constants cannot be modified using the D
assignment operators. The assignment operators have the same meaning
as they do in ANSI-C. Table C-5 shows the D assignment operators.
Operator Meaning
D Operators C-7
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Increment and Decrement Operators
x += 1; y = ++x;
y = x;
If the operator appears after the variable name, the variable is modified
after its current value is returned for use in the expression. For example,
the following two expressions produce identical results:
y = x; y = x--;
x -= 1;
You can use the increment and decrement operators to create new
variables without declaring them. If you omit a variable declaration and
apply the increment or decrement operator to a variable, the variable is
implicitly declared to be of type int64_t.
Conditional Expressions
Although D does not provide support for if-then-else constructs, it does
provide support for simple conditional expressions using the ? and :
operators. These operators permit a triplet of expressions to be associated
where the first expression is used to conditionally evaluate one of the
other two. For example, the following D statement can be used to set a
variable x to one of two strings, depending on the value of i:
x = i == 0 ? zero : non-zero;
D Operators C-9
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A