Dynamic Performance Tuning and Troubleshooting With DTrace SA 327 S10 New PDF

Dynamic Performance Tuning and
Troubleshooting With DTrace

SA-327-S10
Student Guide
Sun Microsystems, Inc.

UBRM05-104
500 Eldorado Blvd.
Broomfield, CO 80021
U.S.A.
Revision A
March 18, 2005 11:30 am
Copyright 2005 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.
This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and
decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of
Sun and its licensors, if any.
Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.
Sun, Sun Microsystems, the Sun logo, Solaris, and OpenBoot are trademarks or registered trademarks of Sun Microsystems, Inc., in the U.S.
and other countries.
All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc., in the U.S. and
other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.
UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd.
Federal Acquisitions: Commercial Software Government Users Subject to Standard License Terms and Conditions
Export Laws. Products, Services, and technical data delivered by Sun may be subject to U.S. export controls or the trade laws of other
countries. You will comply with all such laws and obtain all licenses to export, re-export, or import as may be required after delivery to
You. You will not export or re-export to entities on the most current U.S. export exclusions lists or to any country subject to U.S. embargo
or terrorist controls as specified in the U.S. export laws. You will not use or provide Products, Services, or technical data for nuclear, missile,
or chemical biological weaponry end uses.
DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS, AND

WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE
OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE
LEGALLY INVALID.
THIS MANUAL IS DESIGNED TO SUPPORT AN INSTRUCTOR-LED TRAINING (ILT) COURSE AND IS INTENDED TO BE
USED FOR REFERENCE PURPOSES IN CONJUNCTION WITH THE ILT COURSE. THE MANUAL IS NOT A STANDALONE
TRAINING TOOL. USE OF THE MANUAL FOR SELF-STUDY WITHOUT CLASS ATTENDANCE IS NOT RECOMMENDED.
Export Control Classification Number EAR99 assigned: 10 September 2004
Please
Recycle
Copyright 2005 Sun Microsystems Inc., 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits rservs.
Ce produit ou document est protg par un copyright et distribu avec des licences qui en restreignent lutilisation, la copie, la distribution,
et la dcompilation. Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme, par quelque moyen que ce soit,
sans lautorisation pralable et crite de Sun et de ses bailleurs de licence, sil y en a.
Le logiciel dtenu par des tiers, et qui comprend la technologie relative aux polices de caractres, est protg par un copyright et licenci
par des fournisseurs de Sun.
Sun, Sun Microsystems, le logo Sun, Solaris, et OpenBoot sont des marques de fabrique ou des marques dposes de Sun Microsystems,
Inc., aux Etats-Unis et dans dautres pays.
Toutes les marques SPARC sont utilises sous licence sont des marques de fabrique ou des marques dposes de SPARC International, Inc.
aux Etats-Unis et dans dautres pays. Les produits portant les marques SPARC sont bass sur une architecture dveloppe par Sun
Microsystems, Inc.
UNIX est une marques dpose aux Etats-Unis et dans dautres pays et licencie exclusivement par X/Open Company, Ltd.
Lgislation en matire dexportations. Les Produits, Services et donnes techniques livrs par Sun peuvent tre soumis aux contrles
amricains sur les exportations, ou la lgislation commerciale dautres pays. Nous nous conformerons lensemble de ces textes et nous
obtiendrons toutes licences dexportation, de r-exportation ou dimportation susceptibles dtre requises aprs livraison Vous. Vous
nexporterez, ni ne r-exporterez en aucun cas des entits figurant sur les listes amricaines dinterdiction dexportation les plus courantes,
ni vers un quelconque pays soumis embargo par les Etats-Unis, ou des contrles anti-terroristes, comme prvu par la lgislation
amricaine en matire dexportations. Vous nutiliserez, ni ne fournirez les Produits, Services ou donnes techniques pour aucune utilisation
finale lie aux armes nuclaires, chimiques ou biologiques ou aux missiles.
LA DOCUMENTATION EST FOURNIE EN LETAT ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES

EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y
COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A LAPTITUDE A UNE
UTILISATION PARTICULIERE OU A LABSENCE DE CONTREFAON.
CE MANUEL DE RFRENCE DOIT TRE UTILIS DANS LE CADRE DUN COURS DE FORMATION DIRIG PAR UN
INSTRUCTEUR (ILT). IL NE SAGIT PAS DUN OUTIL DE FORMATION INDPENDANT. NOUS VOUS DCONSEILLONS DE
LUTILISER DANS LE CADRE DUNE AUTO-FORMATION.
Please
Recycle
Table of Contents
About This Course ...............................................................Preface-xi
Course Goals.......................................................................... Preface-xi
Topics Not Covered.............................................................Preface-xiii
How Prepared Are You?.....................................................Preface-xiv
Introductions ......................................................................... Preface-xv
How to Use Course Materials ............................................Preface-xvi
Conventions .........................................................................Preface-xvii
Typographical Conventions ................................... Preface-xviii
DTrace Fundamentals ......................................................................1-1
Objectives ........................................................................................... 1-1
Relevance............................................................................................. 1-2
Additional Resources ........................................................................ 1-3
DTrace Features.................................................................................. 1-4
Transient Failures...................................................................... 1-4
Debugging Transient Failures................................................. 1-5
DTrace Capabilities................................................................... 1-6
DTrace Architecture........................................................................... 1-7
Probes and Probe Providers .................................................... 1-7
DTrace Components ................................................................. 1-8
DTrace Tour ...................................................................................... 1-12
Listing Probes .......................................................................... 1-12
Writing D Scripts..................................................................... 1-21
Using DTrace ....................................................................................2-1
Objectives ........................................................................................... 2-1
Relevance............................................................................................. 2-2
DTrace Performance Monitoring Capabilities............................... 2-4
Features of the DTrace Performance Monitoring
Capabilities ............................................................................. 2-4
Aggregations.............................................................................. 2-4
Examining Performance Problems Using the vminfo Provider . 2-8
The vminfo Probes.................................................................... 2-9
v
Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Finding the Source of Page Faults Using vminfo Probes.. 2-11
Examining Performance Problems Using the sysinfo
Provider .......................................................................................... 2-17
The sysinfo Probes ............................................................... 2-18
Using the quantize Aggregation Function With
the sysinfo Probes.............................................................. 2-21
Finding the Source of Cross-Calls ........................................ 2-22
Examining Performance Problems Using the io Provider ........ 2-26
The io Probes .......................................................................... 2-26
Information Available When io Probes Fire ...................... 2-27
Finding I/O Problems ........................................................... 2-32
Obtaining System Call Information .............................................. 2-36
The syscall Provider............................................................ 2-36
D Language Variables ............................................................ 2-43
Associative Arrays .................................................................. 2-44
Thread-Local Variables .......................................................... 2-45
Timing a System Call.............................................................. 2-46
Following a System Call........................................................ 2-48
Creating D Scripts That Use Arguments ...................................... 2-53
Built-in Macro Variables ....................................................... 2-54
PID Argument Example......................................................... 2-56
Executable Name Argument Example................................. 2-57
Custom Monitoring Tools..................................................... 2-60
Debugging Applications With DTrace............................................ 3-1
Objectives ........................................................................................... 3-1
Relevance............................................................................................. 3-2
Application Profiling ......................................................................... 3-4
The pid Provider....................................................................... 3-4
The profile Provider............................................................ 3-19
Application Variables...................................................................... 3-30
Displaying Process Global Variables ................................... 3-30
Displaying Library Global Variables ................................... 3-34
The plockstat Provider ................................................................ 3-36
Transient System Call Errors.......................................................... 3-38
User Stack Traces on System Call Failures.......................... 3-39
Processes Using a Lot of System Time................................ 3-41
Open Files.......................................................................................... 3-45
Accessing System Call Pointer Arguments......................... 3-45
Displaying Names of Files Being Opened........................... 3-48
Finding System Problems With DTrace......................................... 4-1
Objectives ........................................................................................... 4-1
Relevance............................................................................................. 4-2
Accessing Kernel Variables .............................................................. 4-4
vi Dynamic Performance Tuning and Troubleshooting With DTrace

Using the D Language to Access Kernel Symbols ............... 4-4
Monitoring Kernel Variables................................................... 4-5
Accessing Kernel Data Structures........................................... 4-6
Accessing Lock Contention Information ............................. 4-12
The proc Provider and the system() Function.................. 4-18
Displaying Read Call Information................................................. 4-19
Tracing Read Calls System-Wide ......................................... 4-19
Tracing Read Calls Using the iosnoop.d D Script............ 4-22
Aggregating Read Data.......................................................... 4-22
Using the Anonymous Tracing Facility........................................ 4-25
Creating an Anonymous Enabling ....................................... 4-25
Performing Anonymous Tracing.......................................... 4-25
Using the Speculative Tracing Facility ......................................... 4-30
Speculative Tracing Functions ............................................. 4-31
Speculative Tracing Example ................................................ 4-32
Application Debugging With Speculative Tracing ............ 4-34
DTrace Privileges ............................................................................. 4-37
Using the Least Privilege Facility ......................................... 4-37
Kernel-Destructive Actions .................................................. 4-38
Setting DTrace User Privileges.............................................. 4-38
Setting DTrace Process Privileges......................................... 4-44
Summarizing the DTrace Privilege Levels......................... 4-47
Troubleshooting DTrace Problems.................................................5-1
Objectives ........................................................................................... 5-1
Relevance............................................................................................. 5-2
Minimizing DTrace Performance Impact ....................................... 5-4
Limiting Enabled Probes.......................................................... 5-4
Using Aggregations .................................................................. 5-5
Using Cacheable Predicates..................................................... 5-5
Using and Tuning DTrace Buffers................................................... 5-8
Principal Buffers........................................................................ 5-8
Principal Buffer Policies ........................................................... 5-8
DTrace Option Settings ............................................................ 5-9
The switch Buffer Policy....................................................... 5-10
The fill Buffer Policy ........................................................... 5-12
The ring Buffer Policy ........................................................... 5-13
Other Buffers............................................................................ 5-14
Buffer Resizing Policy ............................................................ 5-14
Debugging DTrace Scripts.............................................................. 5-15
Avoiding Syntax Errors in D Scripts .................................... 5-15
Avoiding Run-Time Errors in D Scripts ............................. 5-18
Actions and Subroutines ................................................................ A-1
Default Action ................................................................................... A-2
Data Recording Actions .................................................................. A-3
vii
The void trace(expression) Action................................ A-3
The void tracemem(address, size_t nbytes) Action . A-3
The void printf(string format, ...) Action............ A-3
The printa Action................................................................. A-10
The stack() Action ................................................................ A-12
The ustack() Action .............................................................. A-13
Destructive Actions......................................................................... A-16
Process Destructive Actions ................................................. A-16
Kernel Destructive Actions................................................... A-18
Special Actions ............................................................................... A-21
Actions Associated With Speculative Tracing ................... A-21
The void exit(int status) Action................................ A-22
Subroutines ..................................................................................... A-22
The void *alloca(size_t size) Subroutine ............... A-22
The string basename(char *str) Subroutine.............. A-23
The void bcopy(void *src, void *dest, size_t size)
Subroutine............................................................................ A-23
The string cleanpath(char *str) Subroutine........... A-23
The void *copyin(uintptr_t addr, size_t size)
Subroutine............................................................................ A-24
The string copyinstr(uintptr_t addr) Subroutine A-24
The string dirname(char *str) Subroutine ............... A-25
The size_t msgdsize(mblk_t *mp) Subroutine........... A-25
The size_t msgsize(mblk_t *mp) Subroutine ............. A-25
The int mutex_owned(kmutex_t *mutex) Subroutine A-25
The kthread_t *mutex_owner(kmutex_t *mutex)
Subroutine............................................................................ A-25
The int mutex_type_adaptive(kmutex_t *mutex)
Subroutine............................................................................ A-26
The int progenyof(pid_t pid) Subroutine................... A-26
The int rand(void) Subroutine ....................................... A-26
The int rw_iswriter(krwlock_t *rwlock) Subroutine.......
A-26
The int rw_write_held(krwlock_t *rwlock) Subroutine ..
A-27
The int speculation(void) Subroutine ........................ A-27
The string strjoin(char *str1, char *str2)
Subroutine............................................................................ A-27
The size_t strlen(string str) Subroutine ............... A-27
D Built-in and Macro Variables .......................................................B-1
Built-in Variables................................................................................B-2
Macro Variables..................................................................................B-4
D Operators ......................................................................................C-1
Arithmetic Operators........................................................................ C-2
Relational Operators......................................................................... C-3
viii Dynamic Performance Tuning and Troubleshooting With DTrace

Logical Operators.............................................................................. C-4
Bitwise Operators.............................................................................. C-5
Assignment Operators ..................................................................... C-6
Increment and Decrement Operators............................................. C-8
Conditional Expressions .................................................................. C-9
ix
Preface
About This Course
Course Goals
Upon completion of this course, you should be able to:
Describe the features and architecture of the Solaris Dynamic
Tracing (DTrace) facility
Use the DTrace facility to find the source of intermittent problems
Use DTrace to help debug applications
Use DTrace to look at the cause of performance problems
Troubleshoot DTrace script problems
Preface-xi
Course Goals
Course Map
The following course map enables you to see what you have
accomplished and where you are going in reference to the course goals.
Understanding and Using the DTrace Facility
DTrace Fundamentals Using DTrace
Using DTrace to Debug Applications and Find System Problems
Debugging Applications Finding System

With DTrace Problems with DTrace
Troubleshooting DTrace
Troubleshooting DTrace
Problems
Preface-xii Dynamic Performance Tuning and Troubleshooting With DTrace

Topics Not Covered
Topics Not Covered

This course does not cover the following topic. Many topics are covered in
other courses offered by Sun Educational Services:
Performance management
Refer to the Sun Educational Services catalog for specific information and
registration.
About This Course Preface-xiii

How Prepared Are You?
How Prepared Are You?

To be sure you are prepared to take this course, can you answer yes to the
following questions?
Do you have some previous programming experience?
Can you use the truss command to diagnose application problems?
Do you know the basics of the kernel structure?
Are you familiar with basic troubleshooting concepts?
Preface-xiv Dynamic Performance Tuning and Troubleshooting With DTrace

Introductions
Introductions
Now that you have been introduced to the course, introduce yourself to
the other students and the instructor, addressing the following items:
Name
Company affiliation
Title, function, and job responsibility
Experience related to topics presented in this course
Reasons for enrolling in this course
Expectations for this course
About This Course Preface-xv

How to Use Course Materials
How to Use Course Materials

To enable you to succeed in this course, these course materials contain a
learning module that is composed of the following components:
Goals You should be able to accomplish the goals after finishing
this course and meeting all of its objectives.
Objectives You should be able to accomplish the objectives after
completing a portion of instructional content. Objectives support
goals and can support other higher-level objectives.
Lecture The instructor presents information specific to the objective
of the module. This information helps you learn the knowledge and
skills necessary to succeed with the activities.
Activities The activities take various forms, such as review
questions, labs, discussion, and demonstration. Activities help
facilitate the mastery of an objective.
Visual aids The instructor might use several visual aids to convey a
concept, such as a process, in a visual form. Visual aids commonly
contain graphics, animation, and video.
Preface-xvi Dynamic Performance Tuning and Troubleshooting With DTrace

Conventions
Conventions
The following conventions are used in this course to represent various
training elements and alternative learning resources.
Icons
Additional resources Indicates other references that provide additional

information on the topics described in the module.
Discussion Indicates a small-group or class discussion on the current

topic is recommended at this time.
!
?
Note Indicates additional information that can help students but is not
crucial to their understanding of the concept being described. Students
should be able to understand the concept or complete the task without
this information. Examples of notational information include keyword
shortcuts and minor system adjustments.
Caution Indicates that there is a risk of personal injury from a

nonelectrical hazard, or risk of irreversible damage to data, software, or
the operating system. A caution indicates that the possibility of a hazard
(as opposed to certainty) might happen, depending on the action of the
user.
Caution Indicates that either personal injury or irreversible damage of

data, software, or the operating system will occur if the user performs this
action. A warning does not indicate potential events; if the action is
performed, catastrophic events will occur.
About This Course Preface-xvii

Conventions
Typographical Conventions
Courier is used for the names of commands, files, directories,
programming code, and on-screen computer output; for example:
Use ls -al to list all files.
system% You have mail.
Courier is also used to indicate programming constructs, such as class

names, methods, and keywords; for example:
The getServletInfo method is used to get author information.
The java.awt.Dialog class contains Dialog constructor.
Courier bold is used for characters and numbers that you type; for
example:
To list the files in this directory, type:
# ls
Courier bold is also used for each line of programming code that is
referenced in a textual description; for example:
1 import java.io.*;
2 import javax.servlet.*;
3 import javax.servlet.http.*;
Notice the javax.servlet interface is imported to allow access to its
life cycle methods (Line 2).
Courier italics is used for variables and command-line placeholders

that are replaced with a real name or value; for example:
To delete a file, use the rm filename command.
Courier italic bold is used to represent variables whose values are to

be entered by the student as part of an activity; for example:
Type chmod a+rwx filename to grant read, write, and execute
rights for filename to world, group, and users.
Palatino italics is used for book titles, new words or terms, or words that
you want to emphasize; for example:
Read Chapter 6 in the Users Guide.
These are called class options.
Preface-xviii Dynamic Performance Tuning and Troubleshooting With DTrace

Module 1
DTrace Fundamentals
Objectives
Upon completion of this module, you should be able to:
Describe the features of the Solaris Dynamic Tracing (DTrace)
facility
Describe the DTrace architecture
List and enable probes, and create action statements and D scripts
1-1
Relevance
Relevance
Discussion The following questions are relevant to understanding

DTrace:
!
?
Would the ability to turn on trace points for any one of the majority
of functions in the kernel be beneficial?
Would it be useful to know who is issuing kill(2) system calls?
1-2 Dynamic Performance Tuning and Troubleshooting With DTrace

Additional Resources
Additional resources The following references provide additional

information on the topics described in this module:
Sun Microsystems, Inc. Solaris Dynamic Tracing Guide, part number
817-6223-10.
The /usr/demo/dtrace directory contains all of the sample scripts
from the Solaris Dynamic Tracing Guide.
Cantrill Bryan M., Michael W. Shapiro, and Adam H. Leventhal.
Dynamic Instrumentation of Production Systems. Paper presented
at 2004 USENIX Conference.
BigAdmin System Administration Portal
[http://www.sun.com/bigadmin/content/dtrace].
The dtrace(1M) manual page.
DTrace Fundamentals 1-3

DTrace Features
DTrace Features
DTrace is a comprehensive dynamic tracing facility that is bundled into
the Solaris 10 Operating System (Solaris 10 OS). It is intended for use by
system administrators, service support personnel, kernel developers,
application program developers, and users who are given explicit access
permission to the DTrace facility
DTrace has the following features:

Enables dynamic modification of the system to record arbitrary data
Promotes tracing on live systems
Is completely safeits use cannot induce fatal failure
Allows tracing of both the kernel program and user-level programs
Functions with low overhead when tracing is enabled and zero
overhead when tracing is not being performed.
Transient Failures
DTrace provides answers to the causes of transient failures. A transient
failure is any unacceptable behavior that does not result in fatal failure of
the system. You might have a clear, specific failure, such as:
read(2) is returning EIO errno values on a device that is not
reporting any errors.
An application occasionally does not receive its expected timer
signal.
A thread is missing a condition variable wakeup.
The transient failure can be based on your own definition of

unacceptable system operation:
We were expecting to accommodate 100 users per CPU, but we
cannot support more than 60 users per CPU.
Why does system time go way up when I run application X?
Every morning between 9:30 a.m. and 10:00 a.m. the system
performs poorly.

DTrace Features
In these situations, you must understand the problem and either eliminate
the performance inhibitors or reset your expectations. Eliminating the
performance inhibitors could involve:
Adding more resources, such as memory or central processing units
(CPUs)
Reconfiguring existing resources, for example, tuning parameters or
rewriting software
Lessening the load
Debugging Transient Failures

DTrace was developed to provide a more efficient and cost-effective
method of diagnosing transient failures. Historically users have debugged
transient failures using process-centric tools such as truss(1), pstack(1),
or prstat(1M). These tools were not designed to debug systemic
problems. The tools that were intended for debugging systemic problems,
such as mdb(1) and Solaris Crash Analysis Tool (Solaris CAT), are
designed for postmortem analysis.
Debugging Using Postmortem Analysis
You can use postmortem analysis to debug transient problems by

inducing fatal failure during the period of transient failure. This technique
has the following disadvantages:
It requires inducing fatal failure, which nearly always results in more
downtime than the transient failure
It requires solving a dynamic problem from a static snapshot of the
systems state
Debugging Using Invasive Techniques
If existing tools cannot find the root cause of a transient failure, then you
must use more invasive techniques. Typically this means developing
custom instrumentation for the failing user program, the kernel, or both.
This can involve using the Trace Normal Form (TNF) facility. You then
reproduce the problem using the instrumented binaries. This technique
requires:
Running the instrumented binaries in production
Reproducing a transient problem in a development environment

DTrace Features
Such invasive techniques are undesirable because they are slow, error-
prone, and often ineffective.
Relying on the existing static TNF trace points found in the kernel, which
you can enable with the prex(1) command, is also unsatisfactory. The
number of TNF trace points in the kernel is limited and the overhead is
substantial.
DTrace Capabilities
The DTrace framework allows you to enable tens of thousands of tracing
points called probes. When these instrumentation points are hit, you can
display arbitrary data in the kernel (or user process).
An example of a probe provided by the DTrace framework is entry into

any kernel function. Information that you can display when this probe
fires includes:
Any argument to the function
Any global variable in the kernel
A nanosecond timestamp of when the function was called
A stack trace to indicate what code called this function
The process that was running when the function was called
The thread that made the call to this function
Using DTrace, you can explore all aspects of the Solaris 10 OS to:
Understand how the software works
Determine the root cause of performance problems
Examine all layers of software sequentially from the user level to the
kernel
Track down the source of aberrant behavior
DTrace comes with powerful data management primitives to eliminate

the need for postprocessing of gathered data. Unwanted data is pruned as
close to the source as possible to avoid the overhead of generating and
later filtering unwanted data.
DTrace also provides a mechanism to trace during boot and to retrieve all
traced data from a kernel crash dump.

DTrace Architecture
DTrace Architecture
DTrace helps you understand a software system by enabling you to
dynamically modify the operating system kernel and user processes to
record additional data that you specify at locations of interest called
probes.
Probes and Probe Providers

A probe is a program location or activityfor example, every system
clock tickto which DTrace can bind a request to perform a set of actions,
such as recording a stack trace, a timestamp, or the argument to a
function.
How Probes Work
Probes are like programmable sensors inserted at strategic points of your

Solaris 10 OS. You use DTrace to program the appropriate sensors to
record the information that you want. As each probe fires, DTrace gathers
the data from your probes and reports it back to you. If you do not specify
any actions for a probe, DTrace simply records each time the probe fires
and on what CPU.
DTrace provides tens of thousands of probes of various types. Probes are

implemented by probe providers. A provider is a kernel module that
enables a requested probe to fire when it is hit. An example of a provider
is the function boundary tracing or fbt provider. It provides entry and
return probes for almost every function in every kernel module.
How Probes Are Enabled
You define probes and actions using a programming language called D,

which is based on the C programming language. Usually D programs are
placed in script files ending in a .d suffix. The D programs are passed to a
DTrace consumer. The primary, generic DTrace consumer is the
dtrace(1M) command.
The user-specified D program is compiled by the DTrace consumer into a

form referred to as D Intermediate Format (DIF), which is then sent to the
DTrace framework within the kernel for execution. There, the probes that
are named within the D program are enabled, and the corresponding
provider performs the instrumentation required to activate them.

DTrace Architecture
DTrace Components
DTrace has the following components: probes, providers, consumers, and
the D programming language. The entire DTrace framework resides in the
kernel. Consumer programs access the DTrace framework through a well-
defined application programming interface (API).
Probes
A probe has the following attributes:

It is made available by a provider.
It identifies the module and function that it instruments.
It has a name.
These four attributes define a 4-tuple that uniquely identifies each probe:
provider:module:function:name
In addition, DTrace assigns a unique integer identifier to each probe.
Providers
A provider represents a methodology for instrumenting the system.

Providers make probes available to the DTrace framework. A provider
receives information from DTrace regarding when a probe is to be enabled
and transfers control to DTrace when an enabled probe is hit.
DTrace offers the following providers:

The function boundary tracing (fbt) provider can dynamically trace
the entry and return of every function in the kernel.
The syscall provider can dynamically trace the entry and return of
every Solaris system call.
The lockstat provider can dynamically trace the kernel
synchronization primitives to observe lock contention and hold
times.
The plockstat provider makes probes available for user-level
synchronization primitives including lock contention and hold times.
The sched provider can dynamically trace key scheduling events.
The profile provider enables you to add a configurable-rate timer
interrupt to the system.

DTrace Architecture
The dtrace provider enables pre-processing and post-processing (as

well as D program error-processing) capabilities.
The pid provider enables function boundary tracing within a process
as well as tracing of any instruction in the virtual address space of
the process.
The statically defined tracing (sdt) provider creates probes at sites a
programmer has explicitly designated in their own application.
The vminfo provider makes available probes that correspond to the
kernels virtual memory statistics.
The sysinfo provider makes available probes that correspond to the
kernels sys statistics.
The proc provider makes available probes that pertain to process and
thread creation and termination as well as signals.
The mib provider makes available probes that correspond to counters
in the Solaris management information bases (MIBs), which are used
by the simple network management protocol (SNMP).
The io provider makes available probes giving details related to disk
input and output (I/O).
The fpuinfo provider makes available probes that correspond to the
simulation of floating point instructions on SPARC-based
microprocessors.
Note You should check the Solaris Dynamic Tracing Guide, part number
817-6223, regularly for the addition of any new DTrace providers.
Consumers
A DTrace consumer is a process that interacts with DTrace. There is one

main DTrace consumer called dtrace(1M). It acts as a generic front-end to
the DTrace facility. Most other consumers are rewrites of previously
existing utilities such as lockstat(1M).
There is no limit on the number of concurrent consumers. That is, many

users can simultaneously run the dtrace(1M) command. DTrace handles
the multiplexing.

DTrace Architecture
D Programming Language
The D programming language enables you to specify probes of interest

and bind actions to those probes. To do this, you construct scripts called
D scripts. The nature of D scripts is similar to awk(1)s pattern action
pairs. The D programming language also borrows heavily from the C
programming language.
Even if you have no experience with the C programming language or

with awk(1), D programs are fairly easy to write and understand.
Features of the D language include the following:

Enables complete access to kernel C types, such as vnode_t
Provides complete access to kernel static and global variables
Provides complete support for American National Standards
Institute (ANSI)-C operators
Supports strings as a built-in type (unlike C, which uses the
ambiguous char * or char[] types).
Architecture Summary
To summarize, the DTrace facility consists of user-level consumer

programs such as dtrace(1M), providers packaged as kernel modules,
and a library interface for the consumer programs to access the DTrace
facility through the dtrace(7D) kernel driver.

DTrace Architecture
Figure 1-1 shows the overall DTrace architecture.
a.d b.d D program source files
intrstat(1M) plockstat(1M)
DTrace consumers
dtrace(1M) lockstat(1M)
libdtrace(3LIB)
dtrace(7D)
userland
kernel
DTrace
sysinfo vminfo io ...

DTrace providers
syscall profile fbt sched
Figure 1-1 DTrace Architecture

DTrace Tour
DTrace Tour
In this section you tour the DTrace facility and learn to perform the
following tasks:
List the available probes using various criteria:
Probes associated with a particular function
Probes associated with a particular module
Probes with a specific name
All probes from a specific provider
Explain how to enable probes
Explain default probe output
Describe action statements
Create a simple D script
Listing Probes
You can list all DTrace probes with the -l option of the dtrace(1M)
command:
# dtrace -l
ID PROVIDER MODULE FUNCTION NAME
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
4 syscall nosys entry
5 syscall nosys return
6 syscall rexit entry
7 syscall rexit return
8 syscall forkall entry
9 syscall forkall return
10 syscall read entry
11 syscall read return
12 syscall write entry
13 syscall write return
14 syscall open entry
15 syscall open return
...

DTrace Tour
You can use an additional option to list specific probes, as follows:

In a specific function: -f function
# dtrace -l -f cv_wait
12921 fbt genunix cv_wait entry
12922 fbt genunix cv_wait return
In a specific module: -m module

# dtrace -l -m sd
17147 fbt sd sdopen entry
17148 fbt sd sdopen return
17149 fbt sd sdclose entry
17150 fbt sd sdclose return
17151 fbt sd sdstrategy entry
17152 fbt sd sdstrategy return
...
With a specific name: -n name
# dtrace -l -n BEGIN
1 dtrace BEGIN
From a specific provider: -P provider
# dtrace -l -P lockstat
469 lockstat genunix mutex_enter adaptive-acquire
470 lockstat genunix mutex_enter adaptive-block
471 lockstat genunix mutex_enter adaptive-spin
472 lockstat genunix mutex_exit adaptive-release
473 lockstat genunix mutex_destroy adaptive-release
474 lockstat genunix mutex_tryenter adaptive-acquire
...
Realize that a specific function or module can be supported by many
providers:
# dtrace -l -f read
10 syscall read entry
11 syscall read return
4036 sysinfo genunix read readch
4040 sysinfo genunix read sysread
7885 fbt genunix read entry
7886 fbt genunix read return

DTrace Tour
The previous output shows that for each probe, the following is
displayed:
The probes uniquely assigned probe ID (The probe ID is only
unique within a given release or patch level of Solaris).
The provider name.
The module name (if applicable).
The function name (if applicable).
The probe name.
Specifying Probes in DTrace
Probes are fully specified by separating each component of the 4-tuple

with a colon:
provider:module:function:name
Empty components match anything. For example, fbt::alloc:entry

specifies a probe with the following attributes:
From the fbt provider
In any module
In the alloc function
Named entry
Elements of the 4-tuple can be left off from the left-hand side. For example,
open:entry matches probes from all providers and kernel modules that
have a function name of open and a probe name of entry:
# dtrace -l -n open:entry
14 syscall open entry
7386 fbt genunix open entry
Probe descriptions also support a pattern matching syntax similar to the

shell File Name Generation syntax described in sh(1). The special characters
*, ?, and [ ] are all supported. For example, the syscall::open*:entry
probe description matches both the open and open64 system calls. The ?
character represents any single character in the name and [ ] characters
lets you specify a choice of specific characters in the name.

DTrace Tour
Enabling Probes
Probes are enabled with the dtrace(1M) command by specifying them

without the -l option. When enabled in this way, DTrace performs the
default action when the probe fires. The default action indicates only that
the probe fired. No other data is recorded. For example, the following
code example enables every probe in the sd module:
# dtrace -m sd
CPU ID FUNCTION:NAME
0 17329 sd_media_watch_cb:entry
0 17330 sd_media_watch_cb:return
0 17167 sdinfo:entry
0 17168 sdinfo:return
0 17151 sdstrategy:entry
0 17152 sdstrategy:return
0 17661 ddi_xbuf_qstrategy:entry
0 17662 ddi_xbuf_qstrategy:return
0 17649 xbuf_iostart:entry
0 17341 sd_xbuf_strategy:entry
0 17385 sd_xbuf_init:entry
0 17386 sd_xbuf_init:return
0 17342 sd_xbuf_strategy:return
0 17177 sd_mapblockaddr_iostart:entry
0 17178 sd_mapblockaddr_iostart:return
0 17179 sd_pm_iostart:entry
0 17365 sd_pm_entry:entry
0 17366 sd_pm_entry:return
0 17180 sd_pm_iostart:return
0 17181 sd_core_iostart:entry
0 17407 sd_add_buf_to_waitq:entry
...
As you can see from the output, the default action displays the CPU
where the probe fired, the DTrace assigned probe ID, the function where
the probe fired, and the probe name.

DTrace Tour
To enable probes provided by the syscall provider:

# dtrace -P syscall
dtrace: description 'syscall' matched 452 probes
0 99 ioctl:return
0 98 ioctl:entry
0 99 ioctl:return
0 98 ioctl:entry
0 99 ioctl:return
0 234 sysconfig:entry
0 235 sysconfig:return
0 168 sigaction:entry
0 169 sigaction:return
0 168 sigaction:entry
0 169 sigaction:return
0 98 ioctl:entry
0 99 ioctl:return
0 38 brk:entry
0 39 brk:return
...
To enable probes named zfod:

# dtrace -n zfod
dtrace: description 'zfod' matched 3 probes
0 4080 anon_zero:zfod
0 4080 anon_zero:zfod
^C
To enable probes provided by the syscall provider in the open function,

use the -n option with the fully specified 4-tuple syntax:
# dtrace -n syscall::open*:
dtrace: description 'syscall::open:' matched 2 probes
0 14 open:entry
0 15 open:return
0 14 open:entry
0 15 open:return
0 14 open:entry
^C

DTrace Tour
To enable the entry probe in the clock function (which should fire every
1/100th second):
# dtrace -n clock:entry
dtrace: description 'clock:entry' matched 1 probe
0 4198 clock:entry
0 4198 clock:entry
0 4198 clock:entry
0 4198 clock:entry
0 4198 clock:entry
0 4198 clock:entry
0 4198 clock:entry
^C
DTrace Actions
Actions are user-programmable statements that are executed within the

kernel by the DTrace virtual machine. The following are properties of
actions:
Actions are taken when a probe fires.
Actions are completely programmable (in the D language).
Most actions record some specified state in the system.
Some actions can change the state of the system in a well-defined
way.
These are called destructive actions.
Destructive actions are not allowed by default.
Many actions use expressions in the D language.
For now, you will use D expressions that consist only of built-in D
variables. The following are some of the most useful built-in D variables.
See Appendix B for a complete list of the D built-in variables.
pid The current process ID
execname The current executable name
timestamp The time since boot in nanoseconds
curthread A pointer to the kthread_t structure that represents
the current thread
probemod The current probes module name
probefunc The current probes function name

DTrace Tour
probename The current probes name
There are also many built-in functions that perform actions. Appendix A,
Actions and Subroutines provides the complete list of D built-in
functions. Start with the trace() function, which records the result of a D
expression to the trace buffer. For example:
trace(pid) traces the current process ID.
trace(execname) traces the name of the current executable.
trace(curthread->t_pri) traces the t_pri field of the current
thread.
trace(probefunc) traces the function name of the probe.
Actions are indicated by following a probe specification with

{ action }. For example:
# dtrace -n 'readch {trace(pid)}'
dtrace: description 'readch ' matched 4 probes
0 4036 read:readch 2040
...
In the last example the process identification number (PID) appears in the
last column of output.

DTrace Tour
The following example traces the executable name:

# dtrace -m 'ufs {trace(execname)}'
dtrace: description 'ufs ' matched 889 probes
0 14977 ufs_lookup:entry ls
0 15748 ufs_iaccess:entry ls
0 15749 ufs_iaccess:return ls
0 14978 ufs_lookup:return ls
...
0 15005 ufs_rwunlock:entry utmpd
0 15006 ufs_rwunlock:return utmpd
0 14963 ufs_close:entry utmpd
0 14964 ufs_close:return utmpd
0 15007 ufs_seek:entry utmpd
0 15008 ufs_seek:return utmpd
0 14963 ufs_close:entry utmpd
^C

DTrace Tour
The next action example traces the time of entry to each system call:
# dtrace -n 'syscall:::entry {trace(timestamp)}'
dtrace: description 'syscall:::entry ' matched 226 probes
0 312 portfs:entry 157088479572713
0 98 ioctl:entry 157088479637542
0 98 ioctl:entry 157088479674339
0 234 sysconfig:entry 157088479767243
0 234 sysconfig:entry 157088479774432
0 168 sigaction:entry 157088479993155
0 168 sigaction:entry 157088480229390
0 98 ioctl:entry 157088480318855
0 234 sysconfig:entry 157088480398692
0 38 brk:entry 157088480422525
0 38 brk:entry 157088480438097
0 98 ioctl:entry 157088480794819
0 98 ioctl:entry 157088480959666
0 98 ioctl:entry 157088480986498
0 98 ioctl:entry 157088481033225
0 60 fstat:entry 157088481050686
0 60 fstat:entry 157088481074680
...
Multiple actions can be specified; they must be separated by semicolons:

# dtrace -n 'zfod {trace(pid);trace(execname)}'
dtrace: description 'zfod ' matched 3 probes
0 4080 anon_zero:zfod 2195 dtrace
0 4080 anon_zero:zfod 2197 bash
0 4080 anon_zero:zfod 2207 vi
0 4080 anon_zero:zfod 2207 vi
...

DTrace Tour
The following example traces the executable name in every entry to the
pagefault function:
# dtrace -n 'fbt::pagefault:entry {trace(execname)}'
dtrace: description 'fbt::pagefault:entry ' matched 1 probe
0 2407 pagefault:entry dtrace
0 2407 pagefault:entry sh
...
Writing D Scripts
Complicated DTrace enablings become difficult to manage on the
command line. The dtrace(1M) command supports scripts, specified
with the -s option. Alternatively, you can create executable DTrace
interpreter files. Interpreter files always begin with:
#!/usr/sbin/dtrace -s
Executable D Scripts
For example, you can write a script to trace the executable name upon
entry to each system call as follows:
# cat syscall.d
syscall:::entry
{
trace(execname);
}

DTrace Tour
By convention, D scripts end with a .d suffix. You can run this D script as
follows:
# dtrace -s syscall.d
dtrace: script 'syscall.d' matched 226 probes
0 312 pollsys:entry java
0 98 ioctl:entry dtrace
0 234 sysconfig:entry dtrace
0 168 sigaction:entry dtrace
0 168 sigaction:entry dtrace
0 38 brk:entry dtrace
^C
If you give the syscall.d file execute permission and add a first line to
invoke the interpreter, you can run the script by entering its name on the
command line as follows:
# cat syscall.d
syscall:::entry
{
trace(execname);
}
# chmod +x syscall.d
# ls -l syscall.d
-rwxr-xr-x 1 root other 62 May 12 11:30 syscall.d
# ./syscall.d
dtrace: script './syscall.d' matched 226 probes
0 98 ioctl:entry java
0 98 ioctl:entry java

DTrace Tour
D Literal Strings
The D language supports literal strings that you can use with the trace
function as follows:
# cat string.d
fbt::bdev_strategy:entry
{
trace(execname);
trace(" is initiating a disk I/O\n");
}
The \n at the end of the literal string produces a new line. To run this
script, enter the following:
# dtrace -s string.d
dtrace: script 'string.d' matched 1 probe
0 9215 bdev_strategy:entry bash is initiating a disk I/O
0 9215 bdev_strategy:entry vi is initiating a disk I/O
0 9215 bdev_strategy:entry sched is initiating a disk I/O
The quiet mode option, -q, in dtrace(1M) tells DTrace to record only the
actions explicitly stated. This option suppresses the default output
normally produced by the dtrace command. The following example
shows the use of the -q option on the string.d script:
# dtrace -q -s string.d
ls is initiating a disk I/O
cat is initiating a disk I/O
fsflush is initiating a disk I/O
vi is initiating a disk I/O
vi is initiating a disk I/O

The BEGIN and END Probes
The simple dtrace provider has only three probes. They are BEGIN, END,
and ERROR. The BEGIN probe fires before all others and performs pre-
processing steps. For example, it enables you to initialize variables, as
well as to display headings for output that is displayed by other actions
that occur later. The END probe fires after all other probes have fired and
enables you to perform post-processing. The ERROR probe fires when there
are any runtime errors in your D programs. The following example shows
a simple use of the BEGIN and END probes of the dtrace provider:
# cat beginEnd.d
BEGIN
{
trace("This is a heading\n");
}
END
{
trace("This should appear at the END\n");
}
# ./beginEnd.d
dtrace: script './beginEnd.d' matched 2 probes
0 1 :BEGIN This is a heading
^C
0 2 :END This should appear at the END
# dtrace -qs beginEnd.d

This is a heading
^C
This should appear at the END
Note The END probe does not fire until you interrupt (^C) the dtrace
command.
Module 2
Using DTrace
Objectives
Describe the DTrace performance monitoring capabilities
Examine performance problems using the vminfo provider
Examine performance problems using the sysinfo provider
Examine performance problems using the io provider
Use DTrace to obtain information about system calls
Create D scripts that use arguments
2-1
Relevance
Relevance
Discussion The following questions are relevant to understanding how

to use DTrace:
!
?
What performance monitoring tools exist in the Solaris 10 OS?
Would it be useful to know which process is making which system
calls?
What advantage does the ability to pass arguments to a D script
provide?


Cantrill, Bryan M., Michael W. Shapiro, and Adam H. Leventhal.
Dynamic Instrumentation of Production Systems. paper presented
at the 2004 USENIX Conference.
Sun Microsystems, Inc. Solaris Dynamic Tracing Guide (Beta), part
number 817-6223-10.
dtrace(1M) manual page in the Solaris 10 OS manual pages, Solaris
10 Reference Manual Collection.
Using DTrace 2-3

DTrace Performance Monitoring Capabilities

A number of the DTrace providers implement probes that correspond to
existing Solaris OS performance monitoring tools:
The vminfo provider Implements probes that correspond to the
vmstat(1M) tool
The sysinfo provider Implements probes that correspond to the
mpstat(1M) tool
The io provider Implements probes that correspond to the
iostat(1M) tool
In addition, the syscall provider implements probes that correspond to

the truss(1) command.
Features of the DTrace Performance Monitoring

Capabilities
Using the DTrace facility, you can extract the same information that the
bundled tools provide, with significant added flexibility. DTrace enables
you to gather only the specific information you need to diagnose the
aberrant behavior. It also provides additional related information such as
process and thread identification, stack traces, and other arbitrary kernel
information available at the time the probes fire.
Aggregations
Aggregated data is more useful than individual data points in answering
performance-related questions. For example, if you want to know the
number of page faults by process, you do not necessarily care about each
individual page fault. Rather, you want a table that lists the process names
and the total number of page faults.
DTrace provides several built-in aggregating functions. An aggregating

function has this property: if it is applied to subsets of a collection of
gathered data and then applied again to the results, it returns the same
result as it does when applied to the whole collection. Examples of
aggregating functions are count(), sum(), min(), and max(); A median
function would not be considered an aggregating function because it lacks
the above stated property.

DTrace is not required to store the entire set of data items for
aggregations; it keeps a running count, needing only the current
intermediate result and the new element. Intermediate results are kept per
central processing unit (CPU), enabling a scalable implementation
(because of not requiring the use of locks).
DTrace Aggregation Syntax
The general form of a DTrace aggregation is:

@name[ keys ] = aggfunc( args );
These variables are defined as follows:

name The name of the aggregation that is preceded by the @
character
keys A comma-separated list of D expressions
aggfunc One of the DTrace aggregating functions
args A comma-separated list of arguments appropriate to the
aggregating function
DTrace Aggregating Functions
Table 2-1 lists the DTrace aggregating functions.
Table 2-1 DTrace Aggregating Functions
Function
Arguments Result
Name
count none The number of times called.

sum scalar The total value of the specified expressions.
expression
avg scalar The arithmetic average (mean) of the specified
expression expressions.
min scalar The smallest value of the specified expressions.
expression
max scalar The largest value of the specified expressions.
expression
Using DTrace 2-5

Table 2-1 DTrace Aggregating Functions (Continued)
Function
Arguments Result
Name
lquantize scalar A linear frequency distribution, sized by the specified
expression, range, of the values of the specified expression.
lower bound, Increments the value in the highest bucket that is less
upper bound, than or equal to the specified expression.
step value
quantize scalar A power-of-two frequency distribution of the values of
expression the specified expression. Increments the value in the
highest power-of-two bucket that is less than or equal to
the specified expression.
Example Use of Aggregating Function
In the following example, the count aggregating function is used to count

the number of write(2) system calls per process:
# cat writes.d
syscall::write:entry
{
@numWrites[execname] = count();
}
# ./writes.d
dtrace: script 'writes.d' matched 1 probe
^C
dtrace 1
date 1
bash 3
grep 20
file 197
ls 201
Note No data is output from the aggregation until dtrace(1M) is

terminated. The output data is a summary up to that point.

Arguments Supplied by Providers
The syscall provider gives you access to a system calls arguments,

using the syntax arg0, arg1, arg2, for the functions first, second, third,
and so on, arguments. These argument values are of type int64_t. You
can also refer to the correctly typed arguments through the args[] array:
args[0], args[1], and so on. The following example displays the
average write size per process:
# cat writes2.d
syscall::write:entry
{
@avgSize[execname] = avg(arg2);
}
# ./writes2.d
dtrace: script 'writes2.d' matched 1 probe
^C
dtrace 1
bash 27
date 29
file 37
grep 60
ls 68
Using DTrace 2-7

Examining Performance Problems Using the vminfo Provider
Examining Performance Problems Using the vminfo

Provider
The vminfo provider makes available probes from the virtual memory
(vm) kernel statistics (kstat) kept by the kernel kstat facility. You can
examine any unexplainable behavior observed from the vm specific
output of the vmstat(1M) command using this DTrace provider. A probe
provided by the vminfo provider fires immediately before the
corresponding vm kstat value is incremented. To display both the names
and the current values (counts) of the vm named kstat, you can use the
kstat(1M) command as shown in the following command example.
# kstat -n vm
module: cpu instance: 0
name: vm class: misc
anonfree 0
anonpgin 4
anonpgout 0
as_fault 157771
cow_fault 34207
crtime 0.178610697
dfree 56
execfree 0
execpgin 3646
execpgout 0
fsfree 56
fspgin 16257
fspgout 57
hat_fault 0
kernel_asflt 0
maj_fault 6743
pgfrec 34215
pgin 9188
pgout 36
pgpgin 19907
pgpgout 57
pgrec 34216
pgrrun 4
pgswapin 0
pgswapout 0
prot_fault 39794
rev 0
scan 28668

snaptime 349429.087071013
softlock 165
swapin 0
swapout 0
zfod 12835
The vminfo Probes

Table 2-2 describes the vminfo probes.
Table 2-2 The vminfo Probes
Probe Name Description
anonfree Probe that fires when an unmodified anonymous page is freed as part of
paging activity. Anonymous pages are those that are not associated with
a file; memory containing such pages include heap memory, stack
memory, or memory obtained by explicitly mapping zero(7D).
anonpgin Probe that fires when an anonymous page is paged in from a swap
device.
anonpgout Probe that fires when a modified anonymous page is paged out to a swap
device.
as_fault Probe that fires when a fault is taken on a page and the fault is neither a
protection fault nor a copy-on-write fault.
cow_fault Probe that fires when a copy-on-write fault is taken on a page. The arg0
argument contains the number of pages that are created as a result of the
copy-on-write.
dfree Probe that fires when a page is freed as a result of paging activity. When
dfree fires, exactly one of the anonfree, execfree, or fsfree probes
also subsequently fires.
execfree Probe that fires when an unmodified executable page is freed as a result of
paging activity.
execpgin Probe that fires when an executable page is paged in from the backing
store.
execpgout Probe that fires when a modified executable page is paged out to the
backing store. Most paging of executable pages occurs in terms of the
execfree probe; the execpgout probe can only fire if an executable page
is modified in memory, an uncommon occurrence in most systems.
fsfree Probe that fires when an unmodified file system data page is freed as part
of paging activity.
Using DTrace 2-9

Table 2-2 The vminfo Probes (Continued)

fspgin Probe that fires when a file system page is paged in from the backing store.
fspgout Probe that fires when a file system page is paged out to the backing store.
kernel_asflt Probe that fires when a page fault is taken by the kernel on a page in its
own address space. When the kernel_asflt probe fires, it is immediately
preceded by a firing of the as_fault probe.
maj_fault Probe that fires when a page fault is taken that results in input/output
(I/O) from a backing store or swap device. Whenever maj_fault fires, it
is immediately preceded by a firing of the pgin probe.
pgfrec Probe that fires when a page is reclaimed from the free page list.
pgin Probe that fires when a page is paged in from the backing store or from a
swap device. This differs from the maj_fault probe in that the
maj_fault probe only fires when a page is paged in as a result of a page
fault; the pgin probe fires when a page is paged in, regardless of the
reason.
pgout Probe that fires when a page is paged out to the backing store or to a swap
device.
pgpgin Probe that fires when a page is paged in from the backing store or from a
swap device. The only difference between the pgpgin probe and the pgin
probe is that the pgpgin probe contains the number of pages paged in as
the arg0 argument. (The pgin probe always contains 1 in the arg0
argument.)
pgpgout Probe that fires when a page is paged out to the backing store or to a swap
device. The only difference between the pgpgout probe and the pgout
probe is that the pgpgout probe contains the number of pages paged out
as the arg0 argument. (The pgout probe always contains 1 in the arg0
argument.)
pgrec Probe that fires when a page is reclaimed.
pgrrun Probe that fires when the pager is scheduled.
pgswapin Probe that fires when a process is swapped in.
pgswapout Probe that fires when a process is swapped out.
prot_fault Probe that fires when a page fault is taken due to a protection violation.
rev Probe that fires when the page daemon begins a new revolution through
all pages.
scan Probe that fires when the page daemon examines a page.

Table 2-2 The vminfo Probes (Continued)
softlock Probe that fires when a page is faulted as a part of placing a software lock
on the page.
swapin Probe that fires when a swapped-out process is swapped back in.
swapout Probe that fires when a process is swapped out.
zfod Probe that fires when a zero-filled page is created on demand.
Finding the Source of Page Faults Using vminfo

Probes
Consider the following example output, obtained by running the vmstat
command.
# vmstat 5
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s2 s1 -- in sy cs us sy id
0 0 0 648560 437016 3 11 13 0 0 0 8 0 1 0 0 406 42 50 0 0 100
0 0 0 598912 396136 0 11 27 0 0 0 0 0 0 0 0 615 113 67 0 0 100
0 0 0 598888 396112 0 1 0 0 0 0 0 0 0 0 0 604 69 47 0 0 100
0 0 0 598864 396088 0 1 0 0 0 0 0 0 0 0 0 616 69 72 0 0 100
0 0 0 598864 396088 0 0 0 0 0 0 0 0 0 0 0 619 73 89 0 0 100
0 1 0 598104 393456 4 45 3588 0 0 0 0 0 474 0 0 2014 5138 1013 3 17 79
0 0 0 595224 381544 0 2 5273 0 0 0 0 0 698 0 0 2593 7545 1448 3 31 66
0 0 0 592024 368832 0 1 5509 0 0 0 0 0 725 0 0 2674 7840 1503 3 26 71
0 0 0 588792 362640 1 3 3679 0 0 0 0 0 485 0 0 2009 5259 1027 3 20 77
0 0 0 587984 361848 0 3 4 0 0 0 0 0 0 0 0 605 80 70 0 0 100
0 0 0 587960 361800 0 4 20 0 0 0 0 0 2 0 0 624 74 91 0 0 100
0 0 0 587944 361768 0 1 0 0 0 0 0 0 0 0 0 614 76 78 0 0 100
0 0 0 587920 361744 0 1 0 0 0 0 0 0 0 0 0 616 69 80 0 0 100
0 0 0 587848 361672 0 1 0 0 0 0 0 0 18 0 0 689 69 69 0 0 100
0 0 0 587832 361656 0 1 0 0 0 0 0 0 0 0 0 611 74 67 0 0 100
0 0 0 587808 361632 0 5 0 0 0 0 0 0 0 0 0 611 71 66 0 0 100
0 0 0 587784 361608 40 193 844 0 0 0 0 0 107 0 0 953 905 260 3 5 92
0 0 0 588184 362576 0 1 0 0 0 0 0 0 0 0 0 611 69 71 0 0 100
Here the pi column denotes the number of kilobytes paged in per second.
Executable Causing Page Faults
The vminfo provider makes it easy to discover more about the source of
these page-ins. The following example uses an anonymous aggregation:
Using DTrace 2-11

# dtrace -n 'pgin {@[execname] = count()}'

dtrace: description 'pgin ' matched 1 probe
^C
utmpd 2
in.routed 2
init 2
snmpd 5
automountd 5
vi 5
vmstat 17
sh 23
grep 33
dtrace 35
bash 62
file 198
find 4551
This output shows that the find command is responsible for most of the
page-ins. For a more complete picture of the find command in terms of
vm behavior, you can enable all vminfo probes. Before doing this,
however, you must introduce a filtering capability of DTrace called a
predicate.
Predicates
A D program consists of a set of probe clauses. A probe clause has the

following general form:
probe descriptions
/ predicate /
{
action statements
}
Predicates are D expressions enclosed in slashes / / that are evaluated at

probe firing time to determine whether the associated actions should be
executed. If the D expression evaluates to zero it is false; if it evaluates to
non-zero it is true. Predicates are optional, but you must place them
between the probe description and the action statements.

Details About the Executable Causing Page Faults
The following example examines the systems detailed vm behavior while

the find command runs:
# cat find.d
vminfo:::
/execname == "find"/
{ @[probename] = count(); }
Before running this D program, run a find command in the background

while another utility uses up a substantial portion of the systems
memory, as shown in the following example.
# (sleep 10 ; find / -name fubar & mkfile 300m /tmp/junk)&
[1] 840
# ps
PID TTY TIME CMD
615 pts/2 0:00 sh
841 pts/2 0:00 sleep
625 pts/2 0:00 bash
840 pts/2 0:00 bash
842 pts/2 0:00 ps
# ps
PID TTY TIME CMD
615 pts/2 0:00 sh
843 pts/2 0:02 find
625 pts/2 0:00 bash
840 pts/2 0:00 bash
845 pts/2 0:00 ps
844 pts/2 0:02 mkfile
# ps
PID TTY TIME CMD
615 pts/2 0:00 sh
843 pts/2 0:08 find
625 pts/2 0:00 bash
846 pts/2 0:00 ps
[1]+ Done ( sleep 10 ; find / -name fubar & mkfile 300m /tmp/junk )
# ps
PID TTY TIME CMD
615 pts/2 0:00 sh
847 pts/2 0:00 ps
625 pts/2 0:00 bash
Using DTrace 2-13

The following dtrace command was started in another terminal window

immediately after the above command group was started in the
background.
# dtrace -s find.d
dtrace: script 'find.d' matched 44 probes
^C
prot_fault 2
cow_fault 8
softlock 11
execpgin 15
kernel_asflt 40
zfod 52
as_fault 170
pgrec 5417
pgfrec 5417
maj_fault 18068
fspgin 18103
pgpgin 18118
pgin 18118
You might wonder why, with such a large memory load, scans do not
show up in the output of the dtrace command. This is because the
pageout daemon is running during scans, not the find user process. The
following example shows this behavior.
# cat mem.d
vminfo:::
{
@vm[execname,probename] = count();
}
END
{
printa("%16s\t%16s\t%@d\n", @vm);
}
# dtrace -qs mem.d

^C
sleep prot_fault 1
rm prot_fault 1
pageout rev 1
dtrace pgfrec 1
bash kernel_asflt 1
in.routed anonpgin 1

mkfile prot_fault 1
find prot_fault 1
dtrace pgrec 1
mkfile execpgin 2
mkfile kernel_asflt 2
vmstat prot_fault 2
rm zfod 3
find execpgin 3
sleep zfod 3
mkfile zfod 3
sendmail anonpgin 3
mkfile cow_fault 4
rm cow_fault 4
bash anonpgin 4
rm maj_fault 4
sendmail pgfrec 4
sleep cow_fault 4
find cow_fault 4
sendmail pgrec 4
...
bash pgrec 205
pageout fspgout 293
pageout anonpgout 293
pageout pgpgout 293
pageout pgout 293
pageout execpgout 293
pageout pgrec 293
pageout anonfree 360
pageout execfree 510
bash as_fault 519
pageout fsfree 519
sched dfree 523
sched pgrec 523
sched pgout 523
sched pgpgout 523
sched anonpgout 523
sched anonfree 523
sched execpgout 523
sched execfree 523
pageout dfree 803
rm pgrec 1388
rm pgfrec 1388
find maj_fault 5067
find fspgin 5085
find pgin 5088
find pgpgin 5088
Using DTrace 2-15

pageout scan 78852
The printa() built-in formatting function gives you increased control

over the output of an aggregation. For example, consider the following
code line:
{
printa("%16s\t%16s\t%@d\n", @vm);
}
It provides these formatting instructions:

%16s\t%16s prints the first and second elements of the aggregation
keys in a 16-character-wide column (right justified).
\t outputs a <Tab>.
%@d prints the aggregation value as a decimal number.
Note Appendix A provides more details on the format letters available

to the printa() function and the more general printf() function (which
resembles the printf(3C) function from the Standard C Library).

Examining Performance Problems Using the sysinfo Provider
Examining Performance Problems Using the sysinfo

Provider
The sysinfo provider makes available probes that correspond to the
sys kernel statistics. Because these statistics provide the input for
system monitoring utilities such as mpstat(1M), the sysinfo provider
enables quick exploration of observed aberrant behavior.
The sysinfo provider probes fire immediately before the sys named
kstat is incremented. The following example displays the sys named
kstat.
# kstat -n sys
module: cpu instance: 0
name: sys class: misc
bawrite 112
bread 6359
bwrite 1401
canch 374
cpu_ticks_idle 2782331
cpu_ticks_kernel 46571
cpu_ticks_user 12187
cpu_ticks_wait 30197
cpumigrate 0
...
syscall 3991217
sysexec 1088
sysfork 1043
sysread 131334
sysvfork 47
syswrite 676775
trap 266286
ufsdirblk 1027383
ufsiget 1086164
ufsinopage 873613
ufsipage 2
wait_ticks_io 30197
writech 5144172931
xcalls 0
xmtint 0
Using DTrace 2-17

The sysinfo Probes

Table 2-3 describes the sysinfo probes.
Table 2-3 The sysinfo Probes
bawrite Probe that fires when a buffer is about to be asynchronously written

out to a device.
bread Probe that fires when a buffer is physically read from a device. The
bread probe fires after the buffer has been requested from the
device, but before blocking pending its completion.
bwrite Probe that fires when a buffer is about to be written out to a device
synchronously or asynchronously.
cpu_ticks_idle Probe that fires when the periodic system clock has determined that
a CPU is idle. Note that this probe fires in the context of the system
clock and therefore fires on the CPU running the system clock; one
must examine the cpu_t argument (arg2) to determine the CPU
that has been deemed idle.
cpu_ticks_kernel Probe that fires when the periodic system clock has determined that
a CPU is executing in the kernel. Note that this probe fires in the
context of the system clock and therefore fires on the CPU running
the system clock; one must examine the cpu_t argument (arg2) to
determine the CPU that has been deemed to be executing in the
kernel.
cpu_ticks_user Probe that fires when the periodic system clock has determined that
a CPU is executing in user mode. Note that this probe fires in the
context of the system clock and therefore fires on the CPU running
the system clock; one must examine the cpu_t argument (arg2) to
determine the CPU that has been deemed to be running in user-
mode.
cpu_ticks_wait Probe that fires when the periodic system clock has determined that
a CPU is otherwise idle, but on which some threads are waiting for
I/O. Note that this probe fires in the context of the system clock and
therefore fires on the CPU running the system clock; one must
examine the cpu_t argument (arg2) to determine the CPU that has
been deemed waiting on I/O.
idlethread Probe that fires when a CPU enters the idle loop.
intrblk Probe that fires when an interrupt thread blocks.

Table 2-3 The sysinfo Probes (Continued)
inv_swtch Probe that fires when a running thread is forced to involuntarily

give up the CPU.
lread Probe that fires when a buffer is logically read from a device.
lwrite Probe that fires when a buffer is logically written to a device.
modload Probe that fires when a kernel module is loaded.
modunload Probe that fires when a kernel module is unloaded.
msg Probe that fires when a msgsnd(2) or msgrcv(2) system call is made,
but before the message queue operations have been performed.
mutex_adenters Probe that fires when an attempt is made to acquire an owned
adaptive lock. If this probe fires, one of the lockstat provider
probes (adaptive-block or adaptive-spin) also fires.
namei Probe that fires when a name lookup is attempted in the file system.
nthreads Probe that fires when a thread is created.
phread Probe that fires when a raw I/O read is about to be performed.
phwrite Probe that fires when a raw I/O write is about to be performed.
procovf Probe that fires when a new process cannot be created because the
system is out of process table entries.
pswitch Probe that fires when a CPU switches from executing one thread to
executing another.
readch Probe that fires after each successful read, but before control is
returned to the thread performing the read. A read can occur
through the read(2), readv(2), or pread(2) system calls. The arg0
argument contains the number of bytes that were successfully read.
rw_rdfails Probe that fires when an attempt is made to read-lock a
readers/writer lock when the lock is either held by a writer, or
desired by a writer. If this probe fires, the lockstat provider's rw-
block probe also fires.
rw_wrfails Probe that fires when an attempt is made to write-lock a
readers/writer lock when the lock is held either by some number of
readers or by another writer. If this probe fires, the lockstat
provider's rw-block probe also fires.
sema Probe that fires when a semop(2) system call is made, but before any
semaphore operations have been performed.
Using DTrace 2-19


sysexec Probe that fires when an exec(2) system call is made.
sysfork Probe that fires when a fork(2) system call is made.
sysread Probe that fires when a read(2), readv(2) or pread(2) system call is
made.
sysvfork Probe that fires when a vfork(2) system call is made.
syswrite Probe that fires when a write(2), writev(2), or pwrite(2) system
call is made.
trap Probe that fires when a processor trap occurs. Note that some
processors (in particular, UltraSPARC variants) handle some
lightweight traps through a mechanism that does not cause this
probe to fire.
ufsdirblk Probe that fires when a directory block is read from the UFS file
system. See the ufs(7FS) man page for details on UFS.
ufsiget Probe that fires when an inode is retrieved. See the ufs(7FS) man
page for details on UFS.
ufsinopage Probe that fires after an in-core inode without any associated data
pages has been made available for reuse. See the ufs(7FS) man
page for details on UFS.
ufsipage Probe that fires after an in-core inode with associated data pages
has been made available for reuse and therefore after the associated
data pages have been flushed to disk. See the ufs(7FS) man page
for details on UFS.
wait_ticks_io Probe that fires when the periodic system clock has determined that
a CPU is otherwise idle, but on which some threads are waiting for
I/O. Note that this probe fires in the context of the system clock and
therefore fires on the CPU running the system clock; one must
examine the cpu_t argument (arg2) to determine the CPU that has
been deemed waiting on I/O. Note that there is no semantic
difference between wait_ticks_io and cpu_ticks_io;
wait_ticks_io exists purely for historical reasons.
writech Probe that fires after each successful write, but before control is
returned to the thread performing the write. A write can occur
through the write(2), writev(2), or pwrite(2) system calls. The
arg0 argument contains the number of bytes that were successfully
written.


xcalls Probe that fires when a cross-call is about to be made. A cross-call is
the operating system's mechanism for one CPU to request
immediate work from another.
Using the quantize Aggregation Function With

the sysinfo Probes
The quantize aggregation function displays a power-of-two frequency
distribution bar graph of its argument. The following example shows how
you can determine the size of reads being performed by all processes over
a 10-second period. The arg0 argument for the sysinfo probes states the
amount to increment the statistic; it is 1 for most sysinfo probes. Two
exceptions are the readch and writech probes, for which the arg0
argument is set to the actual number of bytes read or written respectively.
# cat -n read.d
1 #!/usr/sbin/dtrace -s
2 sysinfo:::readch
3 {
4 @[execname] = quantize(arg0);
5 }
6
7 tick-10sec
8 {
9 exit(0);
10 }
# dtrace -s read.d
dtrace: script 'read.d' matched 5 probes
0 36754 :tick-10sec
bash
value ------------- Distribution ------------- count
0 | 0
1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13
2 | 0
file
-1 | 0
Using DTrace 2-21

0 | 2
1 | 0
2 | 0
4 | 6
8 | 0
16 | 0
32 | 6
64 | 6
128 |@@ 16
256 |@@@@ 30
512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 199
1024 | 0
2048 | 0
4096 | 1
8192 | 1
16384 | 0
grep
-1 | 0
0 |@@@@@@@@@@@@@@@@@@@ 99
1 | 0
2 | 0
4 | 0
8 | 0
16 | 0
32 | 0
64 | 0
128 | 1
256 |@@@@ 25
512 |@@@@ 23
1024 |@@@@ 24
2048 |@@@@ 22
4096 | 4
8192 | 3
16384 | 0
Finding the Source of Cross-Calls

Consider the following output from the mpstat(1M) command:
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 2189 0 1302 14 1 215 12 54 28 0 12995 13 14 0 73
1 3385 0 1137 218 104 195 13 58 33 0 14486 19 15 0 66
2 1918 0 1039 12 1 226 15 49 22 0 13251 13 12 0 75

3 2430 0 1284 220 113 201 10 50 26 0 13926 10 15 0 75
The xcal and syscl columns display relatively high numbers, which
might be affecting the systems performance. Yet the system is relatively
idle, and is not spending time waiting on input/output (I/O). The xcal
numbers are per-second and are read from the xcalls field of the sys
kstat. To see which executables are responsible for the xcalls, enter the
following dtrace(1M) command:
# dtrace -n 'xcalls {@[execname] = count()}'
dtrace: description 'xcalls ' matched 3 probes
^C
find 2
cut 2
snmpd 2
mpstat 22
sendmail 101
grep 123
bash 175
dtrace 435
sched 784
xargs 22308
file 89889
#
This output indicates the source of the cross-calls: some number of

file(1) and xargs(1) processes are inducing the majority of them. You
can find these processes using the pgrep(1) and ptree(1) commands:
# pgrep xargs
15973
# ptree 15973
204 /usr/sbin/inetd -s
5650 in.telnetd
5653 -sh
5657 bash
15970 /bin/sh ./findtxt configuration
15971 cut -f1 -d:
15973 xargs file
16686 file /usr/bin/tbl /usr/bin/troff /usr/bin/ul
/usr/bin/vgrind /usr/bin/catman
Using DTrace 2-23

The xargs and file commands appear to be part of a custom user shell
script. You can locate this script as follows:
# find / -name findtxt
/users1/james/findtxt
# cat /users1/james/findtxt
#!/bin/sh
find / -type f | xargs file | grep text | cut -f1 -d: >/tmp/findtxt$$
cat /tmp/findtxt$$ | xargs grep $1
rm /tmp/findtxt$$
#
The script is running many processes concurrently with much inter-

process communication occurring through pipes. This script appears to be
quite resource intensive: it is trying to find every text file in the system
and is then searching each one for some specific text. You expect these
processes to run concurrently on this systems four processors while they
send data to each other.
Stack Trace xcall Details
You can gather more details on which kernel code is involved in all of the
cross-calls while the file and xargs commands are running. The
following example uses the stack() built-in DTrace function as the
aggregation key to show which kernel code is requesting the cross-call.
The number of unique kernel stack traces is being counted.
# dtrace -n 'xcalls {@[stack()] = count()}'
dtrace: description 'xcalls ' matched 3 probes
^C
SUNW,UltraSPARC-IIIi`send_mondo_set+0x9c
unix`xt_some+0xc4
unix`xt_sync+0x3c
unix`hat_unload_callback+0x6ec
unix`memscrub_scan+0x298
unix`memscrubber+0x308
unix`thread_start+0x4
2
unix`xt_some+0xc4
unix`sfmmu_tlb_demap+0x118
unix`sfmmu_hblk_unload+0x368
unix`hat_unload_callback+0x534
unix`memscrub_scan+0x298
unix`memscrubber+0x308

2
...
unix`xt_some+0xc4
unix`xt_sync+0x3c
unix`hat_unload_callback+0x6ec
genunixànon_private+0x204
genunix`segvn_faultpage+0x778
genunix`segvn_fault+0x920
genunixàs_fault+0x4a0
unix`pagefault+0xac
unix`trap+0xc14
unixùtl0+0x4c
2303
unix`xt_some+0xc4
unix`sfmmu_tlb_range_demap+0x190
unix`sfmmu_chgattr+0x2e8
genunix`segvn_dup+0x3d0
genunixàs_dup+0xd0
genunix`cfork+0x120
unix`syscall_trap32+0xa8
7175
unix`xt_some+0xc4
unix`xt_sync+0x3c
unix`sfmmu_chgattr+0x2f0
genunix`segvn_dup+0x3d0
genunixàs_dup+0xd0
genunix`cfork+0x120
11492
As this output shows, the majority of the cross-calls are the result of a
significant number of fork(2) system calls. (Shell scripts are notorious for
abusing their fork(2) privileges.) Page faults of anonymous memory are
also involved, which probably accounts for the large number of minor
page faults seen in the mpstat output.
Using DTrace 2-25

Examining Performance Problems Using the io Provider

The io provider makes available probes related to disk input and output
(I/O). The io provider is designed to enable quick exploration of behavior
observed through I/O monitoring tools such as iostat(1M). The io
provider describes the nature of the systems I/O by providing data such
as the following:
Device
I/O type
Process ID
Application name
File name
File offset
The io Probes
Table 2-4 describes the io probes.
Table 2-4 The io Probes
start Probe that fires when an I/O request is about to be made to a disk
device or to an NFS server. The buf(9S) structure corresponding to the
I/O request is pointed to by the args[0] argument. The devinfo_t
structure of the device to which the I/O is being issued is pointed to
by the args[1] argument. The fileinfo_t structure of the file that
corresponds to the I/O request is pointed to by the args[2]
argument. Note that file information availability depends on the file
system making the I/O request.
done Probe that fires after an I/O request has been fulfilled. The buf(9S)
structure corresponding to the I/O request is pointed to by the
args[0] argument. The devinto_t structure of the device to which the
I/O was issued is pointed to by the args[1] argument. The
fileinfo_t structure of the file that corresponds to the I/O request is
pointed to by the args[2] argument.

Table 2-4 The io Probes (Continued)

wait-start Probe that fires immediately before a thread begins to wait pending
completion of a given I/O request. The buf(9S) structure
corresponding to the I/O request for which the thread will wait is
pointed to by the args[0] argument. The devinfo_t structure of the
device to which the I/O was issued is pointed to by the args[1]
argument. The fileinto_t structure of the file that corresponds to the
I/O request is pointed to by the args[2] argument. Some time after the
wait-start probe fires, the wait-done probe fires in the same thread.
wait-done Probe that fires immediately after a thread wakes up from waiting for a
pending completion of a given I/O request. The buf(9S) structure
corresponding to the I/O request for which the thread was waiting is
pointed to by the args[0] argument. The devinfo_t structure of the
device to which the I/O was issued is pointed to by the args[1]
argument. The fileinfo_t structure of the file that corresponds to the
I/O request is pointed to by the args[2] argument. Some time after the
wait-start probe fires, the wait-done probe fires in the same thread.
Information Available When io Probes Fire

The io probes fire for all I/O requests to disk devices, and for all file read
and file write requests to an NFS server (except for metadata requests,
such as readdir(3C)).
The io provider uses three I/O structures: the buf(9S) structure, the
devinfo_t structure, and the fileinfo_t structure.
When the io probes fire, the following arguments are made available:
args[0] Set to point to the buf(9S) structure corresponding to the
I/O request.
args[1] Set to point to the devinfo_t structure of the device to
which the I/O was issued.
args[2] Set to point to the fileinfo_t structure containing file
system related information regarding the issued I/O request.
Using DTrace 2-27

The buf(9S) Structure
The buf(9S) structure is the abstraction that describes an I/O request. The
address of this structure is made available to your D programs through
the args[0] argument. Here is its definition:
struct buf {
int b_flags; /* flags */
size t b_bcount; /* number of bytes */
caddr_t b_addr; /* buffer address */
uint64_t b_blkno; /* expanded block # on device */
uint64_t b_lblkno; /* block # on device */
size_t b_resid; /* # of bytes not transferred */
size t b_bufsize; /* size of allocated buffer */
caddr_t b_iodone; /* I/O completion routine */
int b_error; /* expanded error field */
dev_t b_edev; /* extended device */
}
The b_flags member indicates the state of the I/O buffer and consists of
a bitwise OR operator of different state values.
Table 2-5 shows the valid state values for the b_flags field.
Table 2-5 The b_flags Field Values
Flag Value Description
B_DONE Indicates the data transfer has completed.

B_ERROR Indicates an I/O transfer error. It is set in conjunction with
the b_error field.
B_PAGEIO Indicates the buffer is being used in a paged I/O request.
See the description of the b_addr field (Table 2-6) for more
information.
B_PHYS Indicates the buffer is being used for physical (direct) I/O
to a user data area.
B_READ Indicates that data is to be read from the peripheral device
into main memory.
B_WRITE Indicates that the data is to be transferred from main
memory to the peripheral device.

Table 2-6 shows the field descriptions for the buf(9S) structure.
Table 2-6 The buf(9S) Structure Field Descriptions
Field Description
b_bcount Indicates the number of bytes to be transferred as part of

the I/O request.
b_addr Indicates the virtual address of the I/O request, unless
B_PAGEIO is set. The address is a kernel virtual address
unless B_PHYS is set, in which case it is a user virtual
address. If B_PAGEIO is set, the b_addr field contains
kernel private data. Note that either B_PHYS or B_PAGEIO
or neither can be set, but not both.
b_lblkno Identifies which logical block on the device is to be
accessed. The mapping from a logical block to a physical
block (cylinder, track, and so on) is defined by the device.
b_resid Indicates the number of bytes not transferred because of
an error.
b_bufsize Contains the size of the allocated buffer.
b_iodone Identifies a specific routine in the kernel that is called when
the I/O is complete.
b_error Holds an error code returned from the driver in the event
of an I/O error. b_error is set in conjunction with the
B_ERROR bit set in the b_f1ags member.
b_edev Contains the major and minor device numbers of the
device accessed. Consumers can use the D built-in
functions getmajor() and getminor() to extract the major
and minor device numbers from the b_edev field.
Using DTrace 2-29

The devinfo_t Structure
The devinfo_t structure provides information about a device. A pointer

to this structure is available to D programs through the args[1]
argument. Its members are as follows:
typedef struct devinfo {
int dev_major; /* major number */
inc dev_minor; /* minor number */
inc dev_instance; /* instance number */
srring dev_name; /* name of device */
string dev_statname; /* name of device + instance/minor */
string dev_pathname; /* pathname of device */
} devinfo_t;
Table 2-7 shows the field descriptions for the devinfo_t structure.
Table 2-7 The devinfo_t Structure Field Descriptions
Field Description
dev_major Indicates the major number of the device; see

getmajor(9F).
dev_minor Indicates the minor number of the device; see
qetminor(9F).
dev_instance Indicates the instance number of the device. The
instance of a device is different from the minor
number: where the minor number is an abstraction
managed by the device driver, the instance number is
a property of the device node. Device node instance
numbers can be displayed with the prtconf(lM)
command.
dev_name Indicates the name of the device driver that manages
the device. (Device driver names can be viewed with
the -D option to prtconf(1M).)
dev_statname Indicates the name of the device as reported by the
iostat(1M) command. This name also corresponds to
the name of the device as reported by the kstat(1M)
command. This field is provided to enable aberrant
iostat or kstat output to be correlated to actual I/O
activity.
dev_pathname Indicates the complete path of the device.

The fileinfo_t Structure
The fileinfo_t structure provides information about a file. The file to

which an I/O corresponds is pointed to by the args[2] argument in the
start, done, wait-start, and wait-done probes. Note that file
information is contingent upon the file system providing this information
when dispatching I/O requests; some file systems, especially third-party
file systems, do not provide the information. Moreover, I/O requests for
which there is no file information can emanate from the file system. For
example, I/O to file system metadata is not associated with a specific file.
Following is the definition of the fileinfo_t structure:
typedef struct fileinfo {
strinq fi_name; /* name (basename of fi_pathname) */
string fi_dirname /* directory (dirname of fi_pathname) */
string fi_pathname; /* full pathname */
offset_t fi_offset; /* offset within file */
string fi_fs; /* filesystem */
string fi_mount /* mount point of file system */
} fileinfo_t;
Table 2-8 shows the field descriptions for the fileinfo_t structure.
Table 2-8 The fileinfo_t Structure Field Descriptions
Field Description
fi_name Contains the name of the file without any directory

components. If there is no file information associated
with an I/O, the fi_name field is set to the string
<none>. In rare cases, the pathname associated with
a file is unknown; in this case, the fi_name field is set
to the string <unknown>.
fi_dirname Contains only the directory component of the file
name. As with fi_name, this can be set to <none> if
there is no file information present, or to <unknown>
if the pathname associated with the file is not known.
fi_pathname Contains the complete pathname to the file. As with
fi_name, this can be set to <none> if there is no file
information present, or to <unknown> if the
pathname associated with the file is not known.
fi_offset Contains the offset within the file, or -1 if file
information is not present or if the offset is otherwise
unspecified by the file system.
Using DTrace 2-31

Finding I/O Problems

Consider the following output from the iostat(1M) command.
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd0 2.5 168.7 20.0 10937.7 0.0 3.7 21.7 0 75
sd2 106.6 0.0 4319.9 0.0 0.0 0.7 6.5 0 54
sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd0 0.5 168.7 4.0 16162.5 0.0 9.6 56.9 0 72
sd2 80.9 0.0 7570.5 0.0 0.0 1.1 13.2 0 68
sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd0 1.0 166.3 8.1 18973.0 0.0 24.5 146.5 1 88
sd2 43.8 0.0 10949.6 0.0 0.0 0.9 20.4 0 62
sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
fd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd0 1.0 189.5 8.0 11047.6 0.0 2.7 14.4 0 67
sd2 129.5 0.5 2836.3 14.5 0.0 0.7 5.6 0 59
sd15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
^C
This output indicates that a large amount of data is being read from disk
drive sd2 and written to disk drive sd0. Someone appears to be
transferring many megabytes of data between these two drives. Both
disks are consistently over 50% busy. Is someone running a file transfer
command such as tar(1), cpio(1), cp(1), or dd(1M)? The iosnoop.d D
script enables you to determine who is performing this I/O.

The iosnoop.d D Script
The following D script displays data that enables you to determine which
commands are running, what type of I/O those commands are
performing, and which disk devices are involved.
# cat -n iosnoop.d
1 #!/usr/sbin/dtrace -qs
2 BEGIN
3 {
4 printf("%16s %5s %40s %10s %2s %7s\n", "COMMAND", "PID", "FILE",
5 "DEVICE", "RW", "MS");
6 }
7
8 io:::start
9 {
10 start[args[0]->b_edev, args[0]->b_blkno] = timestamp;
11 command[args[0]->b_edev, args[0]->b_blkno] = execname;
12 mypid[args[0]->b_edev, args[0]->b_blkno] = pid;
13 }
14
15 io:::done
16 /start[args[0]->b_edev, args[0]->b_blkno]/
17 {
18 elapsed = timestamp - start[args[0]->b_edev, args[0]->b_blkno];
19 printf("%16s %5d %40s %10s %2s %3d.%03d\n", command[args[0]->b_edev,
20 args[0]->b_blkno], mypid[args[0]->b_edev, args[0]->b_blkno],
21 args[2]->fi_pathname, args[1]->dev_statname,
22 args[0]->b_flags&B_READ? "R": "W", elapsed/1000000,
23 (elapsed/1000)%1000);
24 start[args[0]->b_edev, args[0]->b_blkno] = 0; /* free memory */
25 command[args[0]->b_edev, args[0]->b_blkno] = 0; /* free memory */
26 mypid[args[0]->b_edev, args[0]->b_blkno] = 0; /* free memory */
27 }
You can decipher this D script as follows:

You use the BEGIN probe to print out column headings.
You use an associative array to store the nanosecond timestamp of
when a particular I/O starts from a specific device. You must also
store the executable name and PID of the command issuing the I/O
request; this information is not available at I/O completion time
because you are running in the context of an interrupt handler.
When the I/O is done you determine the elapsed time and then print
out the relevant information.
You retrieve the file undergoing the I/O from the fileinfo_t
structure; the args[2] argument is set up to point to the
fileinfo_t structure when the done probe fires.
Using DTrace 2-33

You retrieve the iostat-compatible device name from the

devinfo_t structure, which is pointed to by the args[1] argument.
You use a D conditional expression to display R or W based on
testing the B_READ bit in the b_flags field of the buf structure,
which is pointed to by the args[0] argument.
You use the D modulo operator (%) to determine the fractional portion
of the time in milliseconds.
Finally, you set the associative array elements to zero. Setting an
associative array element to zero de-allocates the underlying
dynamic memory that was being used. This avoids potential dynamic
variable drops.
The following output results from running the previous iosnoop.d script.
It clearly shows who is performing the I/O operations. Someone is
copying the shared object files from /usr/lib on drive sd2 to a backup
directory on drive sd0.
# ./iosnoop.d
COMMAND PID FILE DEVICE RW MS
bash 725 /usr/bin/bash sd2 R 9.471
bash 725 /usr/lib sd2 R 7.128
bash 725 /lib/libc.so.1 sd2 R 7.696
bash 725 /lib/libnsl.so.1 sd2 R 10.293
bash 768 /lib/libnsl.so.1 sd2 R 0.582
cp 768 /lib/libc.so.1 sd2 R 10.154
cp 768 /usr/lib/0@0.so.1 sd2 R 9.270
cp 768 /usr/lib/0@0.so.1 sd2 R 13.654
cp 768 /mnt/lib.backup/0@0.so.1 sd0 W 2.431
cp 768 /usr/lib/ld.so sd2 R 6.890
cp 768 /mnt/lib.backup/ld.so sd0 W 6.698
cp 768 /mnt/lib.backup/ld.so sd0 W 6.437
cp 768 /mnt/lib.backup/ld.so.1 sd0 W 4.394
cp 768 <unknown> sd2 R 2.206
cp 768 /usr/lib/lib300.so.1 sd2 R 5.771
cp 768 /mnt/lib.backup/lib300.so sd0 W 7.861

cp 768 /mnt/lib.backup/lib300.so.1 sd0 W 6.794

cp 768 /usr/lib/lib300s.so.1 sd2 R 3.326
cp 768 /mnt/lib.backup/lib300s.so sd0 W 2.996
cp 768 /mnt/lib.backup/lib300s.so.1 sd0 W 1.970
...
cp 768 /usr/dt/lib/libXm.so.3 sd2 R 32.020
cp 768 <none> sd0 R 10.184
cp 768 /mnt/lib.backup/libXm.so.1.2 sd0 W 9.777
cp 768 /mnt/lib.backup/libXm.so.3 sd0 W 5.353
...
cp 768 /usr/lib/libgtk-x11-2.0.so.0.100.0 sd2 R 2.374
cp 768 /mnt/lib.backup/libgthread-2.0.so.0 sd0 W 7.732
cp 768 /mnt/lib.backup/libgthread-2.0.so.0.7.0 sd0 W 7.605
cp 768 <none> sd2 R 10.678
cp 768 /mnt/lib.backup/libgtk-x11-2.0.so sd0 W 44.225
cp 768 /mnt/lib.backup/libgtk-x11-2.0.so sd0 W 42.075
^C
Using DTrace 2-35

Obtaining System Call Information

System calls serve as the main interface between user-level applications
and the kernel. You can learn much about the system by knowing the
system calls that are being issued by the set of running applications.
Note System calls are documented in Section 2 of the Solaris 10 OS

manual pages.
Traditionally, system calls of an application were determined using the

truss(1) command. The DTrace syscall provider, however, enables you
to quickly gather more detailed data with which to analyze aberrant
behavior related to system calls. For example, not only can DTrace show
you the system calls being issued by a given application, but it can also
indicate which applications are issuing a given system call. In addition,
you can time (in nanoseconds) how long a particular system call takes,
such as a read(2). These operations cannot be performed with the
truss(1) command.
The syscall Provider

The syscall provider makes available a probe at the entry and return of
every system call in the system. An example of a fully-specified probe
description for the entry probe of the read(2) system call is:
syscall::read:entry
The probe for return from the read(2) system call is:
syscall::read:return
Note that the module name is undefined for the syscall provider probes.

System Call Names
The system call names are usually, but not always, the same as those
documented in Section 2 of the Solaris 10 OS manual pages. The actual
names are listed in the /etc/name_to_sysnum system file. Examples of
system call names that do not match the manual pages are:
rexit for exit(2)
gtime for time(2)
semsy for semctl(2), semget(2), semids(2), and semtimedop(2)
signotify, which has no manual page, and is used for POSIX.4
message queues
Large file system calls such as:
creat64 for creat(2)
lstat64 for lstat(2)
open64 for open(2)
mmap64 for mmap(2)
Arguments for entry and return Probes
For the entry probes, the arguments (arg0, arg1, ... argn) are the
arguments to the system call. For return probes, both arg0 and arg1
contain the same value: return value from the system call. You can check
system call failure in the return probe by referencing the errno D
variable. The following example shows which system calls are failing for
which applications and with what errno value.
# cat errno.d
#!/usr/sbin/dtrace -qs
syscall:::return
/arg0 == -1 && execname != "dtrace"/
{
printf("%-20s %-10s %d\n", execname, probefunc, errno);
}
# ./errno.d
sac read 4
ttymon pause 4
ttymon read 11
nscd lwp_kill 3
in.routed ioctl 12
in.routed ioctl 12
Using DTrace 2-37

tty open 2
tty stat 2
bash setpgrp 13
bash waitsys 10
bash stat64 2
snmpd ioctl 12
^C
The errno.d D program has a predicate that uses the AND operator:
&&. The predicate states that the return from the system call must be -1,
which is how all system calls indicate failure, and that the process
executable name cannot be dtrace. The printf built-in D function uses
the %-20s and %-10s format specifications to left-justify the strings in the
given minimum column width.
D Script Example Using the syscall Provider
The following simple D script counts the number of system calls being
issued system wide.
# cat syscall.d
syscall:::entry
{
@[probefunc] = count();
}
# ./syscall.d
^C
mmap64 1
mkdir 1
umask 1
getloadavg 1
getdents64 2
...
stat 1754
ioctl 1956
close 2708
write 2733
mmap 3006
read 3880
sigaction 7886
brk 12695

The output indicates that the majority of the system calls are setting up
signal handling (sigaction(2)) or growing the heap (brk(2)). The
following D script enables you to discover who is making the brk(2)
system calls.
# cat brk.d
syscall::brk:entry
{
@[execname] = count();
}
# ./brk.d
^C
dtrace 6
prstat 22
nroff 48
cat 48
tbl 142
eqn 144
rm 166
ln 166
col 222
expr 332
head 492
fgrep 492
dirname 581
grep 722
instant 738
sh 917
nawk 984
sgml2roff 1259
nsgmls 13296
# ps -ef | grep nsgmls
root 591 590 2 07:56:32 pts/2 0:00 /usr/lib/sgml/nsgmls -
gl -m/usr/share/lib/sgml/locale/C/dtds/catalog -E0 /usr/s
# man nsgmls
No manual entry for nsgmls.
# man -k sgml
sgml sgml (5) - Standard Generalized Markup Language
solbook sgml (5) - Standard Generalized Markup Language
Apparently some process is working with the Standard Generalized

Markup Language (SGML). Use the ptree command to see who is
creating this process:
# ptree 591
#
Using DTrace 2-39

The ptree command returns no results because the nsgmls process is too
short-lived for the command to be run on it. You have learned, however,
that the problem is not a long-lived process causing a memory leak. Now
write a quick D script to print out the ancestry. You must keep trying the
next previous parent iteratively, because many of the other processes
involved are also short-lived.
Note This particular D script fails if an ancestor does not exist. This is
because the top ancestor, the sched process has no parent. You cannot
harm the kernel even if a D script uses a bad pointer. The intent of this
example is to show how you can quickly create custom D scripts to
answer questions about system behavior. Many of your D scripts will be
throw-away scripts that you will not re-use. You can fix the script by
testing each parent pointer with a predicate before printing. You will see
this fix later with the ancestors3.d D script.
# cat ancestors.d
# cat -n ancestors.d
2 syscall::brk:entry
3 /execname == "nsgmls"/
4 {
5 printf("process: %s\n",
6 curthread->t_procp->p_user.u_psargs);
7 printf("parent: %s\n",
8 curthread->t_procp->p_parent->p_user.u_psargs);
9 printf("grandparent: %s\n",
10 curthread->t_procp->p_parent->p_parent->p_user.u_psargs);
11 printf("greatgrandparent: %s\n",
12 curthread->t_procp->p_parent->p_parent->p_parent->p_user.u_psargs);
13 printf("greatgreatgrandparent: %s\n",
14 curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_user.u_psargs);
15 printf("greatgreatgreatgrandparent: %s\n",
16 curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_parent->p_user.u_psargs);
17 }
# ./ancestors.d
process: /usr/lib/sgml/nsgmls -gl -m/usr/share/lib/sgml/locale/C/dtds/catalog -E0 /usr/s
parent: /usr/lib/sgml/instant -d -c/usr/share/lib/sgml/locale/C/transpec/roff.cmap -s/u
grandparent: /bin/sh /usr/lib/sgml/sgml2roff /usr/share/man/sman4/rt_dptbl.4
greatgrandparent: sh -c cd /usr/share/man; /usr/lib/sgml/sgml2roff
/usr/share/man/sman4/rt_dptbl.
greatgreatgrandparent: catman
greatgreatgreatgrandparent: bash
# ps -ef | grep catman

root 2333 2332 1 08:26:05 pts/1 0:03 catman
root 16984 2880 0 08:41:10 pts/2 0:00 grep catman
# ptree 2333
299 /usr/sbin/inetd -s
2324 in.rlogind

2326 -sh
2332 bash
2333 catman
17232 sh -c cd /usr/share/man; rm -f /usr/share/man/cat4/variables.4;
ln -s ../cat4/e
17235 sh -c cd /usr/share/man; rm -f /usr/share/man/cat4/variables.4;
ln -s ../cat4/e
The previous output indicates that all of the brk(2) system calls resulted
from the catman(1M) command, creating many short-lived children that
issued this system call.
The curthread built-in D variable gives access to the address of the

running kernel thread. Like the C language, the D language accesses
members of a structure with the -> symbol when you have a pointer to
that structure. Through this pointer to the kernel kthread_t structure,
you can access the process name and arguments (kept in the proc_t
structures p_user structure) as well as any parent, grandparent, great-
grandparent, and so on. To do this you follow the parent pointers back.
Refer to the <sys/thread.h>, <sys/proc.h> and <sys/user.h> header
files for details of these fields.
Using DTrace 2-41

Figure 2-1 shows a diagram of the kernel data structures being accessed
by this example.
kthread_t
curthread
t_state
t_pri
proc_t proc_t proc_t
t_lwp
p_exec p_exec
.
t_procp .
p_as p_as
. .
. p_cred p_cred
.
p_parent p_parent p_parent
/usr/include/sys/thread.h
p_tlist p_tlist
.
. . .
. . .
. .
user_t p_user p_user p_user
u_start . .
. .
u_ticks . .
/usr/include/sys/user.h u_psargs[ ] u_psargs[ ] u_psargs[ ]
u_cdir
. .
. . .
. . .
.
/usr/include/sys/proc.h
Figure 2-1 Thread and Process Data Structures
New Approach to Analyzing Transient Failures
As the previous example demonstrates, each result obtained from using

the DTrace facility can lead to further questions, which are answered with
available commands or with new D programs that you can write quickly.
In this way, the DTrace facility significantly shortens the diagnostic loop:
hypothesis->instrumentation->data gathering->analysis->hypothesis
This tightened loop introduces a new paradigm for diagnosing transient

failures. It enables the emphasis to shift from instrumentation to
hypothesis, which is less labor intensive.

D Language Variables
The D language has five basic variable types:
Scalar variables Have fixed-size values such as integers, structures
and pointers
Associative arrays Store values indexed by one or more keys,
similar to aggregations
Thread-local variables Have one name, but storage is local to each
separate kernel thread. These variables are prefixed with the self->
keyword.
Clause-local variables Appear when an action block is entered;
storage is reclaimed after leaving the probe clause. These variables
are prefixed with the this-> keyword.
Kernel external variables DTrace has access to all kernel global and
static variables. These variables are prefixed with a backquote ().
Associative arrays (start, command, and mypid) were used in the

iosnoop.d script. Clause-local variables are similar to automatic or local
variables in the C Language. The elapsed variable in the iosnoop.d
script was a global scalar variable, but could have been made into a
clause-local variable which is slightly more efficient. Clause-local
variables come into existence when an action block (tied to a specific
probe) is entered and their storage is reclaimed when the action block is
left. They help save storage and are quicker to access than associative
arrays.
Note For more information on D variables, refer to the Solaris Dynamic

Tracing Guide, part number 817-6223-10.
You can access kernel global and static variables within your D programs.
To access these external variables, you prefix the global kernel variable
with the (back quote or grave accent) character. For example, to
reference the freemem kernel global variable use: freemem. If the
variable is part of a kernel module that conflicts with other module
variable names, use the character between the module name and the
variable name. For example, sdsd_state references the sd_state
variable within the sd kernel module.
Using DTrace 2-43

Associative Arrays
Associative arrays enable the storing of scalar values in elements of an
array (or table) that are identified by one or more sequences of comma-
separated key fields (an n-tuple). The keys can be any combination of
strings or integers. The following code example shows the use of an
associative array to track how often any command issues more than a
given number of any single system call:
# cat -n assoc2.d
2 syscall:::entry
3 {
4 ++namesys[pid,probefunc];
5 x = namesys[pid,probefunc] > 5000 ? 1 : 0;
6 }
7 syscall:::entry
8 /x && execname != "dtrace"/
9 {
10 printf("Process: %d %s has just made more than 5000 %s
calls\n",
11 pid, execname, probefunc);
12 namesys[pid,probefunc] = 0; /* reset the count */
13 }
# ./assoc2.d
Process: 14837 find has just made more than 5000 lstat64 calls
Process: 14837 find has just made more than 5000 lstat64 calls
Process: 14854 ls has just made more than 5000 lstat64 calls
Process: 14854 ls has just made more than 5000 acl calls
Process: 14854 ls has just made more than 5000 lstat64 calls
^C

The assoc2.d D program uses an associative array indexed by the

unique combination of process ID (PID) and system call name. The ++
operator is incrementing the array element by one each time a process
with that PID is making that system call. The array element, like all
variables (except clause-local variables), is initialized to 0. The second
statement in the action block uses a conditional expression that has three
parts:
expression ? value1 : value2
A conditional expression has the value of value1 when the D

expression is nonzero (true), and has the value of value2 when the
expression is zero (false). Therefore, in the assoc2.d D program, the
global scalar variable x is 1 when that element of the associative array is
greater than 5000, and 0 when it is not greater than 5000. The next action
block is only executed if x is not 0 and the executable name is not
dtrace. After printing the command that made more than 5000 of a
given system call, you reset the array element to 0 to begin counting
again. Note that a comment is used in this D program. Like comments in
the C language, a comment in the D language is text that is enclosed
between /* and */.
Thread-Local Variables
Thread-local variables are useful when you wish to enable a probe and
mark with a tag every thread that fires the probe. Thread-local variables
share a common name but refer to separate data storage associated with
each thread. Thread-local variables are referenced with the special
keyword self followed by the two characters ->, as shown in the
following example:
syscall::read:entry
{
self->read = 1;
}
/self->read/
{
printf("Same thread is returning from read\n");
}
Using DTrace 2-45

Timing a System Call

Thread-local variables enable you to determine the amount of time a
thread spends in any particular system call. The following example times
how long the grep(1) command takes in each read(2) system call. It also
displays the number of bytes read (arg0 is the return value of read).
# cat -n timegrep.d
2 BEGIN
3 {
4 printf("size\ttime\n");
5 }
6 syscall::read:entry
7 /execname == "grep"/
8 {
9 self->start = timestamp;
10 }
11 syscall::read:return
12 /self->start/
13 {
14 printf("%d\t%d\n", arg0, timestamp - self->start);
15 self->start = 0;
16 }
# ./timegrep.d
size time
8192 7108972
319 1526616
0 12112
3293 5663329
0 18816
^C
The first read took 7,108,972 nanoseconds or 7.1 milliseconds, which is

reasonable for an 8-Kbyte disk read. As you might expect, the first read of
0 bytes took only 12 microseconds.
The next example uses an associative array to time every system call
performed by the grep command.
# cat -n timesys.d
2 BEGIN
3 {
4 printf("System Call Times for grep:\n\n");

5 printf("%20s\t%10s\n", "Syscall", "Microseconds");

6 }
7 syscall:::entry
9 {
10 self->name[probefunc] = timestamp;
11 }
12 syscall:::return
13 /self->name[probefunc]/
14 {
15 printf("%20s\t%10d\n", probefunc,
16 (timestamp - self->name[probefunc])/1000);
17 self->name[probefunc] = 0; /* free memory */
18 }
# ./timesys.d
System Call Times for grep:
Syscall Microseconds
mmap 50
resolvepath 47
resolvepath 67
stat 37
open 46
stat 34
open 32
...
brk 25
open64 43
read 8126
brk 20
brk 28
read 24
close 26
^C
Predictably, the system call that took the most time was read, because of
the disk I/O wait time (the second read was of 0 bytes).
Using DTrace 2-47

Following a System Call

You can follow a system call from entry into the kernel through all
subsequent internal kernel function calls and returns back to the original
point of entry of the system call function. You do this by using the
syscall and fbt providers together with a thread-local variable. The
following example traces all of the functions involved in the read(2)
system call as issued by the grep(1) command:
# cat -n follow.d
2 syscall::read:entry
4 {
5 self->start = 1;
6 }
7
8 syscall::read:return
9 /self->start/
10 {
11 exit(0);
12 }
13
14 fbt:::
15 /self->start/
16 {
17 }
The fbt provider probe clause has an empty action. The default action for
DTrace tracks every time you enter and return from all kernel functions
involved in a read(2) system call until it terminates. Option -F of the
dtrace(1M) command indents the output of each nested function call and
shows this with the -> symbol; it un-indents the output when that
function returns back up the call tree and shows this with the <- symbol.
# dtrace -F -s follow.d
dtrace: script './follow.d' matched 38108 probes
CPU FUNCTION
0 -> read32
0 <- read32
0 -> read
0 -> getf
0 -> set_active_fd
0 <- set_active_fd
0 <- getf
...

0 <- ufs_rwlock
0 -> fop_read
0 <- fop_read
0 -> ufs_read
0 -> ufs_lockfs_begin
...
0 -> rdip
0 -> rw_write_held
0 <- rw_write_held
0 -> segmap_getmapflt
0 -> get_free_smp
0 -> grab_smp
0 -> segmap_hashout
...
0 <- sfmmu_kpme_lookup
0 -> sfmmu_kpme_sub
...
0 <- page_unlock
0 <- grab_smp
0 -> segmap_pagefree
0 -> page_lookup_nowait
0 -> page_trylock
...
0 <- segmap_hashin
0 -> segkpm_create_va
0 <- segkpm_create_va
0 -> fop_getpage
0 -> ufs_getpage
0 -> ufs_lockfs_begin_getpage
0 -> tsd_get
...
0 <- page_exists
0 -> page_lookup
0 <- page_lookup
0 -> page_lookup_create
0 <- page_lookup_create
0 -> ufs_getpage_miss
0 -> bmap_read
0 -> findextent
0 <- findextent
0 <- bmap_read
0 -> pvn_read_kluster
0 -> page_create_va
0 -> lgrp_mem_hand
...
0 <- page_add
Using DTrace 2-49

0 <- page_create_va
0 <- pvn_read_kluster
0 -> pagezero
0 -> ppmapin
0 -> sfmmu_get_ppvcolor
0 <- sfmmu_get_ppvcolor
0 -> hat_memload
0 -> sfmmu_memtte
0 <- sfmmu_memtte
...
0 -> xt_some
0 <- xt_some
0 <- xt_sync
...
0 <- sema_init
0 <- pageio_setup
0 -> lufs_read_strategy
0 -> logmap_list_get
0 <- logmap_list_get
0 -> bdev_strategy
0 -> bdev_strategy_tnf_probe
0 <- bdev_strategy_tnf_probe
0 <- bdev_strategy
0 -> sdstrategy
0 -> getminor
...
0 <- drv_usectohz
0 -> timeout
0 <- timeout
0 -> timeout_common
...
0 <- getminor
0 -> scsi_transport
0 <- scsi_transport
0 -> glm_scsi_start
0 -> ddi_get_devstate
...
0 <- ddi_get_soft_state
0 -> pci_pbm_dma_sync
0 <- pci_pbm_dma_sync
0 <- pci_dma_sync
0 <- glm_start_cmd
0 <- glm_accept_pkt
0 <- glm_scsi_start
0 <- sd_start_cmds
0 <- sd_core_iostart

0 <- xbuf_iostart
0 <- lufs_read_strategy
0 -> biowait
0 -> sema_p
0 -> disp_lock_enter
0 <- disp_lock_enter
0 -> thread_lock_high
0 <- thread_lock_high
0 -> ts_sleep
0 <- ts_sleep
0 -> disp_lock_exit_high
0 <- disp_lock_exit_high
0 -> disp_lock_exit_nopreempt
0 <- disp_lock_exit_nopreempt
0 -> swtch
0 -> disp
0 <- disp_lock_enter
0 -> disp_lock_exit
0 <- disp_lock_exit
0 -> disp_getwork
0 <- disp_getwork
0 <- disp
0 <- swtch
0 -> resume
0 <- resume
,,,
0 <- hat_page_getattr
0 <- segmap_getmapflt
0 -> uiomove
0 -> xcopyout
0 <- xcopyout
0 <- uiomove
0 -> segmap_release
0 -> get_smap_kpm
...
0 <- ufs_imark
0 <- ufs_itimes_nolock
0 <- rdip
...
0 <- cv_broadcast
0 <- releasef
0 <- read
0 -> read
Using DTrace 2-51

Although more than half of the functions were removed from the
previous output, the example shows that a great many functions are
required to perform a disk file read. Some of the key functions are
described below:
read read(2) system call entered
ufs_read UFS file being read
segmap_getmapflt Find segmap page for the I/O
segmap_pagefree Free underlying previous physical page tied to
this segmap virtual page onto the cachelist (this policy replaced the
old priority paging)
ufs_getpage Ask UFS to retrieve the page
page_lookup First check to see if the page is in memory (it is not)
page_create_va Get new physical page for the I/O
hat_memload Map the virtual page to the physical page
xt_some Issue cross-trap call to some CPUs
sdstrategy Issue Small Computer System Interface (SCSI)
command to read page from disk into segmap page
timeout Prepare for SCSI timeout of disk read request
glm_scsi_start In glm host bus adapter driver
biowait Wait for block I/O
sema_p Use semaphore to wait for I/O
ts_sleep Put timesharing (TS) thread on sleep queue
swtch Do a context switch (have thread give up the CPU while it
waits for the I/O)
disp_getwork Find another thread to run while this thread waits
for its I/O
resume I/O has completed and CPU is returned to resume
running
uimove Move data from kernel buffer (page) to user-land buffer
segmap_release Release segmap page for use by another I/O
later
read Read operation ends

Creating D Scripts That Use Arguments

As with shell and other interpretive programming language commands
such as the perl(1) command, you can use the dtrace(1M) command to
create executable interpreter files. The file must start with the following
line and must have execute permission:
You can specify other options to the dtrace(1M) command on this line; be
sure, however, to use only one dash (-) followed by the options, with s
being last:
#!/usr/sbin/dtrace -qvs
You can also specify all options to the dtrace(1M) command by using
#pragma lines inside the D script:
# cat -n mem2.d
2
3 #pragma D option quiet
4 #pragma D option verbose
5
6 vminfo:::
7 {
8 @[execname,probename] = count();
9 }
10
11 END
12 {
13 printa("%-20s %-15s %@d\n", @);
14 }
Note For the list of option names used in #pragma lines, see the Solaris
Dynamic Tracing Guide, part number 817-6223-10.
Using DTrace 2-53

Built-in Macro Variables

The D compiler defines a set of built-in macro variables that you can refer
to inside a D script. These macro variables include:
$pid Process ID of dtrace interpreter running script
$ppid Parent process ID of dtrace interpreter running script
$uid Real user ID of user running script
$gid Real group ID of user running script
$0 Name of script
$1, $2, $3, and so on First, second, third command-line arguments
passed to script
$$1, $$2, $$3, and so on - First, second, third command-line
arguments converted to double quoted (" ") strings
The complete list of D macro variables is given in Appendix B. The

following D script uses some of these D macro variables:
# cat -n params.d
3
4 tick-2sec
5 /$1 == $11 && $$3 == "fubar"/
6 {
7 printf("name of script: %s\n", $0);
8 printf("pid of script: %d\n", $pid);
9 printf("9th arg passed to script: %s\n", $$9);
10 exit(0);
11 }
# ./params.d 1 2 fubar 4 5 6 7 8 9 10 1
name of script: ./params.d
pid of script: 5363
9th arg passed to script: 9
# ./params.d 1 2 3 4 5 6 7 8 9 10 11
^C

The last invocation of the script did not output anything because the
value of the first argument did not match the value of the eleventh
argument. The following invocations show that the type and number of
arguments must match those referenced inside the D script. This is an
example of the error-checking capability of the DTrace facility:
# ./params.d 1 2 3 4 5 6 7 8 9
dtrace: failed to compile script ./params.d: line 5: macro argument $11
is not defined
# ./params.d 1 2 3 4 5 6 7 8 9 10 11 12 13
dtrace: failed to compile script ./params.d: line 12: extraneous argument
'13' ($13 is not referenced)
# ./params.d a b c d e f g h i j k
dtrace: failed to compile script ./params.d: line 5: failed to resolve a:
Unknown variable name
The defaultargs option to the dtrace(1M) command allows you to

default the values of $1, $2, and so on to zero if the user does not type
any arguments when invoking the dtrace(1M) command. The $$1, $$2,
and so on references become NULL strings when the user does not type
any arguments. Options can be specified on the dtrace(1M) command
line as an argument to the -x option. The following examples show these
features:
# cat -n args.d
2 BEGIN
3 {
4 x = 5;
5 }
6
7 tick-2sec
8 {
9 x = x + $1;
10 name = $$2
11 }
12
13 tick-11sec
14 {
15 printf("x: %d\n", x);
16 printf("name: %s\n", name);
17 exit(0);
18 }
# ./args.d 2 foo
x: 15
name: foo
Using DTrace 2-55

# ./args.d
dtrace: failed to compile script args.d: line 10: macro argument $1 is
not defined
# dtrace -x defaultargs -qs args.d
x: 5
name:
# dtrace -x defaultargs -qs args.d 2 3 4
dtrace: failed to compile script args.d: line 20: extraneous argument '4'
($3 is not referenced)
PID Argument Example

The following example passes the PID of a running vi process to the
syscalls2.d D script. You use the pgrep command to determine the PID
of the vi process. The D script terminates when the vi command exits.
# cat -n syscalls2.d
2
3 syscall:::entry
4 /pid == $1/
5 {
6 @[probefunc] = count();
7 }
8 syscall::rexit:entry
9 {
10 exit(0);
11 }
# pgrep vi
2208
# ./syscalls2.d 2208
rexit 1
setpgrp 1
creat 1
getpid 1
open 1
lstat64 1
stat64 1
fdsync 1
unlink 2
close 2
alarm 2

lseek 3
sigaction 5
ioctl 45
read 143
write 178
Executable Name Argument Example

In the following example the ancestors.d D script is modified to make it
more general. Remember that this script was created because the
processes involved were too short-lived for a ptree command to be
executed on them. The modified script can retrieve the ancestry back to
the great-great-great-grandparent of any process you catch making any
specified system call. The $$1 references the first command line argument
as a quoted string.
# cat -n ancestors2.d
2 syscall::$2:entry
3 /execname == $$1/
4 {
5 printf("process: %s\n", curthread->t_procp->p_user.u_psargs);
6 printf("parent: %s\n", curthread->t_procp->p_parent->p_user.u_psargs);
7 printf("grandparent: %s\n",
8 curthread->t_procp->p_parent->p_parent->p_user.u_psargs);
9 printf("greatgrandparent: %s\n",
10 curthread->t_procp->p_parent->p_parent->p_parent->p_user.u_psargs);
11 printf("greatgreatgrandparent: %s\n",
12 curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_user.u_psargs);
13 printf("greatgreatgreatgrandparent: %s\n",
14 curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_parent->p_user.u_psargs);
15 exit(0);
16 }
# ./ancestors2.d nsgmls brk

process: /usr/lib/sgml/nsgmls -gl -
m/usr/share/lib/sgml/locale/C/dtds/catalog -E0 /usr/s
parent: /bin/sh /usr/lib/sgml/sgml2roff /usr/share/man/sman2/fork.2
grandparent: /bin/sh /usr/lib/sgml/sgml2roff /usr/share/man/sman2/fork.2
greatgrandparent: sh -c cd /usr/share/man; /usr/lib/sgml/sgml2roff
/usr/share/man/sman2/fork.2
greatgreatgrandparent: catman
greatgreatgreatgrandparent: bash
You can run the same script with a different process name and system
call, which shows the power of being able to pass in arguments to a D
script:
# ./ancestors2.d vi sigaction
process: vi /etc/system
parent: bash
Using DTrace 2-57

grandparent: -sh
greatgrandparent: /usr/sbin/in.telnetd
greatgreatgrandparent: /usr/lib/inet/inetd start
greatgreatgreatgrandparent: /sbin/init
The ancestors3.d D script fixes the problem with trying to print

nonexistent ancestry:
# ./ancestors2.d cron read
dtrace: error on enabled probe ID 1 (ID 10: syscall::read:entry): invalid
address (0x0) in action #4
# cat -n ancestors3.d
2
3 syscall::$2:entry
4 /execname == $$1/
5 {
6 printf("process: %s\n", curthread->t_procp->p_user.u_psargs);
7 nextpaddr = curthread->t_procp->p_parent;
8 }
9
10 syscall::$2:entry
11 /(execname == $$1) && nextpaddr/
12 {
13 printf("parent: %s\n", nextpaddr->p_user.u_psargs);
14 nextpaddr = curthread->t_procp->p_parent->p_parent;
15 }
16
19 {
20 printf("grandparent: %s\n", nextpaddr->p_user.u_psargs);
21 nextpaddr = curthread->t_procp->p_parent->p_parent->p_parent;
22 }
23
26 {
27 printf("greatgrandparent: %s\n", nextpaddr->p_user.u_psargs);
28 nextpaddr = curthread->t_procp->p_parent->p_parent->p_parent->p_parent;
29 }
30
33 {
34 printf("greatgreatgrandparent: %s\n", nextpaddr->p_user.u_psargs);
35 nextpaddr = curthread->t_procp->p_parent->p_parent->p_parent->p_parent->p_parent;
36 }
37

40 {
41 printf("greatgreatgreatgrandparent: %s\n", nextpaddr->p_user.u_psargs);
42 exit(0);
43 }
# ./ancestors3.d cron read

process: /usr/sbin/cron
parent: /sbin/init
grandparent: sched
parent: /sbin/init
grandparent: sched
parent: /sbin/init
grandparent: sched
parent: /sbin/init
grandparent: sched
^C
Using DTrace 2-59

Custom Monitoring Tools

The intended use of the vminfo, sysinfo, and io providers is to further
investigate potential problems shown by the output of the existing Solaris
monitoring tools such as vmstat(1M), sar(1), mpstat(1M), and
iostat(1M). The following two examples show that you can also use
these providers to create custom versions of the existing monitoring tools.
It also shows the arithmetic capabilities of the D Language.
Example of a Custom Tool Resembling the sar -c Command
The following D script uses the sysinfo provider to implement a tool

similar to the sar -c command.
# cat -n sar-c.d
2 /*
3 * Usage: ./sar-c.d interval count
4 */
5
6 BEGIN
7 {
8 printf("%10s %10s %10s %10s %10s %10s %10s\n", "scall/s",
9 "sread/s", "swrit/s", "fork/s", "exec/s", "rchar/s", "wchar/s");
10 rchar = 0;
11 wchar = 0;
12 }
13
14 syscall:::entry
15 {
16 ++scall;
17 }
18
19 sysinfo:::sysread
20 {
21 ++sread;
22 }
23
24 sysinfo:::syswrite
25 {
26 ++swrit;
27 }
28
29 sysinfo:::sysfork
30 {
31 ++fork;
32 }
33
34 sysinfo:::sysexec
35 {

36 ++exec;
37 }
38
39 sysinfo:::readch
40 {
41 rchar = rchar + arg0;
42 }
43
44 sysinfo:::writech
45 {
46 wchar = wchar + arg0;
47 }
48
49 tick-1sec
50 {
51 ++i;
52 }
53
54 tick-1sec
55 /i == $1/
56 {
57 ++n;
58 printf("%10d %10d %10d %10d %10d %10d %10d\n", scall/i,
59 sread/i, swrit/i, fork/i, exec/i, rchar/i, wchar/i);
60 i = 0;
61 scall = 0;
62 sread = 0;
63 swrit = 0;
64 fork = 0;
65 exec = 0;
66 rchar = 0;
67 wchar = 0;
68 }
69
70 tick-1sec
71 /n == $2/
72 {
73 exit(0);
74 }
# ./sar-c.d 5 6
scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
43 0 0 0 0 0 15
70 1 2 0 0 1 32
42 2 2 0 0 2 17
75 0 1 0 0 351 39
436 26 34 3 3 3329 317
38 0 0 0 0 0 15
Using DTrace 2-61

Example of a Custom Tool Resembling the vmstat(1M)

Command
The following D script uses the vminfo provider to implement a tool

similar to the vmstat(1M) command. It displays three fields from the
vmstat(1M) command:
free field Displays the systems average value of freemem in
kilobytes
re field Displays the average page reclaims per second
sr field Displays the average page scans per second performed by
the page daemon

# cat -n vm.d
2 /*
3 * Usage: vmd.d interval count
4 */
5
6 BEGIN
7 {
8 printf("%8s %8s %8s\n", "free", "re", "sr");
9 }
10
11 tick-1sec
12 {
13 ++i;
14 @free["freemem"] = sum(8*`freemem);
15 }
16
17 vminfo:::pgrec
18 {
19 ++re;
20 }
21
22 vminfo:::scan
23 {
24 ++sr;
25 }
26
27 tick-1sec
28 /i == $1/
29 {
30 normalize(@free, $1);
31 printa("%8@d ", @free);
32 printf("%8d %8d\n", re/i, sr/i);
33 ++n;
34 i = 0;
35 re = 0;
36 sr = 0;
37 clear(@free);
38 }
39
40 tick-1sec
41 /n == $2/
42 {
43 exit(0);
44 }
Using DTrace 2-63

# ./vm.d 5 12
free re sr
385296 0 0
385296 0 0
385296 0 0
385296 0 0
316180 2 0
22297 1 19040
1976 2 31727
1964 3 31727
1971 2 31727
1968 3 31727
1964 3 31727
1955 4 31728
Like the vmstat(1M) command, the vm.d script expects two arguments:
the interval value and a count value. The i, re, sr, and n variables are D
global scalar variables used for counting. Note the special reference to the
kernels freemem variable: freemem. The script multiplies freemem by 8
because it sums in units of kilobytes, not pages, and the assumption is
that a page is 8 Kbytes in size. The script uses the sum() aggregation with
the normalize() built-in function which divides the current sum by the
interval value to get per second averages. The script also clears the
running sum of freemem every interval with the clear() built-in
function. The printa() built-in function, which is covered in detail in
Appendix A, prints the value of the sum() aggregation.
Because you are using integer-truncated arithmetic, you can lose some
data. This is also true when using the vmstat(1M) command. For
example, if there are only four page reclaims in the five-second interval,
then the average per second shows as 0. This output shows that the
system is experiencing sustained scanning of memory by the page
daemon, as indicated by the consistently high number of scans per
second. It also shows that someone has used most of the free memory
within a short period of time, which explains the high scan rates.

Module 3
Debugging Applications With DTrace
Objectives
Use DTrace to profile an application
Use DTrace to access application variables
Use Dtrace to find transient system call errors in an application
Use DTrace to determine the names of files being opened
3-1
Relevance
Relevance

to use DTrace for application debugging:
!
?
Would it be useful to follow the software stack sequentially from the
application into the kernel?
Would it be useful to display path names being passed to system
calls while an application is running?
Would it be useful to know where an application is spending the
majority of its time?


817-6223-10.
Debugging Applications With DTrace 3-3

Application Profiling
DTrace provides tools for understanding the behavior of user processes. It
can help you to:
Debug applications
Analyze application performance problems
Understand the behavior of a complex application
These tools can be used alone to determine the cause of problems with
application program behavior, or as an adjunct to traditional debugging
tools such as the mdb(1) debugger.
This module describes the DTrace facilities used to trace user process
activity. It also provides examples of how to use those facilities.
The pid Provider

The pid provider can trace the entry and return of any function in a user
application. It can also trace any instruction of the running application as
specified by its virtual address, which can be given numerically or as a
function name plus offset. The pid provider has no probe effect overhead
when probes are not enabled.
The pid provider defines a class of providers; any process can have its
own associated pid provider. You trace a process with process
identification number (PID) 1234, for example, by using the pid1234
provider.
Unlike most other providers, the pid provider creates probes on demand
based on the probe descriptions found in your D programs. As a result,
you do not see any pid probes listed in the output of the dtrace -l
command until you have enabled them. This is shown in the following
example:
# dtrace -l | awk '{print $2}' | sort -u
PROVIDER
dtrace
fasttrap
fbt
fpuinfo
io
lockstat
mib

proc
profile
sched
sdt
syscall
sysinfo
vminfo
#
Enabling pid Probes
In the following example, you enable all of the function entry probes for
the shell:
# echo $$
8586
# dtrace -n 'pid8586:::entry'
dtrace: description 'pid8586:::entry' matched 6653 probes
^C
# dtrace -l | awk '{print $2}' | sort -u

PROVIDER
dtrace
fasttrap
fbt
fpuinfo
io
lockstat
mib
pid8586
proc
profile
sched
sdt
syscall
sysinfo
vminfo

Naming pid Probes
The module portion of the probe description refers to an object loaded in

the address space of the corresponding process. You can list the objects
using the mdb(1) debugger, as shown in the following example:
# mdb -p 8586
Loading modules: [ ld.so.1 libc.so.1 ]
> ::objects
BASE LIMIT SIZE NAME
10000 a4000 94000 /usr/bin/bash
ff3b0000 ff3da000 2a000 /lib/ld.so.1
ff350000 ff37a000 2a000 /lib/libcurses.so.1
ff320000 ff32c000 c000 /lib/libsocket.so.1
ff200000 ff290000 90000 /lib/libnsl.so.1
ff3a0000 ff3a2000 2000 /lib/libdl.so.1
ff100000 ff1d2000 d2000 /lib/libc.so.1
ff2d0000 ff2d4000 4000 /usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-
1.so.3
> $q
You name the object using only the file name portion, not the complete
path name. You can also omit the suffixes. The following names describe
the same probe:
pid8586:libc.so.1:strcmp:entry
pid8586:libc.so:strcmp:entry
pid8586:libc:strcmp:entry
For the executable load object, use either the file name of the executable or
a.out. The following two probe descriptions name the same probe:
pid8586:bash:main:return
pid8586:a.out:main:return
Tracing Library Functions
The following example shows that executing a simple date(1) command

in the bash shell results in 14 strcmp function calls:
# ps -ef | grep bash
root 8567 8561 0 07:36:26 pts/1 0:00 bash
root 8577 8571 0 07:37:03 pts/2 0:00 bash
root 8586 8580 0 07:37:31 pts/3 0:01 bash
root 8888 8577 0 14:14:25 pts/2 0:00 grep bash

# echo $$
8577
# dtrace -n 'pid8567:libc:strcmp:entry'
dtrace: description 'pid8567:libc:strcmp:entry' matched 1 probe
0 45136 strcmp:entry
Tracing User Functions
The simplest mode of operation for the pid provider is as the user-level
analogue to the fbt provider. The following example traces all function
entries and returns made from a given function. The tracecalls.d D
script takes two command-line arguments: $1 for the PID of the process
being traced, and $2 for the function name from which you want to trace
all function calls. The simple C program that the script is going to trace is
shown below. This C program calls one function after another, performing
simple arithmetic operations:
# cat -n calls.c
1 int f5(int a, int b)
2 {
3 return (a+b);
4 }
5
7 {
8 int r;
9
10 r = f5(a,b)+13;
11 return(r);
12 }
13
14 int f3(int a)

15 {
16 int r;
17
18 usleep(650);
19 r = f4(a-3, a+3);
20 return(r);
21 }
22
23 int f2(int a)
24 {
25 return(f3(5*a));
26 }
27
29 {
30 int r;
31
32 usleep(90);
33 r = f2(a-b);
34 return(r);
35 }
36
37 main()
38 {
39 int x;
40
41 x = f1(13,6);
42 printf("%d\n", x);
43 x = f1(17,5);
45 }
# gcc calls.c -o calls
# calls
83
133
# cat -n tracecalls.d
2
3 pid$1:calls:$2:entry
4 {
5 self->trace = 1;
6 }
7
8 pid$1:calls:$2:return
9 /self->trace/
10 {

11 self->trace = 0;
12 }
13
14 pid$1:calls::entry,
15 pid$1:calls::return
16 /self->trace/
17 {
18 }
You start the calls application in a second window through the mdb(1)
debugger. This enables you to stop it as soon as possible in the start-up
function that calls the main() function. The _start:b command sets a
breakpoint in the _start function where the application starts running.
The :r command starts the process running; it immediately hits the
breakpoint and stops. You then escape from the debugger by using the
!ps command to find the PID of the calls process:
# mdb calls
> _start:b
> :r
mdb: stop at _start
mdb: target stopped at:
_start: clr %fp
> !ps
PID TTY TIME CMD
8916 pts/3 0:00 ps
8914 pts/3 0:00 calls
8586 pts/3 0:01 bash
8915 pts/3 0:00 sh
8580 pts/3 0:00 sh
8913 pts/3 0:00 mdb
You can now run the dtrace command in the first terminal window to
trace the function calls, starting with the f1 function. You must also
continue the process with the :c mdb command after starting the dtrace
command:
# dtrace -F -s tracecalls.d 8914 f1
dtrace: script 'tracecalls.d' matched 16 probes
In the second terminal window you continue the process:

> :c
83
133
mdb: target has terminated
> $q

The call sequence is shown in the first, dtrace terminal window:

CPU FUNCTION
0 -> f1
0 -> f2
0 -> f3
0 -> f4
0 -> f5
0 <- f5
0 <- f4
0 <- f3
0 <- f2
0 -> f1
0 -> f2
0 -> f3
0 -> f4
0 -> f5
0 <- f5
0 <- f4
0 <- f3
0 <- f2
^C
Tracing Function Arguments
By adding a line to the tracecalls.d script, you can print the arguments
to the functions as well as return value information. Arguments to
functions are represented with arg0, arg1, arg2, and so on. The function
return value is placed in the arg1 argument, with the arg0 argument
containing the offset within the function where the return occurred. The
following D script example prints the arguments to functions:
# cat -n tracecalls2.d
2
3 pid$1:calls:$2:entry
4 {
5 self->trace = 1;
6 }
7
8 pid$1:calls:$2:return
9 /self->trace/
10 {
11 self->trace = 0;
12 }
13

14 pid$1:calls::entry,
15 pid$1:calls::return
16 /self->trace/
17 {
18 printf("%d %d", arg0, arg1);
19 }
# dtrace -F -s tracecalls2.d 8944 f1

dtrace: script 'tracecalls2.d' matched 16 probes
CPU FUNCTION
0 -> f1 13 6
0 -> f2 7 7
0 -> f3 35 35
0 -> f4 32 38
0 -> f5 32 38
0 <- f5 40 70
0 <- f4 56 83
0 <- f3 68 83
0 <- f2 52 83
0 -> f1 17 5
0 -> f2 12 12
0 -> f3 60 60
0 -> f4 57 63
0 -> f5 57 63
0 <- f5 40 120
0 <- f4 56 133
0 <- f3 68 133
0 <- f2 52 133
^C
The following commands are entered in the mdb(1) window which started
the calls program. On return from a function, the arg0 argument is the
offset within the function where the restore instruction executed to leave
the function, and the arg1 argument is the return value, as follows:
> f5+0t40/i
f5+0x28:
f5+0x28: restore
> f5+0x24,2/i
f5+0x24:
f5+0x24: ret
f5+0x28: restore
> f2+0t48,2/i
f2+0x30:
f2+0x30: ret
f2+0x34: restore

>
The f5+0t40 address represents 40 decimal bytes into the f5 function,

which the trace output shows was placed in the arg0 argument when the
f5 function returned. For arg1, the return value from the f5 function on
the first return was 70; on the second return it was 120. The f5+0x24,2/i
command in the mdb(1) debugger displays two instructions starting at
address f5+0x24. Functions typically return by using these two SPARC
instructions. All SPARC instructions are four bytes in length. At address
f2+0x34 is another restore instruction.
Tracing Calls Into the Kernel
In the following example you trace a simpler version of the calls

program into the kernel:
# cat -n calls2.c
2 {
3 return (a+b);
4 }
5
7 {
8 int r;
9
10 r = f5(a,b)+13;
11 return(r);
12 }
13
14 int f3(int a)
15 {
16 int r;
17
18 r = f4(a-3, a+3);
19 return(r);
20 }
21
22 int f2(int a)
23 {
24 return(f3(5*a));
25 }
26
28 {
29 int r;

30
31 r = f2(a-b);
32 return(r);
33 }
34
35 main()
36 {
37 int x;
38
39 x = f1(13,6);
41 }
# cat -n traceall.d
2 #pragma D option flowindent
3
4 pid$1::$2:entry
5 {
6 self->trace = 1;
7 }
8
9 pid$1:::entry, pid$1:::return, fbt:::
10 /self->trace/
11 {
12 printf("%s\n", curlwpsinfo->pr_syscall ? "K"
: "U");
13 }
14
15 pid$1::$2:return
16 /self->trace/
17 {
18 self->trace = 0;
19 }
The traceall.d D script uses a #pragma statement to set the equivalent

-F option of the dtrace(1M) command to indent the function calls. The
pr_syscall field of the lwp information data structure to which the
curlwpsinfo built-in variable points is 0 when not in the kernel
otherwise it is the system call number when the thread is in the kernel.
You use this to indicate whether you are tracing user code or kernel code.
The traced calls follow. Many of the function calls are for setting up the
dynamic binding to the library functions on first call. The following
example shows a portion of the output of this script:
# traceall.d 12861 main

CPU FUNCTION
0 -> main U
0 -> f1 U
0 -> f2 U
0 -> f3 U
0 -> f4 U
0 -> f5 U
0 <- f5 U
0 <- f4 U
0 <- f3 U
0 <- f2 U
0 <- f1 U
0 -> elf_rtbndr U
0 -> elf_bndr U
0 -> enter U
0 -> rt_bind_guard U
0 <- rt_bind_guard U
0 -> _ti_bind_guard U
0 <- _ti_bind_guard U
0 -> rt_mutex_lock U
0 <- rt_mutex_lock U
0 -> _lwp_mutex_lock U
0 <- _lwp_mutex_lock U
0 <- enter U
0 -> lookup_sym U
0 -> elf_hash U
0 <- elf_hash U
0 -> callable U
0 <- callable U
0 -> elf_find_sym U
0 -> strcmp U
...
0 <- elf_bndr U
0 <- elf_rtbndr U
0 -> printf U
0 -> _flockget U
0 -> mutex_lock U
0 <- mutex_lock U
0 -> mutex_lock_impl U
0 <- mutex_lock_impl U
0 <- _flockget U
0 -> _setorientation U
0 <- _setorientation U
0 -> _ndoprnt U
0 -> elf_rtbndr U
0 -> elf_bndr U

0 -> enter U
0 -> rt_bind_guard U
...
0 -> _write U
0 -> pre_syscall K
0 -> syscall_mstate K
0 <- syscall_mstate K
0 <- pre_syscall K
0 -> write32 K
0 <- write32 K
0 -> write K
0 -> getf K
0 -> set_active_fd K
...
0 <- clear_active_fd K
0 -> cv_broadcast K
0 <- cv_broadcast K
0 <- releasef K
0 <- write K
0 -> post_syscall K
0 -> clear_stale_fd U
0 <- clear_stale_fd U
0 -> syscall_mstate U
0 <- syscall_mstate U
0 <- post_syscall U
0 <- _xflsbuf U
0 -> ferror_unlocked U
0 <- ferror_unlocked U
0 <- _ndoprnt U
0 -> ferror_unlocked U
0 <- ferror_unlocked U
0 -> mutex_unlock U
0 <- mutex_unlock U
0 <- printf U
0 <- main U
^C

Tracing Arbitrary Instructions
You can use the pid provider to trace any instruction in any user function.
Upon demand, the pid provider creates a probe for every instruction in a
function. The name of each probe is the offset in hexadecimal of the
corresponding instruction in the function. The following example traces
the instruction 10 (hexadecimal) bytes into the strcmp function while the
bash shell runs the date(1) command:
# dtrace -n 'pid28845:libc:strcmp:10'
dtrace: description 'pid28845:libc:strcmp:10' matched 1 probe
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
0 39492 strcmp:10
^C
You see this instruction near the beginning of the strcmp C library
function, where it is called 14 times when the bash shell runs the date(1)
command. You can see which instructions within the strcmp C library
function are executed by tracing all of the functions instructions, as
follows:
# dtrace -n 'pid28845:libc:strcmp:'
dtrace: description 'pid28845:libc:strcmp:' matched 128 probes
0 39495 strcmp:0
0 39496 strcmp:4
0 39497 strcmp:8
0 39498 strcmp:c
0 39492 strcmp:10
0 39499 strcmp:14
0 39500 strcmp:18
0 39511 strcmp:44
0 39512 strcmp:48

0 39513 strcmp:4c
0 39582 strcmp:160
0 39583 strcmp:164
0 39584 strcmp:168
0 39585 strcmp:16c
0 39586 strcmp:170
0 39587 strcmp:174
0 39588 strcmp:178
0 39589 strcmp:17c
0 39597 strcmp:19c
0 39598 strcmp:1a0
0 39599 strcmp:1a4
0 39600 strcmp:1a8
0 39601 strcmp:1ac
0 39602 strcmp:1b0
0 39603 strcmp:1b4
0 39604 strcmp:1b8
0 39605 strcmp:1bc
0 39606 strcmp:1c0
0 39607 strcmp:1c4
0 39618 strcmp:1f0
0 39619 strcmp:1f4
0 39493 strcmp:return
0 39495 strcmp:0
0 39496 strcmp:4
The previous output shows the strcmp function executing each

instruction sequentially until the instruction at strcmp+0x18 branches to
strcmp+0x44. You can display some of the assembly instructions using
the mdb(1) debugger:
# mdb -p 8567
> libc`strcmp,14/ai
libc.so.1`strcmp:
libc.so.1`strcmp: subcc %o0, %o1, %o2
libc.so.1`strcmp+4: be +0xac <libc.so.1`strcmp+0xb0>
libc.so.1`strcmp+8: sethi %hi(0x1010000), %o5
libc.so.1`strcmp+0xc: andcc %o0, 3, %o3
libc.so.1`strcmp+0x10: or %o5, 0x101, %o5
libc.so.1`strcmp+0x14: be +0x30 <libc.so.1`strcmp+0x44>
libc.so.1`strcmp+0x18: sll %o5, 7, %o4
libc.so.1`strcmp+0x1c: sub %o3, 4, %o3
libc.so.1`strcmp+0x20: ldub [%o1 + %o2], %o0
libc.so.1`strcmp+0x24: ldub [%o1], %g1
libc.so.1`strcmp+0x28: subcc %o0, %g1, %o0
libc.so.1`strcmp+0x2c: bne +0x1c4 <libc.so.1`strcmp+0x1f0>
libc.so.1`strcmp+0x30: addcc %o0, %g1, %g0

libc.so.1`strcmp+0x34: be +0x1bc <libc.so.1`strcmp+0x1f0>

libc.so.1`strcmp+0x38: addcc %o3, 1, %o3
libc.so.1`strcmp+0x3c: bne -0x1c <libc.so.1`strcmp+0x20>
libc.so.1`strcmp+0x40: add %o1, 1, %o1
libc.so.1`strcmp+0x44: andcc %o1, 3, %o3
libc.so.1`strcmp+0x48: be +0x118 <libc.so.1`strcmp+0x160>
libc.so.1`strcmp+0x4c: cmp %o3, 2
The instruction at the strcmp+0x18 address is a shift left logical (sll),

which is in the delay slot after the conditional branch instruction: be. This
instruction executes before the one at address: strcmp+0x44 even when
the branch is taken, which in this execution it was. Another conditional
branch was taken at address: strcmp+0x48.
DTrace enables you to trace, instruction by instruction, the actual

execution flow through the logic of a program. This is an improvement
over the traditional debugging techniques of inserting print statements in
your application or of running the application under a debugger and
setting breakpoints where appropriate.
Determining Time Spent in Functions
Using an associative array and the quantize aggregation built-in

function, you can determine the amount of time spent in every function of
an application. The following D script displays a power-of-two
distribution of how much time (in nanoseconds) is spent in every function
of the calls application. A clause-local variable is used to calculate the
elapsed time:
# cat -n timespent.d
2
3 pid$1:::entry
4 {
5 self->t[probefunc] = timestamp;
6 }
7
8 pid$1:::return
9 /self->t[probefunc]/
10 {
11 this->elapsed = timestamp - self->t[probefunc];
12 @[probefunc] = quantize(this->elapsed);
13 self->t[probefunc] = 0; /* frees memory */
14 }
# ./timespent.d 8950
^C

...
usleep
1048576 | 0
2097152 |@@@@@@@@@@ 1
4194304 |@@@@@@@@@@ 1
8388608 |@@@@@@@@@@@@@@@@@@@@ 2
16777216 | 0
...
f4
16384 | 0
32768 |@@@@@@@@@@@@@@@@@@@@ 1
65536 |@@@@@@@@@@@@@@@@@@@@ 1
131072 | 0
...
f1
4194304 | 0
8388608 |@@@@@@@@@@@@@@@@@@@@ 1
16777216 |@@@@@@@@@@@@@@@@@@@@ 1
33554432 | 0
...
main
16777216 | 0
33554432 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
67108864 | 0
The profile Provider

The profile provider provides unanchored probes: probes that are not
associated with any particular point of execution. When you specify these
probes, you leave off both the module and the function portion of the
probe description. Instead of being tied to a specific program location, the
profile probes are associated with an asynchronous, time-based
interrupt that fires at a fixed, specified time interval. You can use these
probes to sample an aspect of system state at the specified interval. For
example, you can sample the state of the current thread, the state of a
central processing unit (CPU), or the current machine instruction. You can
then use the samples to infer system behavior.

Using the profile-n Probes
A profile-n probe fires every fixed interval on every CPU at high

interrupt level. These probes are used to profile the execution of an
application because you do not know what CPU it may be running on at
any instant in time. The profile-n probes fire n times per second. You
can add the following suffixes to change the time units: ns for
nanoseconds, us for microseconds, ms for milliseconds, m for minutes, h
for hours, or d for days. For example, the following probes fire at the same
rate:
profile-200 Fires 200 times per second on every CPU
profile-5ms Fires every 5 milliseconds on every CPU
profile-5000us Fires every 5000 microseconds on every CPU
The following probes fire once per day:

profile-1d
profile-24h
The following script should output numbers that increase by

approximately one million (nanoseconds):
# dtrace -q -n 'profile-1ms {printf("%d\n", timestamp)}'
274817618640560
274817619628282
274817620626998
274817621624780
274817622624686
^C
Currently you cannot specify a time interval less than 200 microseconds
with the profile provider, as the following example shows:
# dtrace -q -n 'profile-199us {printf("%d\n", timestamp)}'
dtrace: invalid probe specifier profile-199us {printf("%d\n",
timestamp)}: probe description :::profile-199us does not match any probes
# dtrace -q -n 'profile-200us {printf("%d\n", timestamp)}'
275328143837997
275328144030602
275328144229696
275328144431022
^C

Sampling Process Activity
The following D script samples 109 times per second to see which
processes are running. The count indicates which processes have run the
most often during the interval that the script runs:
# cat -n running.d
2
3 profile-109
4 /pid != 0/
5 {
6 @[pid, execname] = count();
7 }
8
9 END
10 {
11 printf("%-8s %-40s %s\n", "PID", "CMD", "COUNT");
12 printa("%-8d %-40s %@d\n", @);
13 }
# ./running.d
^C
PID CMD COUNT
9190 grep 1
9191 bash 1
9190 bash 1
9189 bash 1
9188 uptime 2
8586 bash 2
9191 vi 12
3 fsflush 24
9192 find 80
You can use the profile-n provider to sample information about a

specific process. The following script samples, slightly quicker than every
5 milliseconds, the priority of the shell thread while it is running in an
infinite loop:
# echo $$
8586
# while : ; do : ; done

In another window, run the following D script:

# cat -n profilepri.d
2 profile-211
3 /pid == $1/
4 {
5 @[execname] = lquantize(curlwpsinfo->pr_pri, 0, 100, 10);
6 }
# ./profilepri.d 8586
^C
bash
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@ 271
10 |@@@@@@ 63
20 |@@@@ 48
30 |@@@ 32
40 |@ 15
50 |@@ 24
60 | 0
In the previous example, the curlwpsinfo built-in variable points to a

structure containing lwp information. This structure is described in the
proc(4) manual page. It shows the Solaris timesharing schedulers bias
towards zero for compute-bound threads. The high counts indicate that
this thread is running more frequently than other threads on the system.
In the following example, you see the results of running the next
invocation of the script when the shell is running in its more normal mode
of executing a few interactive commands:
# ./profilepri.d 8586
^C
bash
30 | 0
40 |@@@@@@@@@@@@@ 1
50 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2
60 | 0

This shows that the shells priority is higher when run interactively, where
it spends most of its time waiting on input; the small counts indicate that
it was not running frequently.
Using the tick-n probes
Like profile-n probes, tick-n probes fire every fixed interval at high
interrupt level. However, the tick-n probes fire only on one CPU per
interval, rather than on every CPU like the profile-n probes. These
probes should not be used to profile an application because it many run
on any CPU at any instant in time. You specify the n suffix just as you do
for the profile-n probes. For example, tick-20ms fires every 20
milliseconds, but only on one CPU. One use of the tick-n probes is to
provide periodic output or to take periodic action. You saw this usage in
Module 2 with the custom monitoring tools.
Using Arguments to the profile Provider
You can use the arguments to the profile probes to determine if the
executing thread is currently in kernel mode and, if it is not, where within
its process address space it is executing when the probe fires. The
program counter (PC) registers value is made available when the
profile probes fire. The arguments are set as follows:
The arg0 argument The PC register value in the kernel at the time
the probe fired, or 0 if the current thread was not executing in the
kernel at the time that the probe fired
The arg1 argument The PC register value in the user-level process
at the time the probe fired, or 0 if the current thread was executing in
the kernel at the time the probe fired
Profiling an Application Using the profile Provider
You can learn whether your application is executing within its own
process address space or within the kernel space by using the arg0 and
arg1 arguments, which are set when the profile probes fire. The following
D script samples the PC slightly faster than every millisecond. The script
runs for 10 seconds on a compute-bound application. It also shows how
many time intervals, out of the total that occurred in 10 seconds, the
application used:
# cat -n profile.d
2

3 profile-1009
4 {
5 ++t;
6 }
7
8 profile-1009
9 /pid == $1/
10 {
11 @pc[arg1] = count();
12 @mode[arg0 ? "kernel" : "user"] = count();
13 ++n;
14 }
15
16 tick-10sec
17 /n/
18 {
19 printa("%-10x\t%@u\n", @pc);
20 printf("Total: %u out of %u\n", n, t);
21 exit(0);
22 }
# ./profile.d 9240
ff3163ac 1
0 5
107f8 60
10810 60
10710 64
1084c 64
10754 65
10734 66
10824 69
1083c 69
10738 69
1081c 71
10820 73
106f4 75
10730 75
10728 76
10744 77
10814 77
1074c 79
1074c 79
106e4 79
106d8 79
1075c 80
10770 80
10828 80

1072c 82
10760 83
106f0 86
10758 86
106dc 87
106d4 88
106d0 92
ff2a11e8 132
10764 134
20ac8 137
20acc 141
ff2a11ec 142
10840 144
20ac4 147
10834 172
106cc 306
106e0 562
10714 611
107fc 623
ff2a11e4 716
ff2a11e0 3723
Total: 9887 out of 10002
kernel 5
user 9882
In the previous example, the high count in user mode versus kernel mode
indicates that this process is compute-bound. By using the mdb(1)
debugger as shown in the following example, you can tell where the
process is spending most of its time:
> ff2a11e0/i
libc.so.1`.umul:
libc.so.1`.umul:umul %o0, %o1, %o0
> ff2a11e4/i
libc.so.1`.umul+4: rd %y, %o1
> 107fc/i
mod+0x34: cmp %o0, %o1
> 10714/i
prod+0x1c: cmp %o0, %o1
> 106e0/i
sum+0x14: add %o0, %o1, %o0

This output shows that this process spent most of its time in the C library
multiply function: .umul. It spent most of the remaining time in its own
mod, prod, and sum functions. The programmer should investigate
compiler options to have the multiplication occur with hardware
instructions instead of in software. This program was compiled with the
gcc compiler with no optimizations.

Determining Time Spent in Functions
You can use the timespent2.d D script to obtain a graph of the time
spent in each function of this process. A special macro, $target, is set to
the process ID of the application that is started for you with the -c option
to the dtrace(1M) command. The command after the -c must be quoted
if it contains arguments:
# cat -n timespent2.d
2
3 pid$target:::entry
4 {
5 self->t[probefunc] = timestamp;
6 }
7
8 pid$target:::return
9 /self->t[probefunc]/
10 {
11 this->elapsed = timestamp - self->t[probefunc];
12 @[probefunc] = quantize(this->elapsed);
13 self->t[probefunc] = 0; /* frees memory */
14 }
# dtrace -s timespent2.d -c ./pgm
dtrace: script 'timespent2.d' matched 5836 probes
^C
...
.rem
4096 | 0
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 15
16384 | 0
memchr
4096 | 0
8192 |@@@@@@@@@@@@@ 5
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 11
32768 | 0
.div
2048 | 0
4096 |@@@@@@@ 5
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24
16384 |@ 1

32768 | 0
mutex_lock
8192 | 0
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16
32768 | 0
...
sum
4096 | 0
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13986319
16384 | 15890
32768 | 419
65536 | 14174
131072 | 426
262144 | 282
524288 | 59
1048576 | 57
2097152 | 24
...
prod
17179869184 | 0
34359738368 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
68719476736 |@@@ 1
137438953472 | 0
...
.umul
4096 | 0
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 27699230
16384 | 37144
32768 | 943
65536 | 30290
131072 | 864
262144 | 579
524288 | 111
1048576 | 157
2097152 | 74
4194304 | 3

This output shows that the process is spending an average of only 816
microseconds in both the sum and the .umul functions, but they are being
called significantly more often than the other functions. The process spent
between 3468 seconds in the prod function 14 times that it was called
and between 68137 seconds the other time it was called.
Finally, the following command builds a table of which functions of an

application are called the most frequently:
# dtrace -n 'pid$target:::entry {@[probefunc] = count()}' -c ./pgm
dtrace: description 'pid$target:::entry ' matched 2931 probes
^C
...
main 1
hdl_create 1
elf_entry_pt 1
unused 1
rtld_db_postinit 1
call_init 1
munmap 1
...
printf 3
.rem 3
mod 3
free 4
prod 4
defrag 4
strncpy 5
plt_full_range 5
strlen 5
...
strcmp 39
rt_bind_clear 42
sum 3549598
.umul 6249598

Application Variables
Accessing process address space information is more difficult than
accessing kernel information because DTrace actions run in the kernel.
Therefore, to access process data such as application variables or system
call argument strings (for example, path names), you must copy the
information from the process address space to the kernel. DTrace provides
two built-in functions to accomplish this:
void *copyin(uintptr_t addr, size_t size)
The copyin function copies the specified size in bytes from the
specified user address into a DTrace scratch buffer and returns the
address of this buffer. The user address is interpreted as being within
the address space of the process associated with the currently
running thread when the probe fires.
string copyinstr(uintptr_t addr)
The copyinstr function copies a null-terminated C string from the
specified user address into a scratch buffer and returns its address.
Displaying Process Global Variables

The following example shows how to display global variables from an
application when a probe fires. Two global variables have been added to
the calls.c C program you saw previously:
# cat -n calls3.c
1 int y = 15;
2 int z = 8;
3
5 {
6 ++z;
7 return (a+b);
8 }
9
11 {
12 int r;
13
14 r = f5(a,b)+13;
15 y = z+r;
16 return(r);
17 }

18
19 int f3(int a)
20 {
21 int r;
22
23 usleep(650);
24 r = f4(a-3, a+3);
25 z = r*y;
26 return(r);
27 }
28
29 int f2(int a)
30 {
31 return(f3(5*a));
32 }
33
35 {
36 int r;
37
38 usleep(90);
39 r = f2(a-b);
40 y = z*r;
41 return(r);
42 }
43
44 main()
45 {
46 int x;
47
48 x = f1(13,6);
49 printf("x=%d y=%d z=%d\n", x, y, z);
50 x = f1(17,5);
51 printf("x=%d y=%d z=%d\n", x, y, z);
52 }
# calls3
x=83 y=633788 z=7636
x=133 y=137443530 z=1033410
The following D script is passed three arguments:

$1 The virtual address of a global variable
$2 The global variables size
$$3 The name of the variable

You have dtrace(1M) start the process by using the -c option.

dtrace(1M) sets the $target macro to the process PID. The script
displays the value of a global variable on entry and return to every
function in the program that is called after the main function.
# cat -n uservariables.d
2
3 pid$target:a.out:main:entry
4 {
5 started = 1;
6 }
7
8 pid$target:a.out::entry
9 /started/
10 {
11 v = (int *)copyin($1, $2);
12 printf("On entry to %s: %s=%d\n", probefunc, $$3, *v);
13 }
14
15 pid$target:a.out::return
16 /started/
17 {
18 v = (int *)copyin($1, $2);
19 printf("On return from %s: %s=%d\n", probefunc, $$3, *v);
20 }
21
22 pid$target:a.out:main:return
23 {
24 exit(0);
25 }
The (int *) in front of the copyin function is called a cast, which is a

feature taken from the C language. A cast converts one data type into
another data type. In this case, the data type is converted from void *,
which is the type of the buffer address into which the variable is copied,
to an integer pointer, because you are copying in an integer. You use a * in
front of the v variable in the printf statements to dereference the pointer
to that which it points, namely the integer.
The nm(1) command is used to display the symbol table entry for the z
variable in the calls3 executable file.
# /usr/ccs/bin/nm calls3 | grep '|z$'
[70] | 133952| 4|OBJT |GLOB |0 |16 |z

# dtrace -qs uservariables.d -c calls3 133952 4 z

x=83 y=633788 z=7636
x=133 y=137443530 z=1033410
On entry to main: z=8
On entry to f1: z=8
On entry to f2: z=8
On entry to f3: z=8
On entry to f4: z=8
On entry to f5: z=8
On return from f5: z=9
On entry to f1: z=7636
On return from main: z=1033410
You can easily display the y variable, as follows:

# /usr/ccs/bin/nm calls3 | grep '|y$'
[67] | 133948| 4|OBJT |GLOB |0 |16 |y
# dtrace -qs uservariables.d -c calls3 133948 4 y
x=83 y=633788 z=7636
x=133 y=137443530 z=1033410
On entry to main: y=15
On entry to f1: y=15
On return from f5: y=15


On return from main: y=137443530
Displaying Library Global Variables

The following example displays various errno variables from libraries
linked with the bash shell every 211 milliseconds. Run an infinite loop of
cd commands that fail in the bash shell. The assumption is that errno
should be set to 2 (No such file or directory) by the bash shell when the
cd commands fail:
# cat -n libvars.d
2
3 tick-211ms
4 /pid == $1/
5 {
6 v = (int *)copyin($2, $3);
7 printf("The value of %s=%d\n", $$4, *v);
8 }
# ps -ef | grep bash
root 9593 9587 0 15:35:27 pts/2 0:00 bash
root 9583 9577 0 15:35:04 pts/1 0:00 bash
# echo $$
9593
# mdb -p 9583
> ::objects
BASE LIMIT SIZE NAME
10000 b2000 a2000 /usr/bin/bash
ff3b0000 ff3dc000 2c000 /lib/ld.so.1
ff350000 ff37a000 2a000 /lib/libcurses.so.1
ff320000 ff32c000 c000 /lib/libsocket.so.1
ff200000 ff292000 92000 /lib/libnsl.so.1
ff3a0000 ff3a2000 2000 /lib/libdl.so.1
ff100000 ff1d4000 d4000 /lib/libc.so.1
ff2d0000 ff2d4000 4000 /usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-
1.so.3

> ::nm ! grep '|errno$'

0xff3ee670|0x00000004|OBJT |LOCL |0x2 |21 |errno
0xff1ec03c|0x00000004|OBJT |GLOB |0x0 |21 |errno
> $q
# ./libvars.d 9583 0xff3ee670 4 errno
The value of errno=2
^C
# ./libvars.d 9583 0xff1ec03c 4 errno
^C
The libvars.d D script was run while the bash shell performed the
following loop:
# while :; do cd /fubar; done
bash: cd: /fubar: No such file or directory
This shows that the first errno at address 0xff3ee670 is the one set as a
result of the cd command failing in the bash shell. The No such file
or directory error message corresponds to an errno value of 2.

The plockstat Provider

The plockstat provider gives you details about user-level locking
events. It is used similarly to the pid provider when identifying the
process to be traced. For example plockstat1234 would trace user-level
lock events for the process with PID 1234. The three types of lock events
are hold events, contention events, and error events. Hold events occur
when a lock is acquired or released; contention events occur when the
application thread must wait for a lock; error events are any detected
errors when using the locks. The following example shows how to
monitor all lock events for a particular process:
# pgrep sendmail
1196
# dtrace -n 'plockstat1196::: {trace(timestamp)}'
dtrace: description 'plockstat1196::: ' matched 39 probes
0 51440 lmutex_lock:mutex-acquire 1523449860253331
0 51460 lmutex_unlock:mutex-release 1523449860271845
^C
The next example monitors readers/writer lock activity for the vold
process. The -p option to dtrace(1M) attaches to a running process and
sets the $target macro it its PID:
# pgrep vold
1098
# dtrace -n 'plockstat$target:::rw* {trace(timestamp)}' -p 1098
dtrace: description 'plockstat$target:::rw* ' matched 11 probes
0 51474 rwlock_lock:rw-block 1529287107214473
0 51494 rwlock_lock:rw-acquire 1529287107231728

0 51496 __rw_unlock:rw-release 1529287107252733

The plockstat(1M) command is a DTrace consumer that uses the

plockstat provider to show detailed application lock usage information.
The plockstat(1M) command is comparable to the lockstat(1M)
command which shows detailed lock contention details for kernel locks.

Transient System Call Errors

The following D program displays pertinent information any time any
processs system call fails. System call failures return a value of -1 , which
is placed in the arg0 argument when a syscall return probe fires. You
exclude looking at dtrace system call errors by comparing the PID of the
process whose system call failed with that of the dtrace command.
When a system call returns -1, the C library interface sets a global user
variable named errno to a positive error code, as shown in the following
example. These errno values are documented in the Intro(2) manual
page and in the /usr/inlude/sys/errno.h header file.
# cat -n errno.d
2 syscall:::return
3 /arg0 == -1 && pid != $pid/
4 {
5 printf("%-20s %-10s %d\n",execname,probefunc,errno);
6 }
# ./errno.d
svc.startd portfs 62
nscd lwp_park 62
fmd lwp_park 62
bash stat64 2
bash chdir 2
bash chdir 2
bash stat64 2
nscd lwp_kill 3
find open 2
find stat 2
bash setpgrp 13
bash waitsys 10
date open 2
date stat 2
ls open 2
ls stat 2
bash setpgrp 13
bash waitsys 10
nscd lwp_kill 3
^C

User Stack Traces on System Call Failures

By using the ustack() built-in DTrace function, you can also display a
stack trace of the application code that issued the failed system call:
# cat -n errno2.d
2
3 syscall:::return
4 /arg0 == -1 && pid != $pid/
5 {
6 printf("\n%-20s %-10s %d", execname, probefunc, errno);
7 ustack();
8 }
# ./errno2.d
bash setpgrp 13
libc.so.1`_syscall6+0x1c
35c6c
34fa8
bashèxecute_command_internal+0x414
bashèxecute_command+0x50
bash`reader_loop+0x220
bash`main+0x90c
bash`_start+0x108
libc.so.1`_portfs+0x4
svc.startd`wait_thread+0x30
libc.so.1`_lwp_start
libc.so.1`_portfs+0x4
svc.startd`wait_thread+0x30
libc.so.1`_lwp_start
bash waitsys 10
libc.so.1`_waitid+0x8
libc.so.1`waitpid+0x60
410a0
41004
libc.so.1`__sighndlr+0xc
libc.so.1`call_user_handler+0x3b8
libc.so.1`__lwp_sigmask+0x30
libc.so.1`pthread_sigmask+0x1b4
libc.so.1`sigprocmask+0x20

bash`make_child+0x254
35c6c
34fa8
bash`main+0x90c
bash`_start+0x108
bash stat64 2
libc.so.1`stat64+0x4
bash`sh_canonpath+0x258
63638
bash`cd_builtin+0x364
352a0
35a8c
34fc8
bash`main+0x90c
bash`_start+0x108
find open 2
ld.so.1`__open+0x4
ld.so.1èlf_config+0x120
ld.so.1`setup+0xc20
ld.so.1`_setup+0x37c
ld.so.1`_rt_boot+0x88
Hexadecimal addresses are shown on the stack trace output when the
dtrace command cannot resolve the PC value to a symbol. To find what
transient system call errors are occurring in a specific application and
where, you simply change the errno2.d script to pass in the PID of the
application.

Processes Using a Lot of System Time

Suppose you saw the following prstat(1M) command output:
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
12663 root 1104K 672K run 0 0 0:00:13 47% unknown/1
12662 root 4736K 4392K cpu0 59 0 0:00:00 0.2% prstat/1
278 root 2976K 1832K sleep 59 0 0:00:15 0.0% nscd/23
9593 root 2840K 2096K sleep 59 0 0:00:01 0.0% bash/1
12577 root 2808K 2056K sleep 59 0 0:00:00 0.0% bash/1
478 root 4696K 1312K sleep 59 0 0:00:21 0.0% sendmail/1
451 root 10M 5016K sleep 59 0 0:00:09 0.0% snmpd/1
517 root 2016K 472K sleep 59 0 0:00:00 0.0% ttymon/1
434 root 3624K 1464K sleep 59 0 0:00:00 0.0% snmpXdmid/2
9584 root 4520K 2200K sleep 59 0 0:00:00 0.0% in.telnetd/1
422 root 2280K 824K sleep 59 0 0:00:00 0.0% snmpdx/1
426 root 4920K 1168K sleep 59 0 0:00:00 0.0% dtlogin/1
439 root 2968K 1584K sleep 59 0 0:00:00 0.0% vold/3
476 root 2032K 720K sleep 59 0 0:00:00 0.0% ttymon/1
433 root 3048K 1032K sleep 59 0 0:00:00 0.0% dmispd/1
353 root 1872K 136K sleep 59 0 0:00:00 0.0% smcboot/1
339 root 1200K 472K sleep 59 0 0:00:01 0.0% utmpd/1
329 root 1560K 488K sleep 59 0 0:00:00 0.0% powerd/2
281 root 2616K 1200K sleep 59 0 0:00:00 0.0% inetd/1
265 root 2520K 792K sleep 59 0 0:00:00 0.0% cron/1
251 root 3800K 1432K sleep 59 0 0:00:01 0.0% automountd/3
260 root 3784K 1568K sleep 59 0 0:00:00 0.0% syslogd/13
171 root 2096K 1016K sleep 59 0 0:00:16 0.0% in.routed/1
185 daemon 2424K 584K sleep 59 0 0:00:00 0.0% rpcbind/1
189 root 2384K 352K sleep 59 0 0:00:00 0.0% keyserv/2
68 root 3128K 56K sleep 59 0 0:00:00 0.0% picld/4
65 daemon 3544K 1208K sleep 59 0 0:00:00 0.0% kcfd/3
59 root 2368K 152K sleep 59 0 0:00:00 0.0% syseventd/14
Total: 38 processes, 109 lwps, load averages: 0.14, 0.11, 0.09
You can obtain more details on the unknown process by using the
following command:
# prstat -m -p 12663
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
12663 root 43 57 0.0 0.0 0.0 0.0 0.0 0.3 0 129 .12 0 unknown/1

The unknown process is using a lot of system time. The following D

program can determine what system calls are being made:
# dtrace -n 'syscall:::entry /pid == 12663/ { @syscalls[probefunc] = count();}'
^C
read 940592
This process appears to be stuck in an endless loop of read(2) system

calls. The following truss(1) command confirms this, and shows that the
reads are failing:
# truss -p 12663
read(3, 0xFFBFFD0B, 1) Err#89 ENOSYS
...
The errno2.d D script shows further evidence of a runaway loop of

failing read(2) system calls:
# ./errno2.d
unknown read 89
libc.so.1`_read+0x8
unknown`main+0x134
unknown`_start+0x5c
unknown read 89
libc.so.1`_read+0x8
unknown`main+0x134
unknown`_start+0x5c
unknown read 89
libc.so.1`_read+0x8
unknown`main+0x134
unknown`_start+0x5c
^C
# grep 89 /usr/include/sys/errno.h
/* Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T */
* (c) 1983,1984,1985,1986,1987,1988,1989 AT&T.
#define ENOSYS 89 /* Unsupported file system operation */
# pkill unknown

Suppose you saw the following similar prstat(1M) command output:

# prstat -m -p 12745
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
12745 root 17 81 0.0 0.0 0.0 0.0 0.0 1.5 0 132 .5M 0 readchar/1
You can again get details on what system calls are being made, as follows:
# dtrace -n 'syscall:::entry /pid == 12745/ { @syscalls[probefunc] = count();}'
^C
stat 6
open 6
write 6
close 6
read 760747
# truss -p 12745
read(3, "\b", 1) = 1
read(3, "92", 1) = 1
read(3, "10", 1) = 1
read(3, "\0", 1) = 1
read(3, "14", 1) = 1
read(3, " @", 1) = 1
read(3, "\0", 1) = 1
read(3, "82", 1) = 1
read(3, " #", 1) = 1
read(3, "90", 1) = 1
^C
As its name implies, this readchar process is reading a single character at

a time. Now run the iosnoop.d D script from Module 2 to get details on
the disk input/output (I/O):
# ./iosnoop.d
readchar 12745 /usr/lib/nss_ldap.so.1 sd2 R 6.492
readchar 12745 <none> sd2 R 6.398
readchar 12745 /usr/lib/passwdutil.so.1 sd2 R 0.729
readchar 12745 /usr/lib/watchmalloc.so.1 sd2 R 6.622
readchar 12745 /usr/lib/cpp sd2 R 7.896


readchar 12745 <unknown> sd2 R 5.968
readchar 12745 /usr/lib/libz.so.1 sd2 R 2.075
readchar 12745 /usr/lib/libz.so.1 sd2 R 5.438
readchar 12745 /usr/lib/llib-lz sd2 R 7.249
readchar 12745 /usr/lib/llib-lz.ln sd2 R 0.586
readchar 12745 /lib/libm.so.2 sd2 R 4.507
^C
This application appears to be reading all of the files under the /usr/lib
directory one byte at a time. This programmer must not realize that using
the standard I/O library functions to buffer reads is more efficient than
issuing system call reads of one character at a time. The OS is reading the
disk in blocks, as the iosnoop.d D script output indicates, but the
application is only extracting the information from the kernel buffers one
byte at a time.

Open Files
Open Files
In this section you learn how to display the path names of files being
opened. Note that in DTrace it is more difficult to display pointer
arguments passed to system calls than those passed as integer arguments.
Examples of system calls that take pointer arguments are open(2), stat(2),
unlink(2), and chmod(2), which each take path name string arguments.
There are also system calls that pass the address of structures, for
example, sigaction(2). You must use the appropriate copinstr() and
copyin() built-in functions to display the actual strings or structures being
passed to the kernel.
Accessing System Call Pointer Arguments

Suppose you knew an application was writing out literal strings using the
write(2) system call, as follows:
# cat -n writemsg.c
1 main()
2 {
3 write(1, "This is some text being", 23);
4 write(1, " written to standard output", 29);
5 write(1, " to prove a point\n", 18);
6 }
# gcc writemsg.c -o writemsg

# writemsg
This is some text being written to standard output to prove a point
#
You might try to display these strings using the following D script:
# cat -n write.d
2
3 syscall::write:entry
4 /pid == $target/
5 {
6 printf("%s\n", stringof(arg1));
7 }
# dtrace -s write.d -c writemsg
dtrace: script 'write.d' matched 1 probe
dtrace: pid 1532 exited with status 1

Open Files
dtrace: error on enabled probe ID 1 (ID 12: syscall::write:entry):

invalid address (0x10000) in action #1
^C
The arg1 argument used in the write.d D script is the second argument
to the write(2) system call, which in this case is the address of the string
you want to display. It is a process address, however, and DTrace is
running the action statements in the kernels address space. The
stringof() built-in function converts the write(2) system call argument
to the proper string type. For the script to work, you must use the
copyinstr() or copyin() built-in DTrace functions showed previously.
The following example shows the correct way to access the processs
string arguments:
# cat -n write2.d
2
4 /pid == $target/
5 {
6 printf("%s\n", copyinstr(arg1));
7 }
# dtrace -s write2.d -c writemsg
dtrace: script 'write2.d' matched 1 probe
0 12 write:entry This is some text being
0 12 write:entry written to standard output
0 12 write:entry to prove a point
The following changes to the D script enable it to work on all system-wide

write(2) system calls (except those issued by the dtrace(1M) command):
# cat -n write3.d
2
4 /pid != $pid/
5 {

Open Files
6 printf("%s\n", copyinstr(arg1));
7 }
# ./write3.d
dtrace: script './write3.d' matched 1 probe
ore--ion, name)
iption specifiers (provider, module, func-
e
describes how to use
4maction]]
0 914 write:entry sys61#

./write2.ddwrite2.dted token `newline'
ctory
_________________________________________________________________________
_________________________________________________________________________
_____________________________________________________
0 914 write:entry pys61#

./write2.ddwrite2.dted token `newline'
ctory
_________________________________________________________________________
_________________________________________________________________________
_____________________________________________________
You received garbage output because the write(2) system call does not
necessarily write out null terminated strings. The copyin() system call is
the more appropriate function to use for specifying the size of the write:
# cat -n write4.d
2
4 /pid != $pid/
5 {
6 printf("%s\n", stringof(copyin(arg1, arg2)));
7 }
# ./write4.d
dtrace: script './write4.d' matched 1 probe
0 914 write:entry p
0 914 write:entry w

Open Files
0 914 write:entry d
0 914 write:entry
0 914 write:entry /var/dtrace/mod3
0 914 write:entry d
0 914 write:entry a
0 914 write:entry t
0 914 write:entry e
0 914 write:entry
0 914 write:entry Sun Jun 13 16:55:28 MDT 2004
^C
Displaying Names of Files Being Opened

The following example shows how to display the names of files being
opened systemwide:
# cat -n open.d
2
3 syscall::open*:entry
4 {
5 printf("%s opening %s\n", execname, copyinstr(arg0));
6 }
# ./open.d
init opening /etc/inittab
init opening /etc/svc/volatile/init-next.state
init opening /etc/svc/volatile/init-next.state
init opening /etc/inittab

Open Files
man opening /var/ld/ld.config

man opening /lib/libc.so.1
man opening /usr/share/man/man.cf
man opening /usr/share/man/windex
man opening /usr/share/man/sman1m/dtrace.1m
sh opening /var/ld/ld.config
sh opening /lib/libc.so.1
more opening /var/ld/ld.config
more opening /lib/libcurses.so.1
more opening /lib/libc.so.1
more opening /usr/share/lib/terminfo//x/xterm
utmpd opening /var/adm/utmpx
utmpd opening /var/adm/utmpx
utmpd opening /proc/12571/psinfo
utmpd opening /proc/9587/psinfo
date opening /var/ld/ld.config
date opening /lib/libc.so.1
date opening /usr/share/lib/zoneinfo/US/Mountain
vi opening /var/ld/ld.config
vi opening /usr/lib/libmapmalloc.so.1
vi opening /lib/libcurses.so.1
vi opening /lib/libc.so.1
vi opening /lib/libgen.so.1
vi opening /usr/share/lib/terminfo//x/xterm
vi opening //.exrc
vi opening /var/tmp/ExTcaqBz
vi opening /var/tmp/ExUcaqBz
vi opening /etc/system
^C
Displaying Path Names When open System Calls Fail
The following example shows how to know when an open(2) system call
fails and how to display the pertinent information to determine the
problem:
# cat -n failedopen.d
2
4 /pid == $1/
5 {
6 self->path = copyinstr(arg0);
7 self->entry = 1;
8 }
9

Open Files
10 syscall::open*:return
11 /self->entry && arg0 == -1/
12 {
13 printf("open for '%s' failed, errno=%d", self->path, errno);
14 ustack();
15 self->entry = 0;
16 }
# failedopen.d 13026
open for '/usr/openwin/lib/X11/XtErrorDB' failed, errno=2
febbcf78
febb05a0
fec97b38
fec97a78
fedbbffc
fedbbeac
fedbbe40
fedc0220
fedc037c
fed8fb6c
fed8f2f8
fed8f290
cf3f8
3f648
d1c98
5c658
^C
Displaying a Symbolic Stack Trace
The failedopen.d D script was run on the dtmail graphical user

interface (GUI) utility as it was started up over a telnet session. The
dtrace(1M) command could not determine the symbols at the place the
functions were called. This may be due to the application exiting before
the dtrace(1M) consumer has a chance to read its symbol table. You can
use the mdb(1) debugger to display the PC locations symbolically:
# mdb /usr/dt/bin/dtmail
> _start:b
> :r
mdb: stop at _start
_start: clr %fp
> !ps
PID TTY TIME CMD
13025 pts/1 0:00 mdb

Open Files
13027 pts/1 0:00 sh

12571 pts/1 0:00 sh
13026 pts/1 0:00 dtmail
13028 pts/1 0:00 ps
12577 pts/1 0:06 bash
> :c
libSDtMail: Error: Xt Error: Can't open display: 129.150.33.103:0.0
mdb: target has terminated
> 5c658/i
_start+0x108:
_start+0x108: call +0x75618 <main>
> d1c98/i
main+0x28:
main+0x28: jmpl %i1, %o7
> 3f648/i
__0fHRoamAppKinitializePiPPc+0x310:
__0fHRoamAppKinitializePiPPc+0x310: call +0x8fd24
<__0fLApplicationKinitializePiPPc>
> cf3f8/i
__0fLApplicationKinitializePiPPc+0x8c:
__0fLApplicationKinitializePiPPc+0x8c: call +0x52718
<PLT:XtAppInitialize>
> fed8f290/i
libXt.so.4`XtAppInitialize+0x54:
libXt.so.4`XtAppInitialize+0x54:call +0x56800
<PLT:XtOpenApplication>
> fed8f2f8/i
libXt.so.4`XtOpenApplication+0x48: call +0x56774
<PLT:_XtAppInit>
> fed8fb6c/i
libXt.so.4`_XtAppInit+0x138: call +0x553cc <PLT:XtErrorMsg>
> febbcf78/i
libc.so.1`__open+4: ta 8
> febbcf78:b
> :c
mdb: stop at libc.so.1`__open+4
libc.so.1`__open+4: ta 8
> $c
libc.so.1`__open+4(ff2893ec, 2000, 1b6, 38e70, ff3b3508, febe2264)
libc.so.1òpen+0x64(ff2893ec, 2000, 1b6, ff3ea0f8, ff3ec46c, 0)
libnsl.so.1`__nsl_fopen+0x8c(ff2893ec, ff2893fc, ff24fbb4, ff3ea0f8,
ff3ec46c, ff2893fc)
libnsl.so.1`getnetlist+0x20(0, 69bcc, ff292690, 0, 0, ff290f30)
libnsl.so.1`setnetconfig+0x38(0, ff294a58, ff292690, 0, 763a8, febea4c0)
libnsl.so.1`__rpc_getconfip+0xd8(ff296ea8, 0, 0, 0, 4144c, 0)

Open Files
libnsl.so.1`getipnodebyname+0x1c(ffbfef50, 1a, 3, ffbfef3c, 1010101,

57f74)
libsocket.so.1`get_addr+0x158(0, fe8920a0, ffbff0c0, 17700000, 0, 0)
libsocket.so.1`_getaddrinfo+0x710(fe8920a0, 1770, ffbff168, 15950c, 0, 2)
libX11.so.4`_X11TransSocketINETConnect+0x178(1594d0, fe8920a0, ffbff188,
ffbff32c, fed13100, 0)
libX11.so.4`_X11TransConnect+0x58(1594d0, ffbff3e8, 7ffffc00, fe892090,
fed13104, fe982078)
libX11.so.4`_X11TransConnectDisplay+0x6e0(e, 1594d0, 1, ffbff3e8, 0, 0)
libX11.so.4`XOpenDisplay+0xe8(0, fed20bc4, 158f88, ffbffdec, 9ebc4, 0)
libXt.so.4`XtOpenDisplay+0xe4(158190, 0, ffbffdcc, fe982010, 0, 0)
libXt.so.4`_XtAppInit+0xfc(ffbff71c, fe982010, 0, 0, ffbffdcc, ffbff778)
libXt.so.4`XtOpenApplication+0x48(12bc78, fe982010, 0, 0, ffbffdcc,
ffbffdec)
libXt.so.4`XtAppInitialize+0x54(1346f4, fede7638, fede4000, 120008,
54da8, 14e634)
__0fLApplicationKinitializePiPPc+0x8c(12bc68, ffbffdcc, ffbffdec, 0,
1346f4, 134400)
__0fHRoamAppKinitializePiPPc+0x310(12bc68, ffbffdcc, ffbffdec, 14d400, 0,
136000)
main+0x28(136000, 3f338, 12bc68, 12d0cc, 0, 136000)
_start+0x108(0, 0, 0, 0, 0, 0)
>
A breakpoint was set on the C library open function and the dtmail
utility was continued in the debugger to hit the breakpoint. The $c mdb
command was used to display the stack trace symbolically after the
breakpoint hit.
Examining Another Failed open Example
The next example shows the failedopen2.d D script run on the cat(1)
command while it opens a non-existent file. This script assumes that
dtrace(1M) will start the command.
# cat -n failedopen2.d
2
4 /pid == $target/
5 {
6 self->path = copyinstr(arg0);
7 self->entry = 1;
8 }
9

Open Files
11 /self->entry && arg0 == -1/

12 {
13 printf("open for '%s' failed, errno=%d", self->path, errno);
14 ustack();
15 self->entry = 0;
16 }
# dtrace -s failedopen2.d -c "cat /nothing"
dtrace: script 'failedopen2.d' matched 4 probes
cat: cannot open /nothing
0 397 open64:return open for '/nothing' failed,
errno=2
libc.so.1`__open64+0x4
libc.so.1`_endopen+0x88
libc.so.1`fopen64+0x1c
cat`main+0x318
cat`_start+0x108

Open Files
Accessing structure members in the sigaction(2) system call.
The sigaction(2) system call passes the kernel an address of a

sigaction structure. In order to access its members you must first copy
in the structure using the copyin() DTrace function. The following script
shows how to do this. It uses a clause-local variable to point to the copied
in sigaction structure.
# cat -n sigaction.d
2
3 syscall::sigaction:entry
4 {
5 this->sa_struct = (struct sigaction *)copyin(arg1,
sizeof(struct sigaction));
6 printf("%s called sigaction on signal %d with flags: %x\n",
7 execname, arg0, this->sa_struct->sa_flags);
8 }
# ./sigaction.d
...
tcsh called sigaction on signal 2 with flags: 0
...
vi called sigaction on signal 8 with flags: 2

Module 4
Finding System Problems With DTrace
Objectives
Use DTrace to access kernel variables
Use DTrace to obtain information about read calls
Use DTrace to perform anonymous tracing
Use DTrace to perform speculative tracing
Explain the privileges necessary to run DTrace operations
4-1
Relevance
Relevance

to use DTrace for finding system problems:
!
?
Would the ability to access any kernel variable when a probe fires be
beneficial?
Would it be useful to know who is issuing which type of read calls?
Would it be advantageous to trace device driver code during system
boot?
Would it be beneficial to give regular user accounts access to the
DTrace facility that is limited to user-owned processes?


817-6223-10.
Finding System Problems With DTrace 4-3

Accessing Kernel Variables

The DTrace instrumentation executes inside the Solaris Operating
System (Solaris OS) kernel. This means that, in addition to accessing
DTrace variables and probe arguments such as pid and arg1, you can
also access kernel data structures, symbols, and types. These capabilities
allow advanced DTrace users, experienced system administrators, support
service personnel, and driver developers to examine low-level behavior of
the operating system kernel and the device drivers.
Using the D Language to Access Kernel Symbols

The D language uses the backquote character () as a scoping operator for
accessing symbols that are defined in the operating system and not in
your D programs. For example, the Solaris kernel contains a C language
declaration of a system tunable named kmem_flags for enabling memory
allocator debugging features. This tunable is declared in C in the kernel
source code as follows:
int kmem_flags;
To display the value of this variable, you can write the D statement:
printf(%x\n, kmem_flags);
Examining Naming Conflicts
DTrace associates each kernel symbol with the type used for it in the
operating system C code, providing source-based access to the native
operating system data structures. Because kernel symbol names are kept
in a separate namespace from D variables and function identifiers, naming
conflicts are not an issue.
When you prefix a variable with a backquote, the D compiler searches the
known kernel symbols in order, using the list of loaded modules to find a
matching variable definition. Because the Solaris OS kernel supports
dynamically loaded modules with separate symbol namespaces, the same
variable name or function name can be used more than once in the kernel.
You resolve this conflict by preceding the variable or function name with
the kernel module name and the backquote character as a separator. For
example, you refer to the _init(9E) function in the sd module as follows:
sd_init

You can apply any of the D operators to external kernel variables, except
those that modify values. When you launch DTrace, the D compiler loads
the set of variable names corresponding to active kernel modules, so
declarations of these variables are not required.
Monitoring Kernel Variables

The following D script displays, every five seconds, the value of three
global kernel variables:
The nproc variable Holds the current number of Solaris OS
processes
The nthread variable Holds the current number of Solaris OS
threads
The freemem variable Holds the current amount of system free
memory not owned by the memory allocator
You must precede each reference to these kernel variables with a

backquote character (), as shown in the following example:
# cat -n monitor.d
2
3 BEGIN
4 {
5 printf("%-14s %-10s %10s\n", "Processes",
6 "Threads", "Free Memory");
7 }
8
9 tick-5sec
10 {
11 printf("%-14d %-10d %9dmb\n", `nproc,
12 `nthread, (`freemem*8)/1024);
13 }
# ./monitor.d
Processes Threads Free Memory
41 232 322mb
42 232 306mb
41 232 322mb
53 242 320mb
47 249 251mb
41 232 252mb
41 232 252mb
41 232 232mb
41 232 111mb

47 235 110mb
47 241 110mb
Accessing Kernel Data Structures

When a probe fires, DTrace sets many useful built-in variables. Three of
these variables and their associated data structures are:
The curpsinfo variable Points to a process information structure
The curlwpsinfo variable Points to a lightweight process (LWP)
information structure
The curcpu variable Points to a central processing unit (CPU)
information structure
The first two structures are part of the proc(4) interface and are used by
commands like ps(1) and prstat(1M). These variables provide access to
kernel state information at the time any probe fires. The following
examples define the data structures.
The psinfo Data Structure
The following shows the psinfo data structure:

typedef struct psinfo {
int pr_nlwp; /* number of active lwps in the process */
pid_t pr_pid; /* unique process id */
pid_t pr_ppid; /* process id of parent */
pid_t pr_pgid; /* pid of process group leader */
pid_t pr_sid; /* session id */
uid_t pr_uid; /* real user id */
uid_t pr_euid; /* effective user id */
gid_t pr_gid; /* real group id */
gid_t pr_egid; /* effective group id */
uintptr_t pr_addr; /* address of process */
dev_t pr_ttydev; /* controlling tty device (or PRNODEV) */
timestruc_t pr_start; /* process start time, from the epoch */
char pr_fname[PRFNSZ]; /* name of execed file */
char pr_psargs[PRARGSZ]; /* initial characters of arg list */
int pr_argc; /* initial argument count */
uintptr_t pr_argv; /* address of initial argument vector */
uintptr_t pr_envp; /* address of initial environment vector */
char pr_dmodel; /* data model of the process */
taskid_t pr_taskid; /* task id */
projid_t pr_projid; /* project id */
poolid_t pr_poolid; /* pool id */
zoneid_t pr_zoneid; /* zone id */
} psinfo_t;

The lwpsinfo Data Structure
The following shows the lwpsinfo data structure:

typedef struct lwpsinfo {
int pr_flag; /* lwp flags */
id_t pr_lwpid; /* lwp id */
uintptr_t pr_addr; /* internal address of lwp */
uintptr_t pr_wchan; /* wait addr for sleeping lwp */
char pr_stype; /* synchronization event type */
char pr_state; /* numeric lwp state */
char pr_sname; /* printable character for pr_state */
char pr_nice; /* nice for cpu usage */
short pr_syscall; /* system call number (if in syscall) */
int pr_pri; /* priority, high value is high priority */
char pr_clname[PRCLSZ]; /* scheduling class name */
processorid_t pr_onpro; /* processor which last ran this lwp */
processorid_t pr_bindpro; /* processor to which lwp is bound */
psetid_t pr_bindpset; /* processor set to which lwp is bound */
} lwpsinfo_t;

The cpuinfo Data Structure
The following shows the cpuinfo data structure:

typedef struct cpuinfo {
processorid_t cpu_id; /* CPU identifier */
psetid_t cpu_pset; /* processor set identifier */
chipid_t cpu_chip; /* chip identifier */
lgrp_id_t cpu_lgrp; /* locality group identifier */
processor_info_t cpu_info; /* CPU information */
} cpuinfo_t;
The curthread Variable
Another built-in D variable that is set when a probe fires is the curthread
variable, which you used in the ancestry.d D script in Module 2. The
curthread variable points to the kthread_t kernel structure of the
currently running thread. Using the curthread pointer to access
information in the kthread_t structure (or most other kernel data
structures) provides a less stable interface than using the lwpsinfo_t and
psinfo_t structures. The reason for this is that the psinfo_t and
lwpsinfo_t structures are abstractions of process and thread information
as advertised by the proc(4) interface. In contrast, curthread gets at the
actual kernel implementation of this information which may change. For
more details on the stability of DTrace interfaces, see the Solaris Dynamic
Tracing Guide, part number 817-6223-10. The dtrace(1M) command has a
-v option that will tell you the stability of a D program.
Example D Script Using Data Structures
The following D script uses the psinfo_t and lwpsinfo_t structures to

display thread and process information for any thread that calls a specific
kernel function:
# cat -n ps.d
2
3 BEGIN
4 {
5 printf("TID\tPID\tPPID\tUID\tPRI\tCOMMAND\n");
6 }
7
8 fbt::$1:entry
9 /pid != $pid && pid != 0/
10 {
11 ++nlines;
12 printf("%d\t%d\t%d\t%d\t%d\t%s\n", curlwpsinfo->pr_lwpid,

13 curpsinfo->pr_pid, curpsinfo->pr_ppid, curpsinfo->pr_uid,

14 curlwpsinfo->pr_pri, curpsinfo->pr_psargs);
15 }
16
17 fbt::$1:entry
18 /nlines > 20/
19 {
20 printf("TID\tPID\tPPID\tUID\tPRI\tCOMMAND\n");
21 nlines = 0;
22 }
# ./ps.d bdev_strategy
TID PID PPID UID PRI COMMAND
1 4640 4639 0 55 find / -type f
1 4640 4639 0 55 find / -type f
1 4698 4641 0 51 file
/var/sadm/pkg/SUNWfontconfig-root/save/pspool/SUNWfontconfig-
root/install/
1 4640 4639 0 55 find / -type f
1 4698 4641 0 51 file
/var/sadm/pkg/SUNWfontconfig-root/save/pspool/SUNWfontconfig-
root/install/
^C
# ps.d nanosleep
11 279 1 0 59 /usr/sbin/nscd
12 279 1 0 59 /usr/sbin/nscd
21 279 1 0 59 /usr/sbin/nscd
18 279 1 0 59 /usr/sbin/nscd
17 279 1 0 59 /usr/sbin/nscd
16 279 1 0 59 /usr/sbin/nscd
13 279 1 0 59 /usr/sbin/nscd
1 2120 2119 0 59 sleep 5
12 279 1 0 59 /usr/sbin/nscd
11 279 1 0 59 /usr/sbin/nscd
13 279 1 0 59 /usr/sbin/nscd
14 279 1 0 59 /usr/sbin/nscd
15 279 1 0 59 /usr/sbin/nscd
16 279 1 0 59 /usr/sbin/nscd
17 279 1 0 59 /usr/sbin/nscd
18 279 1 0 59 /usr/sbin/nscd
21 279 1 0 59 /usr/sbin/nscd
18 279 1 0 59 /usr/sbin/nscd
17 279 1 0 59 /usr/sbin/nscd
16 279 1 0 59 /usr/sbin/nscd

^C
The sched Provider
The sched DTrace provider enables probes related to thread scheduling.

For example, the on-cpu probe fires when a CPU begins to execute a
thread, and the off-cpu probe fires when a thread is about to be taken off
of a CPU.
Note Refer to the Solaris Dynamic Tracing Guide for details on the probes
provided by the sched provider.
To list the sched probes, use the following command:

# dtrace -l -P sched | awk '{print $NF}' | sort -u
NAME
change-pri
dequeue
enqueue
off-cpu
on-cpu
preempt
remain-cpu
schedctl-nopreempt
schedctl-preempt
schedctl-yield
sleep
surrender
tick
wakeup
The following D script uses the on-cpu sched probe to display the name
of the executable process starting to run on a CPU and the priority of its
thread:
# cat -n start2run.d
2
3 sched:::on-cpu
4 /pid != $pid && pid != 0/
5 {
6 printf("Thread %d from: %s starting on CPU %d at priority %d\n",
7 curlwpsinfo->pr_lwpid, curpsinfo->pr_psargs, curcpu->cpu_id,
8 curlwpsinfo->pr_pri);
9 }

# ./start2run.d
Thread 1 from: fsflush starting on CPU 0 at priority 60
Thread 1 from: bash starting on CPU 0 at priority 59
Thread 1 from: pgm starting on CPU 1 at priority 49
Thread 6 from: /lib/svc/bin/svc.startd starting on CPU 0 at priority 59
Thread 1 from: /usr/sfw/sbin/snmpd starting on CPU 1 at priority 59
Thread 4 from: /usr/lib/picl/picld starting on CPU 2 at priority 59
Thread 18 from: /usr/sbin/nscd starting on CPU 0 at priority 59
Thread 4 from: /usr/lib/picl/picld starting on CPU 2 at priority 59
Thread 2 from: /usr/lib/autofs/automountd starting on CPU 2 at priority 59
Thread 18 from: /usr/sbin/nscd starting on CPU 0 at priority 59
Thread 1 from: /usr/lib/sendmail -bd -q15m starting on CPU 0 at priority 59
The following D script uses the on-cpu sched probe with an aggregation
to display a summary of who has recently been running on what CPU:
# cat -n whorun.d
2
3 sched:::on-cpu
4 /pid != $pid && pid != 0/
5 {
6 @[curpsinfo->pr_psargs, curcpu->cpu_id] = count();
7 }
8
9 END
10 {
11 printf("%-30s %4s %6s\n", "Command", "CPU", "Count");
12 printa("%-30s %4d %@6d\n", @);
13 }
# ./whorun.d
^C
Command CPU Count

/usr/lib/fm/fmd/fmd 1 1
uptime 2 1
find / -name fubar 3 2
/usr/lib/autofs/automountd 2 3
-sh 2 3
-sh 1 3
/usr/lib/picl/picld 1 4
/usr/sbin/nscd 3 8
/usr/sbin/nscd 2 14
/usr/sbin/nscd 0 15
/usr/lib/sendmail -bd -q15m 0 16
ls -lR / 1 18
/usr/sfw/sbin/snmpd 0 18
-sh 3 20
/usr/lib/utmpd 0 20
/usr/lib/sendmail -bd -q15m 2 20
/usr/lib/picl/picld 0 32
fsflush 0 72
/usr/sbin/nscd 1 77
find / -name fubar 1 152
/usr/sbin/vold 2 152
Accessing Lock Contention Information

The lockstat provider makes available probes that give information
regarding locking behavior on the system. For example, when the
adaptive-block probe fires, you know that a kernel thread had to wait
for an adaptive mutex, and the arg1 argument tells you how long it slept
waiting for the locks release. This gives you a sense of how much
contention there is for the data (or code) that the mutex is protecting.
Note See the Solaris Dynamic Tracing Guide for details on other lockstat
provider probes.

The lockstat Provider Probes
To list the lockstat provider probes, use the following command:

# dtrace -l -P lockstat
467 lockstat genunix mutex_enter adaptive-acquire
468 lockstat genunix mutex_enter adaptive-block
469 lockstat genunix mutex_enter adaptive-spin
470 lockstat genunix mutex_exit adaptive-release
471 lockstat genunix mutex_destroy adaptive-release
472 lockstat genunix mutex_tryenter adaptive-acquire
473 lockstat genunix lock_set spin-acquire
474 lockstat genunix lock_set spin-spin
475 lockstat genunix lock_set_spl spin-acquire
476 lockstat genunix lock_set_spl spin-spin
477 lockstat genunix lock_try spin-acquire
478 lockstat genunix lock_clear spin-release
479 lockstat genunix lock_clear_splx spin-release
480 lockstat genunix CLOCK_UNLOCK spin-release
481 lockstat genunix rw_enter rw-acquire
482 lockstat genunix rw_enter rw-block
483 lockstat genunix rw_exit rw-release
484 lockstat genunix rw_tryenter rw-acquire
485 lockstat genunix rw_tryupgrade rw-upgrade
486 lockstat genunix rw_downgrade rw-downgrade
487 lockstat genunix thread_lock thread-spin
488 lockstat genunix thread_lock_high thread-spin
The following D script displays CPU, thread, process, wait time, and stack
trace information related to a thread blocking on an adaptive mutex:
# cat -n mutex.d
2
3 lockstat:::adaptive-block
4 {
5 printf("\nCPU\tTID\tPID\tUID\tWAIT TIME\tCOMMAND\n");
6 printf("%d\t%d\t%d\t%d\t%d\t\t%s\n", curcpu->cpu_id,
7 curlwpsinfo->pr_lwpid, curpsinfo->pr_pid,
8 curpsinfo->pr_uid, arg1, curpsinfo->pr_psargs);
9 stack();
10 }
Test the mutex.d D script by starting four instances of the readchar user
application, which reads every file in the current directory one byte at a
time using the read(2) system call:
# (cd /usr/lib; /var/dtrace/readchar)& (cd /usr/lib; /var/dtarce/readchar)&
[1] 2323
[2] 2325
# (cd /usr/lib; /var/dtrace/readchar)& (cd /usr/lib; /var/dtrace/readchar)&

[3] 2327
[4] 2329
# ./mutex.d
^C
# mpstat 2
0 2 0 0 409 307 45 8 0 0 0 65534 13 14 0 73
0 3 0 0 401 301 54 31 0 0 0 103605 21 79 0 0
0 0 0 0 406 305 50 30 0 0 0 100905 20 80 0 0
0 1 0 0 402 302 55 32 0 0 0 104497 21 79 0 0
^C
Lock Contention on a Single Processor System
The mpstat(1M) command output indicates that you are on a single

processor system which is CPU-bound primarily in system mode. The
system call counts are high, which correlates with the high percentage of
system time. You expect such numbers from running four instances of the
readchar process. There is no mutex contention on a single processor
system until you add more file system-intensive commands, as shown in
the following example:
# find / -name fubar & ls -lR / >/ll&
[1] 2357
[2] 2358
# find / -name fubar & ls -lR / >/ll&
[3] 2359
[4] 2360
# ./mutex.d
CPU TID PID UID WAIT TIME COMMAND

0 0 0 0 56917 sched
genunix`clock+0x3f0
genunix`cyclic_softint+0xa4
unix`cbe_level10+0x8
unixìntr_thread+0x144

0 0 0 0 41076 sched
genunix`clock+0x3f0

0 0 0 0 50424 sched
sd`sdintr+0x14
glm`glm_doneq_empty+0x144

glm`glm_intr+0xf4
pcipsy`pci_intr_wrapper+0x9c

0 0 0 0 45321 sched
genunix`clock+0x3f0

0 0 0 0 41184 sched
genunix`clock+0x3f0
^C

0 0 0 0 43214 sched
genunix`kmem_cache_free+0x4c
uataàtapi_tran_destroy_pkt+0x58
scsi`scsi_destroy_pkt+0x14
sd`sd_return_command+0x16c
sd`sdintr+0x224
uata`ghd_doneq_process+0x64
Lock Contention on a Multiprocessor Server
The following output results from running four instances of the readchar
process on a four-processor server. In this case you do not run the extra
find and ls -lR commands, as you did on the uniprocessor system.
There is significantly more mutex contention, as indicated by the smtx
column (you should always ignore the first set of numbers output by the
mpstat(1M) command). There is also significantly more frequent output
from the mutex.d D script:
# mpstat 2
0 1 0 3 4 1 65 0 1 8 0 27 0 1 0 99
1 1 0 3 7 4 30 0 1 8 0 29 0 0 0 100
2 1 0 3 4 1 28 0 1 8 0 28 0 0 0 100
3 1 0 3 214 111 15 0 0 9 0 28 0 0 0 100
0 2 0 5 21 1 56 17 8 74478 0 225870 14 81 0 4
1 0 0 5 29 4 67 22 8 76857 0 228291 11 83 0 6
2 0 0 1 53 1 150 49 5 83973 0 224372 16 74 0 9

3 3 0 90 216 113 12 12 1 86392 0 227446 13 87 0 0

0 0 0 4 24 1 64 19 8 108269 0 189929 12 86 0 2
1 0 0 2 39 2 99 34 9 107282 0 189818 13 82 0 4
2 0 0 1 43 1 104 39 5 120189 0 173753 11 79 0 10
3 0 0 95 216 112 7 10 0 96010 0 229465 17 83 0 0
^C
# ./mutex.d

0 1 12523 0 23500 /var/dtrace/readchar
ufs`rdip+0x150
ufsùfs_read+0x208
genunix`read+0x274
genunix`read32+0x1c

ufs`rdip+0x488
ufsùfs_read+0x208
genunix`read+0x274
genunix`read32+0x1c

ufs`rdip+0x150
ufsùfs_read+0x208
genunix`read+0x274
genunix`read32+0x1c

ufs`rdip+0x488
ufsùfs_read+0x208
genunix`read+0x274
genunix`read32+0x1c

ufsùfs_lockfs_end+0x70
ufsùfs_read+0x25c
genunix`read+0x274
genunix`read32+0x1c


ufs`rdip+0x488
ufsùfs_read+0x208
genunix`read+0x274
genunix`read32+0x1c
The previous output shows that the mutex contention is in the UNIX File
System (UFS) code. The sleep times are only between 2129 microseconds.

The proc Provider and the system() Function

The proc provider makes available probes related to process creation and
termination as well as signal delivery. The signal-send probe fires when
a signal is being sent to a process or thread. The args[2] argument is set
to the signal number which can be compared with the symbolic names
such as SIGINT used in the signal(3head) manual page. The args[1]
argument is set to point to the psinfo_t structure of the recieving
process.
The system() built-in function allows you to run shell commands anytime
a probe fires. This general capability provides great power in that any
probe event can trigger the execution of any command. You can use
format specifications similar to the printf() built-in function to
parameterize the shell command you wish to invoke. The system()
function requires destructive actions to be enabled with either the -w
option to the dtrace(1M) command or with the #pragma statement used
inside the script with the destructive option.
The following script uses the signal-send probe as well as the built-in
system() function to display what user account is sending the SIGKILL
signal and to which process:
# cat -n whosend.d
2
3 #pragma D option destructive
5
6 proc:::signal-send
7 /args[2] == SIGKILL/
8 {
9 printf("SIGKILL was sent to %s by ", args[1]->pr_fname);
10 system("getent passwd %d | cut -d: -f5", uid);
11 }
# ./whosend.d
SIGKILL was sent to vi by Super-User
SIGKILL was sent to bash by Mary Smith

Displaying Read Call Information

DTrace provides several ways to display read information:
You can trace system-wide activity or application-specific activity.
You can show information about each individual read call or
summarize the data with an aggregation function.
You can monitor read activity at the driver level with the io provider
or at the application level with the pid provider, the syscall
provider, or the sysinfo provider.
This section demonstrates some of these methods.
Tracing Read Calls System-Wide

The first example traces, system-wide, each individual read(2) and
pread(2) system call. There is a difference between the read size requested
in the read(2) and pread(2) system calls and the number of bytes actually
read, which is given in the return value from these system calls. A 0
return value indicates an end-of-file condition; a return of -1 indicates
that the read(2) system call failed.
# cat -n reads.d
2
3 BEGIN
4 {
5 printf("FD\tREQUEST\tACTUAL\tCOMMAND\n");
6 }
7
8 syscall::read:entry, syscall::pread*:entry
9 /execname != "dtrace"/
10 {
11 self->started = 1;
12 self->arg0 = arg0;
13 self->arg2 = arg2;
14 }
15
16 syscall::read:return, syscall::pread*:return
17 /self->started/
18 {
19 printf("%d\t%d\t%d\t%s\n", self->arg0, self->arg2, arg0,
execname);

21 ++nlines;
22 }
23
25 /nlines > 20/
26 {
27 printf("FD\tREQUEST\tACTUAL\tCOMMAND\n");
28 nlines = 0;
29 }
# ./reads.d
FD REQUEST ACTUAL COMMAND
0 1 1 bash
0 1 1 bash
0 1 1 bash
0 1 1 bash
3 877 877 date
0 1 1 bash
0 1 1 bash
...
0 1 1 bash
3 152 152 uptime
4 8192 4092 uptime
4 8192 0 uptime
3 877 877 uptime
0 1 1 bash
0 1 1 bash
...
0 1 1 bash
0 1 1 bash
0 1 1 bash
3 8192 8192 grep
3 8192 200 grep
3 8192 0 grep
1 8192 1006 init
1 8192 0 init
1 8192 1006 init
1 8192 0 init
...
5 1024 61 nscd
5 8192 4464 utmpd
6 336 336 utmpd
6 336 336 utmpd
6 336 336 utmpd

5 8192 0 utmpd
1 24 -1 sac
2 8 8 ttymon
2 8 -1 ttymon
1 24 24 sac
...
0 1 1 bash
0 1 1 bash
0 128 4 sh
0 128 3 sh
0 128 4 sh
...
4 416 416 ps
4 416 416 ps
11 336 336 svc.startd
4 416 416 ps
4 416 416 ps
4 416 416 ps
4 416 416 ps
4 416 416 ps
4 416 416 ps
^C
Using the previous output (and help from the truss(1) command), you
can determine the following:
The date(1) command reads a time zone (US/Mountain)
configuration file of size 877 bytes when it starts.
The ps(1) command reads the psinfo_t structure of size 416 bytes
many times.
The init(1M) command re-reads the /etc/inittab file periodically.
The grep(1) command reads its file one page (8192 bytes) at a time.
The sh(1) command reads a whole line from standard input into a
128-byte buffer.
The bash(1) command reads standard input one byte at a time
(probably to implement command line editing).
The uptime(1) command reads the same time zone configuration file
as the date(1) command.
The sac(1M) and ttymon(1M) commands issued reads that failed.

Tracing Read Calls Using the iosnoop.d D Script

The following output results from running the iosnoop.d D script at the
same time as the previous reads.d D script. It shows that only the
grep(1) command performed actual disk reads. The other reads found the
data cached in memory.
# ./iosnoop.d
sched 0 <none> sd2 W 3.733
grep 2691 /usr/include/sys/zone.h sd2 R 4.817
fsflush 3 <none> sd2 W 13.120
^C
Aggregating Read Data

The following D script uses the avg() aggregation function to display the
average number of bytes read by file descriptor and process name:
# cat -n readsummary.d
2
3 syscall::read:entry, syscall::pread*:entry
4 {
6 self->fd = arg0;
7 }
8
10 /self->started && execname != "dtrace" && arg0 != -1/
11 {
12 @[self->fd, execname] = avg(arg0);
14 }
15
16 END
17 {

18 printa("%d\t%-24s\t%@d\n", @);
19 }
# ./readsummary.d
^C
4 instant 0
2 more 1
0 vi 1
4 readchar 1
0 bash 1
2 ttymon 8
4 rup 23
1 sac 24
5 nscd 59
19 sgml2roff 119
3 rup 413
3 rpc.rstatd 413
4 ps 416
3 uptime 514
1 init 540
3 man 550
3 ps 687
5 rup 787
4 vi 803
3 grep 845
3 date 877
3 vi 1492
4 nroff 2221
4 uptime 2232
0 nroff 3479
0 tbl 3861
3 cat 3861
6 nsgmls 3894
0 eqn 3914
0 col 3979
0 instant 4072
3 instant 4442
3 nsgmls 4459
6 rpc.rstatd 4464
3 more 4842
5 man 5802
4 nsgmls 6606
0 grep 6815

By changing the aggregation function from avg() to sum(), you can obtain
the total number of bytes read by file descriptor and process name:
# ./totalread.d
^C
4 instant 0
0 vi 6
2 more 8
5 nscd 61
0 bash 121
11 svc.startd 336
10 svc.startd 336
3 man 550
6 readchar 671
3 date 877
4 ls 877
3 vi 2984
19 sgml2roff 3214
1 init 4324
23 readchar 4572
19 readchar 10276
6 nsgmls 11684
5 man 17408
4 nroff 17771
3 more 17876
7 readchar 18064
20 readchar 18116
3 cat 18435
0 tbl 18435
4 vi 18880
10 readchar 19500
14 readchar 19500
0 eqn 20356
0 nroff 20356
0 col 22496
8 readchar 28252
0 grep 30095
3 nsgmls 33799
3 instant 53314
11 readchar 56616
0 instant 160360
4 nsgmls 171763
21 readchar 192636

Using the Anonymous Tracing Facility

Probes are usually enabled through a DTrace consumer process such as
dtrace(1M). A DTrace consumer process cannot run, however, until you
boot the system. Anonymous tracing allows you to enable tracing during
boot.
Anonymous tracing is not associated with any DTrace consumer. Any

tracing that you can do interactively with the dtrace(1M) process you can
also do anonymously. Only the super-user can create an anonymous
enabling, and there can only be one anonymous enabling at any time.
Most DTrace users do not need this feature, but because boot problems
are particularly difficult to debug, anonymous tracing can prove valuable
for kernel and device driver developers.
Creating an Anonymous Enabling

To create an anonymous enabling, use the -A option to a dtrace(1M)
invocation that specifies the desired probes, predicates, actions, and
options. The dtrace(1M) process modifies your /etc/system file to force
the loading of the kernel modules that implement the needed DTrace
providers. The dtrace process then adds a series of driver properties
representing your request to the dtrace(7D) drivers configuration file:
/kernel/drv/dtrace.conf. These properties are read by the dtrace(7D)
driver when it is loaded. The driver then enables the specified probes
with the specified actions, creating an anonymous state to associate with
the new enabling.
Reboot your system. While the system is booting, messages appear on the
console describing the anonymous enabling.
After the machine boots, claim the anonymous state by specifying the -a
option to the dtrace(1M) command. By default the -a option claims the
anonymous state, processes the existing data, and continues to run. To
process the anonymous state data and exit, add the -e option to the
dtrace(1M) command.
Performing Anonymous Tracing

The following dtrace(1M) command performs anonymous tracing on the
conskbd module, the console keyboard multiplexer driver:

# dtrace -A -m conskbd
dtrace: cleaned up old anonymous enabling in /kernel/drv/dtrace.conf
dtrace: cleaned up forceload directives in /etc/system
dtrace: saved anonymous enabling in /kernel/drv/dtrace.conf
dtrace: added forceload directives to /etc/system
dtrace: run update_drv(1M) or reboot to enable changes
# tail /etc/system
* chapter of the Solaris Dynamic Tracing Guide for details.
*
forceload: drv/systrace
forceload: drv/sdt
forceload: drv/profile
forceload: drv/lockstat
forceload: drv/fbt
forceload: drv/fasttrap
forceload: drv/dtrace
* ^^^^ Added by DTrace
# reboot
...
# grep enabling /var/adm/messages
Feb 27 07:34:22 sys63 dtrace: [ID 566105 kern.notice] NOTICE: enabling
probe 0 (:kmdb::)
probe 1 (dtrace:::ERROR)
probe 0 (:conskbd::)
# dtrace -ae
0 25339 conskbd_attach:entry
0 25340 conskbd_attach:return
0 25327 conskbdopen:entry
0 25328 conskbdopen:return
0 25331 conskbduwput:entry
0 25332 conskbduwput:return
0 25345 conskbdioctl:entry
0 25346 conskbdioctl:return
0 25331 conskbduwput:entry
0 25332 conskbduwput:return
0 25345 conskbdioctl:entry
0 25346 conskbdioctl:return
0 25329 conskbdclose:entry
0 25330 conskbdclose:return

0 25329 conskbdclose:entry
0 25330 conskbdclose:return
The forceload entries in the /etc/system are not automatically

removed after the reboot. Run the dtrace(1M) command with just the -A
option to clean up these forceload entries:
# tail -18 /etc/system
* vvvv Added by DTrace

*
* The following forceload directives were added by dtrace(1M) to allow
for
* tracing during boot. If these directives are removed, the system will
* continue to function, but tracing will not occur during boot as
desired.
* To remove these directives (and this block comment) automatically, run
* "dtrace -A" without additional arguments. See the "Anonymous Tracing"
* chapter of the Solaris Dynamic Tracing Guide for details.
*
forceload: drv/systrace
forceload: drv/sdt
forceload: drv/profile
forceload: drv/lockstat
forceload: drv/fbt
forceload: drv/fasttrap
forceload: drv/dtrace
* ^^^^ Added by DTrace
# dtrace -A
dtrace: cleaned up old anonymous enabling in /kernel/drv/dtrace.conf
dtrace: cleaned up forceload directives in /etc/system
# tail /etc/system
*
* To set variables in 'unix':
*
* set nautopush=32
* set maxusers=40
*
* To set a variable named 'debug' in the module named 'test_module'
*
* set test_module:debug = 0x13

The next example focuses only on those functions called from the
conskbd_attach() function in the conskbd module:
# cat -n cons.d
2
3 fbt::conskbd_attach:entry
4 {
5 self->trace = 1;
6 }
7
8 fbt:::
9 /self->trace/
10 {
11 }
12
13 fbt::conskbd_attach:return
14 {
15 self->trace = 0;
16 }
# dtrace -AFs cons.d

dtrace: saved anonymous enabling in /kernel/drv/dtrace.conf
dtrace: added forceload directives to /etc/system
dtrace: run update_drv(1M) or reboot to enable changes
# reboot
...
# grep enabling /var/adm/messages
probe 0 (:conskbd::)
probe 0 (fbt::conskbd_attach:entry)
probe 1 (fbt:::)
probe 2 (fbt::conskbd_attach:return)
# dtrace -ae
CPU FUNCTION
0 -> conskbd_attach
0 -> ddi_create_minor_node
0 -> ddi_create_minor_common
0 -> ddi_driver_major

0 <- ddi_driver_major
0 -> strcmp
0 <- strcmp
0 -> derive_devi_class
0 -> i_ddi_devi_class
0 <- i_ddi_devi_class
0 -> strncmp
0 <- strncmp
...
0 <- kstat_compare_bykid
0 -> kstat_zone_compare
0 <- kstat_zone_compare
0 <- avl_find
0 <- kstat_hold
0 <- kstat_hold_bykid
0 <- kstat_install
0 -> kstat_rele
0 -> cv_broadcast
0 <- cv_broadcast
0 <- kstat_rele
0 <- conskbd_attach

Using the Speculative Tracing Facility

Because of the comprehensive tracing coverage that DTrace provides, the
challenge for the user can be deciding what not to trace. The primary
mechanism for filtering out uninteresting events is the predicate
mechanism. Predicates are useful when you know at the time a probe fires
whether the probe event is interesting. For example, you might want to
know when the read(2) system call is entered only if a particular process
issued the call. There are situations, however, in which you can determine
that a given probe event is interesting only some time after the probe has
fired.
For example, if a read(2) system call is failing sporadically with an EIO

errno code value, you might want to see the total code path leading to
the error (not just the current stack trace.) Tracing every code path is
possible with the fbt provider, but doing this while waiting for the failure
to reappear results in too much recorded data. This causes one of two
problems:
Unwanted data that must be filtered afterwards
Data loss caused by running out of buffer space in DTrace
To address this problem, DTrace provides a facility called speculative

tracing. Speculative tracing allows you to tentatively trace data. Later, you
can decide that the traced data is interesting and commit it, or you can
decide that the traced data is uninteresting and discard it.

Speculative Tracing Functions

The D functions shown in Table 4-1 compose the DTrace speculative
tracing facility:
Table 4-1 DTrace Speculative Tracing Functions
Function Name Args Description
speculation None Returns an identifier (ID) for a

new speculative buffer
speculate ID Denotes that the remainder of
the probe clause should be
traced to the speculative buffer
specified by the ID
commit ID Commits the speculative buffer
associated with the ID
discard ID Discards the speculative buffer
associated with the ID
The speculation() function allocates a speculative buffer and returns a

speculation identifier (ID). You use this ID in subsequent calls to the
speculate() function. You must place the speculate() call before any
data recording action statement in the same clause. All such data
recording action statements are then speculatively traced. Probe clauses
can contain speculative tracing or regular tracing, but not both.
Aggregating actions, destructive actions, and exit actions can never be
speculative.
By default (without tuning), there is only one speculative buffer. Therefore

you must be careful not to start a new speculation before committing or
discarding an existing one. You use the commit() function to commit a
speculation. When you commit a speculative buffer, its data is copied into
the one (per CPU) principal buffer of DTrace. You cannot have any data
recording actions in a clause containing a commit() function. You use the
discard() function to discard a speculation. When a speculative buffer is
discarded, its contents are thrown away.

Speculative Tracing Example

You can use speculations to highlight a particular code path. The
following example displays the entire code path for the open(2) system
call only when it fails. The script explicitly ignores failed opens of the
/var/ld/ld.config file, which are common on this system:
# cat -n spec.d
2
3 #pragma D option flowindent
4
6 /stringof(copyinstr(arg0)) != "/var/ld/ld.config"/
7 {
8 self->spec = speculation();
9 speculate(self->spec);
10
11 /* The following will only appear if later committed */
12 printf("%s was opening: %s\n", execname, copyinstr(arg0));
13 }
14
15 fbt:::
16 /self->spec/
17 {
18 speculate(self->spec); /* default action */
19 }
20
22 /self->spec && arg0 == -1/
23 {
24 printf("Open failed with errno: %d\n", errno);
25 }
26
28 /self->spec && arg0 == -1/
29 {
30 /*
31 * Move data recorded in speculative buffer
32 * to principal buffer, freeing speculative buffer
33 * for a new specualtion()
34 */
35 commit(self->spec);
36 self->spec = 0;
37 }
38

40 /self->spec && arg0 != -1/
41 {
42 /* Throw away data recorded in speculative buffer */
43 discard(self->spec);
44 self->spec = 0;
45 }
# ./spec.d
dtrace: script './spec.d' matched 40768 probes
CPU FUNCTION
0 <= open64 Open failed with errno: 2
0 => open64 grep was opening:

/etc/sytem
0 -> open64
0 <- open64
0 -> copen
0 -> falloc
0 -> ufalloc
0 <- ufalloc
0 -> ufalloc_file
0 -> fd_find
...
0 <- cv_broadcast
0 <- setf
0 -> unfalloc
0 -> crfree
0 <- crfree
0 <- unfalloc
0 -> kmem_cache_free
0 <- kmem_cache_free
0 -> set_errno
0 <- set_errno
0 <- copen
^C
It appears that the spec.d D script never starts a new open speculation
until the current open returns and the current speculation is either
committed or discarded. This is not the case, however, if an open blocks
and does not return before another open is started. You learn in a lab
exercise how to tune the number of speculative buffers.

Application Debugging With Speculative Tracing

The next example shows how to use speculative tracing for application
debugging. Infrequent errors can be difficult to debug because they can be
difficult to reproduce. Often you can identify a problem after a failure
occurs, but at that point it is too late to reconstruct the code path that led
to the failure condition. You can use the pid provider with speculative
tracing to solve this common problem. The following script shows how to
trace every instruction in a function only when it fails.
# cat -n appspec.d
2
3 pid$target::malloc:entry
4 {
5 self->spec = speculation();
7 printf("( %d )", arg0);
8 }
9
10 pid$target::malloc: /* trace all instructions */
11 /self->spec/
12 {
14 }
15
16 pid$target::malloc:return
17 /self->spec && arg1 == 0/
18 {
19 commit(self->spec);
20 self->spec = 0;
21 }
22
23 pid$target::malloc:return
24 /self->spec && arg1 != 0/
25 {
26 discard(self->spec);
27 self->spec = 0;
28 }
# dtrace -s appspec.d -c myapp
dtrace: script 'appspec.d' matched 106 probes
...
0 42239 malloc:entry ( 1000000000 )
0 42239 malloc:entry
0 42311 malloc:4
0 42312 malloc:8
0 42313 malloc:c
0 42314 malloc:10
0 42315 malloc:14
0 42316 malloc:18
0 42317 malloc:1c

0 42318 malloc:20
0 42319 malloc:24
0 42320 malloc:28
0 42321 malloc:2c
0 42327 malloc:44
0 42328 malloc:48
0 42329 malloc:4c
0 42330 malloc:50
0 42331 malloc:54
0 42332 malloc:58
0 42333 malloc:5c
0 42334 malloc:60
0 42335 malloc:64
0 42336 malloc:68
0 42337 malloc:6c
0 42309 malloc:return
...
# mdb myapp
> _start:b
> :r
mdb: stop at _start
_start: clr %fp
> malloc::nm
Value Size Type Bind Other Shndx Name
0xff2d1cf0|0x00000070|FUNC |GLOB |0x0 |9 |libc.so.1`malloc
> 70%4=x
1c
> malloc,1c/ai
libc.so.1`malloc:
libc.so.1`malloc: save %sp, -0x60, %sp
libc.so.1`malloc+4: mov %o7, %i3
libc.so.1`malloc+8: call +8 <libc.so.1`malloc+0x10>
libc.so.1`malloc+0xc: sethi %hi(0x92400), %i2
libc.so.1`malloc+0x10: add %i2, 0x180, %i2
libc.so.1`malloc+0x14: add %i2, %o7, %i4
libc.so.1`malloc+0x18: mov %i3, %o7
libc.so.1`malloc+0x1c: ld [%i4 + 0xec8], %i5
libc.so.1`malloc+0x20: ld [%i5], %i1
libc.so.1`malloc+0x24: cmp %i1, 0
libc.so.1`malloc+0x28: bne +0x1c <libc.so.1`malloc+0x44>
libc.so.1`malloc+0x2c: nop
libc.so.1`malloc+0x30: call +0x93624 <PLT:___errno>
libc.so.1`malloc+0x34: mov 0x30, %l7
libc.so.1`malloc+0x38: st %l7, [%o0]
libc.so.1`malloc+0x3c: ret
libc.so.1`malloc+0x40: restore %g0, 0, %o0
libc.so.1`malloc+0x44: call +0x657d4
<libc.so.1àssert_no_libc_locks_held>
libc.so.1`malloc+0x48: nop
libc.so.1`malloc+0x4c: call +0x6437c <libc.so.1`lmutex_lock>
libc.so.1`malloc+0x50: ld [%i4 + 0xec0], %o0
libc.so.1`malloc+0x54: call +0x1c <libc.so.1`_malloc_unlocked>
libc.so.1`malloc+0x58: mov %i0, %o0

libc.so.1`malloc+0x5c: mov %o0, %i0

libc.so.1`malloc+0x60: call +0x6446c <libc.so.1`lmutex_unlock>
libc.so.1`malloc+0x64: ld [%i4 + 0xec0], %o0
libc.so.1`malloc+0x68: ret
libc.so.1`malloc+0x6c: restore
>

DTrace Privileges
DTrace Privileges
By default, only the super-user can use DTrace. This is because DTrace
enables visibility into all aspects of the system, including:
User-level functions
System calls
Kernel functions
Kernel data
In addition, some DTrace actions can modify a programs state by

stopping a process or even inducing a breakpoint in the kernel. Just as it is
inappropriate to allow one user to stop another users process or access
another users files, so it is inappropriate to grant a user full access to all
of the DTrace facilities. The traditional UNIX all or none approach to
user privileges is not suitable for managing the use of the DTrace
capabilities.
Using the Least Privilege Facility

The Least Privilege facility in the Solaris operating system enables a
Solaris system administrator to grant particular users or processes specific
privileges that permit access to individual DTrace capabilities.
Three specific privileges control access by a user or process to the DTrace

features:
The dtrace_proc privilege Permits use of only the pid and
plockstat providers for process-level tracing of processes owned by
the user.
The dtrace_user privilege Permits use of only the profile and
syscall providers on processes owned by the user.
The dtrace_kernel privilege Permits the use of every provider
except the pid and plockstat providers, unless dtrace_proc
privilege is also granted. Does not allow kernel-destructive actions.
In addition to the above DTrace specific privileges, if a user has both

dtrace_proc and proc_owner privileges then he is allowed to trace other
users processes.

DTrace Privileges
Kernel-Destructive Actions
Only the super-user can perform kernel-destructive actions. You enable
such actions by running the dtrace(1M) command with the -w option.
Three built-in DTrace functions cause kernel-destructive actions:
The breakpoint() function Action that induces a kernel
breakpoint, causing the system to stop, with control passing to
OpenBoot PROM or kmdb(1), depending on how the system
was booted.
The panic() function Action that induces a kernel panic with
crash files normally being created for postmortem analysis.
The chill() function Action that causes DTrace to spin for the
specified number of nanoseconds. Intended for dealing with
race condition situations.
Setting DTrace User Privileges

The Solaris Least Privilege facility enables system administrators to grant
specific privileges to specific Solaris users. To give a user a privilege at
login, insert a line into the /etc/user_attr file, as follows:
username::::defaultpriv=basic,privilege,...
The following examples show the effect of setting the three DTrace
specific privileges.
No Specified DTrace Privileges
The following example shows a user with no DTrace privileges specified:

$ cat /etc/user_attr
#
# Copyright (c) 2003 by Sun Microsystems, Inc. All rights reserved.
#
# /etc/user_attr
#
# user attributes. see user_attr(4)
#
#pragma ident "@(#)user_attr 1.1 03/07/09 SMI"
#
adm::::profiles=Log Management
lp::::profiles=Printer Management
root::::auths=solaris.*,solaris.grant;profiles=Web Console
Management,All;lock_after_retries=no

DTrace Privileges
user2::::defaultpriv=basic,dtrace_proc
user3::::defaultpriv=basic,dtrace_user
user4::::defaultpriv=basic,dtrace_kernel
user5::::defaultpriv=basic,dtrace_kernel,dtrace_proc
user6::::defaultpriv=basic,dtrace_proc,proc_owner
$ id
uid=1001(user1) gid=101(users)
$ /usr/sbin/dtrace -l
dtrace: failed to initialize dtrace: DTrace requires additional privileges
$ echo $$
919
$ /usr/sbin/dtrace -n pid919:::
$
The dtrace_proc Privilege
This example shows the DTrace features available to a user with the
dtrace_proc privilege:
$ id
$ dtrace -l
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
$ echo $$
9447
$ dtrace -n pid9447:::entry
^C
$ dtrace -qn 'pid$target:libc:memcpy:entry {printf("size: %d\n",arg2)}' -c date

Sun Feb 27 10:02:01 MST 2005
size: 16
size: 15
size: 1
size: 15
size: 5
size: 521
size: 44
size: 28
size: 28
size: 48
size: 48
size: 308
size: 56
size: 36
size: 29

DTrace Privileges
$ ps -ef | grep vi
user2 1534 1528 0 09:48:20 pts/1 0:00 grep vi
user5 1531 1452 0 09:47:55 pts/2 0:00 vi resume
$ dtrace -n pid1531:::
dtrace: invalid probe specifier pid1531:::: failed to grab pid 1531: permission denied
$ dtrace -n syscall::read:
dtrace: invalid probe specifier syscall::read:: probe description syscall::read: does
not match any probes
$
The dtrace_proc and proc_owner Privileges
$ id
$ grep user6 /etc/user_attr
user6::::defaultpriv=basic,dtrace_proc,proc_owner
$ ps -ef | grep vi
user6 650 637 0 09:41:30 pts/1 0:00 grep vi
user5 649 630 0 09:41:16 pts/2 0:00 vi resume
$ /usr/sbin/dtrace -n pid649:::entry
0 42548 peekkey:entry
0 42544 getkey:entry
0 42546 getbr:entry
0 42548 peekkey:entry
0 42544 getkey:entry
0 42546 getbr:entry
...
The dtrace_user Privilege
dtrace_user privilege:
$ id
user3::::defaultpriv=basic,dtrace_user
$ echo $$
1171
$ dtrace -n pid1171:::entry
dtrace: invalid probe specifier pid1171::: probe description pid1171::: does not match
any probes
$ pgm
f: 13 p: 0 q: -1952257862 m: -10
f: 640001883 p: -2056615 q: -929109794 m: -7
f: -1660723204 p: -1529159 q: 94444073 m: 25
f: 2041630813 p: 749994 q: -42775360 m: -23

DTrace Privileges
f: -1255556994 p: 1065403 q: 309691762 m: 14

^C
$ dtrace -qn 'syscall::write:entry /arg0 == 1/ {printf("T: %d\n",timestamp)}' -c pgm
f: 13 p: 0 q: -1952257862 m: -10
f: 640001883 p: -2056615 q: -929109794 m: -7
f: -1660723204 p: -1529159 q: 94444073 m: 25
f: 2041630813 p: 749994 q: -42775360 m: -23
f: -1255556994 p: 1065403 q: 309691762 m: 14
f: -1207459745 p: 1769677 q: -8640714 m: -35
T: 150116053418082
T: 150116222152140
T: 150116388881669
T: 150116558431666
T: 150116728255203
...
$ dtrace -n 'pid$target:::entry' -c pgm
dtrace: invalid probe specifier pid$target:::entry: probe description pid1208:::entry
does not match any probes
$ dtrace -qn 'profile-109 {@[arg1] = count()}' -c pgm

f: 13 p: 0 q: -1952257862 m: -10
f: 640001883 p: -2056615 q: -929109794 m: -7
f: -1660723204 p: -1529159 q: 94444073 m: 25
...
^C
133476 49
4280947012 226
4280947008 1094
$ mdb pgm
> _start:b
> :r
mdb: stop at pgm`_start
mypgm`_start: clr %fp
> 0t4280947008/ai
libc.so.1`.umul:
libc.so.1`.umul:umul %o0, %o1, %o0
> $q
$ (sleep 33; pwd)&
1680
$ dtrace -n 'syscall:::entry /pid != $pid/ {}'
/export/home/user3
0 18832 rexit:entry
0 18922 ioctl:entry
0 18908 setpgrp:entry
0 18922 ioctl:entry
0 19004 waitsys:entry
0 19214 getcwd:entry
0 18838 write:entry
0 18832 rexit:entry
^C
$

DTrace Privileges
The dtrace_user privilege only allows the use of the syscall and
profile providers on processes owned by the user. Even though there
are many system calls occuring in the system, the above output shows
only the sh, sleep, and pwd commands system calls.
The dtrace_kernel Privilege
dtrace_kernel privilege:
$ id
user4::::defaultpriv=basic,dtrace_kernel
$ dtrace -qn 'sched:::on-cpu {printf("Starting to run: %s\n", execname)}'
Starting to run: sched
Starting to run: fsflush
Starting to run: svc.configd
Starting to run: inetd
Starting to run: svc.startd
Starting to run: fmd
Starting to run: dtrace
^C
$ dtrace -qn 'io:::start {printf("Starting an I/O: %s\n", execname)}'
Starting an I/O: bash
Starting an I/O: fsflush
Starting an I/O: find
^C
$ echo $$
6711
$ dtrace -n pid6711:a.out::entry
dtrace: invalid probe specifier pid6711:a.out::entry: probe description
pid6711:bash::entry does not match any probes
The preceding example demonstrates that you must have the

dtrace_proc privilege to trace your own processes. The dtrace_kernel
privilege by itself is not sufficient.
$ id
user5::::defaultpriv=basic,dtrace_kernel,dtrace_proc
$ echo $$

DTrace Privileges
6736
$ dtrace -n 'pid6736:a.out::entry'
dtrace: description 'pid6736:a.out::entry' matched 211 probes
^C
$ dtrace -l | awk '{print $2}' | sort -u
PROVIDER
dtrace
fasttrap
fbt
fpuinfo
io
lockstat
mib
pid6736
proc
profile
sched
sdt
syscall
sysinfo
vminfo
$

DTrace Privileges
Privilege Needed for Kernel-Destructive Actions
Only super-user can invoke kernel-destructive actions:

$ dtrace -wn 'syscall::fork1:entry {chill(2000); printf("OK, lets start: %s\n",
execname);}'
dtrace: description 'syscall::fork1:entry ' matched 1 probe
dtrace: allowing destructive actions
dtrace: error on enabled probe ID 2 (ID 18246: syscall::fork1:entry): invalid kernel
access in action #1
dtrace: error on enabled probe ID 2 (ID 18246: syscall::fork1:entry): invalid kernel
access in action #1
^C
$ su
Password:
# dtrace -wn 'syscall::fork1:entry {chill(2000); printf("OK, lets start: %s\n",
execname);}'
dtrace: description 'syscall::fork1:entry ' matched 1 probe
0 18246 fork1:entry OK, lets start: bash
0 18246 fork1:entry OK, lets start: bash
^C
Setting DTrace Process Privileges

The Least Privilege facility also enables a Solaris system administrator to
grant privileges to specific processes. To give a running process an
additional privilege, use the ppriv(1) command:
# ppriv -s A+privilege process-ID
The following interactive session shows the use of the ppriv(1) command
to give a shell specific DTrace privileges. Look at privileges(5) for
details:
$ id
$ echo $$
1774
$ ppriv -s A+dtrace_proc 1774
1774: ppriv: Not owner
$ su
Password:
# ppriv -s A+dtrace_proc 1774
# exit

DTrace Privileges
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
$ /usr/sbin/dtrace -n 'pid$target:calls::entry' -c calls
dtrace: description 'pid$target:calls::entry' matched 7 probes
83
133
0 28355 _start:entry
0 28362 _init:entry
0 28361 main:entry
0 28360 f1:entry
0 28359 f2:entry
0 28358 f3:entry
0 28357 f4:entry
0 28356 f5:entry
0 28360 f1:entry
0 28359 f2:entry
0 28358 f3:entry
0 28357 f4:entry
0 28356 f5:entry
0 28363 _fini:entry
$ ppriv $$
1774: -sh
flags = <none>
E: basic,dtrace_proc
I: basic,dtrace_proc
P: basic,dtrace_proc
L: all
$ bash
bash-2.05b$ ppriv $$
1789: bash
flags = <none>
E: basic,dtrace_proc
I: basic,dtrace_proc
P: basic,dtrace_proc
L: all
bash-2.05b$ /usr/sbin/dtrace -n 'pid$target:calls::entry' -c calls
dtrace: description 'pid$target:calls::entry' matched 7 probes
83
133
0 28355 _start:entry
0 28362 _init:entry
0 28361 main:entry
0 28360 f1:entry
...
bash-2.05b$ echo $$
1789
bash-2.05b$ su

Password:
# ppriv -s A+dtrace_kernel 1789
# ppriv $$
1854: sh
flags = <none>
E: all
I: basic
P: all
L: all
# exit
bash-2.05b$ ppriv $$
1789: bash
flags = <none>
E: basic,dtrace_kernel,dtrace_proc
I: basic,dtrace_kernel,dtrace_proc
P: basic,dtrace_kernel,dtrace_proc
L: all
bash-2.05b$ /usr/sbin/dtrace -qn 'fbt::cv_wait_sig:entry
> {trace(execname);ustack();stack();exit(0);}'
more
ff2bcb58
15684
149a4
13ad8
12780
1201c
115cc
genunix`str_cv_wait+0x28
genunix`strwaitq+0x238
genunix`strread+0x174
genunix`read+0x274
unix`syscall_trap32+0xcc
DTrace Privileges
Summarizing the DTrace Privilege Levels

Table 4-2 describes the DTrace privilege levels.
Table 4-2 DTrace Privilege Levels
Address
Privilege Level Providers Actions Variables
Spaces
Any DTrace dtrace exit printf args probemod this None

Privilege tracemem epid probename
discard timestamp id
speculate probeprov
printa trace vtimestamp
probefunc self
dtrace_proc pid copyin execname pid uregs User
Privilege plockstat copyout stop
copyinstr
raise ustack
dtrace_user profile copyin execname pid uregs User
Privilege syscall copyout stop
copyinstr
raise ustack
dtrace_kernel All except All but All User
Privilege the pid destructive Kernel
and actions
plockstat
providers

Module 5
Troubleshooting DTrace Problems
Objectives
Describe how to lessen the performance impact of DTrace
Describe how to use and tune DTrace buffers
Debug DTrace scripts
5-1
Relevance
Relevance

to troubleshoot DTrace problems:
!
?
Would the ability to write your D scripts with minimal performance
impact be beneficial?
Would it be useful to have control over buffer management policies
when DTrace buffer space is exhausted?
Would it be useful to detect common mistakes made in D scripts?


817-6223-10.
Troubleshooting DTrace Problems 5-3

Minimizing DTrace Performance Impact

Enabling DTrace in any manner affects system performance in some way.
Often, this effect is negligible, but it can be substantial if many probes are
enabled with costly enablings. You can minimize the performance impact
of DTrace by:
Limiting enabled probes
Using aggregations
Using cacheable predicates
Limiting Enabled Probes

DTrace provides comprehensive tracing coverage of both kernel and user
processes. This coverage allows for a major probe effect if tens of
thousands of probes are enabled. In general, you should only enable as
many probes as needed to solve your problem. Do not, for example,
enable all fbt probes if a more concise enabling can answer your
question. When possible, limit enabled probes to a specific module or
function of interest. The more concisely you can formulate the problem
statement, the better you will be at limiting your probe effect.
You should also be careful when using the pid provider, because it can
instrument every instruction of an application. This can result in millions
of probes being enabled in the application, slowing the target process to a
crawl.
Nevertheless, there are many conditions in which you must enable a large
number of probes to answer a question. DTrace has been designed with
this in mind. Enabling a large number of probes can slow down the
system substantially, but it can never induce fatal failure of the machine.
You should therefore not hesitate to enable many probes if necessary.

Using Aggregations
DTrace aggregations provide a scalable method of aggregating data.
Although associative arrays appear to offer similar functionality, they are
global, general-purpose variables that cannot provide the linear scalability
of aggregations. Aggregating functions allow for intermediate results to
be kept per-CPU instead of in a shared global data structure. When a
system-wide result is required, the aggregating function may then be
applied to the set consisting of the per-CPU intermediate results. You
should therefore use aggregations rather than associative arrays whenever
possible. For example, you should avoid performing the action shown in
the following script:
syscall:::entry
{
++totals[execname];
}
syscall::rexit:entry
{
printf(%40s %d\n, execname, totals[execname]);
totals[execname] = 0;
}
You should instead perform the following:

syscall:::entry
{
@totals[execname] = count();
}
END
{
printa(%40s %@d\n, @totals);
}
Using Cacheable Predicates

A tracing framework that offers comprehensive coverage must provide a
mechanism that enables you not to trace events, otherwise you are flooded
with unwanted data. DTrace does this with predicates, which enable you
to trace data only when a specified condition is found to be true.

When enabling many probes, you tend to use predicates of a form that
identifies a specific thread or threads of interest, such as /self-
>traceme/ or /pid == 12345/. Many of these predicates evaluate to the
same (false) value for most threads in most probes, but the evaluation
itself can become costly when done for every function entry and return
point in the kernel.
To reduce this cost, DTrace caches the evaluation of a predicate if it

includes only thread-local variables (as in the first example), only
immutable variables (as in the second), or both. The cost of evaluating a
cached predicate is much smaller than the cost of evaluating a non-cached
predicate, especially if the predicate involves thread-local variables, string
comparisons, or other relatively costly operations.
Examining Cacheable and Uncacheable Predicates
Predicate caching is transparent to the user (cache coherency is

maintained by DTrace). It does, however, require you to follow some
guidelines to construct optimal predicates. Table 5-1 shows some
examples of cacheable as opposed to uncacheable predicate expressions.
Table 5-1 Cacheable and Uncacheable Predicates
Cacheable Uncacheable
self->mumble mumble[curthread] or mumble[pid, tid]

execname == pgm curpsinfo->pr_fname or curthread->t_procp-
>p_user.u_comm
pid == 1234 curpsinfo->pr_pid or curthread->t_procp-
>p_pidp->pid_id
tid == 17 curlwpsinfo->pr_lwpid or curthread->t_tid
Constructing Optimal Predicates
You should avoid constructing uncacheable predicates, such as that

shown in the following example:
syscall::read:entry
{
follow[pid, tid] = 1;
}
fbt:::
/follow[pid, tid]/

{}
/follow[pid, tid]/
{follow[pid, tid] = 0;}
You should instead use thread-local variables, as in the following

example:
syscall::read:entry
{
self->follow = 1;
}
fbt:::
/self->follow/
{}
/self->follow/
{
self->follow = 0;
}
To be cacheable, a predicate must consist exclusively of cacheable

expressions. The following predicates are all cacheable:
/execname == myprogram /
/execname == $$1/
/pid == 12345/
/pid == $1/
/self->traceme == 1/
Because of the use of global variables, these predicates are all not
cacheable:
/execname == one to_watch/
/traceme[execname]/
/pid == pid_i_care_about/
/se1f->traceme == my_global/

Using and Tuning DTrace Buffers

Data buffering and management is an essential service provided by the
DTrace framework for its clients. In previous modules you used DTrace
without examining how traced data is transported from the DTrace
framework to clients such as the dtrace(1M) utility. In this section, you
explore data buffering in detail and learn about options you can tune to
change the DTrace buffer management policies.
Principal Buffers
The buffer most fundamental to DTrace operation is the principal buffer.
The principal buffer is present in every DTrace invocation, and is the
buffer to which tracing actions record their data by default. These actions
include:
exit()
printf()
trace()
ustack()
printa()
stack()
The principal buffers are always allocated on a per-CPU basis, although

tracing (and thus buffer allocation) can be restricted to a single CPU by
using the cpu option.
Principal Buffer Policies

DTrace enables tracing in highly constrained contexts in the kernel. In
particular, DTrace enables tracing in contexts in which you cannot reliably
allocate memory. The consequence of this flexibility of context is that there
always exists a possibility that you want to trace data when there is no
space available. DTrace must have policies to deal with such situations
when they arise. Which policy you choose is dictated by the specifics of
how you are using DTrace: sometimes it is best to discard the new data,
while at other times it is desirable to reuse the space containing the oldest
recorded data to trace the new data. Usually, however, the best policy is
the one that minimizes the likelihood of running out of available space in
the first place.

To accommodate these varying demands, DTrace supports the following

buffer policies:
The switch policy
The fill policy
The ring policy
This support is implemented with the bufpolicy option, and can be set
on a per-consumer basis.
DTrace Option Settings

You can set options in a D script by using the #pragma D option
statement and the option name. If the option takes a value, the option
name should be followed by an equals sign (=) and the option value. For
example, all of the following are valid option settings:
#pragma D option nspec=4
#pragma D option grabanon
#pragma D option bufsize=2g
#pragma D option switchrate=64
#pragma D option aggrate=l00
#pragma D option bufresize=manual
The dtrace(1M) command also accepts option settings on the command

line as an argument to the -x option. For example:
# dtrace -x nspec=4 -x bufsize=2g -x switchrate=60 \
-x aggrate=l0ms -x bufpolicy=switch -n zfod
You can also specify the bufsize option with the -b flag to the
dtrace(1M) command:
# dtrace -b 2g -n zfod
Note This section describes only those options relevant to buffer

management. For details on the other DTrace options, see the Solaris
Dynamic Tracing Guide.

The switch Buffer Policy

By default, the principal buffer has a switch buffer policy. Under this
policy, per-CPU buffers are allocated in pairs: one buffer is active, the other
is inactive. When a DTrace consumer asks to read its buffer out of the
kernel, the kernel first switches the inactive and active buffers. Buffer
switching is done in such a manner that there is no window in which
tracing data can be lost. When the buffers are switched, the newly inactive
buffer is copied out to the DTrace consumer. This policy ensures that the
consumer always sees a self-consistent buffer (that is, a buffer is never
simultaneously traced to and copied out), and that no window is
introduced in which tracing is paused or otherwise prevented.
The consumer controls the rate at which the buffer is read out (and thus
switched) by using the switchrate option. As with any rate option,
switchrate can be specified with any time suffix, but defaults to rate-per-
second.
Dropped Data
Under the switch policy, if a given enabled probe would trace more data
than there is space available in the active principal buffer, the data is
dropped and a per-CPU drop count is incremented. In the event of one or
more drops, the dtrace(1M) command displays this message or a similar
one:
dtrace: 11 drops on CPU 0
You can reduce or eliminate drops by:

increasing the size of the principal buffer with the bufsize option,
or
increasing the switching rate with the switchrate option
The switch policy allocates scratch space for the copyin(), copyinstr(),
and alloca() commands out of the active buffer.
Example of Tuning Buffers to Alleviate Drops
The following D script causes significant drops:

# cat -n stress.d
2

3 fbt:::
4 {
5 trace(timestamp);
6 }
7
8 tick-5sec
9 {
10 exit(0);
11 }
# ./stress.d >/var/tmp/stress.d.out
dtrace: script './stress.d' matched 38665 probes
# ls -l /var/tmp/stress.d.out
-rw-r--r-- 1 root root 86004878 Mar 13 14:58
/var/tmp/stress.d.out
The drops result from the limited buffer space, the low switchrate value,
or both. The default buffer size for the principal buffer is 4 Mbytes and the
default switchrate is one second. In the next invocation of the script you
increase the buffer size significantly:
# dtrace -x bufsize=300m -s stress.d >/var/tmp/stress.d.out
dtrace: script 'stress.d' matched 38665 probes
dtrace: buffer size lowered to 150m
Note that DTrace lowers the setting for buffer size because there is not
enough memory. By increasing the buffer size, you eliminated all drops
and created 18 Mbytes of trace data. In the next example you use a
smaller buffer size, but with an increased switchrate value:
# dtrace -x bufsize=64m -x switchrate=16 -s stress.d >
>/var/tmp/stress.d.out
^C

The fill Buffer Policy

For some problems it is useful to have a single in-kernel buffer. In such
situations you might want to have a single, large in-kernel buffer, and
continue tracing until one or more of the per-CPU buffers has filled. You
can implement this solution using the fill buffer policy. The fill buffer
policy is beneficial in helping to avoid drops that result in the loss of trace
data. Kernel buffer space is also saved since there is only one buffer per
CPU.
Under the fill buffer policy, tracing continues until an enabled probe is
about to trace more data than there is space in the principal buffer. At this
time, the buffer is marked as filled and the consumer is notified that at
least one of its per-CPU buffers has filled. When the dtrace(1M) utility
detects a single filled buffer, tracing is stopped, all buffers are processed,
and dtrace exits. Note that no further data is traced to a filled buffer,
even if the data would fit in the buffer.
To use the fill policy, set the bufpolicy option to fill. For example,
the following invocation of DTrace traces every system call entry into a
per-CPU 2-Kbyte buffer with the buffer policy set to fill:
# dtrace -n syscall:::entry -b 2k -x bufpolicy=fill
To allow for END tracing in fill buffers, DTrace calculates beforehand the
amount of space potentially consumed by END probes and subtracts this
from the size of the principal buffer. If the net size is negative, DTrace
refuses to start, and the dtrace(1M) utility outputs a corresponding error
message:
dtrace: END enablings exceed size of principal buffer
Reserving space beforehand ensures that a full buffer always has

sufficient space for any and all END probes.

The ring Buffer Policy

When using DTrace to help diagnose failure (as opposed to
understanding non-failing behavior), you often want to track the events
leading to failure. Moreover, in cases where reproducing failure can take
hours or days, you might want to keep only the most recent data. To
support such situations, DTrace provides the ring buffer policy. Under
this policy, when a principal buffer has filled, tracing wraps around to the
first entry, thereby overwriting older tracing data. You establish a ring
buffer by setting the bufpolicy option to ring:
# dtrace -s stress.d -x bufpolicy=ring -b 16k
0 9808 disp_lock_enter_high:entry 810424080584641
0 9809 disp_lock_enter_high:return 810424080586093
0 2288 setfrontdq:return 810424080588595
0 668 generic_enq_thread:entry 810424080590727
0 669 generic_enq_thread:return 810424080592504
0 14298 ts_preempt:return 810424080594241
...
With the ring buffer policy, the dtrace(1M) utility does not display any
output until the process terminates; at that time the ring buffer is
consumed and processed.
Note that if a given record cannot fit in the buffer (that is, if the record is
larger than the buffer size), the record is dropped regardless of buffer
policy. By adding the following two lines to a D script, you can enable
ring buffering with a specific buffer size:
#praqma D option bufpolicy=ring
#pragma D option bufsize=16k

Other Buffers
Principal buffers exist in every DTrace enabling. In addition to principal
buffers, some DTrace consumers have additional in-kernel data buffers: an
aggregation buffer, a number of speculative buffers, or both. You tune the
aggregation buffer size with the aggsize option, and you tune the
speculative buffer size with the specsize option. You can tune the size of
each buffer on a per-consumer basis. Note that setting the buffer sizes
denotes the sizes of the buffers on each CPU. Moreover, for the switch
buffer policy, bufsize denotes the individual sizes of the active and
inactive buffers on each CPU.
Buffer Resizing Policy

In some cases there is not adequate free kernel memory to allocate a
buffer of the desired size. There might be insufficient memory available,
or the DTrace consumer might have exceeded a tunable limit. DTrace
provides a configurable policy when a buffer cannot be allocated.
The policy is set with the bufresize option, and defaults to auto. Under
the auto buffer resize policy, the size of a buffer is halved until a
successful allocation occurs. The dtrace(1M) utility emits a message if a
buffer as allocated is smaller than the requested size:
# dtrace -P syscall -b 4g
dtrace: buffer size lowered to 128m
# dtrace -n 'fbt:::entry {@a[probefunc] = count()}' -x aggsize=1g
dtrace: description 'fbt:::entry ' matched 16250 probes
dtrace: aggregation size lowered to 128m
Alternatively, you can set the buffer resize policy to be manual by setting
bufresize to manual. Under this policy, a failure to allocate causes
DTrace to fail to start:
# dtrace -P syscall -x bufsize=500m -x bufresize=manual
dtrace: could not enable tracing: Not enough space
The bufresize option dictates the buffer resizing policy of all buffers
principal, speculative and aggregation.

Debugging DTrace Scripts

As with any programming language, you can experience a multitude of
errors in the D language. As you write more D scripts, you find it easier to
diagnose errors, whether they be syntax errors or run-time errors. This
section provides requirements and recommendations for writing correct D
scripts.
Avoiding Syntax Errors in D Scripts

This section describes requirements that help you to avoid common D
script syntax errors.
Start your scripts with the following first line: #!/usr/sbin/dtrace -s

# ./badstart.d
./badstart.d: line 1: BEGIN: command not found
./badstart.d: line 8: tick-1sec: command not found
./badstart.d: line 10: syntax error near unexpected token
`0'
./badstart.d: line 10: ` exit(0);'
# cat comments.d
/* This D script counts the number of read system calls */
syscall::read:entry
{
@["Number of reads:"] = count();
}
# ./comments.d
./comments.d: line 1: /bin: is a directory
./comments.d: line 3: syscall::read:entry: command not
found
./comments.d: line 5: syntax error near unexpected token
`('
./comments.d: line 5: ` @["Number of reads:"] = count();'
You must match up /* with an ending */ for comments in D scripts:

# cat comments2.d
/* This D script counts the number of read system calls
syscall::read:entry
{

@["Number of reads:"] = count();

}
# ./comments2.d
dtrace: failed to compile script ./comments2.d: line 7:
end-of-file encountered before matching */
If you have more than one statement in a probe clause, make sure you end
each one with a semicolon:
...
BEGIN
{
a=$1
b=$2
c=$3
}
...
# ./badstart2.d 1 2 3
dtrace: failed to compile script ./badstart2.d: line 6:
syntax error near "b"
When comparing values, make sure that you use the == relational
operator and not =:
# cat test5.d
fbt::sema_init:entry
/arg1 = 1/
{
trace(timestamp);
}
# ./test5.d
dtrace: failed to compile script ./test5.d: line 4:
operator = can only be applied to a writable variable
The first assignment to a variable determines its type. As in the C

language, you cannot mix types in the D language:
# cat test8.d
BEGIN
{
vp = `rootdir;
i = 5;

tick-1sec
{
i = *vp;
}
# ./test8.d
operands have incompatible types: "int" = "vnode_t"
Remember that even with the -w dtrace(1M) option, which enables

destructive actions, you cannot modify kernel variables:
# cat test6.d
#!/usr/sbin/dtrace -ws
tick-5sec
/`freemem < `lotsfree/
{
`lotsfree = `lotsfree*2;
}
# ./test6.d
operator = can only be applied to a writable variable

Avoiding Run-Time Errors in D Scripts

This section describes requirements that help you to avoid common D
script run-time errors.
Make sure your D script file has execute permission:

# ./badstart.d
./badstart.d: : Permission denied
# chmod +x badstart.d
If you specify other options on the first line of a D script, be sure the s
option is last:
# head badstart3.d
#!/usr/sbin/dtrace -sq
BEGIN
{
a=$1
b=$2
c=$3
}
tick-1sec
# ./badstart3.d
dtrace: failed to open q: No such file or directory
Make sure that you pass the correct number of arguments expected by the
script (unless you explicitly set the defaultargs option). For example,
the badstart4.d script expects three command-line arguments:
# ./badstart4.d
macro argument $1 is not defined
# dtrace -x defaultargs -s badstart4.d
dtrace: script 'badstart4.d' matched 2 probes
0 36401 :tick-1sec
If an argument is a string, make sure that you either reference the

argument in the script with $$3 (if it is the third argument) or type it on
the command line as string:
# head badstart5.d

BEGIN
{
a=$1;
b=$2;
}
tick-1sec
/execname == $3/
# ./badstart5.d 1 2 init
failed to resolve init: Unknown variable name
# ./badstart5.d 1 2 '"init"'
^C
Avoid misspelled words, which are a common problem in writing D

scripts:
# ./test1.d
dtrace: failed to compile script ./test1.d: line 3: probe
description syscall::opn:entry does not match any probes
The following script uses an improper probe description:

# cat test2.d
syscall
{
trace(timestamp);
}
# ./test2.d
dtrace: failed to compile script ./test2.d: line 3: probe
description :::syscall does not match any probes
When using the printf() and printa() built-in functions, make sure that
the arguments match the format specifiers in type and number:
# cat -n test3.d
2
3 sched:::on-cpu
4 /pid != $pid && pid != 0/
5 {
6 @[curpsinfo->pr_psargs, curcpu->cpu_id] =
count();

7 }
8
9 END
10 {
11 printf("%-30s %4s %6s\n", "Command", "CPU");
12 printa("%-30s %4d %@6d\n", @);
13 }
# ./test3.d
printf( ) prototype mismatch: conversion #3 (%s) is missing
a corresponding value argument
# cat -n test3a.d
2
3 sched:::on-cpu
4 /pid != $pid && pid != 0/
5 {
6 @[curpsinfo->pr_psargs, curcpu->cpu_id] =
count();
7 }
8
9 END
10 {
11 printf("%-30s %4s %6s\n", "Command", "CPU",
"Count");
12 printa("%-30s %4s %@6d\n", @);
13 }
# ./test3a.d
dtrace: failed to compile script ./test3a.d: line 12:
printa( ) argument #3 is incompatible with conversion #2
prototype:
conversion: %s
prototype: char [] or string (or use stringof)
argument: processorid_t
# cat test4.d
syscall::open:entry
{
printf("%s was opening: %s\n", execname, arg0);
}
# ./test4.d
dtrace: failed to compile script ./test4.d: line 5: printf(
) argument #3 is incompatible with conversion #2 prototype:
conversion: %s

prototype: char [] or string (or use stringof)

argument: int64_t
Remember that pointer arguments to system calls are user addresses, not
kernel addresses. You must use the copyinstr() built-in function to
retrieve the strings:
# cat test4a.d
syscall::open:entry
{
printf("%s was opening: %s\n", execname, stringof(arg0));
}
# ./test4a.d
dtrace: script './test4a.d' matched 1 probe
dtrace: error on enabled probe ID 1 (ID 37: syscall::open:entry): invalid
address (0xff3d79d3) in action #2
address (0xff3ed570) in action #2
address (0xff3ef6d0) in action #2
^C
# cat test4b.d
syscall::open:entry
{
printf("%s was opening: %s\n", execname, copyinstr(arg0));
}
# ./test4b.d
dtrace: script './test4b.d' matched 1 probe
0 37 open:entry ls was opening: /var/ld/ld.config
0 37 open:entry ls was opening: /lib/libc.so.1
0 37 open:entry ls was opening:

/usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3
0 37 open:entry cat was opening: /var/ld/ld.config
0 37 open:entry cat was opening: /lib/libc.so.1
0 37 open:entry cat was opening:

/usr/lib/locale/en_US.ISO8859-1/en_US.ISO8859-1.so.3
^C

Numbering of Action Statements
The run-time error shown for test4a.d references action #2 although

there is only one action statement. Action statements are numbered as
follows: there is one action for each non-printf() expression, and one for
each data argument to printf. Therefore the stringof(arg0)data
argument to printf is action #2.
Avoid enabling probes that generate too much data, causing drops:
# cat drop.d
entry
{
printf("%s %s %s\n", probeprov,
probemod, probefunc);
}
# ./drop.d > /tmp/drop.out
dtrace: script './drop.d' matched 19579 probes
^Cdtrace: 448991 drops on CPU 0
If you cause any run-time exceptions in your D scripts, such as divide-by-

zero, DTrace gives you run-time errors, but continues to run:
# cat -n test9.d
2
3 BEGIN
4 {
5 x = 5*1024*1024;
6 }
7
8 tick-3sec
9 {
10 x = x/(`pagesize-8192);
11 }
# ./test9.d
dtrace: script './test9.d' matched 2 probes
dtrace: error on enabled probe ID 2 (ID 36402:
profile:::tick-3sec): divide-by-zero in action #1 at DIF
offset 20


profile:::tick-3sec): divide-by-zero in action #1 at DIF
offset 20
^C

Appendix A
Actions and Subroutines
You have seen function calls used in D program examples. D function

calls allow you to invoke two kinds of services provided by DTrace:
Actions that trace data or modify state external to DTrace
Subroutines that only affect internal DTrace state
This appendix formally defines the set of actions and subroutines

available in DTrace, along with their syntax and semantics. This appendix
enables you to:
Describe the default action
Describe and use data recording actions
Describe and use destructive actions
Describe and use special actions
Describe and use subroutines
A-1
Default Action
Default Action
A clause need not contain an action; it may instead consist simply of
manipulation of variable state, or of any combination of actions and
manipulations of variable state. If a clause contains no actions and no D
manipulation (that is, if a clause is empty), the default action is taken. The
default action is to trace the enabled probe identifier (EPID) to the
principal buffer.
The EPID identifies a particular enabling of a particular probe with a

particular predicate and actions. From the EPID, DTrace consumers can
determine which probe induced the action. Indeed, whenever data is
traced, it must be accompanied by the EPID to allow the consumer to
make sense of the data; hence the default action is to trace the EPID and
nothing else.
Using the default action allows for simple use of the dtrace(1M)
command. For example, you can enable all probes in the TS module with
the default action by using:
# dtrace -m TS
(The TS module implements the timesharing scheduling class; see

dispadmin(1M) for more information.) The above command results in
output similar to the following:
# dtrace -m TS
dtrace: description 'TS' matched 93 probes
0 14297 ts_preempt:entry
0 14298 ts_preempt:return
0 14301 ts_sleep:entry
0 14302 ts_sleep:return
0 14329 ts_update:entry
0 14331 ts_update_list:entry
0 14327 ts_change_priority:entry
0 14328 ts_change_priority:return
0 14332 ts_update_list:return
0 14332 ts_update_list:return
...
A-2 Dynamic Performance Tuning and Troubleshooting With DTrace

Data Recording Actions

Data recording actions compose the core DTrace actions. Each of these
actions records data to the principal buffer by default, but each can also
record data to speculative buffers. The descriptions below refer to the
buffer where actions are being recorded as the directed buffer.
The void trace(expression) Action

The most basic action is the trace() action, which takes a D expression as
its argument and traces the result to the directed buffer. All of the
following are valid trace() actions:
trace(execname);
trace(curlwpsinfo->pr_pri);
trace(timestamp / 1000);
trace(lbolt);
trace(somehow managed to get here);
The void tracemem(address, size_t nbytes) Action

A cousin to trace() is the tracemem() action, which takes a D expression
as its first argument, address, and a constant as its second argument,
nbytes. The tracemem() action copies the memory from the address
specified by address into the directed buffer for the length specified by
nbytes.
The void printf(string format, ...) Action

Like trace(), the printf() action traces D expressions, but printf()
allows for elaborate printf(3C)-style formatting. Like printf(3C), the
parameters consist of a format string followed by a variable number of
arguments. The following action traces a string and an integer argument
with appropriate labels:
printf(execname is %s; priority is %d, execname,
curlwpsinfo->pr_pri);
Actions and Subroutines A-3

Copyright 2005 Sun Microsystems, Inc. All Rights Reserved. Sun ServicesRevision A
The printf() action tells DTrace to trace the data associated with each
argument after the first argument, and then to format the results using the
rules described by the first printf() argument, known as a format string.
The format string is a regular string that contains any number of format
conversions, each beginning with the % character, which describe how to
format the corresponding argument. The first conversion in the format
string corresponds to the second printf() argument, the second
conversion to the third argument, and so on. All of the text between
conversions is printed verbatim. The character following the conversion
character describes the format to use for the corresponding argument.
Unlike the printf(3C) action, DTrace printf() is implemented as a built-
in function that is recognized by the D compiler. The D compiler provides
several useful services for the DTrace printf() action that are not found
in the C library printf():
The D compiler compares the arguments to the conversions in the
format string. If an arguments type is incompatible with the format
conversion, the D compiler produces an error message explaining
the problem.
The D compiler does not require the use of size prefixes with
printf() format conversions. The C printf() routine requires that
you indicate the size of arguments by adding prefixes, such as %ld
for long or %lld for long long. The D compiler knows the size and
type of your arguments, so these prefixes are not required in your D
printf() statements.
DTrace provides additional format characters that are useful for
debugging and observability; for example, the %a format conversion
can be used to print a pointer as a symbol name and offset.
In order to implement these features, the format string in the DTrace

printf() function must be specified as a string constant in your D
program; format strings cannot be dynamic variables of type string.
Conversion Specifications
Each conversion specification in the format string is introduced by the %

character, after which the following appear in sequence:
Zero or more flags (in any order), which modify the meaning of the
conversion specification as described in the following subsection.
An optional minimum field width. If the converted value has fewer
bytes than the field width, it is padded with spaces on the left by
default, or on the right if the left-adjustment flag (-) is specified. The

field width can also be specified as an asterisk (*), in which case the
field width is set dynamically based on the value of an additional
argument of type int.
An optional precision that provides one of the following:
The minimum number of digits to appear for the d, i, o, u, x,
and X conversions (the field is padded with leading zeroes)
The number of digits to appear after the radix character for the
e, E, and f conversions
The maximum number of significant digits for the g and G
conversions
The maximum number of bytes to be printed from a string by
the a conversion
The precision takes the form of a period (.) followed by either an
asterisk (*), as described in the Width and Precision Specifiers
subsection, or by a decimal digit string.
An optional sequence of size prefixes that indicate the size of the
corresponding argument (described in the Size Prefixes
subsection). The size prefixes are not necessary in D and are
provided solely for compatibility with the C printf() function.
A conversion specifier (described in the following subsection) that
indicates the type of conversion to be applied to the argument.
The printf(3C) function also supports conversion specifications of the

form %n$ where n is a decimal integer; DTrace printf() does not support
this type of conversion specification.
Flag Specifiers
You enable the printf() conversion flags by specifying one or more of the
following characters, which can appear in any order:
() The integer portion of the result of a decimal conversion (%i,
%d, %u, %f, %g, or %G) is formatted with thousands grouping
characters using the non-monetary grouping character. Not all
locales, including the POSIX C locale, provide non-monetary
grouping characters for use with this flag.
(-) The result of the conversion is left-justified within the field. The
conversion will be right-justified if this flag is not specified.
(+) The result of signed conversion always begins with a sign (+ or
-). If this flag is not specified, the conversion begins with a sign only
when a negative value is converted.

( space) If the first character of a signed conversion is not a sign or

if a signed conversion results in no characters, a space is placed
before the result. If the space and + flags both appear, the space flag
is ignored.
(#) The value is converted to an alternate form if one is defined for
the selected conversion. The alternate formats for conversions are
described below in the text corresponding to each conversion.
(0) For d, i, c, u, x, X, e, E, f, g, and G conversions, leading zeroes
(following any indication of sign or base) are used to pad to the field
width; no space padding is performed. If the 0 and - flags both
appear, the 0 flag is ignored. For d, i, o, u, x, and X conversions, if a
precision is specified, the 0 flag is ignored. If the 0 and flags both
appear, the grouping characters are inserted before the zero padding.
Width and Precision Specifiers
You can specify the minimum field width as a decimal digit string
following any flag specifier, as described previously, in which case the
field width is set to the specified number of columns. You can also specify
the field width as asterisk (*), in which case an additional argument of
type int is accessed to determine the field width. For example, to print an
integer x in a field width determined by the value of the int variable w,
you write this D statement:
printf(%*d, w, x);
Additionally, you can specify the field width using a ? character to

indicate that the field width should be set based on the number of
characters required to format an address in hexadecimal in the data model
of the operating system kernel. The width is set to 8 if the kernel is using
the 32-bit data model, or to 16 if the kernel is using the 64-bit data model.
The precision for the conversion can be specified as a decimal digit string
following a period (.) or by an asterisk (*) following a period. If an
asterisk is used to specify the precision, an additional argument of type
int prior to the conversion argument is accessed to determine the
precision. If both width and precision are specified as asterisks, the order
of arguments to printf() for the conversion should appear in the order:
width, precision, value.

Size Prefixes
Size prefixes are required in ANSI-C programs that use printf(3C) in

order to indicate the size and type of the conversion argument. The D
compiler performs this processing for your printf() calls automatically,
so size prefixes are not required.
Although size prefixes are provided for C compatibility, their use is

explicitly discouraged in D programs because they also tend to bind your
code to a particular data model when using derived types. For example, if
a typedef is redefined to different integer base types depending on the
data model, it is not possible to use a single C conversion that works in
both data models without explicitly knowing the two underlying types
and including a cast expression, or defining multiple format strings. The
D compiler solves this problem by allowing you to omit size prefixes and
automatically determining the argument size.
The size prefixes can be placed just before the format conversion name
and after any flags, widths, and precision specifiers. The size prefixes are:
Optional h specifies that a following a, i, o, u, x, or X conversion
applies to a short or unsigned short
Optional l specifies that a following d, i, o, u, x, or X conversion
applies to a long or unsigned long
Optional ll specifies that a following d, i, o, u, x, or X conversion
applies to a long long or unsigned long long
Optional L specifies that a following e, E, f, g, or G conversion
applies to a long double
Optional l specifies that a following c conversion applies to a
wint_t argument; an optional l specifies that a following s
conversion character applies to a pointer to a wchar_t argument
Conversion Formats
Each conversion character sequence results in fetching zero or more

arguments. If you do not provide sufficient arguments for the format
string, or if the format string is exhausted and arguments remain, the D
compiler issues an error message. If you specify an undefined conversion
format, the D compiler issues an error message. The conversion character
sequences and their meanings are:

a The pointer or uintptr_t argument is printed as a kernel

symbol name in the form modulesymbol-name plus an optional
hexadecimal byte offset. If the value does not fall within the range
defined by a known kernel symbol, the value is printed as a
hexadecimal integer.
c The char, short, or int argument is printed as an ASCII
character.
d The char, short, int, long, or long long argument is printed
as a decimal (base 10) integer. If the argument is signed, it is printed
as a signed value. If the argument is unsigned, it is printed as an
unsigned value. This conversion has the same meaning as i.
e, E The float, double, or long double argument is converted to
the style [-]d.dddedd, where there is one digit before the radix
character (which is non-zero if the argument is non-zero) and the
number of digits after it is equal to the precision. If you do not
specify the precision, the default precision value is 6. If the precision
is 0 and the # flag is not specified, no radix character appears. The E
conversion format produces a number with E instead of e
introducing the exponent. The exponent always contains at least two
digits. The value is rounded up to the appropriate number of digits.
f The float, double, or long double argument is converted to
the style [-]ddd.ddd, where the number of digits after the radix
character is equal to the precision specification. If you do not specify
the precision, the default precision value is 6. If the precision is 0 and
the # flag is not specified, no radix character appears. If a radix
character appears, at least one digit appears before it. The value is
rounded up to the appropriate number of digits.
g, G The float, double, or long double argument is printed in the
style f or e (or in style E in the case of a G conversion character), with
the precision specifying the number of significant digits. If an explicit
precision is 0, it is taken as 1. The style used depends on the value
converted: style e (or E) is used only if the exponent resulting from
the conversion is less than -4 or greater than or equal to the
precision. Trailing zeroes are removed from the fractional part of the
result. A radix character appears only if it is followed by a digit. If
the # flag is specified, trailing zeroes are not removed from the result
as they normally are.
i The char, short, int, long, or long long argument is printed
as a decimal (base 10) integer. If the argument is signed, it is printed
as a signed value. If the argument is unsigned, it is printed as an
unsigned value. This conversion has the same meaning as d.

o The char, short, int, long, or long long argument is printed

as an unsigned octal (base 8) integer. Arguments that are signed or
unsigned can be used with this conversion. If the # flag is specified,
the precision of the result is increased if necessary to force the first
digit of the result to be a zero.
p The pointer or uintptr_t argument is printed as a hexadecimal
(base 16) integer. D accepts pointer arguments of any type. If the #
flag is specified, a non-zero result has 0x prepended to it.
s The argument must be an array of char or a string. Bytes from
the array or string are read up to a terminating null character or to
the end of the data and are interpreted and printed as ASCII
characters. If the precision is not specified, it is taken to be infinite, so
all characters up to the first null character are printed. If the
precision is specified, only that portion of the character array that
displays in the corresponding number of screen columns is printed.
If an argument of type char * is to be formatted, it should be cast to
string or prefixed with the D stringof operator to indicate that
DTrace should trace the bytes of the string and format them.
u The char, short, int, long, or long long argument is printed
as an unsigned decimal (base 10) integer. Arguments that are signed
or unsigned can be used with this conversion, and the result is
always formatted as unsigned.
wc The int argument is converted to a wide character (wchar_t)
and the resulting wide character is printed.
ws The argument must be an array of wchar_t. Bytes from the
array are read up to a terminating null character or to the end of the
data and are interpreted and printed as wide characters. If the
precision is not specified, it is taken to be infinite, so all wide
characters up to the first null character are printed. If the precision is
specified, only that portion of the wide character array that displays
in the corresponding number of screen columns is printed.
x, X The char, short, int, long, or long long argument is printed
as an unsigned hexadecimal (base 16) integer. Arguments that are
signed or unsigned can be used with this conversion. If the X form
of the conversion is used, the letter digits abcdef are used. If the X
form of the conversion is used, the letter digits ABCDEF are used. If
the # flag is specified, a non-zero result has 0x (for %x) or 0X (for %X)
prepended to it.
% Print a literal % character; no argument is converted. The entire
conversion specification must be %%.

The printa Action

There are two forms of the printa action:
void printa(aggregation)
void printa(string format, aggregation)
The printa() action is used to format the results of aggregations in a D

program. If the first form of the action is used, the dtrace(1M) command
takes a consistent snapshot of the aggregation data and produces output
equivalent to the default output format used for aggregations. If the
second form of the function is used, the dtrace(1M) command takes a
consistent snapshot of the aggregation data and produces output based on
the conversions specified in the format string, according to the rules
described in the following subsection.
Rules for Specifying Conversions in the format String
The rules for specifying conversions in the format string are as follows:
The format conversions must match the tuple signature used to
create the aggregation. Each tuple element can only appear once. For
example, suppose you aggregate a count using the following D
statements:
@a[hello, 123] = count();
@a[goodbye, 456] = count();
If you then add the D statement printa(format-string, @a) to a
probe clause, the dtrace utility snapshots the aggregation data and
produces output as if you had entered the statements for each tuple
defined in the aggregation, such as:
printf(format-string, hello, 123);
printf(format-string, goodbye, 456);
Unlike printf(), the format string you use for printa() need not
include all elements of the tuple (that is, you can have a tuple of
length 3 and only one format conversion). Therefore you can omit
any tuple keys from your printa() output by changing your
aggregation declaration to move the ones you want to omit to the
end of the tuple and then omitting corresponding conversion
specifiers for them from the printa() format string.
The aggregation result itself can be included in the output by using
the additional @ format flag character, which is only valid when used
with printa(). The @ flag can be combined with any appropriate
format conversion specifier, and can appear more than once in a

format string. This means that your tuple result can appear
anywhere in the output and can appear more than once. The set of
conversion specifiers that can be used with each aggregating
function are implied by the aggregating functions result type, listed
below:
uint64_t avg()
uint64_t count()
int64_t lquantize()
uint64_t max()
uint64_t min()
int64_t quantize()
uint64_t sum()
For example, to format the results of avg(), you can apply the %d, %i,
%o, %u, or %x format conversions. The quantize() and lquantize()
functions format their results as an ASCII table rather than as a
single value.
Example of the printa() Action
The following D program shows a complete example of the printa()

action, using the profile provider to sample the value of caller and
then formatting the results as a simple table:
profile:::protile-997
{
@a[caller] = count();
}
END
{
printa(@8u %a\n, @a);
}
If you use the dtrace command to execute this program, then wait a few
seconds and type Control-C, you see output similar to the following:
# dtrace -s printa.d
^C
CPU ID FUNCTION: NAME
1 2 :END 1 Oxl
1 ohciohci_handle root hub_status_change+0x148
1 specfsspec_write+OxeO

1 Oxffl4f950
1 genunixcyclicsoftint+0x588
1 Oxfef228Oc
1 genunixgetf+Oxdc
1 ufsufs icheck+0x50
1 genunixinfpollinfo+0x80
1 genunixkmem_log_enter+tOxle8
...
The stack() Action

There are two forms of the stack() action:
void stack(int nframes)
void stack(void)
The stack() action records a kernel stack trace to the directed buffer. The
kernel stack is nframes in depth. If you do not provide nframes, the
number of stack frames recorded is the number specified by the
stackframes option. For example:
# dtrace -n uiomove:entry{stack()}
0 12200 uiomove:entry
ufs`rdip+0x338
ufsùfs_read+0x208
genunix`vn_rdwr+0x1c0
elfexec`getelfphdr+0xa4
elfexecèlf32exec+0x7a0
genunix`gexec+0x324
genunixèxec_common+0x278
genunixèxece+0xc
0 12200 uiomove:entry
ufsùfs_readlink+0x11c
genunix`pn_getsymlink+0x40
genunix`lookuppnvp+0x414
genunix`lookuppnat+0x120
genunix`resolvepath+0x50
...
The stack() action differs from other actions in that it can also be used as
a key to an aggregation:

# dtrace -n kmem_alloc:entry {@[stack()] = count()}

dtrace: description 'kmem_alloc:entry ' matched 1 probe
^C
genunixìnstallctx+0xc
genunix`schedctl+0x5c
unix`syscall_trap+0xac
1
genunix`schedctl_shared_alloc+0xc0
genunix`schedctl+0x18
1
unix`lgrp_shm_policy_set+0x168
genunix`segvn_create+0x82c
genunixàs_map+0xf0
genunix`schedctl_map+0x98
genunix`schedctl_shared_alloc+0x8c
genunix`schedctl+0x18
1
...
sd`xbuf_iostart+0x7c
ufs`log_roll_write_bufs+0x100
ufs`log_roll_write+0xe4
ufs`trans_roll+0x2f8
16
The ustack() Action

There are two forms of the ustack() action:
void ustack(int nframes)
void ustack(void)

The ustack() action records a user stack trace to the directed buffer. The
user stack is nframes in depth. If you do not specify nframes, the number
of stack frames recorded is the number specified by the ustackframes
option. Although ustack() can determine the address of the calling
frames when the probe fires, the stack frames are not translated into
symbols until the ustack() action is processed at user-level by the DTrace
consumer. Note that some functions are static and therefore do not have
entries in the symbol table; call sites in these functions are displayed with
their hexadecimal address. Also, because ustack() symbol translation
does not occur until after the data is recorded, there exists a possibility
that the process in question has exited, making stack frame translation
impossible. In this case, the dtrace utility emits a warning, followed by
the hexadecimal stack frames. For example:
dtrace: failed to grab process 100941: no such process
c7b834d4
c7bca95d
c7bcala4
c7bd4 374
c7bc2528
8047efc
Finally, because the postmortem DTrace debugger commands cannot

perform the frame translation, using ustack() with a ring buffer policy
always results in raw ustack() data.
Example of the ustack() Action
The following D program shows an example of the ustack() action:

syscall::brk:entry
/execname == $1/
{
@a[ustack(40)] = count();
}
# dtrace -s brk.d '"vi"'

dtrace: script 'brk.d' matched 1 probe
^C
libc.so.1`_brk_unlocked+0x4
libc.so.1`sbrk+0x24
vi`morelines+0x4
viàppend+0xc4
vi`vdoappend+0x2c
vi`fixzero+0x28

viòvbeg+0x30
vi`vop+0x158
vi`commands+0x13d0
vi`main+0xf24
vi`_start+0x108
1
...
libc.so.1`_brk_unlocked+0x4
libc.so.1`sbrk+0x24
vi`morelines+0x4
viàppend+0xc4
vi`put+0xe4
vi`vremote+0x64
vi`vmain+0x1670
vi`vop+0x25c
vi`commands+0x13d0
vi`main+0xf24
vi`_start+0x108
35

Destructive Actions
Destructive Actions
Some actions are destructive in that they change the state of the system.
Although they change the system in a well-defined way, they change it
nonetheless. You cannot use destructive actions unless you have explicitly
enabled them. In the dtrace(1M) command, you enable destructive
actions with the -w option. If you attempt to use destructive actions in the
dtrace(1M) command without explicitly enabling them, dtrace fails,
returning an error message similar to:
dtrace: could not enable tracing: Destructive actions
not allowed
Process Destructive Actions

Some destructive actions are destructive only to a processthe system
itself remains intact. These actions are available to those with the
dtrace_proc or dtrace_user privileges.
The void stop(void) Action
The stop() action forces the process that hit the enabled probe to stop
when it next leaves the kernel, as if stopped by a proc(4) action. You can
use the prun(1) utility to resume a process that has been stopped by the
stop() action. You can use the stop() action to stop a process at any
DTrace probe point; this allows you to capture a program in a very
particular state (which is difficult to achieve with a simple breakpoint).
You can then attach a traditional debugger (such as mdb(1)) to examine the
programs state, or use the gcore(1) utility to capture that state in a core
file for later analysis.
The void raise(int signal) Action
The raise() action sends the specified signal to the currently running
process. This is similar to using the kill(1) command to send a process a
signal; however, you can use the raise() action to send a signal at a
precise point in a processs execution.

Destructive Actions
The void copyout(void *buf, uintptr_t addr, size_t

nbytes) Action
The copyout() action copies nbytes from the buffer specified by buf to
the address specified by addr in the address space of the process
associated with the current thread. If the user-space address does not
correspond to a valid, faulted-in page in the current address space, an
error is generated.
The void copyoutstr(string str, uintptr_t addr,

size_t maxlen) Action
The copyoutstr() action copies the string specified by str to the address
specified by addr in the address space of the process associated with the
current thread. If the user-space address does not correspond to a valid,
faulted-in page in the current address space, an error is generated. The
string length is limited to the value set by the strsize option.
The void system(string program ...) Action
The system() action causes the program to be executed as if it were given

to a shell as input.The program string can contain any of the printf()
format conversions with corresponding arguments that follow.
Example of the system() Action
#pragma D option destructive

#pragma D option quiet
proc:::signal-send
/args[2] == SIGINT/
{
printf("SIGINT sent to %s by ", args[1]->pr_fname);
system("getent passwd %d | cut -d: -f5", uid);
}
# ./whosend.d
SIGINT sent to run-mozilla.sh by Mary Smith
^C

Destructive Actions
Kernel Destructive Actions

Some destructive actions are destructive to the entire system. These must
be used with extreme care, as they can affect any process on the system
(and any other systems dependent upon your network services).
The void breakpoint(void) Action
The breakpoint() action induces a kernel breakpoint, causing the system

to stop and control to transfer to the kernel debugger. The kernel
debugger then emits a string denoting the DTrace probe that triggered the
action. For example, suppose you performed the following action:
# dtrace -w -n clock:entry {breakpoint()}'
dtrace: description 'clock:entry' matched 1 probe
On the Solaris Operating System running on SPARC, you might see

the following on the console:
dtrace: breakpoint action at probe
fbt:genunix:clock:entry (ecb 30002765700)
Type go to resume
ok
On Solaris running on x86, you might see the following on the console:
dtrace: breakpoint action at probe
fbt:genunix:clock:entry (ecb d2b97060)
stopped at int2O+Oxb: ret
kadb [0]:
The address following the probe description is the address of the enabling
control block (ECB) within DTrace. You can use it to learn more details
about the probe enabling that induced the breakpoint action.
Note that a mistake with the breakpoint() action can cause it to be called
far more often than intended. This can in turn prevent you from even
terminating the DTrace consumer that is inducing the breakpoint actions.
If you find yourself in this situation, set the kernel integer variable
dtrace_destructive_disallow to 1. This disallows all destructive
actions on the machine. This setting should be used only if you find
yourself in this particular situation.

Destructive Actions
The exact method for setting dtrace_destructive_disallow depends

on the kernel debugger that you are using. If you are using OpenBoot
PROM on SPARC, follow these steps:
1. Use w! as follows:
ok 1 dtrace_destructive_disallow w!
ok
2. Confirm that this has been set using w?:
ok dtrace_destructive_disallow w?
1
ok
3. Continue by using go:
ok go
If you are using the kadb(1M) debugger on x86, follow these steps:
1. Use the 4-byte write modifier (W) with the / formatting dcmd:
kadb[0]: dtrace_destructive_disallow/w 1
dtrace_destructive_disallow: 0x0 = 0xl
kadb[0]:
2. Continue by entering :c:
kadb[0]: :c
If you wish to re-enable destructive actions after continuing, you must

explicitly reset dtrace_destructive_disallow back to 0. You do this
using the mdb(1) debugger:
# echo dtrace_destructive_disallow/W 0 | mdb -kw
dtrace_destructive_disallow: 0xl = 0x0
#
The void panic(void) Action
The panic() action induces a kernel panic when triggered. Use this action
to force a system crash dump at a time of interest. The panic() action can
be used together with ring buffering and postmortem analysis to
understand a problem. When you use the panic() action, you see a panic
message that denotes the probe inducing the panic. For example:
panic[cpu0]/thread=300Ol83Ob80: dtrace: panic action at
probe
syscall::mmap:entry (ecb 300000acfc8)

Destructive Actions
000002al0050b840 dtrace:dtrace_probe+518 (fffe, 0, 1830f88,

1830f88, 30002fb8040, 300000acfc8)
%l0-3: 0000000000000000 00000300030e4d80 0000030003418000
00000300018c0800
%l4-7: 000002a10050b980 0000000000000500 0000000000000000
0000000000000502
000002a10050ba30 genunix:dtrace_systrace_syscall32+44 (0,
2000, 5, 80000002, 3, 1898400)
%l0-3: 00000300030de730 0000000002200008 00000000000000e0
000000000184d928
%l4-7: 00000300030de000 0000000000000730 0000000000000073
0000000000000010
syncing file systems... 2 done

dumping to /dev/dsk/cOtOdOsl, offset 214827008, content:
kernel
100% done: 11837 pages dumped, compression ratio 4.66, dump
succeeded
rebooting...
In addition, the syslogd(1M) emits a message upon reboot:

Jun 10 16:56:31 machinel savecore: [ID 570001 auth.error]
reboot after panic:
dtrace: panic action at probe syscall::mmap:entry (ecb
300000actc8)
The message buffer of the crash dump will also contain the probe and
ECB responsible for the panic() action.
The void chill(int nanoseconds) Action
The chill() action causes DTrace to spin for the specified number of
nanoseconds. This action is primarily useful for exploring problems that
might be timing related. For example, you can use it to open race
condition windows, or to bring periodic events into or out of phase with
one another.

Special Actions
Because interrupts are disabled while in DTrace probe context, any use of
the chill() action induces interrupt latency, scheduling latency, dispatch
latency, and so on. The chill() action can, therefore, cause strange
systemic effects, and should not be used indiscriminately. Moreover,
because the liveness of the system relies on being able to periodically
handle interrupts, DTrace refuses to implement the chill() action for
longer than 500 milliseconds within any given one-second interval, and
instead reports an illegal operation error:
# dtrace -w -n 'syscall::open:entry {chill(500000001)}'
dtrace: description 'syscall::open:entry ' matched 1 probe
syscall::open:entry): illegal operation in action #1
The cap is enforced even if the time is spread across multiple calls to
chill(), or if the time is spread across multiple DTrace consumers for a
single probe.
Special Actions
Some actions do not fall into either the data recording action or the
destructive action category. These other special actions fall into one of two
sets. The first set contains those actions associated with speculative tracing.
The second set contains the exit() action.
Actions Associated With Speculative Tracing

Three actions are associated with speculative tracing:
speculate(int id)
The speculate() action denotes that the remainder of the probe
clause should be traced to the speculative buffer specified by id.
commit(int id)
The commit() action commits the speculative buffer associated with
id.
discard(int id)
The discard() action discards the speculative buffer associated with
id.

Subroutines
The void exit(int status) Action

You use the exit() action to immediately stop tracing, and to inform the
DTrace consumer that it should cease tracing, perform any final
processing, and call exit(3C) with the status specified. Because exit()
does return a status to user-level, it is a data-storing action. Unlike other
data-storing actions, however, it cannot be speculatively traced. The
exit() action causes the DTrace consumer to exit regardless of buffer
policy. Note that the data-storing nature of the exit() action means that it
can be dropped.
When the exit() action is called, only DTrace actions already underway
on other CPUs are taken; no subsequent actions are taken on any CPU.
The only exception to this is the END probe, which is called after the
DTrace consumer has processed the exit() action and has indicated that
tracing should stop.
Subroutines
Subroutines differ from actions in that they generally only affect internal
DTrace state. There is therefore no such thing as a destructive subroutine,
and subroutines never trace data into buffers. Many subroutines have
analogs in Section 9F or Section 3C of the manual pages; see Intro(9F)
and Intro(3), respectively.
The void *alloca(size_t size) Subroutine

The alloca() subroutine allocates size bytes out of scratch space, and
returns a pointer to the allocated memory. The returned pointer is
guaranteed to have 8-byte alignment. Scratch space is only valid for the
duration of a clause; memory allocated with alloca() is deallocated when
the clause completes. If insufficient scratch space is available, no memory
is allocated and an error is generated.

Subroutines
The string basename(char *str) Subroutine

The basename() subroutine is a D analogue for basename(1); it creates a
string that consists of a copy of the specified string, but without any prefix
that ends in /. The returned string is allocated out of scratch memory, and
is therefore valid only for the duration of the clause. If insufficient scratch
space is available, basename aborts and an error is generated.
The void bcopy(void *src, void *dest, size_t

size) Subroutine
The bcopy() subroutine copies the bytes specified by the size variable
from the memory pointed to by the src variable to the memory pointed
to by the dest variable. All of the source memory must lie outside of
scratch memory and all of the destination memory must lie within it; if
this is not the case, no copying takes place and an error is generated.
The string cleanpath(char *str) Subroutine

The cleanpath() subroutine creates a string that consists of a copy of the
path indicated by the str variable, but with certain redundant elements
eliminated. In particular, /./ elements in the path are removed, and
/../ elements are collapsed.
Note that the collapsing of /../ elements is nave in that the parent
component is collapsed without regard to symbolic links. As a result, the
cleanpath() subroutine might take a valid path and return a shorter,
invalid one. For example, if the path specified by str were
/foo/../bar, and /foo were a symbolic link to /net/foo/export,
then cleanpath() would return the string /bar even though bar might
only be in /net/foo, not in /. This limitation is due to the fact that
cleanpath() is called in the context of a firing probe, where full symbolic
link resolution or arbitrary names are not possible. The returned string is
allocated out of scratch memory, and is therefore valid only for the
duration of the clause. If insufficient scratch space is available, cleanpath
aborts and an error is generated.

Subroutines
The void *copyin(uintptr_t addr, size_t size)

Subroutine
The copyin() subroutine copies the specified size in bytes from the
specified user address into a DTrace scratch buffer, and returns the
address of this buffer. The user address is interpreted as an address in the
space of the process associated with the current thread. The resulting
buffer pointer is guaranteed to have 8-byte alignment. The address in
question must correspond to a faulted-in page in the current process. If
the address does not correspond to a faulted-in page, or if insufficient
scratch space is available, NULL is returned, and an error is generated.
The string copyinstr(uintptr_t addr) Subroutine

The copyinstr() subroutine copies a null-terminated C string from the
specified user address into a DTrace scratch buffer, and returns the
address of this buffer. The user address is interpreted as an address in the
space of the process associated with the current thread. The string length
is limited to the value set by the strsize option. As with the copyin
subroutine, the specified address must correspond to a faulted-in page in
the current process. If the address does not correspond to a faulted-in
page, or if insufficient scratch space is available, NULL is returned, and an
error is generated.
The void copyinto(uintptr_t addr, size_t size,

void *dest) Subroutine
The copyinto() subroutine copies the specified size in bytes from the
specified user address into the DTrace scratch buffer specified by the dest
variable. The user address is interpreted as an address in the space of the
process associated with the current thread. The address in question must
correspond to a faulted-in page in the current process. If the address does
not correspond to a faulted-in page, or if any of the destination memory
lies outside scratch space, no copying takes place, and an error is
generated.

Subroutines
The string dirname(char *str) Subroutine

The dirname() subroutine is a D analogue for dirname(1); it creates a
string that consists of all but the last level of the path name specified by
str. The returned string is allocated out of scratch memory, and is
therefore valid only for the duration of the clause. If insufficient scratch
space is available, dirname aborts and an error is generated.
The size_t msgdsize(mblk_t *mp) Subroutine

The msgdsize() subroutine returns the number of bytes in the data
message pointed to by the mp variable. See msgdsize(9F) for details. Note
that msgdsize() only includes data blocks of type M_DATA in the count.
The size_t msgsize(mblk_t *mp) Subroutine

The msgsize() subroutine returns the number of bytes in the message
pointed to by the mp variable. Unlike the msgdsize() subroutine, which
returns only the number of data bytes, msgsize() returns the total number
of bytes in the message.
The int mutex_owned(kmutex_t *mutex) Subroutine

The mutex_owned() subroutine is an implementation of the
mutex_owned(9F) command. The mutex_owned() subroutine returns non-
zero if the calling thread currently holds the specified kernel mutex, or
zero if the specified adaptive mutex is currently unowned.
The kthread_t *mutex_owner(kmutex_t *mutex)

Subroutine
The mutex_owner() subroutine returns the thread pointer of the current
owner of the specified adaptive kernel mutex. The mutex_owner()
subroutine returns NULL if the specified adaptive mutex is currently
unowned, or if the specified mutex is a spin mutex. See mutex_owned(9F).

Subroutines
The int mutex_type_adaptive(kmutex_t *mutex)

Subroutine
The mutex_type_adaptive() subroutine returns non-zero if the specified
kernel mutex is of type MUTEX_ADAPTIVE, or zero if it is not. Mutexes are
adaptive if they are:
Declared statically
Created with an interrupt block cookie of NULL, or
Created with an interrupt block cookie that does not correspond to a
high-level interrupt.
See mutex_init(9F) for more details on mutexes. The great majority of

mutexes in the Solaris kernel are adaptive.
The int progenyof(pid_t pid) Subroutine

The progenyof() subroutine returns non-zero if the calling process (the
process associated with the thread that is currently triggering the matched
probe) is among the progeny of the specified process ID.
The int rand(void) Subroutine

The rand() subroutine returns a pseudo-random integer. The number
returned is a weak pseudo-random number, and should not be used for
any cryptographic application.
The int rw_iswriter(krwlock_t *rwlock)

Subroutine
The rw_iswriter() subroutine returns non-zero if the specified reader-
writer lock is either held or desired by a writer. If the lock is neither held
nor desired by any writers (that is, it is held only by readers and no writer
is blocked, or it is not held at all), rw_iswriter() returns zero. Refer to
rw_init(9F).

Subroutines
The int rw_write_held(krwlock_t *rwlock)

Subroutine
The rw_write_held() subroutine returns non-zero if the specified reader-
writer lock is currently held by a writer. If the lock is held only by readers
or not held at all, rw_write_held() returns zero. See rw_init(9F).
The int speculation(void) Subroutine

The speculation() subroutine reserves a speculative trace buffer for use
with the speculate() action, and returns an identifier for this buffer.
The string strjoin(char *str1, char *str2)

Subroutine
The strjoin() subroutine creates a string that consists of the strl
variable concatenated with the str2. variable. The returned string is
allocated out of scratch memory, and is therefore valid only for the
duration of the clause. If insufficient scratch space is available, strjoin
aborts and an error is generated.
The size_t strlen(string str) Subroutine

The strlen() subroutine returns the length of the specified string in bytes,
excluding the terminating null byte.

Appendix B
D Built-in and Macro Variables
This appendix describes and lists:

Built-in variables provided by the D language
Macro variables provided by the D language
B-1
Built-in Variables
Built-in Variables
You have seen a number of special built-in D variables in the example
programs, including timestamp, pid, and others. All of these variables
are scalar global variables; currently D does not define thread-local
variables, clause-local variables, or built-in associative arrays. Table B-1
shows the complete list of D built-in variables.
Table B-1 DTrace Built-in Variables
Type and Name Description
int64_t arg0, ..., arg9 The first ten input arguments to a probe represented as raw
64-bit integers. If fewer than ten arguments are passed to
the current probe, the remaining variables return zero.
args[] The typed arguments to the current probe, if any. The
args[] array is accessed using an integer index, but each
element is defined to be the type corresponding to the
given probe argument. For example, if args[] is
referenced by a read(2) system call probe, args[0] is of
type int, args[1] is of type void *, and args[2] is of
type size_t.
unintptr_t caller The program counter location of the current thread just
before entering the current probe.
lwpsinfo_t *curlwpsinfo The lightweight process (LWP) state of the LWP associated
with the current thread. This structure is described in
further detail in proc(4).
psinfo_t *curpsinfo The process state of the process associated with the current
thread. This structure is described in further detail in
proc(4).
kthread_t *curthread The address of the operating system kernels internal data
structure for the current thread, the kthread_t structure.
The kthread_t is defined in <sys/thread.h>.
string cwd The name of the current working directory of the process
associated with the current thread.
epid The enabled probe ID (EPID) for the current probe. This
integer uniquely identifies a particular probe that is
enabled with a specific predicate and set of actions.
int errno The error value returned by the last system call executed
by this thread.
B-2 Dynamic Performance Tuning and Troubleshooting With DTrace

Built-in Variables
Table B-1 DTrace Built-in Variables (Continued)
Type and Name Description

string execname The name that was passed to exec(2) to execute the current
process.
uint_t id The probe ID for the current probe. This is the system-wide
unique identifier for the probe as published by DTrace and
listed in the output of dtrace -l.
uint_t ipl The interrupt priority level (IPL) on the current CPU at
probe firing time.
pid_t pid The process ID of the current process.
string probefunc The function name portion of the current probes
description.
string probemod The module name portion of the current probes
description.
string probename The name portion of the current probes description.
string probeprov The provider name portion of the current probes
description.
string root The name of the root directory of the process associated
with the current thread.
unit_t stackdepth The current threads stack frame depth at probe firing time.
id_t tid The thread ID of the current thread. For threads associated
with user processes, this value is equal to the result of a call
to pthread_self(3C).
unint64_t timestamp The current value of a nanosecond timestamp counter. This
counter increments from an arbitrary point in the past and
should only be used for relative computations.
unint64_t uregs[] The current threads saved user-mode register values at
probe firing time.
unint64_t vtimestamp The current value of a nanosecond timestamp counter that
is virtualized to the amount of time that the current thread
has been running on a CPU, minus the time spent in
DTrace predicates and actions. This counter increments
from an arbitrary point in the past and should only be used
for relative time computations.
D Built-in and Macro Variables B-3

Macro Variables
Macro Variables
The D compiler defines a set of built-in macro variables that you can use
when writing D programs or interpreter files. Macro variables are
identifiers that are prefixed with a dollar sign ($) and are expanded once
by the D compiler when processing your input file. Table B-2 shows the
complete list of D macro variables.
Table B-2 D Macro Variables
Name Description Reference
$[0-9]+ Macro arguments See Module 2, Built-in

Macro Variables
$egid Effective group ID getegid(2)
$euid Effective user ID geteuid(2)
$gid Real group ID getgid(2)
$pid Process ID getpid(2)
$pgid Parent group ID getpgid(2)
$ppid Parent process ID getppid(2)
$projid Project ID getprojid(2)
$sid Session ID getsid(2)
$taskid Task ID getatskid(2)
$uid Real user ID getuid(2)
B-4 Dynamic Performance Tuning and Troubleshooting With DTrace

Appendix C
D Operators
This appendix defines and describes the following D operators:

Arithmetic operators
Relational operators
Logical operators
Bitwise operators
Assignment operators
Increment and decrement operators
This appendix also describes conditional expressions.
C-1
Arithmetic Operators
Arithmetic Operators
D provides the standard arithmetic operators for use in your programs.
These operators all have the same meaning as they do in ANSI-C for
integer operands. Table C-1 shows the D binary arithmetic operators.
Table C-1 D Binary Arithmetic Operators
Operator Meaning
+ Integer addition
- Integer subtraction
* Integer multiplication
/ Integer division
% Integer modulus
Arithmetic in D can only be performed on integer operands or on

pointers. Arithmetic cannot be performed on floating-point operands in D
programs. The DTrace execution environment does not take any action on
integer overflow or underflow; you must check for these conditions
yourself in situations where they are applicable.
The DTrace execution environment does automatically check for and

report division by zero errors resulting from improper use of the / and %
operators. If a D program executes an invalid division operation, DTrace
automatically disables the affected instrumentation and reports the error
to you. Errors detected by DTrace have no effect on other DTrace users or
on the operating system kernel, so you do not need to worry about
causing any damage if your D program inadvertently contains one of
these errors.
In addition to these binary operators, the + and - operators can also be

used as unary operators; these have higher precedence than any of the
binary arithmetic operators. The order of precedence and associativity
properties for all the D operators is summarized at the end of this
Appendix. You can control precedence by grouping expressions in
parentheses ( ).
C-2 Dynamic Performance Tuning and Troubleshooting With DTrace

Relational Operators
Relational Operators
D provides binary relational operators for use in your programs. These
operators all have the same meaning as they do in ANSI-C. Table C-2
shows the D relational operators.
Table C-2 D Relational Operators
Operator Meaning
< Left-hand operand is less than right-hand operand

<= Left-hand operand is less than or equal to right-hand
operand
> Left-hand operand is greater than right-hand operand
>= Left-hand operand is greater than or equal to right-hand
operand
== Left-hand operand is equal to right-hand operand
!= Left-hand operand is not equal to right-hand operand
Relational operators are most frequently used to write D predicates. Each

operator evaluates to a value of type int, which is equal to 1 if the
condition is true, and 0 if it is false.
Relational operators can be applied to pairs of integers, pointers, or

strings. If pointers are compared, the result is equivalent to an integer
comparison of the two pointers interpreted as unsigned integers. If strings
are compared, the result is determined as if by performing a strcmp(3C)
on the two operands. Here are some example D string comparisons and
their results:
coffee < espresso ... returns 1 (true)

coffee == coffee ... returns 1 (true)
coffee >= mocha ... returns 0 (false)
Relational operators can also be used to compare a data object associated

with an enumeration type with any of the enumerator tags defined by the
enumeration. Enumerations are a facility for creating named integer
constants.
D Operators C-3
Logical Operators
Logical Operators
D provides binary logical operators for use in your programs. Table C-3
shows the D logical operators. The first two are equivalent to the
corresponding ANSI-C operators.
Table C-3 D Relational Operators
Operator Meaning
&& Logical AND: true if both operands are true

|| Logical OR: true if one or both operands are true
^^ Logical XOR: true if exactly one operand is true
Logical operators are most frequently used in writing D predicates. The

logical AND operator performs short-circuit evaluation: if the left-hand
operand is false, the right-hand expression is not evaluated. The logical
OR operator also performs short-circuit evaluation: if the left-hand
operand is true, the right-hand expression is not evaluated. The logical
XOR operator does not short-circuit: both expression operands are always
evaluated.
In addition to the binary logical operators, the unary ! operator can be

used to perform a logical negation of a single operand: it converts a zero
operand into a 1 and a non-zero operand into a 0. By convention, D
programmers use ! when working with integers that are meant to
represent Boolean values and == 0 when working with non-Boolean
integers, although both expressions are equivalent in meaning.
The logical operators can be applied to operands of integer type or

pointer type. The logical operators interpret pointer operands as unsigned
integer values. As with all logical and relational operators in D, operands
are true if they have a non-zero integer value and false if they have a zero
integer value.

Bitwise Operators
Bitwise Operators
D provides binary operators for manipulating individual bits inside of
integer operands. These operators all have the same meaning as they do
in ANSI-C. Table C-4 shows the D bitwise operators.
Table C-4 D Bitwise Operators
Operator Meaning
& Bitwise AND

| Bitwise OR
^ Bitwise XOR
<< Shift the left-hand operand left by the number of bits
specified by the right-hand operand
>> Shift the left-hand operand right by the number of bits
specified by the right-hand operand
You use the binary & operator to clear bits from an integer operand. You
use the binary | operator to set bits in an integer operand. The binary ^
operator returns 1 in each bit position where exactly one of the
corresponding operand bits is set.
You use the shift operators to move bits left or right in a given integer
operand. Shifting left fills empty bit positions on the right-hand side of
the result with zeroes. Shifting right using an unsigned integer operand
fills empty bit positions on the left-hand side of the result with zeroes.
Shifting right using a signed integer operand (an action known as an
arithmetic shift operation) fills empty bit positions on the left-hand side
with the value of the sign bit.
Shifting an integer value by a negative number of bits or by a number of

bits larger than the number of bits in the left-hand operand itself produces
an undefined result. The D compiler produces an error message if it
detects this condition when you compile your D program.
In addition to the binary logical operators, you can use the unary ~
operator to perform a bitwise negation of a single operand: it converts
each 0 bit in the operand into a 1 bit, and each 1 bit in the operand into a
0 bit.
D Operators C-5
Assignment Operators
D provides the following binary assignment operators for modifying D
variables. Remember that you can only modify D variables and arrays:
kernel data objects and constants cannot be modified using the D
assignment operators. The assignment operators have the same meaning
as they do in ANSI-C. Table C-5 shows the D assignment operators.
Table C-5 D Assignment Operators
Operator Meaning
= Set the left-hand operand equal to the right-hand expression

value
+= Increment the left-hand operand by the right-hand
expression value
-= Decrement the left-hand operand by the right-hand
expression value
*= Multiply the left-hand operand by the right-hand expression
value
/= Divide the left-hand operand by the right-hand expression
value
%= Modulo the left-hand operand by the right-hand expression
value
|= Bitwise OR the left-hand operand with the right-hand
expression value
&= Bitwise AND the left-hand operand with the right-hand
expression value
^= Bitwise XOR the left-hand operand with the right-hand
expression value
<<= Shift the left-hand operand left by the number of bits
specified by the right-hand expression value
>>= Shift the left-hand operand right by the number of bits
specified by the right-hand expression value

With the exception of the assignment operator =, the assignment operators

are provided as short-hand for using the operator with one of the other
operators described previously. For example, the expression x = x + 1 is
equivalent to the expression x += 1, except that the expression x is
evaluated once. These assignment operators obey the same rules for
operand types as the binary forms described previously.
The result of any assignment operator is an expression equal to the new

value of the left-hand expression. You can use the assignment operators,
or any of the operators described so far, in combination to form
expressions of arbitrary complexity. You can use parentheses ( ) to group
terms in complex expressions.
D Operators C-7
Increment and Decrement Operators
Increment and Decrement Operators

D provides the special unary ++ and -- operators for incrementing and
decrementing pointers and integers. These operators have the same
meaning as they do in ANSI-C. They can only be applied to variables, and
can be applied either before or after the variable name. If the operator
appears before the variable name, the variable is first modified and the
resulting expression is equal to the new value of the variable. For
example, the following two expressions produce identical results:
x += 1; y = ++x;
y = x;
If the operator appears after the variable name, the variable is modified
after its current value is returned for use in the expression. For example,
the following two expressions produce identical results:
y = x; y = x--;
x -= 1;
You can use the increment and decrement operators to create new
variables without declaring them. If you omit a variable declaration and
apply the increment or decrement operator to a variable, the variable is
implicitly declared to be of type int64_t.
You can apply the increment and decrement operators to integer or

pointer variables. When applied to integer variables, the operators
increment or decrement the corresponding value by one. When applied to
pointer variables, the operators increment or decrement the pointer
address by the size of the data type referenced by the pointer.

Conditional Expressions
Conditional Expressions
Although D does not provide support for if-then-else constructs, it does
provide support for simple conditional expressions using the ? and :
operators. These operators permit a triplet of expressions to be associated
where the first expression is used to conditionally evaluate one of the
other two. For example, the following D statement can be used to set a
variable x to one of two strings, depending on the value of i:
x = i == 0 ? zero : non-zero;
In this example, the expression i == 0 is first evaluated to determine if it

is true or false. If the first expression is true, the second expression is
evaluated and the ?: expression returns its value. If the first expression is
false, the third expression is evaluated and the ?: expression return its
value.
As with any D operator, you can use multiple ?: operators in a single

expression to create more complex expressions. For example, the
following expression takes a char variable c containing one of the
characters 0-9, a-z, or A-Z and returns the value of this character when
interpreted as a digit in a hexadecimal (base 16) integer:
hexval = (c >= 0 && c <= 9) ? c - 0 :
(c >= a && c <= z) ? c + 10 - a : c + 10 - A;
The first expression used with ?: must be a pointer or integer in order to

be evaluated for its truth value. The second and third expressions can be
of any compatible types. You cannot construct a conditional expression in
which, for example, one path returns a string and another an integer. The
second and third expressions also cannot invoke a tracing function, such
as trace() or printf(). If you want to trace data conditionally, you should
use a predicate instead.
D Operators C-9

Dynamic Performance Tuning and Troubleshooting With DTrace SA 327 S10 New PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Dynamic Performance Tuning and Troubleshooting With DTrace SA 327 S10 New PDF

Caricato da

Copyright:

Formati disponibili

Dynamic Performance Tuning and

Troubleshooting With DTrace

Sun Microsystems, Inc.

DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS, AND

Export Control Classification Number EAR99 assigned: 10 September 2004

LA DOCUMENTATION EST FOURNIE EN LETAT ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES

vi Dynamic Performance Tuning and Troubleshooting With DTrace

viii Dynamic Performance Tuning and Troubleshooting With DTrace

About This Course

Understanding and Using the DTrace Facility

DTrace Fundamentals Using DTrace

Using DTrace to Debug Applications and Find System Problems

Debugging Applications Finding System

Preface-xii Dynamic Performance Tuning and Troubleshooting With DTrace

Topics Not Covered

About This Course Preface-xiii

How Prepared Are You?

Preface-xiv Dynamic Performance Tuning and Troubleshooting With DTrace

About This Course Preface-xv

How to Use Course Materials

Preface-xvi Dynamic Performance Tuning and Troubleshooting With DTrace

Additional resources Indicates other references that provide additional

Discussion Indicates a small-group or class discussion on the current

Caution Indicates that there is a risk of personal injury from a

Caution Indicates that either personal injury or irreversible damage of

About This Course Preface-xvii

Courier is also used to indicate programming constructs, such as class

Courier italics is used for variables and command-line placeholders

Courier italic bold is used to represent variables whose values are to

Preface-xviii Dynamic Performance Tuning and Troubleshooting With DTrace

Discussion The following questions are relevant to understanding

1-2 Dynamic Performance Tuning and Troubleshooting With DTrace

Additional resources The following references provide additional

DTrace Fundamentals 1-3

DTrace has the following features:

The transient failure can be based on your own definition of

1-4 Dynamic Performance Tuning and Troubleshooting With DTrace

Debugging Transient Failures

Debugging Using Postmortem Analysis

You can use postmortem analysis to debug transient problems by

Debugging Using Invasive Techniques

DTrace Fundamentals 1-5

An example of a probe provided by the DTrace framework is entry into

DTrace comes with powerful data management primitives to eliminate

1-6 Dynamic Performance Tuning and Troubleshooting With DTrace

Probes and Probe Providers

How Probes Work

Probes are like programmable sensors inserted at strategic points of your

DTrace provides tens of thousands of probes of various types. Probes are

How Probes Are Enabled

You define probes and actions using a programming language called D,

The user-specified D program is compiled by the DTrace consumer into a

DTrace Fundamentals 1-7

A probe has the following attributes:

In addition, DTrace assigns a unique integer identifier to each probe.

A provider represents a methodology for instrumenting the system.

DTrace offers the following providers:

1-8 Dynamic Performance Tuning and Troubleshooting With DTrace

The dtrace provider enables pre-processing and post-processing (as

A DTrace consumer is a process that interacts with DTrace. There is one

There is no limit on the number of concurrent consumers. That is, many

DTrace Fundamentals 1-9

The D programming language enables you to specify probes of interest