Measure Early For Perf 1

CLR Inside Out: Measure Early and Often for Performance, Part 1 Page 1 of 6
CLR Inside Out

Measure Early and Often for Performance, Part 1
Vance Morrison
Code download available at: CLRInsideOut2008_04.exe (205KB)
Contents
Have a Plan
Profile Data Processing Example
Measure Early and Measure Often
The Logistics of Measuring
Validating Your Performance Results
Validating Micro-Benchmark Data with a Debugger
Wrapping Up
A
s a Performance Architect on the Microsoft .NET Framework Common Language Runtime team, it is
my job to help people best utilize the runtime to write high-performance applications. The truth of the
matter is that there is no mystery to this, .NET or otherwiseyou just have to design applications for
performance from the start. Too many applications are written with almost no thought given to
performance at all. Often that's not a problem because most programs do relatively little computation,
and they are much faster than the humans with which they interact. Unfortunately, when the need for
high performance does present itself, we simply don't have the knowledge, skills, and tools to do a good
job.
Here I'll discuss what you need to write high-performance applications. While the concepts are
universal, I'll focus here on programs written for .NET. Because .NET abstracts the underlying machine
more than a typical C++ compiler, and because .NET provides powerful but expensive features
including reflection, custom attributes, regular expressions, and so forth, it is much easier to unwittingly
inject expensive operations into a performance-critical code path. To help you avoid that expense, I'll
show how to quantify the expense of various .NET features so you know when it's appropriate to use
them.
Have a Plan
As I mentioned, most programs are written without much thought given to performance, but every
project should have a performance plan. You must consider various user scenarios and articulate what
excellent, good, and bad performance actually mean. Then, based on data volume, algorithmic
complexity, and any previous experience building similar applications, you must decide if you can easily
meet whatever performance goals you've defined. For many GUI applications, the performance goals are
modest, so it's easy to achieve at least good performance without any special design. If this is the case,
your performance plan is done.
If you don't know if you can easily meet your performance goals, you'll need to begin writing a plan by
listing the areas likely to be bottlenecks. Typical problem areas include startup time, bulk data
operations, and graphics animations.
mk:@MSITStore:C:\Users\Vancem\Downloads\MSDNMagazineApril2008en-us.chm::/C... 5/13/2016
Profile Data Processing Example
An example will make this more concrete. I'm currently designing the .NET infrastructure for processing
profile data. I need to present a list of events (page faults, disk I/O, context switches, and so on)
generated by the OS in a meaningful way. The data files involved tend to be large; small profiles are in
the neighborhood of 10MB and file sizes well over 1GB are not unusual.
While working on forming my performance plan, I concluded that the display of the data would not be
problematic if I computed only the parts of the dataset that were needed to paint the display; in other
words, if display was "lazy." Unfortunately, it takes extra work to make GUI objects like tree controls,
list controls, and textboxes lazy. This is why most text editors have unacceptable performance when file
sizes get too large (say, for instance, 100MB). If I had designed the GUI without thinking about
performance, the result would have almost certainly been unacceptable.
Laziness, however, does not help for operations that need to use all the data in the file (when computing
summaries, for example). Given the dataset size, the data dispatch and processing methods are "hot"
code paths that must be designed carefully. Most of the rest of the program is unlikely to be
performance-critical and needs no special attention.
This experience is typical. Even in high-performance scenarios, 95 percent of the application does not
need any performance planning, but you need to carefully identify the last 5 percent that does. Also, as
in my case, it is usually pretty easy to determine which parts of the program are likely to be that 5
percent that matters.
Measure Early and Measure Often
The next step in high-performance design is to measurebefore writing a line of code, you need to
know whether your performance goals are even possible, and if so, what constraints they place on the
design. In my case, I need to know the costs of basic operations being considered in my design, such as
raw file I/O and database access. To proceed, I need some numbers. This is the most critical time in the
design of the project.
Sadly, most performance is lost very early in the development process. By the time you have chosen the
data structures for the heart of your program, the application's performance profile has been set in stone.
Choosing your algorithms further limits performance. Selecting interface contracts between various sub-
components constrains performance still further. It is critical that you understand the costs of each of
these early design decisions and make wise ones.
Design is an iterative process. It is best to start with the cleanest, simplest, most obvious choice and
work up a sketch of the design (I actually recommend a prototype of the hot code) and evaluate the
performance of that. You should also think about what the design would look like if you were to make
performance the only factor, and then estimate how fast that application would be. Now the fun
engineering begins! You start tinkering with the design and thinking about alternatives between these
two extremes, looking for designs that give you the best result.
Again, my experience with the profile data processor is instructive. Like most projects, my choice of
data representation was critical. Should the data be in-memory? Should it be streamed over in a file?
Should it be in a database? The standard solution is that any large dataset should be stored in a database;
however, databases are optimized for relatively slow change, not for having large volumes of data
changing frequently. My application would be dumping many gigabytes of data into the database
routinely. Could the database handle this? With just a bit of measurement and analysis of database
operations, it was easy to confirm that databases did not have the performance profile I needed.
After some more measurements on how much memory an application can use before inducing excessive
page faulting, I ruled out the in-memory solution as well. That left streaming data from a file for my
basic data representation.
There were still many other design decisions to be made, however. The basic form of the profile data is
a list of heterogeneous events. But what should events look like? Are they strings (which are nicely
uniform)? Are they C# structs or objects?
If they are objects, the obvious solution is to make one allocation per event, which is a lot of allocations.
Is that acceptable? How exactly does dispatch work as I iterate over the events? Is it a callback model or
an iteration model? Does dispatch work through interfaces, delegates, or reflection? There were dozens
of design decisions to be made, and they all would have impact on the ultimate performance of the
program, so I needed to take measurements to understand the tradeoffs.
The Logistics of Measuring
Clearly you will be doing a lot of measuring during design. So exactly how do you do that? There are
many profiling tools that can help, but one general-purpose technique that's also the simplest and most
available is micro-benchmarking. The technique is simple: when you want to know how much a
particular operation costs, you simply set up an example of its use and directly measure how much time
the operation takes.
The .NET Framework has a high-resolution timer called System.Diagnostics.Stopwatch that was
designed specifically for this purpose. The resolution varies with your hardware, but it typically has a
resolution of less than 1 microsecond which is more than adequate. Since this comes with the .NET
Framework, you already have the functionality you need.
While Stopwatch is a great start, a good benchmark harness should do more. Small operations should be
placed in loops to make the interval long enough to measure accurately. The benchmark should be run
once before taking a measurement to ensure that any just-in-time (JIT) compilation and other one-time
initialization has completed (unless, of course, the goal is to measure that initialization). Since
measurements are noisy, the benchmark should be run several times and statistics should be gathered to
determine the stability of the measurement. It should also be easy to run many benchmarks (design
variations) in bulk and get a report that displays all the results for comparison.
I have written a benchmark harness called MeasureIt.exe which builds upon the Stopwatch class and
addresses these goals. It is available with the code download for this column on the MSDN Magazine
Web site. After unpacking, simply type the following in order to run it:
MeasureIt
Within seconds it will run a set of more than 50 standard benchmarks and display the results as a Web
page. An excerpt of the data is shown in Figure 1. In these results, each measurement performs an
operation 10,000 times (the operation is cloned 10 times in a loop executed 1000 times). Each
measurement is then performed 10 times and standard statistics (min, max, median, mean, standard
deviation) are computed.
To make the time measurements more meaningful, they are normalized so that the median time for
calling (and returning) from an empty static function is one unit. It is not uncommon for benchmarks to
have widely varying times, which is why all the statistical information is important. Note the large
difference between the minimum (71.299) and maximum (953.864) times for the FinalizableClass
benchmark. This variation needs to be explained before you can trust the data from that benchmark. In
that particular case, it is the result of the runtime periodically executing slower code paths to allocate
bookkeeping data structures in bulk. Already, having these statistics available is proving useful in
validating the data.
This table is a gold mine of useful performance data, detailing the costs of most of the primitive
operations used by .NET-targeted code. I will go into detail in my next installment of this column, but
here I want to explain an important feature of MeasureIt: it comes with its own source code. To unpack
MeasureIt's source code and launch Visual Studio to browse it (if Visual Studio is available), type this:
MeasureIt /edit
Having the source means that you can quickly understand exactly what the benchmark is measuring. It
also means that you can easily add a new benchmark to the suite.
Again, my experience with the profile data processor is instructive. At one point on the design, I could
do a certain common operation with either C# events, delegates, virtual methods, or interfaces. To make
a decision, I needed to understand the performance tradeoff among these choices. Within minutes I had
written the micro-benchmark to measure the performance of each of the alternatives. Figure 2 displays
the relevant rows and you can see that there is no substantial difference between the alternatives. This
knowledge allows me to choose the most natural alternative, knowing I was not sacrificing performance
to do so.
Validating Your Performance Results
The MeasureIt application makes collecting data for a broad variety of benchmarks very easy.
Unfortunately, MeasureIt does not address an important aspect of using benchmark data: validation. It is
extremely easy to measure something other than what you thought you were measuring. The result is
data that is simply wrong, and worse than useless. The old adage "if it sounds too good (or bad) to be
true, it probably is" definitely applies to performance data. It is imperative that you validate data that
you use in any important design decision.
Validating Micro-Benchmark Data with a Debugger
What does it mean to validate performance results? It means collecting other information that also will
predict the performance result and seeing if the two methodologies agree. For very small micro-
benchmarks, inspecting machine instructions and making an estimate based on the number of
instructions executed is an excellent check. In a debugger like Visual Studio, it should be as easy as
setting a breakpoint in your benchmark code and switching to the disassembly window (Debug ->
Windows ->Disassembly). Unfortunately, the default options for Visual Studio are designed to simplify
debugging, not to do performance investigations, so you need to change two options to make this work.
First, go to Tools | Options... | Debugging | General and clear the Suppress JIT Optimization checkbox.
This box is checked by default, which means that even when debugging code that should be optimized,
the debugger tells the runtime not to do so. The debugger does this so that optimizations don't interfere
with the inspection of local variables, but it also means that you are not looking at the code that is
actually run. I always uncheck this option because I strongly believe that debuggers should strive to only
inspect, and not to change the program being debugged. Note that unsetting this option has no effect on
code that was compiled for debugging since the runtime would not have optimized that code anyway.
Next, clear the Enable Just My Code checkbox from Tools | Options | Debugging | General dialog. The
Just My Code feature instructs the debugger not to show you code that you did not write. Generally, this
feature removes the clutter of call frames that are often not of interest to the application developer.
However, this feature assumes that any code that is optimized can't be yours (it assumes your code is
compiled using the debug configuration or suppressed JIT Optimizations is turned on). If you allow JIT
optimizations but don't turn off Just My Code, you will find that you never hit any breakpoints because
the debugger does not believe your code is yours.
Once you have unchecked these options, they remain unchecked for ALL projects. Generally this works
out well, but it does mean that you don't get the Just My Code feature. You may find yourself switching
Just My Code on and off as you go from debugging to performance evaluation and back.
As an example of using a debugger to validate performance results, you can investigate an anomaly in
the data shown in the excerpt in Figure 3. This data shows that calls to an interface method of a C#
structure is many times faster than a call to a static method. This certainly seems odd, given that you
would expect a static method call to be the most efficient type of call. To investigate this, you set a
breakpoint in this benchmark and run the application. Switch to the disassembly window (Debug ->
Windows -> Disassembly) and see that the whole benchmark consists of just the following code:
aStructWithInterface.InterfaceMethod();
00000000 ret
What this shows is that the benchmark (which is 10 calls to an interface method) has been inlined away
to be nothing. The ret instruction is actually the end of the delegate body that defines the whole
benchmark. Well, it is not surprising that doing nothing is faster than doing method calls, so this shows
the reason for the anomaly.
The only mystery is why static methods don't get inlined, too. This is because for static methods, I
specifically went out of my way to suppress inlining with the MethodImplOptions.NoInlining attribute. I
intentionally "forgot" to put this on this interface call benchmark to demonstrate that the JIT compiler
can make certain interface calls as efficient as non-virtual calls (there is a comment mentioning this
above the benchmark).
Wrapping Up
To reiterate, it is very easy to measure something other than what you intended, especially when
measuring small things that are subject to JIT compiler optimizations. It is also very easy to accidentally
measure non-optimized code, or measure the cost of JIT compilation of a method rather than the method
itself. The MeasureIt /usersGuide command will bring up a user's guide that discusses many of the
pitfalls you might encounter when creating benchmarks. I strongly recommend that you read these
details when you are ready to write your own benchmarks.
The point that I want to stress is this concept of validation. If you can't explain your data, you should not
use it for making design decisions. If you have unusual data, ideally you should collect more data, debug
the benchmarks, or collaborate with others who have more expertise until you can explain your data.
You should be highly suspicious of unexplainable data, and should not use it in making any important
decisions.
This discussion is about the basics of writing high-performance applications. Like any other attribute of
software, good performance needs to be designed into the product from the beginning. To do this, you
need measurements that quantify the tradeoffs of making various design decisions. This means doing
performance experiments. MeasureIt makes it easy to generate good-quality micro-benchmarks quickly,
and as such it should become an indispensable part of your design process. MeasureIt is also useful out
of the box because it comes with a set of benchmarks that cover most of the primitive operations in
the .NET Framework.
You can also easily add your own benchmarks for the part of the .NET Framework that most interests
you. With this data you can form a model of application costs and thus make reasonable (rough) guesses
about the performance of design alternatives even before you have written application code.
There is a lot more to say about the performance of applications in .NET. There are potential pitfalls
associated with building micro-benchmarks, so please do read the MeasureIt users guide before writing
any. I have also deferred discussion about situations where disk I/O, memory, or lock contention is the
important bottleneck. I have not even discussed how to use various profiling tools to validate and
monitor the performance health of your application after it has been designed.
There is a lot to know, and the sheer volume of information often discourages developers from doing
any such testing at all. However, since most performance is lost in the design of an application, if you do
nothing else, you should think about performance at this initial stage. I hope this column will encourage
you to make performance an explicit part of the design on your next .NET software project.
Send your questions and comments to clrinout@microsoft.com.
Vance Morrison is the Compiler Architect for the CLR team at Microsoft, where he has been involved
in the design of .NET since its inception. He drove the design for the .NET Intermediate Language (IL)
and was lead for the just-in-time (JIT) compiler team.
2007 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in
whole without permission is prohibited.

Measure Early For Perf 1

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Measure Early For Perf 1

Caricato da

Copyright:

Formati disponibili

CLR Inside Out: Measure Early and Often for Performance, Part 1 Page 1 of 6

CLR Inside Out

Code download available at: CLRInsideOut2008_04.exe (205KB)

Profile Data Processing Example

Measure Early and Measure Often

The Logistics of Measuring

Validating Your Performance Results

Validating Micro-Benchmark Data with a Debugger

Send your questions and comments to clrinout@microsoft.com.

Potrebbero piacerti anche