Sei sulla pagina 1di 3

Summary

This whitepaper attempts to answer the question: Should using SQL Server 2008
data compression be a standard practice?
The data compression feature introduced in SQL Server 2008 1 is a very valuable
feature capable of significantly reducing storage space while also improving
performance in most cases; however, if used inappropriately it is possible to
degrade performance if CPU resources are inadequate to handle work load because
compression/decompression operations require CPU time. If the application is IO
bound and very rarely CPU constrained then the feature is almost certainly worth
implementing. While the memory available to production systems continues to
grow rapidly, the great majority of systems will remain IO bound for the foreseeable
future. Since processor power is also increasing rapidly it is reasonable to expect
that CPU binding will diminish over time.
Performance gains are realized because a) more data is read during a read
operation and b) more data is stored in cache.
The obvious caution is that, should the application become processor bound at
some point then compression will make things worse; while compression can be
turned off at any time doing so will trigger intensive IO and significantly degrade
performance still further until the process completes. 2
A new application would have to be very badly designed or running on inadequate
hardware to suffer from the future. Given that fact that compression is highly
controllable and customizable, even retrofit compression is not high risk if
approached thoughtfully and could in many cases help alleviate problems. The use
of data partitioning reduces risk there as it would allow both compression and
decompression (i.e. rollback) to occur on a controlled gradual basis.
Best Performance Gain Scenarios
Columns where most values dont use the full storage length; this applies to
numeric as well as character data.
Data with a large number of null values
Large amounts of repeating data values or prefix strings.
Frequent physical IO table or index scans (typical of reporting/warehouse
applications).
Worst Performance Gain Scenarios
Columns where most values use all of the allocated byte lengths.
Little repeating values or prefix strings.
1 Technically the feature was introduced in 2005 SP2 as vardecimal data type but
the 2008 functionality is vastly expanded.
2 If data partitioning is being used this can be controlled to some extent by working
on a single file at a time

Lots of out of row storage.


Dominantly single row lookups.
Complex multi-join or aggregation queries. E.g. data warehouse applications. 3

Notes

Compression can be implemented at the row or page level.


Row level compression simply stores fixed length fields in variable length
format; it is doubtful than many apps still use such fields.
Page level compression is far more sophisticated; it stores any given value
only once on a page using Prefix and Dictionary compression techniques 4.
Pages are compressed on disk and remain compressed when read into
memory.
Data is decompressed when updated or read for filtering, sorting or joining in
a query response.
The number of bytes that can be stored on a row (8060) does not change and
only in row data can be compressed.
Large data stored outside of row pages (e.g. varchar (max), nvarchar (max),
text, ntext and other blob data are not compressible
Decompression cpu cost is less than 10% and dropping steadily that
number is probably high if deploying to a machine with a high compute unit
(CU) rating.
Index rebuild times are increased by 1.5 to 5x.
Logical IO (data read from cache) requires cpu time so reduced IO
theoretically could offset some compression costs but it would be difficult to
measure.
Index pages can be, but are not automatically, compressed but only leaf-level
pages.
Important: Initial compression requires free space in the target DB, Temp DB
and Log. Space will vary based on the data and how it is being compressed
but in essence the table being compressed will exist on disk in both
uncompressed and compressed state until the operation completes. Make
sure you have sufficient free space for the largest table. Use
sp_estimate_data_compression_savings to obtain the size on disk of your
table(s).

There are excellent tools built into SQL Server Management Studio to estimate
compression savings, get space usage, and generate scripts. While most
applications can simply compress everything, the best reference for detailed
analysis for selective use of compression on a table by table basis is Data
Compression: Strategy, Capacity Planning and Best Practices (see below).
3 . Note however that such queries will not be adversely impacted by compression, just
unlikely to gain any benefit.

4 e.g. if page rows contain values Microsoft,Micro-processor,Microbe and


Microscope the string Micro will be stored only once.

Sam Atwater
Application Service Management Business Intelligence (ASM-BI)
Microsoft IT
9/28/2010
Edited for clarity 3/11/2011
Additional Notes added 1/27/2012

References
SQL Server 2008 R2 Help
Inside SQL Server 2008, Microsoft Press
Data Compression: Strategy, Capacity Planning and Best Practices

Sanjay Mishra, MSDN, May 2009


Data Compression in SQL Server 2008

Ashish Kumar Mehta, SqlServerPerformance.com, Oct 2008

Potrebbero piacerti anche