Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
This whitepaper attempts to answer the question: Should using SQL Server 2008
data compression be a standard practice?
The data compression feature introduced in SQL Server 2008 1 is a very valuable
feature capable of significantly reducing storage space while also improving
performance in most cases; however, if used inappropriately it is possible to
degrade performance if CPU resources are inadequate to handle work load because
compression/decompression operations require CPU time. If the application is IO
bound and very rarely CPU constrained then the feature is almost certainly worth
implementing. While the memory available to production systems continues to
grow rapidly, the great majority of systems will remain IO bound for the foreseeable
future. Since processor power is also increasing rapidly it is reasonable to expect
that CPU binding will diminish over time.
Performance gains are realized because a) more data is read during a read
operation and b) more data is stored in cache.
The obvious caution is that, should the application become processor bound at
some point then compression will make things worse; while compression can be
turned off at any time doing so will trigger intensive IO and significantly degrade
performance still further until the process completes. 2
A new application would have to be very badly designed or running on inadequate
hardware to suffer from the future. Given that fact that compression is highly
controllable and customizable, even retrofit compression is not high risk if
approached thoughtfully and could in many cases help alleviate problems. The use
of data partitioning reduces risk there as it would allow both compression and
decompression (i.e. rollback) to occur on a controlled gradual basis.
Best Performance Gain Scenarios
Columns where most values dont use the full storage length; this applies to
numeric as well as character data.
Data with a large number of null values
Large amounts of repeating data values or prefix strings.
Frequent physical IO table or index scans (typical of reporting/warehouse
applications).
Worst Performance Gain Scenarios
Columns where most values use all of the allocated byte lengths.
Little repeating values or prefix strings.
1 Technically the feature was introduced in 2005 SP2 as vardecimal data type but
the 2008 functionality is vastly expanded.
2 If data partitioning is being used this can be controlled to some extent by working
on a single file at a time
Notes
There are excellent tools built into SQL Server Management Studio to estimate
compression savings, get space usage, and generate scripts. While most
applications can simply compress everything, the best reference for detailed
analysis for selective use of compression on a table by table basis is Data
Compression: Strategy, Capacity Planning and Best Practices (see below).
3 . Note however that such queries will not be adversely impacted by compression, just
unlikely to gain any benefit.
Sam Atwater
Application Service Management Business Intelligence (ASM-BI)
Microsoft IT
9/28/2010
Edited for clarity 3/11/2011
Additional Notes added 1/27/2012
References
SQL Server 2008 R2 Help
Inside SQL Server 2008, Microsoft Press
Data Compression: Strategy, Capacity Planning and Best Practices