SQL Server 2010-06

www.WorldMags.net & www.aDowns.
net
www.WorldMags.net & www.aDowns.net
C
WWW.S
.SQLMAG.COM
ontents JUNE 2
2010 Vol. 12 • No. 6
16 SQL Server
COVER STORY
2008 R2
New Features
—Michael
—M
—Mic
Mic
icha
ch ell O
Otey
tte
ey
Find out if SQL Server 2008 R2’s business intelligence

and relational database enhancements are right for
your database environment.
FEATURES
21 Descending Indexes 39 Getting Started with
—Itzik Ben-Gan Parallel Data Warehouse
Learn about special cases of SQL Server index B-trees —Rich Johnson
and their use cases related to backward index ordering, The SQL Server 2008 R2 Parallel Data Warehouse
as well as when to use descending indexes. (PDW) Edition is Microsoft’s first offering in the
27 Troubleshooting
Massively Parallel Processor (MPP) data warehouse
space. Here’s a peek at what PDW is and what it can do.
Transactional Replication
—Kendal Van Dyke
Find out how to use Replication Monitor, tracer
tokens, and alerts to stay ahead of replication prob-
lems, as well as how to solve three specific transactional
replication problems. Editor’s Tip
E
33 Maximizing Report We’re resurfacing our most
Performance with Parameter- popular articles in the SQL
Driven Expressions Server classics column in the
—William Vaughn July issue. Which SQL Magg articles are your
Want your users to have a better experience and favorites? Let me know at mkeller@sqlmag.com.
decrease the load on your server? Learn to use —Megan Keller, associate editor
parameter-driven expressions.

C ontents
IN EVERY ISSUE
WWW.SQLMAG.CO
LMAG.COM
JUNE 2010 Vol. 12 • No
No. 6
The Smart Guide to Building World-Class Applications
Technology Group
Senior Vice President,
Technology Media Group
Editorial
Editorial and Custom Strategy Director
Kim Paulsen
kpaulsen@windowsitpro.com
Michele Crockett
crockett@sqlmag.com
Technical Director Michael Otey
motey@sqlmag.com
5 Editorial:
Executive Editor, IT Group Amy Eisenberg
Executive Editor, SQL Server and Developer Sheila Molnar
Group Editorial Director Dave Bernard
Readers Weigh In on Microsoft’s Support for dbernard@windowsitpro.com
DBA and BI Editor Megan Bearly Keller
Small Businesses Editors
—Michael Otey Karen Bemowski, Jason Bovberg, Anne Grubb, Linda Harty,
Caroline Marwitz, Chris Maxcer, Lavon Peters, Rita-Lyn Sanders,
7
Zac Wiggy, Brian Keith Winstead
Reader to Reader Production Editor Brian Reinholz
13
Contributing Editors
Itzik Ben-Gan IBen-Gan@SolidQ.com
Kimberly & Paul: Michael K. Campbell mike@sqlservervideos.com
SQL Server Questions Answered Kalen Delaney kalen@sqlserverinternals.com
Brian Lawton brian.k.lawton@redtailcreek.com
Your questions are answered regarding dropping clustered indexes Douglas McDowell DMcDowell@SolidQ.com
and changing the definition of the clustering key. Brian Moran BMoran@SolidQ.com
Michelle A. Poolet mapoolet@mountvernondatasystems.com
48
Paul Randal paul@sqlskills.com
Kimberly L. Tripp kimberly@sqlskills.com
The Back Page: William Vaughn vaughnwilliamr@gmail.com
SQL Server 2008 LOB Data Types Richard Waymire rwaymi@hotmail.com
Art & Production
—Michael Otey Senior Graphic Designer Matt Wiebe
Production Director Linda Kirchgesler
Advertising Sales
Publisher Peg Miller
Director of IT Strategy and Partner Alliances Birdie Ghiglione
619-442-4064 birdie.ghiglione@penton.com
Online Sales and Marketing Manager
Dina Baird dina.baird@penton.com, 970-203-4995
COLUMNS Key Account Directors

Jeff Carnes
Chrissy Ferraro
jeff.carnes@penton.com, 678-455-6146
christina.ferraro@penton.com, 970-203-2883
11 Tool Time: Account Executives

Barbara Ritter, West Coast barbara.ritter@penton.com,
858-759-3377
SP_WhoIsActive Cass Schulz, East Coast cassandra.schulz@penton.com,
—Kevin Kline 858-357-7649
Ad Production Supervisor Glenda Vaught
Use this stored procedure to quickly retrieve information about glenda.vaught@penton.com
users’ sessions and activities. Client Project Managers Michelle Andrews
michelle.andrews@penton.com
Kim Eck
PRODUCTS
kim.eck@penton.com
Reprints
Reprint Sales Diane Madzelonka
43 Product Review:
888-858-8851 diane.madzelonka@penton.com
216-931-9268
Circulation & Marketing
Panorama NovaView Suite IT Group Audience Development Director Marie Evans
Customer Service service@sqlmag.com
—Derek Comingore
Panorama’s NovaView Suite offers all the OLAP functions you
could ask for, most of which are part of components installed on
the server.
45
Chief Executive Officer Sharon Rowlands
Industry News: Sharon.Rowlands@penton.com
Chief Financial Officer/ Executive Vice President Jean Clifton
Bytes from the Blog Jean.Clifton@penton.com
Copyright
Derek Comingore compares two business intelligence suites: Unless otherwise noted, all programming code and articles in this issue are
copyright 2010, Penton Media, Inc., all rights reserved. Programs and articles
Tableau Software’s Tableau 5.1 and Microsoft’s PowerPivot. may not be reproduced or distributed in any form without permission in
writing from the publisher. Redistribution of these programs and articles, or
47
the distribution of derivative works, is expressly prohibited. Every effort has
been made to ensure examples in this publication are accurate. It is the reader’s
New Products responsibility to ensure procedures and techniques used from this publication
are accurate and appropriate for the user’s installation. No warranty is implied
Check out the latest products from Lyzasoft, Attunity, Aivosto, or expressed. Please back up your files before you run a new procedure or pro-
gram or make significant changes to disk files, and be sure to test all procedures
Embarcadero Technologies, and HiT Software. and programs before putting them into production.
List Rentals
Contact MeritDirect, 333 Westchester Avenue, White Plains, NY or www
.meritdirect.com/penton.

EDITORIAL
Readers Weigh In on
Microsoft’s Support for
Small Businesses Michael Otey
M
Y our responses to “Is

Is Microsoft Leaving Small
Businesses Behind?” April 2010, InstantDoc
ID 103615, indicate that some of you have strong
and appreciate the simplicity and ease of use that SQL
Server used to provide. I understand that Integration
Services can provide very robust functionality if
(
(motey@sqlmag.com) is technical director
for Windows IT Proo and SQL Server
Magazinee and author of Microsoft SQL Server
2008 New Featuress (Osborne/McGraw-Hill).
feelings that Microsoft’s focus on the enterprise has used properly, but it is definitely at the expense of
indeed had the unwanted effect of leaving the small simplicity.”
business sector behind. At least that’s the perception.
Readers have noted the high cost of the enterprise Do Small Businesses Want a Scaled-
products and the enterprise-oriented feature sets in Down Enterprise Offering?
products intended for the small business market. Perhaps most outstanding is the feeling that Microsoft
Some readers also lamented the loss of simplicity has lost its small business roots in its quest to be
and ease of use that Microsoft products have been an enterprise player. Andrew Whittington pointed
known for. In this column I’ll share insight from out “We often end up wondering how much longer
readers about what they perceive as Microsoft’s Microsoft can continue on this path of taking away
growing distance from the needs and wants of the what customers want, replacing it with what Microsoft
small business community. _thinks_ they want!” Doug Thompson agreed that as
Microsoft gets larger it has become more IBM-like.
Is the Price Point a Fit for Small “Microsoft is vying to be in the same space as its old
Businesses? foe IBM—if it could sell mainframes it would.”
Not surprisingly, in this still lean economy, several Dorvak also questioned whether Microsoft might
readers noted that the price increases of Microsoft be on the wrong track with its single-minded enterprise
products make it more difficult for small businesses focus. “For every AT&T, there are 10’s, 100’s or 1000’s
to continue to purchase product upgrades. Dean of companies our size. Microsoft had better think care-
Zimmer noted, “The increase in cost and complexity, fully about where it goes in the future.”
and decrease in small business focus has been quite In comments to the editorial online, Chipman
noticeable the last 5 years. We will not be upgrading to observed that Microsoft’s attention on the enterprise
VS2010 [Visual Studio 2010], we will stop at VS2008 is leaving the door open at the low-end of the market,
and look for alternatives.” Likewise, Kurt Survance felt “It’s the 80/20 rule where the 20% of the small
there was a big impact on pricing for smaller customers. businesses/departments are neglected in favor of the
“The impact of the new SQL Server pricing is heaviest 80% representing IT departments or the enterprise,
on small business, but the additional revenue seems to who make the big purchasing decisions. This short-
be targeted for features and editions benefitting the sightedness opens the way for the competition,
enterprise client. SQL 2008 R2 is a case in point. If you such as MySQL, who are doing nothing more than
are not seduced by the buzz about BI for the masses, taking a leaf out of Microsoft’s original playbook by
there is little in R2 of much worth except the two new offering more affordable, easier to use solutions for
premium editions and some enterprise management common business problems. It will be interesting to
features useful only to very large installations.” see how it all plays out.”
Do Small Businesses Need a Simpler Two Sides to Every Story

Offering? I’d like to thank everyone for their comments. I didn’t
Price, while important, was only part of the equation. get responses from readers who felt warm and fuzzy
Increasing complexity was also a concern. David about the way Microsoft is embracing small businesses.
Dorvak lamented the demise of the simpler Data If you have some thoughts about Microsoft and small
Transformation Services (DTS) product in favor business, particularly if you’re in a small business and
of the enterprise-oriented SQL Server Integration you think Microsoft is doing a great job, tell us about it
Services (SSIS). “With Integration Services Microsoft at motey@sqlmag.com m or letters@sqlmag.com.
completely turned its back on those of us who value InstantDoc ID 125116
SQL Server Magazine • www.sqlmag.com June 2010 5

READER
TO READER
Reporting on Non-Existent Data

I
n the Reader to Reader article “T“T-SQL
SQL Statement see, the revamped query uses only one derived
can see
Tracks Transaction-Less Dates” (November 2009, query with the DISTINCT clause instead of three.
InstantDoc ID 102744), Saravanan Radhakrisnan Plus, the NOT IN construction is replaced with a
presented a challenging task that I refer to as reporting LEFT OUTER JOIN and a WHERE clause. In my
on non-existent data. He had to write a T-SQL query testing environment, the revamped query was more
that would determine which stores didn’t have any than 15 percent faster than the original query.
transactions during a one-week period, but the table
being queried included only the stores’ IDs and the How to Include Every Store Gennadiy
dates on which each store had a transaction. Listing 1 If you want the results to include all the stores, even Chornenkyy
shows his solution. Although this query works, it has those stores without any transactions during the
some shortcomings: reporting period, you need to change the query’s
1. As Radhakrisnan mentions, if none of the internal logic. Instead of using transactional data (in
stores have a transaction on a certain day, the query OLTP systems) or a fact table (in data warehouse/
won’t return any results for that particular day for OLAP systems) as a source for obtaining the list of
all the stores. So, for example, if all the stores were stores, you need to introduce a look-up table (OLTP)
closed on a national holiday and therefore didn’t have or dimension table (data warehouse/OLAP). Then, to
any sales, that day won’t appear in the results. get the list of stores, you replace
2. If a store doesn’t have any sales for all the
days in the specified period, that store won’t appear (SELECT DISTINCT Store_ID
in the results. FROM dbo.MySalesTable2) st
3. The query uses T-SQL functionality that isn’t
recommended because of the poor performance it with
can cause. Specifically, the query uses three derived MORE on the WEB
M
queries with the DISTINCT clause and the NOT IN (SELECT Store_ID Download the code at
construction. You can encounter performance prob- FROM dbo.MyStoresTable) st
InstantDoc ID 125130.
lems when derived tables get too large to use indexes
for query optimization. in the revamped query.
The code in Listing 3 shows this solution. It creates
I’d like to call attention to several different ways to a new table named dbo.MyStoresTable. This table is
work around these shortcomings. populated with the five stores in dbo.MySalesTable2
(stores with IDs 100, 200, 300, 400, and 500) and adds
How to Include All the Days two new stores (stores with IDs 600 and 700) that don’t
If you want the results to include all the days in have any transactions. If you run the code in Listing 3,
the reporting period, even those days without any you’ll see that the results include all seven stores, even
transactions, you can use the code in Listing 2. though stores 600 and 700 don’t have any transactions
Here’s how this code works. After creating the dbo during the reporting period.
.MySalesTable2 table, the code populates it with data
that has a “hole” in the sales date range. (This is the How to Optimize Performance
same data that Radhakrisnan used for his query, Sometimes when you have a large number of rows (e.g.,
except that the transaction for October 03, 2009, 5,000 to 10,000 rows) in the derived tables and large
isn’t inserted.) Then, the query in callout A runs. The fact tables, implementing indexed temporary tables
first statement in callout A declares the @BofWeek can increase query performance. I created a version of
variable, which defines the first day of the reporting
LISTING 1: The Original
g Query
period (in this case, a week). The query uses the
@BofWeek variable as a base in the table constructor SELECT st1.Store_ID, st2.NoTransactionDate
FROM (SELECT DISTINCT Store_ID FROM MySalesTable (NOLOCK)) AS st1,
clause to generate the seven sequential days needed for (SELECT DISTINCT TransactionDate AS NoTransactionDate
the reporting period. FROM MySalesTable (NOLOCK)) AS st2
WHERE st2.NoTransactionDate NOT IN
In addition to including all the days, the revamped (SELECT DISTINCT st3.TransactionDate FROM MySalesTable st3
query in Listing 2 performs better than the original (NOLOCK) WHERE st3.store_id = st1.store_id)
query in Listing 1 because it minimizes the use of ORDER BY st1.store_id, st2.NoTransactionDate
GO
T-SQL functionality that’s not recommended. As you

READER
TO READER
LISTING 2: Query That Includes All Days LISTING 3: Query That Includes All Stores
CREATE TABLE dbo.MySalesTable2 CREATE TABLE [dbo].[MyStoresTable](
( Store_ID INT, TransactionDate SMALLDATETIME ) [Store_ID] [int] NOT NULL,
GO CONSTRAINT [PK_MyStores] PRIMARY KEY CLUSTERED ([Store_ID] ASC) )
INSERT INTO dbo.MyStoresTable (Store_ID)
-- Populate the table. VALUES (100),(200),(300),(400),(500),(600),(700)
INSERT INTO dbo.MySalesTable2
SELECT 100, '2009-10-05' UNION DECLARE @BofWeek datetime = '2009-10-01 00:00:00'
SELECT 200, '2009-10-05' UNION SELECT st2.Store_ID, st2.Day_of_Week
SELECT 200, '2009-10-06' UNION FROM
SELECT 300, '2009-10-01' UNION (SELECT st.Store_ID, DATES.Day_of_Week
SELECT 300, '2009-10-07' UNION FROM (
SELECT 400, '2009-10-04' UNION VALUES
SELECT 400, '2009-10-06' UNION (CONVERT(varchar(35),@BofWeek ,101)),
SELECT 500, '2009-10-01' UNION (CONVERT(varchar(35),dateadd(DD,1,@BofWeek),101)),
-- Transaction for October 03, 2009, not inserted. (CONVERT(varchar(35),dateadd(DD,3,@BofWeek),101)),
-- SELECT 500, '2009-10-03' UNION (CONVERT(varchar(35),dateadd(DD,4,@BofWeek),101)),
SELECT 500, '2009-10-05' UNION (CONVERT(varchar(35),dateadd(DD,6,@BofWeek),101))
SELECT 500, '2009-10-06' UNION ) DATES(Day_of_Week)
SELECT 500, '2009-10-07' CROSS JOIN
GO (SELECT Store_ID FROM dbo.MyStoresTable ) st
) AS st2
A DECLARE @BofWeek datetime = '2009-10-01 00:00:00' LEFT JOIN dbo.MySalesTable2 st3
SELECT st2.Store_ID, st2.Day_of_Week ON st3.Store_ID = st2.Store_ID AND
FROM st3.TransactionDate = st2.Day_of_Week
(SELECT st.Store_ID, DATES.Day_of_Week WHERE st3.TransactionDate IS NULL
FROM ( ORDER BY st2.Store_ID, st2.Day_of_Week
VALUES GO
(CONVERT(varchar(35),@BofWeek ,101)),
(CONVERT(varchar(35),dateadd(DD,1,
@BofWeek),101)),
(CONVERT(varchar(35),dateadd(DD,2, You need
Y d tto b
be cautious
ti with
ith solutions
l ti th
thatt use
@BofWeek),101)),
indexed temporary tables. A solution might work well
@BofWeek),101)), in one environment but timeout in another, killing
(CONVERT(varchar(35),dateadd(DD,4, your application. For example, I initially tested Query-
@BofWeek),101)),
(CONVERT(varchar(35),dateadd(DD,5, UsingIndexedTemporaryTables.sql using a temporary
@BofWeek),101)), table with 15,000 rows. When I changed number of
@BofWeek),101))
rows for the temporary table to 16,000, the query’s
) DATES(Day_of_Week) response time increased more than four times—from
CROSS JOIN 120ms to 510ms. So, you need to know your produc-
(SELECT DISTINCT Store_ID FROM dbo
.MySalesTable2 ) st tion system workload types, SQL instance configu-
) AS st2 ration, and hardware limitations if you plan to use
LEFT JOIN dbo.MySalesTable2 st3
ON st3.Store_ID = st2.Store_ID AND indexed temporary tables.
st3.TransactionDate = st2.Day_of_Week Another way to optimize the performance of
WHERE st3.TransactionDate IS NULL
queries is to use the EXCEPT and INTERSECT
ORDER BY st2.Store_ID, st2.Day_of_Week
GO operators, which were introduced in SQL Server 2005.
These set-based operators can increase efficiency when
you need to work with large data sets.
the
h revamped d query (Q
(QueryUsingIndexedTemporary-
U i I d dT I created a version of the revamped query (Query-
Tables.sql) that uses an indexed temporary table. This UsingEXCEPTOperator.sql) that uses the EXCEPT
code uses Radhakrisnan’s original data (i.e., data that operator. Once again, this code uses Radhakrisnan’s
includes the October 03, 2009, transaction), which was original data. QueryUsingEXCEPTOperator.sql pro-
created with MySalesTable.Table.sql. vides the fastest and most stable performance. It ran
Like the queries in Listings 2 and 3, QueryUsing- five times faster than Radhakrisnan’s original query.
IndexedTemporaryTables.sql creates the @BofWeek (A table with a million rows was used for the tests.)
variable, which defines the first day of the reporting You can download the solutions I discussed (as well
period. Next, it uses the CREATE TABLE command as MySalesTable.Table.sql) from the SQL Server Maga-
to create the #StoreDate local temporary table, which zinee website. I’ve provided two versions of the code. The
has two columns: Store_ID and Transaction_Date. first set of listings is compatible with SQL Server 2008.
Using the INSERT INTO…SELECT clause, the code (These are the listings you see here.) The second set can
populates the #StoreDate temporary table with all pos- be executed in a SQL Server 2005 environment.
sible Store_ID and Transaction_Date combinations. —Gennadiy Chornenkyy,
Finally, the code uses a CREATE INDEX statement to data architect, ADP Canada
create an index for the #StoreDate temporary table. InstantDoc ID 125130
8 June 2010 SQL Server Magazine • www.sqlmag.com

TOOL TIME
SP_WhoIsActive
G detailed information
Get f about the sessions
running on your SQL Server system
•
T o say I like SP_WhoIsActive is an understate-
ment. This is probably the most useful and effec-
tive stored procedure I’ve ever encountered for activity
Query text is available that includes the statements
that are currently running, and you can optionally
include the outer batch by setting @get_outer_
monitoring. The purpose of the SP_WhoIsActive command = 1. In addition, SP_WhoIsActive
stored procedure is to give DBAs and developers as can pull the execution plan for the active session
much performance and workload data about SQL statement using the @get_planss parameter.
Server’s internal workings as possible, while retaining • Deltas of numeric values between the last run and the
both flexibility and security. It was written by Boston- current run of the script can be assigned using the Kevin Kline
area consultant and writer Adam Machanic, who @delta_interval = N (where N is seconds) parameter.
(kevin.kline@quest.com) is the director of
is also a long-time SQL Server MVP, a founder of • Filtered results are available on session, login, data- technology for SQL Server Solutions at Quest
SQLBlog.com, and one of the most elite individuals base, host, and other columns using simple wild- Software and a founding board member of
who are qualified to teach the Microsoft Certified cards similar to the LIKE clause. You can filter the international PASS. He is the author of
Master classes. to include or exclude values, as well as exclude SQL in a Nutshelll, 3rd edition (O’Reilly).
Adam, who has exhaustive knowledge of SQL sleeping SPIDs and system SPIDs so that you can
Server internals, knew that he could get more detailed focus on user sessions.
information about SQL Server performance than • Transaction details, such as how many transaction
Editor’s Note
what was offered natively through default stored log entries have been written for each database, are
procedures, such as SP_WHO2 and SP_LOCK, and governed by the @get_transaction_info parameter. We want to hear your
SQL Server Management Studio (SSMS). Therefore, • Blocks and locks are easily revealed using param- feedback on the Tool Time
he wrote the SP_WhoIsActive stored procedure to eters such as @find_block_leaders, which, when discussion forum at
quickly retrieve information about users’ sessions combined with sorting by the [blocked_session_ sqlforums.windowsitpro
and activities. Let’s look at SP_WhoIsActive’s most count] column, puts the lead blocking sessions at .com/web/forum/categories
important features. top. Locks are similarly revealed by setting the
.aspx?catid=169&entercat=y.
@get_locks parameter.
Key Parameters • Long-term data collection is facilitated via a set
SP_WhoIsActive does almost everything you’d of features designed for data collection, such as SP_WhoIsActive
expect from an activity-monitoring stored proce- defining schema for output or a destination table
dure, such as displaying active SPIDs and transac- to hold the collected data. BENEFITS:
It provides detailed
tions and locking and blocking, but it also does a information about
variety of things that you aren’t typically able to do SP_WhoIsActive is the epitome of good T-SQL all of the sessions
unless you buy a commercial activity-monitoring coding practices. I encourage you to spend a little time running on your
solution. One key feature of the script is flex- perusing the code. You’ll note, from beginning to end, SQL Server system,
including what they’re
ibility, so you can enable or disable (or even specify the strong internal documentation, intuitive and read-
doing and how they’re
different levels of information for) any of the fol- able naming of variables, and help-style comments impacting server
lowing parameters: describing all parameters and output columns. The behavior.
• Online help is available by setting the parameter procedure is completely safe against SQL injection
SYSTEM
@help = 1, which enables the procedure to return attacks as well, since it parses input parameter values REQUIREMENTS:
commentary and details regarding all of the input to a list of allowable and validated values. SQL Server 2005
parameters and output column names. SP1 and later; users
• Aggregated wait stats, showing the number of System Requirements need VIEW SERVER
STATE permissions
each kind of wait and the minimum, maximum, Adam releases new versions of the stored procedure
and average wait times are controlled using the at regular intervals at http://tinyurl.com/WhoIsActive. HOW TO GET IT:
@get_task_info parameter with input values of SP_WhoIsActive requires SQL Server 2005 SP1 or You can download
SP_WhoIsActive
0 (don’t collect), the default of 1 (lightweight later. Users of the stored procedure need VIEW
from sqlblog.com/
collection mode), and 2 (collect all current waits, SERVER STATE permissions, which can be granted tags/Who+is+Active/
with the minimum, maximum, and average wait via a certificate to minimize security issues. default.aspx.
times). InstantDoc ID 125107

SQL SERVER
QUESTIONS
ANSWERED
What Happens if I Drop a

Clustered Index?
I
’ve heard that the clustered index is “the data,” However, all of these things change if you drop the
but I don’t fully understand what that means. If I clustered index on a table. The data isn’t removed, just
drop a clustered index, will I lose the data? the maintenance of order (i.e., the index/navigational
component of the clustered index). However, non-
I get asked this question a lot, and I find that index clustered indexes use the clustering key to look up the
structures tend to confuse people; indexes seem mys- corresponding row of data, so when a clustered index
terious and, as a result, are unintentionally thought of is dropped, the nonclustered indexes must be modified
as very complicated. A table can be stored internally to use another method to look up the corresponding
Paul Randal
with or without a clustered index. If a table doesn’t data row because the clustering key no longer exists.
have a clustered index, it’s called a heap. If the table has The only way to jump directly to a record in the
a clustered index, it’s often referred to as a clustered table without a clustered index is to use its physical
table. When a clustered index is created, SQL Server location in the database (i.e., a particular record
will temporarily duplicate and sort the data from the number on a particular data page in a particular
heap into the clustered index key order (because the data file, known as a row identifier—RID), and
key defines the ordering of the data) and remove this physical location must be included in the non-
the original pages associated with the heap. From this clustered indexes now that the table is no longer
point forward, SQL Server will maintain order logi- clustered. So when a clustered index is dropped, all
cally through a doubly-linked list and a B+ tree that’s the nonclustered indexes must be rebuilt to use RIDs Kimberly L. Tripp
used to navigate to specific points within the data. to look up the corresponding row within the heap.
Paul Randal (paul@SQLskills.com)
In addition, a clustered index helps you quickly Rebuilding all the nonclustered indexes on a and Kimberly L. Tripp (kimberly@
navigate to the data when queries make use of non- table can be very expensive. And, if the clustered SQLskills.com) are a husband-and-wife
clustered indexes—the other main type of index SQL index is also enforcing a relational key (primary or team who own and run SQLskills.com, a
Server allows. A nonclustered index provides a way to unique), it might also have foreign key references. world-renowned SQL Server consulting and
training company. They’re both SQL Server
efficiently look up data in the table using a different key Before you can drop a primary key, you need to first
MVPs and Microsoft Regional Directors,
from the clustered index key. For example, if you create remove all the referencing foreign keys. So although with more than 30 years of combined
a clustered index on EmployeeID in the Employee dropping a clustered index doesn’t remove data, you SQL Server experience.
table, then the EmployeeID will be duplicated in each still need to think very carefully before dropping it.
nonclustered index record and used for navigation I have a huge queue of index-related questions,
from the nonclustered indexes to retrieve columns so I’m sure indexing best practices will be a topic
from the clustered index data row. (This process is often frequently covered in this column. I’ll tackle another
known as a bookmark lookup or a Key Lookup.) indexing question next to keep you on the right track.
Changing the Definition of a

Editor’s Note
Clustered Index Check out Kimberly and
I
’ve learned that my clustering key (i.e., the whether the clustered index is enforcing a primary Pa new blog, Kimberly &
Paul’s
columns on which I defined my clustered index) key constraint. In SQL Server 2000, the DROP_ Paul: SQL Server Questions
should be unique, narrow, static, and ever- EXISTING clause was added to let you change the Answered, on the SQL Mag
increasing. However, my clustering key is on a GUID. definition of the clustered index without causing all
website at www.sqlmag.com.
Although a GUID is unique, static, and relatively the nonclustered indexes to be rebuilt twice. The first
narrow, I’d like to change my clustering key, and rebuild is because when you drop a clustered index, the
therefore change my clustered index definition. How table reverts to being a heap, so all the lookup refer-
can I change the definition of a clustered index? ences in the nonclustered indexes must be changed
from the clustering key to the row identifier (RID), as
This question is much more complex than it seems, I described in the answer to the previous question. The
and the process you follow is going to depend on second nonclustered index rebuild is because when

SQL SERVER
QUESTIONS
ANSWERED
you build the clustered index again, all nonclustered Clustering on a key such as a GUID can result
indexes must use the new clustering key. in a lot of fragmentation. However, the level of
To reduce this obvious churn on the nonclustered fragmentation also depends on how the GUIDs
indexes (along with the associated table locking are being generated. Often, GUIDs are generated
and transaction log generation), SQL Server 2000 at the client or using a function (either the newid()
included the DROP_EXISTING clause so that function or the newsequentialid() function) at the
the clustering key could be changed and the non- server. Using the client or the newid() function to
clustered indexes would need to be rebuilt only once generate GUIDs creates random inserts in the struc-
(to use the new clustering key). ture that’s now ordered by these GUIDs—because
However, the bad news is that the DROP_ it’s the clustering key. As a result of the performance
EXISTING clause can be used to change only problems caused by the fragmentation, you might
indexes that aren’t enforcing a primary key or unique want to change your clustering key or even just
key constraint (i.e., only indexes created using a change the function (if it’s server side). If the GUID
CREATE INDEX statement). And, in many cases, is being generated using a DEFAULT constraint,
when GUIDs are used as the primary key, the pri- then you might have the option to change the func-
mary key constraint definition might have been tion behind the constraint from the newid() function
created without speci- to the newsequentialid()
fying the index type. function. Although the
When the index type
isn’t specified, SQL
If you want to change newsequentialid() function
doesn’t guarantee perfect
Server defaults to cre- the definition of a contiguity or a gap-free
ating a clustered index sequence, it generally cre-
to enforce the primary clustered index, and the ates values greater than
key. You can choose any previously generated.
to enforce the primary clustered index is being (Note that there are cases
key with a nonclus-
tered index by explic-
used to enforce your when the base value that’s
used is regenerated. For
itly stating the index table’s primary key, it’s example, if the server is
type at definition, but restarted, a new starting
the default index type not going to be a simple value, which might be
is a clustered index if lower than the current
one doesn’t already process. value, will be generated.)
exist. (Note that if a Even with these excep-
clustered index already exists and the index tions, the fragmentation within this clustered index will
type isn’t specified, SQL Server will still allow the be drastically reduced.
primary key to be created; it will be enforced using So if you still want to change the definition of
a nonclustered index.) the clustered index, and the clustered index is being
used to enforce your table’s primary key, it’s not going
to be a simple process. And, this process should be
LISTING 1: Code to Generate the ALTER INDEX
done when users aren’t allowed to connect the data-
Statements
base, otherwise data integrity problems can occur.
SELECT Additionally, if you’re changing the clustering key
DISABLE_STATEMENT =
N'ALTER INDEX ' to use a different column(s), then you’ll also need to
+ QUOTENAME(si.[name], N']')
+ N' ON '
remember to recreate your primary key to be enforced
+ QUOTENAME(sch.[name], N']') by a nonclustered index instead. Here’s the process to
+ N'.'
+ QUOTENAME(OBJECT_NAME(so.[object_id]), N']') follow to change the definition of a clustered index:
+ N' DISABLE' 1. Disable all the table’s nonclustered indexes
, ENABLE_STATEMENT =
N'ALTER INDEX ' so that they aren’t automatically rebuilt when the
+ QUOTENAME(si.[name], N']')
+ N' ON '
clustered index is dropped in step 3. Because this is
+ QUOTENAME(sch.[name], N']') likely to be a one-time operation, use the query in
+ N'.'
+ QUOTENAME(OBJECT_NAME(so.[object_id]), N']') Listing 1 (with the desired table name) to generate
+ N' REBUILD' the ALTER INDEX statements. Note that you
FROM sys.indexes AS si
JOIN sys.objects AS so should use the column for DISABLE_STATE-
ON si.[object_id] = so.[object_id]
JOIN sys.schemas AS sch
MENTS to disable the nonclustered indexes, and be
ON so.[schema_id] = sch.[schema_id] sure to keep the enable information handy because
WHERE si.[object_id] = object_id('tablename')
AND si.[index_id] > 1 you’ll need it to rebuild the nonclustered indexes
after you’ve created the new clustered index.
14
SQL SERVER
QUESTIONS
ANSWERED
2. Disable any foreign key constraints. This is LISTING 2: Code to Generate the DISABLE
where you want to be careful if there are users using Command
the database. In addition, this is also where you SELECT
might want to use the following query to change the DISABLE_STATEMENT =
N'ALTER TABLE '
database to be restricted to only DBO use: + QUOTENAME(convert(sysname, schema_name(o2.schema_id)), N']')
+ N'.'
+ QUOTENAME(convert(sysname, o2.name), N']')
ALTER DATABASE DatabaseName + N' NOCHECK CONSTRAINT '
+ QUOTENAME(convert(sysname, object_name(f.object_id)), N']')
SET RESTRICTED_USER , ENABLE_STATEMENT =
WITH ROLLBACK AFTER 5 N'ALTER TABLE '
+ QUOTENAME(convert(sysname, schema_name(o2.schema_id)), N']')
+ N'.'
+ QUOTENAME(convert(sysname, o2.name), N']')
The ROLLBACK AFTER n clause at the + N' WITH CHECK CHECK CONSTRAINT '
end of the ALTER DATABASE statement lets + QUOTENAME(convert(sysname, object_name(f.object_id)), N']'),
RECHECK_CONSTRAINT =
you terminate user connections and put the N'SELECT OBJECTPROPERTY (OBJECT_ ID('
database into a restricted state for modifications. + QUOTENAME(convert (sysname, object_name(f.object_id)), N'''')
+ N'), ''CnstIsNotTrusted'')'
As for automating the disabling of foreign key FROM
sys.objects AS o1,
constraints, I leveraged some of the code from sys.objects AS o2,
sp_fkeyss and significantly altered it to generate sys.columns AS c1,
sys.columns AS c2,
the DISABLE command (similarly to how we did sys.foreign_keys AS f
this in step 1 for disabling nonclustered indexes), INNER JOIN sys.foreign_key_columns AS k
ON (k.constraint_object_id = f.object_id)
which Listing 2 shows. Use the column for INNER JOIN sys.indexes AS i
ON (f.referenced_object_id = i.object_id
DISABLE_STATEMENTS to disable the foreign AND f.key_index_id = i.index_id)
key constraints, and keep the remaining informa- WHERE
o1.[object_id] = object_id('tablename')
tion handy because you’ll need it to reenable and AND i.name = 'Primary key Name'
recheck the data, as well as verify the foreign key AND o1.[object_id] = f.referenced_object_id
AND o2.[object_id] = f.parent_object_id
constraints after you’ve recreated the primary key AND c1.[object_id] = f.referenced_object_id
AND c2.[object_id] = f.parent_object_id
as a unique nonclustered index. AND c1.column_id = k.referenced_column_id
3. Drop the constraint-based clustered index AND c2.column_id = k.parent_column_id
ORDER BY 1, 2, 3
using the following query:
ALTER TABLE schema.tablename step 2 to re-enable

bl andd recheck
h k allll off the
h fforeign
i
DROP CONSTRAINT ConstraintName keys. In this case, you’ll want to make sure to
recheck the data as well using the WITH CHECK
4. Create the new clustered index. The new clause. However, this is likely to be a one-time
clustered index can be constraint-based or a thing, so as long as you kept the information from
regular CREATE INDEX statement. However, step 2, you should be able to recreate the foreign
the clustering key (the key definition that defines key constraints relatively easily.
the clustered index) should be unique, narrow, 7. Once completed, make sure that all of the
static, and ever-increasing. And although we’ve constraints are considered “trusted” by using the
started to discuss some aspects of how to choose RECHECK_CONSTRAINT statements that
a good clustering key, this is an incredibly difficult were generated in step 2.
discussion to have in one article. To learn more, 8. Rebuild all of the nonclustered indexes
check out my posts about the clustering key at (this is how you enable them again). Use the
www.sqlskills.com/BLOGS/KIMBERLY/category/ ENABLE_STATEMENT created in step 1.
Clustering-Key.aspx. Rebuilding a nonclustered index is the only way
5. Create the primary key as a constraint- to enable them.
based nonclustered index. Because nonclustered
indexes use the clustering key, you should always Although this sounds like a complicated process,
create nonclustered indexes after creating the you can analyze it, review it, and script much of the
clustered index, as the following statement code to minimize errors. The end result is that no
shows: matter what your clustering key is, it can be changed.
Why you might want to change the clustering key is a
ALTER TABLE schema.tablename whole other can of worms that I don’t have space to
ADD CONSTRAINT ConstraintName go into in this answer, but keep following the Kimberly
PRIMARY KEY NONCLUSTERED (key definition) & Paul: SQL Server Questions Answered d blog (www
.sqlmag.com/blogs/sql-server-questions-answered
6. Recreate the foreign key constraints. First, .aspx) and I’ll open that can, and many more, in the
use the ENABLE_STATEMENT generated in future!

COVER STORY
SQL Server
2008 R2
New Features
What you nneed
eed tto
o kknow
now aabout
bout tthe
he nnew
ew
BI and relational database functionality
S
QL Ser Serve
verr 20
2008
08 R2R2 isis Mi
Micro
icroso
sofft’
ft’s lat
ates
testt re
relleas
leasee comb
mbinbinat
atiion
ion ha
hard
rdwa
dware
re andd sof
oft
ftwar
twaree so
solluti
luti
tion
on th
that’
hat’
ts
off iitts
ts ent
nter
ter
erpr
priis
pr ise re
ise rella
lati
lati
tion
onall dattab
abas
bas
asee an
and d bu
busisi-
i- avai
ail
ila
labl
labl
blee on
onlly
ly thr
hrough
hrou h sel
elec
lec
ectt OE
OEMMss ssuc
M uch
uc h as HPHP,
ness inttelli
llige
g nce (B( I)) platf
tform,, and d it
it builild
ds on Dellll,, and d IB
IBM M. OEM
OEMs supp pplly and
d preconfi figure all
ll
the base of functionality established by SQL Server the hardware, including the storage to support the
2008. However, despite the R2 moniker, Microsoft data warehouse functionality. The Parallel Data
has added an extensive set of new features to SQL Warehouse Edition uses a shared-nothing Massively
Server 2008 R2. Although the new support for self- Parallel Processing (MPP) architecture to support
service BI and PowerPivot has gotten the lion’s share data warehouses from 10TB to hundreds of tera-
Michael Otey of attention, SQL Server 2008 R2 includes several bytes in size. As more scalability is required, addi-
(motey@sqlmag.com) is technical other important enhancements. In this article, we’ll tional compute and storage nodes can be added.
director for Windows IT Pro and SQL
look at the most important new features in SQL As you would expect, the Parallel Data Ware-
Server Magazinee and author of Microsoft
SQL Server 2008 New Featuress (Osborne/
Server 2008 R2. house Edition is integrated with SQL Server Integra-
McGraw-Hill). tion Services (SSIS), SQL Server Analysis Services
New Editions (SSAS), and SQL Server Reporting Services (SSRS).
Some of the biggest changes with the R2 release of For more in-depth information about the SQL
SQL Server 2008 are the new editions that Micro- Server 2008 R2 Parallel Data Warehouse Edition,
soft has added to the SQL Server lineup. SQL Server see “Getting Started with Parallel Data Warehouse,”
2008 R2 Datacenter Edition has been added to the page 39, InstantDoc ID 125098.
top of the relational database product lineup and The SQL Server 2008 R2 lineup includes
brings the SQL Server product editions in-line with • SQL Server 2008 R2 Parallel Data Warehouse
the Windows Server product editions, including its Edition
Datacenter Edition. SQL Server 2008 R2 Datacenter • SQL Server 2008 R2 Datacenter Edition
Edition provides support for systems with up to 256 • SQL Server 2008 R2 Enterprise Edition
processor cores. In addition, it offers multiserver • SQL Server 2008 R2 Developer Edition
management and a new event-processing technol- • SQL Server 2008 R2 Standard Edition
ogy called StreamInsight. (I’ll cover multiserver • SQL Server 2008 R2 Web Edition
management and StreamInsight in more detail later • SQL Server 2008 R2 Workgroup Edition
in this article.) • SQL Server 2008 R2 Express Edition (Free)
The other new edition of SQL Server 2008 R2 • SQL Server 2008 Compact Edition (Free)
is the Parallel Data Warehouse Edition. The Paral-
lel Data Warehouse Edition, formerly code-named More detailed information about the SQL Server 2008
Madison, is a different animal than the other edi- R2 editions, their pricing, and the features that they
tions of SQL Server 2008 R2. It’s designed as a Plug support can be found in Table 1. SQL Server 2008 R2
and Play solution for large data warehouses. It’s a supports upgrading from SQL Server 2000 and later.

SQL SERVER 2008 R2 NEW FEATURES COVER STORY
Support for Up to TABLE 1: SQL Server 2008 R2 Editions

256 Processor Cores SQL Server 2008 R2 Pricing Significant Features
On th
On hee hara dw
dware siide, SQL S Server 2008 R2 Data- Editions
ceent
nterer EEdditi
di tion
tioon now o supp ports systems with up to 64
Parallel Data Warehouse $57,498 per CPU MPP scale-out architecture
phyysiiiccaall pro
ph rocceessssor
o s and d 256 cores. This support
Not offered via server CAL BI—SSAS, SSIS, SSRS
enab
en nabbleles gre
greeaatteer sc
gr scal
a ab
alabilitty in the x64 line than ever
beefore re. SQL S Seerver 200 008 R2 Enterprise Edition sup- Datacenter $57,498 per CPU 64 CPUs and up to 256 cores
po
p ort
rts up
rts up to 6644 proceessors, and Standard Edition Not offered via server CAL 2TB of RAM
ssuupp
ppo orts up to
or to ffou
o r pr
ou pro
oceessors. 16-node failover clustering
SQ QLL Serverver 20088 R2
er R remains one of the few Database mirroring
Micrros
Mi o of
oft serv rverer plattforrms that is still available in StreamInsight
both
bo th
h 32- 2-bi
bit aan
bit nd 6644-b
-bit
it vversions. I expect it will be Multiserver management
the lalastt 332-
2-bi
-bi
bit ve verrssio
i n off SQL Server that Microsoft Master Data Services
reele
leas
asese. BI—SSAS, SSIS, SSRS
PowerPivot for SharePoint
PowerPivot and Self-Service BI Partitioning
Wiith
W t out a dou doub
do ubt,
t, the mmost publicized new feature Resource Governor
in SQ QL L Serrvever 2008 R2 R is PowerPivot and self- Online indexing and restore
sseerrvviicce BI.
BII. SQ
SQL Se
SQL Server 2008 R2’s PowerPivot for Backup compression
Exce
Ex cel (fo orrmerlly co
code-
de-naamed Gemini) is essentially
ann Exce xcel aaddd-iin th
that
at brrrings the SSAS engine into Enterprise $28,749 per CPU 64 CPUs and up to 256 cores
Excel
Exc cel. It ad addds pow werffful
u data analysis capabili- $13,969 per server 2TB of RAM
ties
ti es tto o Exc
Excel, the fro ont-eend data analysis tool that with 25 CALs 16-node failover clustering
knowlledg
know ledgge work
workerss know w and use on a daily basis. Database mirroring
Buil
Bu iltt-in
il in ddat
ata co
com
mpress ssiion enables PowerPivot for Multiserver management
Excell to w wor
orkk wi
with
th millio ons of rows and still deliver Master Data Services
subsecond response time. As you would expect, BI—SSAS, SSIS, SSRS
PowerPivot for Excel can connect to SQL Server PowerPivot for SharePoint
2008 databases, but it can also connect to previ- Partitioning
ous versions of SQL Server as well as other data Resource Governor
sources, including Oracle and Teradata, and even Online indexing and restore
SSRS reports. In addition to its data manipulation Backup compression
capabilities, PowerPivot for Excel also includes a Developer $50 per developer Same as the Enterprise Edition
new cube-oriented calculation language called Data
Analysis Expressions (DAX), which extends Excel’s Standard $7,499 per CPU 4 CPUs
data analysis capabilities with the multidimensional $1,849 per server 2TB of RAM
capabilities of the MDX language. Figure 1 shows with 5 CALs 2-node failover clustering
the new PowerPivot for Excel add-in being used to Database mirroring
create a PowerPivot chart and PowerPivot table for BI—SSAS, SSIS, SSRS
data analysis. Backup compression
PowerPivot for SharePoint enables the sharing, Web $15 per CPU per month 4 CPUs
collaboration, and management of PowerPivot work- Not offered via server CAL 2TB of RAM
sheets. From an IT perspective, the most important BI—SSRS
feature that PowerPivot for SharePoint offers is the
ability to centrally store and manage business-criti- Workgroup $3,899 per CPU 2 CPUs
cal Excel worksheets. This functionality addresses a $739 per server 4GB of RAM
huge hole that plagues most businesses today. Critical with 5 CALs BI—SSRS
business information is often kept in a multitude of Express Base Free 1 CPU
Excel spreadsheets, and unlike business application 1GB of RAM
databases, in the vast majority of cases these spread-
sheets are unmanaged and often aren’t backed up or Express with Tools Free 1 CPU
protected in any way. If they’re accidentally deleted or 1GB of RAM
corrupted, there’s a resulting business impact that IT Express with Advanced Free 1 CPU
can’t do anything about. Using SharePoint as a cen- Services 1GB of RAM
tral storage and collaboration point facilitates sharing BI—SSRS (for the local
these important Excel spreadsheets, but perhaps more instance)
importantly, it provides a central storage location in

COVER STORY SQL SERVER 2008 R2 NEW FEATURES
Multiserver Management
Some of the most important additions to SQL
Server 2008 R2 on the relational database side
are the new multiserver management capabili-
ties. Prior to SQL Server 2008 R2, the multi-
server management capabilities in SQL Server
were limited. Sure, you could add multiple serv-
ers to SQL Server Management Studio (SSMS),
but there was no good way to perform similar
tasks on multiple servers or to manage multiple
servers as a group. SQL Server 2008 R2 includes
a new Utility Explorer, which is part of SSMS,
to meet this need. The Utility Explorer lets you
create a SQL Server Utility Control Point where
you can enlist multiple SQL Server instances to
be managed, as shown in Figure 2. The Utility
Explorer can manage as many as 25 SQL Server
instances.
The Utility Explorer displays consolidated
performance, capacity, and asset information
for all the registered servers. However, only SQL
Server 2008 R2 instances can be managed with
FFigure 1 the initial release; support for earlier SQL Server
Creating a PowerPivot chart and PowerPivot table for data analysis versions is expected to be added with the fi
first ser-
vice pack. Note that multiserver management is
available only in SQL Server 2008 R2 Enterprise
Edition and Datacenter Edition. You can find
out more about multiserver management at www
.microsoft.com/sqlserver/2008/en/us/R2-
multi-server.aspx.
Master Data Services

Master Data Services might be the most under-
rated feature in SQL Server 2008 R2. It provides
a platform that lets you create a master defi fini-
tion for all the disparate data sources in your
organization. Almost all large businesses have
a variety of databases that are used by different
applications and business units. These databases
have different schema and data meanings for
what’s often the same data. This creates a prob-
lem because there isn’t one version of the truth
throughout the enterprise, and businesses almost
Figure 2
Fi always want to bring disparate data together for
A SQL Server Utility Control Point in the Utility Explorer centralized reporting, data analysis, and data
mining.
which these critical Excel spreadsheets can be man- Master Data Services gives you the ability to cre-
aged and backed up by IT, providing the organization ate a master data defifinition for the enterprise to map
with a safety net for these documents that didn’t exist and convert data from all the different date sources
before. PowerPivot for SharePoint is supported by into that central data repository. You can use Master
SQL Server 2008 R2 Enterprise Edition and higher. Data Services to act as a corporate data hub, where
As you might expect, the new PowerPivot func- it can serve as the authoritative source for enterprise
tionality and self-service BI features require the lat- data. Master Data Services can be managed using a
est versions of each product: SQL Server 2008 R2, web client, and it provides workfl flows that can notify
Offi
fice 2010, and SharePoint 2010. You can find out assigned data owners of any data rule violations.
more about PowerPivot and download it from www Master Data Services is available in SQL Server
.powerpivot.com. 2008 R2’s Enterprise Edition and Datacenter Edition.

SQL SERVER 2008 R2 NEW FEATURES COVER STORY
Find out more about

a Master
Data Services at www.microsoft
.com/sqlserver/2008/en/us/mds
.aspx.
StreamInsight
StreamInsight is a near real-
time event monitoring and pro-
cessing framework. It’s designed
to process thousands of events
per second, selecting and writ-
ing out pertinent data to a SQL
Server database. This type of
high-volume event processing
is designed to process manufac-
turing data, medical data, stock
exchange data, or other process-
control types of data streams
where your organization wants
to capture real-time data for F Figure 3
data mining or reporting.
The Report Designer 3.0 design surface
StreamInsight is a program-
ming framework and doesn’t have
a graphical interface. It’s available only in SQL Server • The ability to connect to and manage SQL Azure
2008 R2 Datacenter Edition. You can read more instances
about SQL Server 2008 R2’s StreamInsight technol- • The addition of SSRS support for SharePoint zones
ogy at www.microsoft.com/sqlserver/2008/en/us/R2- • The ability to create Report Parts that can be
complex-event.aspx. shared between multiple reports
• The addition of backup compression to the
Report Builder 3.0 Standard Edition
Not all businesses are diving into the analytical side
of BI, but almost everyone has jumped onto the You can learn more about the new features in SQL
SSRS train. With SQL Server 2008 R2, Microsoft Server 2008 R2 at msdn.microsoft.com/en-us/library/
has released a new update to the Report Builder por- bb500435(SQL.105).aspx.
tion of SSRS. Report Builder 3.0 (shown in Figure 3)
offers several improvements. Like Report Builder 2.0, To R2 or Not to R2?
fice Ribbon interface. You can integrate SQL Server 2008 R2 includes a tremendous amount
it sports the Offi
geospatial data into your reports using the new Map of new functionality for an R2 release. Although the
Wizard, and Report Builder 3.0 includes support for bulk of the new features, such as PowerPivot and
adding spikelines and data bars to your reports so that the Parallel Data Warehouse, are BI oriented, there
queries can be reused in multiple reports. In addition, are also several significant
fi new relational database
you can create Shared Datasets and Report Parts that enhancements, including multiserver management
are reusable report items stored on the server. You can and Master Data Services. However, it remains to be
then incorporate these Shared Datasets and Report seen how quickly businesses will adopt SQL Server
Parts in the other reports that you create. 2008 R2. All current Software Assurance (SA) cus-
tomers are eligible for the new release at no addi-
Other Important tional cost, but other customers will need to evaluate
Enhancements if the new features make the upgrade price worth-
Although SQL Server 2008 R2 had a short two- while. Perhaps more important than price are the
year development cycle, it includes too many new resource demands needed to roll out new releases of
features to list in a single article. The following are core infrastructure servers such as SQL Server.
some other notable enhancements included in SQL That said, PowerPivot and self-service BI are poten-
Server 2008 R2: tially game changers, especially for organizations that
• The installation of slipstream media containing have existing BI infrastructures. The value these fea-
current hotfixes and updates tures bring to organizations heavily invested in BI
• The ability to create hot standby servers with makes SQL Server 2008 R2 a must-have upgrade.
database mirroring InstantDoc ID 125003

FEATURE
FEATU
UR
Descending
Index ordering, parallelism,
m, and ranking calculations
Indexes
C
ertain aspects of SQL Server index B-trees of SQL
QL SServer 2008 SP1 C
Cumulative
l i U Update
d 6)6)—not
and their use cases are common knowledge, because there’s a technical problem or engineering
but some aspects are less widely known difficulty with supporting the option, but simply
because they fall into special cases. In this article because it hasn’t yet floated as a customer request.
I focus on special cases related to backward index My guess is that most DBAs just aren’t aware of
ordering, and I provide guidelines and recommenda- this behavior and therefore haven’t thought to ask
tions regarding when to use descending indexes. All for it. Although performing a backward scan gives
my examples use a table called Orders that resides you the benefit of relying on index ordering and
in a database called Performance. Run the code in therefore avoiding expensive sorting or hashing, the
Itzik Ben-Gan
Listing 1 to create the sample database and table and query plan can’t benefit from parallelism. If you (Itzik@SolidQ.com) is a mentor with Solid
Quality Mentors. He teaches, lectures, and
populate it with sample data. Note that the code in find a case in which parallelism is important, you
consults internationally. He’s a SQL Server MVP
Listing 1 is a subset of the source code I prepared for need to arrange an index that allows an ordered and is the author of several books about
my book Inside Microsoft SQL Server 2008: T-SQL forward scan. T-SQL, including Microsoft SQL Server 2008:
Queryingg (Microsoft Press, 2009), Chapter 4, Query Consider the following query as an example: T-SQL Fundamentalss (Microsoft Press).
Tuning. If you have the book and already created the
Performance database in your system, you don’t need USE Performance;
to run the code in Listing 1.
One of the widely understood aspects of SQL
Server indexes is that the leaf level of an index enforces
SELECT *
FROM dbo.Orders
MORE on the WEB
M
Download the listing at
bidirectional ordering through a doubly-linked list. WHERE orderid <= 100000 InstantDoc ID 125090.
This means that in operations that can potentially rely ORDER BY orderdate;
on index ordering—for example, filtering (seek plus
partial ordered scan), grouping (stream aggregate), There’s a clustered index defined on the table with
presentation ordering (ORDER BY)—the index can orderdate ascending as the key. The table has
be scanned either in an ordered forward or ordered 1,000,000 rows, and the number of qualifying rows
backward fashion. So, for example, if you have a in the query is 100,000. My laptop has eight logical
query with ORDER BY col1 DESC, col2 DESC, CPUs. Figure 1 shows the graphical query plan for
SQL Server can rely on index ordering both when you this query. Here’s the textual plan:
create the index on a key list with ascending ordering
(col1, col2) and with the exact reverse ordering (col1 |--Parallelism(Gather Streams, ORDER BY:
DESC, col2 DESC). ([orderdate] ASC))
So when do you need to use the DESC index key |--Clustered Index Scan(OBJECT:([idx_cl_od]),
option? Ask SQL Server practitioners this question, WHERE:([orderid]<=(100000)) ORDERED
and most of them will tell you that the use case is when FORWARD)
there are at least two columns with opposite ordering
requirements. For example, to support ORDER BY As you can see, a parallel query plan was used. Now
col1, col2 DESC, there’s no escape from defining try the same query with descending ordering:
one of the keys in descending order—either (col1,
col2 DESC), or the exact reverse order (col1 DESC, SELECT *
col2). Although this is true, there’s more to the use of FROM dbo.Orders
descending indexes than what’s commonly known. WHERE orderid <= 100000
ORDER BY orderdate DESC;
Index Ordering and Parallelism
As it turns out, SQL Server’s storage engine isn’t Figure 2 shows the graphical query plan for this
coded to handle parallel backward index scans (as query. Here’s the textual plan:

FEATURE DESCENDING INDEXES
LISTING 1: Script to Create Sample Database and Tables

SET NOCOUNT ON; SELECT
USE master; @numorders = 1000000,
IF DB_ID('Performance') IS NULL @numcusts = 20000,
CREATE DATABASE Performance; @numemps = 500,
GO @numshippers = 5,
USE Performance; @numyears = 4,
GO @startdate = '20050101';
-- Creating and Populating the Orders Table

-- Creating and Populating the Nums Auxiliary Table
CREATE TABLE dbo.Orders
SET NOCOUNT ON;
(
IF OBJECT_ID('dbo.Nums', 'U') IS NOT NULL
orderid INT NOT NULL,
DROP TABLE dbo.Nums;
custid CHAR(11) NOT NULL,
CREATE TABLE dbo.Nums(n INT NOT NULL PRIMARY KEY);
empid INT NOT NULL,
shipperid VARCHAR(5) NOT NULL,
DECLARE @max AS INT, @rc AS INT; orderdate DATETIME NOT NULL,
SET @max = 1000000; filler CHAR(155) NOT NULL DEFAULT('a')
SET @rc = 1; );
INSERT INTO dbo.Nums(n) VALUES(1); CREATE CLUSTERED INDEX idx_cl_od ON dbo.Orders(orderdate);

WHILE @rc * 2 <= @max
BEGIN ALTER TABLE dbo.Orders ADD
INSERT INTO dbo.Nums(n) SELECT n + @rc FROM dbo.Nums; CONSTRAINT PK_Orders PRIMARY KEY NONCLUSTERED(orderid);
SET @rc = @rc * 2;
END INSERT INTO dbo.Orders WITH (TABLOCK) (orderid, custid, empid, shipperid,
orderdate)
SELECT n AS orderid,
INSERT INTO dbo.Nums(n) 'C' + RIGHT('000000000'
SELECT n + @rc FROM dbo.Nums WHERE n + @rc <= @max; + CAST(
GO 1 + ABS(CHECKSUM(NEWID())) % @numcusts
AS VARCHAR(10)), 10) AS custid,
IF OBJECT_ID('dbo.Orders', 'U') IS NOT NULL 1 + ABS(CHECKSUM(NEWID())) % @numemps AS empid,
DROP TABLE dbo.Orders; CHAR(ASCII('A') - 2
GO + 2 * (1 + ABS(CHECKSUM(NEWID())) % @numshippers)) AS shipperid,
DATEADD(day, n / (@numorders / (@numyears * 365.25)), @startdate)
-- late arrival with earlier date
-- Data Distribution Settings - CASE WHEN n % 10 = 0
DECLARE THEN 1 + ABS(CHECKSUM(NEWID())) % 30
@numorders AS INT, ELSE 0
@numcusts AS INT, END AS orderdate
@numemps AS INT, FROM dbo.Nums
@numshippers AS INT, WHERE n <= @numorders
@numyears AS INT, ORDER BY CHECKSUM(NEWID());
@startdate AS DATETIME; GO
Figure 1 Figure 2
Parallel query plan Serial query plan
|--Clustered Index Scan(OBJECT:([idx_cl_od]), Wayne discovered this bug and reported it, and it was
WHERE:([orderid]<=(100000)) ORDERED BACKWARD) fixed in SQL Server 2000 SP1.
Note that although an ordered scan of the index was Index Ordering and Ranking
used, the plan is serial because the scan ordering is Calculations
backward. If you want to allow parallelism, the index Back to cases in which descending indexes are
must be scanned in an ordered forward fashion. So relevant, it appears that ranking calculations—par-
in this case, the orderdate column must be defined ticularly ones that have a PARTITION BY clause—
with DESC ordering in the index key list. need to perform an ordered forward scan of the index
This reminds me that when descending indexes in order to avoid the need to sort the data. Again, this
were introduced in SQL Server 2000 RTM, my friend is the case only when the calculation is partitioned.
Wayne Snyder discovered an interesting bug. Suppose When the calculation isn’t partitioned, both a forward
you had a descending clustered index on the Orders and backward scan can be utilized. Consider the fol-
table and issued the following query: lowing example:
DELETE FROM dbo.Orders WHERE orderdate SELECT

< '20050101'; ROW_NUMBER() OVER(ORDER BY orderdate DESC,
orderid DESC) AS RowNum,
Instead of deleting the rows before 2005, SQL Server orderid, orderdate, custid, filler
deleted all rows after January 1, 2005! Fortunately, FROM dbo.Orders;
22
DESCENDING INDEXES FEATURE
Figure 3
Query plan with sort for nonpartitioned ranking calculation
Assuming that there’s currently no index to sup-

port the ranking calculation, SQL Server must
actually sort the data, as the query execution plan
in Figure 3 shows. Here’s the textual form of the Figure 4
Fi
plan: Query plan without sort for nonpartitioned ranking calculation
|--Sequence Project(DEFINE:([Expr1004] order or exactly reversed, plus include the rest of the
=row_number)) columns from the query in the INCLUDE clause for
|--Segment coverage purposes. With this in mind, to support the
|--Parallelism(Gather Streams, ORDER previous query you can define the index with all the
BY:([orderdate] DESC, [orderid] DESC)) keys in ascending order, like so:
|--Sort(ORDER BY:([orderdate] DESC, [orderid]
DESC)) CREATE UNIQUE INDEX idx_od_oid_i_cid_filler
|--Clustered Index Scan(OBJECT:([idx_cl_od])) ON dbo.Orders(orderdate, orderid)
INCLUDE(custid, filler);
Indexing guidelines for queries with nonpartitioned
ranking calculations are to have the ranking ordering Rerun the query, and observe in the query execution
columns in the index key list, either in specified plan in Figure 4 that the index was scanned in an

FEATURE DESCENDING INDEXES
Figure 5
Query plan with sort for partitioned ranking calculation
|--Index Scan(OBJECT:([idx_od_oid_i_
cid_ filler]))
If you want to avoid sorting, you need to

Figure 6
arrange an index that matches the ordering in
Query plan without sort for partitioned ranking calculation the ranking calculation exactly, like so:
ordered backward fashion. Here’s the textual form CREATE UNIQUE INDEX idx_cid_odD_oidD_i_
of the plan: filler
ON dbo.Orders(custid, orderdate DESC,
|--Sequence Project(DEFINE:([Expr1004]=row_number)) orderid DESC)
|--Segment INCLUDE(filler);
|--Index Scan(OBJECT:([idx_od_oid_i_cid_
filler]), ORDERED BACKWARD) Examine the query execution plan in Figure 6, and
observe that the index was scanned in an ordered
However, when partitioning is involved in the forward fashion and a sort was avoided. Here’s the
ranking calculation, it appears that SQL Server is textual plan:
strict about the ordering requirement—it must match
the ordering in the expression. For example, consider |--Sequence Project(DEFINE:([Expr1004]=row_
the following query: number))
|--Segment
SELECT |--Index Scan(OBJECT:([idx_cid_odD_
ROW_NUMBER() OVER(PARTITION BY custid oidD_i_filler]), ORDERED FORWARD)
ORDER BY orderdate DESC,
orderid DESC) AS RowNum, When you’re done, run the following code for
orderid, orderdate, custid, filler cleanup:
FROM dbo.Orders;
DROP INDEX dbo.Orders.idx_od_oid_i_cid_
When partitioning is involved, the indexing guidelines filler;
are to put the partitioning columns first in the key DROP INDEX dbo.Orders.idx_cid_od_oid_i_
list, and the rest is the same as the guidelines for filler;
nonpartitioned calculations. Now try to create an DROP INDEX dbo.Orders.idx_cid_odD_ oidD_i_filler;
index following these guidelines, but have the ordering
columns appear in ascending order in the key list: One More Time
In this article I covered the usefulness of descending
CREATE UNIQUE INDEX idx_cid_od_oid_i_filler indexes. I described the cases in which index ordering
ON dbo.Orders(custid, orderdate, orderid) can be relied on in both forward and backward linked
INCLUDE(filler); list order, as opposed to cases that support only forward
direction. I explained that partitioned ranking calcula-
Observe in the query execution plan in Figure 5 tions can benefit from index ordering only when an
that the optimizer didn’t rely on index ordering but ordered forward scan is used, and therefore to benefit
instead sorted the data. Here’s the textual form of from index ordering you need to create an index in
the plan: which the key column ordering matches that of the
ORDER BY elements in the ranking calculation. I also
|--Parallelism(Gather Streams) explained that even when backward scans in an index
|--Index Insert(OBJECT:([idx_cid_od_oid_i_ are supported, this prevents parallelism; so even in those
filler])) cases there might be benefit in arranging an index that
|--Sort(ORDER BY:([custid] ASC, [orderdate] matches the ordering requirements exactly rather than
ASC, [orderid] ASC) PARTITION in reverse.
ID:([custid])) InstantDoc ID 125090
24
FEATURE
Troubleshooting
Transactional
Replication
3 common transactional replication problems solved
T
ransactional replication is a useful way to shows the name, current status, and number of
keep schema and data for specific objects Subscribers for each publication on the Publisher;
synchronized across multiple SQL Server Subscription Watch List, which shows the status and
databases. Replication can be used in simple scenarios estimated latency (i.e., time to deliver pending com-
involving a few servers or can be scaled up to complex, mands) of all Subscriptions to the Publisher; and
multi-datacenter distributed environments. However, Agents, which shows the last start time and current
no matter the size or complexity of your topology, status of the Snapshot, Log Reader, and Queue Reader
the number of moving parts involved with replica- agents, as well as various automated maintenance jobs Kendal Van Dyke
tion means that occasionally problems will occur that created by SQL Server to keep replication healthy.
(kendal.vandyke@gmail.com) is a senior DBA
require a DBA’s intervention to correct. Expanding a Publisher node in the treeview shows in Celebration, FL. He has worked with SQL
In this article, I’ll show you how to use SQL Server’s its publications. Selecting a publication displays four Server for more than 10 years and managed
native tools to monitor replication performance, tabbed views in the right pane: All Subscriptions, high-volume replication topologies for nearly
receive notification when problems occur, and diagnose which shows the current status and estimated latency seven years. He blogs at kendalvandyke
.blogspot.com.
the cause of those problems. Additionally, I’ll look at of the Distribution Agent for each Subscription; Tracer
three common transactional replication problems and Tokens, which shows the status of recent tracer tokens
explain how to fix them. for the publication (I’ll discuss tracer tokens in more
A View into Replication Health

detail later); Agents, which shows the last
start time, run duration, and current status of MORE on the WEB
M
Replication Monitor is the primary GUI tool at your the Snapshot and Log Reader agents used by Download the listings at
disposal for viewing replication performance and diag- the publication; and Warnings, which shows
nosing problems. Replication Monitor was included in the settings for all warnings that have been
Enterprise Manager in SQL Server 2000, but in SQL configured for the publication.
Server 2005, Replication Monitor was separated from Right-clicking any row (i.e., agent) in the Subscrip-
SQL Server Management Studio (SSMS) into a stand- tion Watch List, All Subscriptions, or Agents tabs
alone executable. Just like SSMS, Replication Monitor
can be used to monitor Publishers, Subscribers, and
Distributors running previous versions of SQL Server,
although features not present in SQL Server 2005 won’t
be displayed or otherwise available for use.
To launch Replication Monitor, open SSMS, con-
nect to a Publisher in the Object Explorer, right-click
the Replication folder, and choose Launch Replica-
tion Monitor from the context menu. Figure 1 shows
Replication Monitor with several registered Publishers
added. Replication Monitor displays a treeview in the
left pane that lists Publishers that have been registered;
the right pane’s contents change depending on what’s
selected in the treeview.
Selecting a Publisher in the treeview shows three Figure 1
Fi
tabbed views in the right pane: Publications, which Replication Monitor with registered Publishers added

FEATURE TROUBLESHOOTING TRANSACTIONAL REPLICATION
will display a context menu with options that include way through to Subscribers (the latency values shown
stopping and starting the agent, viewing the agent’s for agents in Replication Monitor are estimated).
profile, and viewing the agent's job properties. Double- Creating a tracer token writes a special marker to the
clicking an agent will open a new window that shows transaction log of the Publication database that’s read
specific details about the agent’s status. by the Log Reader agent, written to the distribution
Distribution Agent windows have three tabs: database, and sent through to all Subscribers. The time
Publisher to Distributor History, which shows the it takes for the token to move through each step is saved
status and recent history of the Log Reader agent for in the Distribution database.
the publication; Distributor to Subscriber History, Tracer tokens can be used only if both the Pub-
which shows the status and recent history of the Dis- lisher and Distributor are on SQL Server 2005 or
tribution Agent; and Undistributed Commands, which later. Subscriber statistics will be collected for push
shows the number of commands at the distribution subscriptions if the Subscriber is running SQL Server
database waiting to be applied to the Subscriber and 7.0 or later and for pull subscriptions if the Subscriber
an estimate of how long it will take to apply them. Log is running SQL Server 2005 or higher. For Subscribers
Reader and Snapshot Reader agent windows show that don’t meet these criteria (non–SQL Server Sub-
only an Agent History tab, which displays the status scribers, for example), statistics for tracer tokens will
and recent history of that agent. still be gathered from the Publisher and Distributor.
When a problem occurs with replication, such To add a tracer token you must be a member of the
as when a Distribution Agent fails, the icons for sysadmin fixed server role or db_ownerr fixed database
the Publisher, Publication, and agent will change role on the Publisher.
depending on the type of problem. Icons overlaid by To add a new tracer token or view the status of
a red circle with an X indicate an agent has failed, a existing tracer tokens, navigate to the Tracer Tokens
white circle with a circular arrow indicates an agent tab in Replication Monitor. Figure 2 shows an
is retrying a command, and a yellow caution symbol example of the Tracer Tokens tab showing latency
indicates a warning. Identifying the problematic details for a previously inserted token. To add a new
agent is simply a matter of expanding in the treeview token, click Insert Tracer. Details for existing tokens
the Publishers and Publications that are alerting to can be viewed by selecting from the drop-down list on
a condition, selecting the tabs in the right pane for the right.
the agent(s) with a problem, and double-clicking the
agent to view its status and information about the Know When There Are Problems
error. Although Replication Monitor is useful for viewing
replication health, it’s not likely (or even reasonable)
Measuring the Flow of Data that you’ll keep it open all the time waiting for an error
Understanding how long it takes for data to move to occur. After all, as a busy DBA you have more to
through each step is especially useful when trouble- do than watch a screen all day, and at some point you
shooting latency issues and will let you focus your atten- have to leave your desk.
tion on the specific segment that’s problematic. Tracer However, SQL Server can be configured to raise
tokens were added in SQL Server 2005 to measure the alerts when specific replication problems occur. When
flow of data and actual latency from a Publisher all the a Distributor is initially set up, a default group of
alerts for replication-related events is created. To view
the list of alerts, open SSMS and make a connection
to the Distributor in Object Explorer, then expand the
SQL Server Agent and Alerts nodes in the treeview.
To view or configure an alert, open the Alert proper-
ties window by double-clicking the alert or right-click
the alert and choose the Properties option from the
context menu. Alternatively, alerts can be configured
in Replication Monitor by selecting a Publication in
the left pane, viewing the Warnings tab in the right
pane, and clicking the Configure Alerts button.
The options the alert properties window offers for
response actions, notification, etc., are the same as an
alert for a SQL Server agent job. Figure 3 shows the
Warnings tab in Replication Monitor.
There are three alerts that are of specific interest
Figure 2
Fi for transactional replication: Replication: Agent failure,
Tracer Tokens tab showing latency details for a token Replication: Agent retry, and Replication Warning:
28
TROUBLESHOOTING TRANSACTIONAL REPLICATION FEATURE
Transactional replication latency (Threshold: latency).

By default, only the latency threshold alerts are
enabled (but aren’t configured to notify an operator).
The thresholds for latency alerts are configured in
the Warnings tab for a Publication in Replication
Monitor. These thresholds will trigger an alert if
exceeded and are also used by Replication Monitor to
determine if an alert icon is displayed on the screen.
In most cases, the default values for latency alerts are
sufficient, but you should review them to make sure
they meet the SLAs and SLEs you’re responsible for.
A typical replication alert response is to send a
notification (e.g., an email message) to a member of
the DBA team. Because email alerts rely on Database
Mail, you’ll need to configure that first if you haven’t
done so already. Also, to avoid getting inundated Figure 3
Fi
with alerts, you’ll want to change the delay between
Replication Monitor’s Warnings tab
responses to five minutes or more. Finally, be sure
to enable the alert on the General page of the Alert
LISTING 1: Code to Acquire the Publisher’s Database ID
properties window.
Changes to alerts are applied to the Distributor and SELECT DISTINCT
subscriptions.publisher_database_id
affect all Publishers that use the Distributor. Changes FROM sys.servers AS [publishers]
INNER JOIN distribution.dbo.MSpublications AS [publications] ON
to alert thresholds are applied only to the selected publishers.server_id = publications.publisher_id
Publication and can’t be applied on a Subscriber-by- INNER JOIN distribution.dbo.MSarticles AS [articles] ON publications
.publication_id = articles.publication_id
Subscriber basis. INNER JOIN distribution.dbo.MSsubscriptions AS [subscriptions] ON
articles.article_id = subscriptions.article_id
AND articles.publication_id = subscriptions.publication_id
Other Potential Problems to AND articles.publisher_db = subscriptions.publisher_db
AND articles.publisher_id = subscriptions.publisher_id
Keep an Eye On INNER JOIN sys.servers AS [subscribers] ON subscriptions.subscriber_id
= subscribers.server_id
Two other problems can creep up that neither alerts WHERE publishers.name = 'MyPublisher'
AND publications.publication = 'MyPublication'
nor Replication Monitor will bring to your atten- AND subscribers.name = 'MySubscriber'
tion: agents that are stopped, and unchecked growth
of the distribution database on the Distributor.
A common configuration option is to run agents running. Once commands have been delivered to all
continuously (or Start automatically when SQL Server Subscribers, they need to be removed to free space
Agent starts). Occasionally, they might need to be for new commands. When the Distributor is initially
stopped, but if they aren’t restarted, you can end up set up, a SQL Server Agent job named Distribution
with transactions that accumulate at the Distributor clean up: distribution is created to remove com-
waiting to be applied to the Subscriber or, if the log mands that have been delivered to all Subscribers.
reader agent was stopped, transaction log growth at If the job is disabled or isn’t running properly (e.g.,
the Publisher. The estimated latency values displayed is blocked) commands won’t be removed and the
in Replication Monitor are based on current perfor- distribution database will grow. Reviewing this job’s
mance if the agent is running, or the agent’s most history and the size of the distribution database for
recent history if it’s stopped. If the agent was below every Distributor should be part of a DBA’s daily
the latency alert threshold at the time it was stopped, checklist.
then a latency alert won’t be triggered and Replication
Monitor won’t show an alert icon. Common Problems and Solutions
The dbo.Admin_Start_Idle_Repl_Agents stored Now that you have the tools in place to monitor
procedure in Web Listing 1 (www.sqlmag.com, performance and know when problems occur, let’s
InstantDoc ID 104703) can be applied to the Dis- take a look at three common transactional replication
tributor (and Subscribers with pull subscriptions) problems and how to fix them.
and used to restart replication agents that are Distribution Agents fail with the error message The
scheduled to run continuously but aren’t currently row was not found at the Subscriber when applying the
running. Scheduling this procedure to run periodi- replicated commandd or Violation of PRIMARY KEY
cally (e.g., every six hours) will prevent idle agents constraint [Primary Key Name]. Cannot insert dupli-
from turning into bigger problems. cate key in object [Object Name].
Unchecked growth of the distribution database Cause: By default, replication delivers com-
on the Distributor can still occur when all agents are mands to Subscribers one row at a time (but as

FEATURE TROUBLESHOOTING TRANSACTIONAL REPLICATION
part of a batch wrapped by a transaction) and All Subscriptions tab, and choose the Agent Profile
uses @@rowcount to verify that only one row was menu option. A new window will open that lets you
affected. The primary key is used to check for which change the selected agent profile; select the check box
row needs to be inserted, updated, or deleted; for for the Continue on data consistency errorss profile,
inserts, if a row with the primary key already exists and then click OK. Figure 4 shows an example of
at the Subscriber, the command will fail because the Agent Profile window with this profile selected.
of a primary key constraint violation. For updates The Distribution Agent needs to be restarted for
or deletes, if no matching primary key exists, the new profile to take effect; to do so, right-click
@@rowcount returns 0 and an error will be raised the Subscriber and choose the Stop Synchronizing
that causes the Distribution Agent to fail. menu option. When the Subscriber’s status changes
Solution: If you don’t care which command is from Running to Not Running, right-click the Sub-
failing, you can simply change the Distribution scriber again and select the Start Synchronizing menu
Agent’s profile to ignore the errors. To change the option.
profile, navigate to the Publication in Replication This profile is a system-created profile that will
Monitor, right-click the problematic Subscriber in the skip three specific errors: inserting a row with a
duplicate key, constraint violations, and rows missing
from the Subscriber. If any of these errors occur while
using this profile, the Distribution Agent will move
on to the next command rather than failing. When
choosing this profile, be aware that the data on the
Subscriber is likely to become out of sync with the
Publisher.
If you want to know the specific command that’s
failing, the sp_browsereplcmdss stored procedure can
be executed at the Distributor. Three parameters are
required: an ID for the Publisher database, a transac-
tion sequence number, and a command ID. To get the
Publisher database ID, execute the code in Listing 1
on your Distributor (filling in the appropriate values
for Publisher, Subscriber, and Publication). To get the
transaction sequence number and command ID, navi-
gate to the failing agent in Replication Monitor, open
its status window, select the Distributor to Subscriber
History tab, and select the most recent session with
an Error status. The transaction sequence number
and command ID are contained in the error details
message. Figure 5 shows an example of an error
message containing these two values.
Finally, execute the code in Listing 2 using the
FFigure 4 values you just retrieved to show the command that’s
Continue on data consistency errors profifile selected in the Distribution Agent’s profi
file failing at the Subscriber. Once you know the com-
mand that’s failing, you can make changes at the
Subscriber for the command to apply successfully.
Distribution Agent fails with the error message
Could not find stored procedure 'sp_MSins_.
Cause: The Publication is configured to deliver
INSERT, UPDATE, and DELETE commands using
Figure 5
Fi stored procedures, and the procedures have been
dropped from the Subscriber. Replication stored
An error message containing the transaction sequence number and command ID
procedures aren’t considered to be system stored
procedures and can be included using schema com-
LISTING 2: Code to Show the Command parison tools. If the tools are used to move changes
that’s Failing at the Subscriber from a non-replicated version of a Subscriber database
EXECUTE distribution.dbo.sp_browsereplcmds to a replicated version (e.g., migrating schema changes
@xact_seqno_start = '0x0000001900001926000800000000',
@xact_seqno_end = '0x0000001900001926000800000000', from a local development environment to a test envi-
@publisher_database_id = 29,
@command_id = 1
ronment) the procedures could be dropped because
they don’t exist in the non-replicated version.
30
TROUBLESHOOTING TRANSACTIONAL REPLICATION FEATURE
Solution: This is an easy problem to fix. In the (i.e., making it a value of 768 or 1,024) should
published database on the Publisher, execute the be sufficient to resolve the issue. Click OK after
sp_scriptPublicationcustomprocs stored procedure modifying the value. Rebooting will ensure that the
to generate the INSERT, UPDATE, and DELETE new value is used by Windows. For more informa-
stored procedures for the Publication. This procedure tion about the non-interactive desktop heap, see
only takes one parameter—the name of the Publica- “Unexpected behavior occurs when you run many
tion—and returns a single nvarchar(4000) column as processes on a computer that is running SQL Server”
the result set. When executed in SSMS, make sure (support.microsoft.com/kb/824422).
to output results to text (navigate to Control-T or
Query Menu, Results To, Results To Text) and that Monitoring Your Replication
the maximum number of characters for results to Environment
text is set to at least 8,000. You can set this value by When used together, Replication Monitor, tracer
selecting Tools, Options, Query Results, Results to tokens, and alerts are a solid way for you to monitor
Text, Maximum number of characters displayed in your replication topology and understand the
each column. After executing the stored procedure, source of problems when they occur. Although the
copy the scripts that were generated into a new query techniques outlined here offer guidance about how
window and execute them in the subscribed database to resolve some of the more common issues that
on the Subscriber. occur with transactional replication, there simply
Distribution Agents won’t start or don’t appear to isn’t enough room to cover all the known problems
do anything. in one article. For more tips about troubleshooting
Cause: This typically happens when a large replication problems, visit the Microsoft SQL Server
number of Distribution Agents are running on the Replication Support Team’s REPLTalk k blog at
same server at the same time; for example, on a blogs.msdn.com/repltalk.
Distributor that handles more than 50 Publications InstantDoc ID 104703
or Subscriptions. Distribution Agents are indepen-
dent executables that run outside of the SQL Server
process in a non-interactive fashion (i.e., no GUI).
Windows Server uses a special area of memory
called the non-interactive desktop heap to run these
kinds of processes. If Windows runs out of available
memory in this heap, Distribution Agents won’t be
able to start.
Solution: Fixing the problem involves making
a registry change to increase the size of the non-
interactive desktop heap on the server experi-
encing the problem (usually the Distributor) and
rebooting. However, it’s important to note that
modifying the registry can result in serious prob-
lems if it isn’t done correctly. Be sure to perform the
following steps carefully and back up the registry
before you modify it:
1. Start the Registry Editor by typing regedit32
.exe in a run dialog box or command prompt.
2. Navigate to the HKEY_LOCAL_
MACHINE\SYSTEM\CurrentControlSet\Control\
Session Manager\SubSystems key in the left pane.
3. In the right pane, double-click the Windows
value to open the Edit String dialog box.
4. Locate the SharedSection parameter in
the Value data input box. It has three values
separated by commas, and should look like the
following:
SharedSection=1024,3072,512
The desktop heap is the third value (512 in this

example). Increasing the value by 256 or 512

FEATURE
Maximizing
Report Performance
with
Parameter-Driven
Expressions
ed up your reports
O
nly static reports saved as pre-rendered such as mountain bikes. Once the user chooses a
images—snapshot reports—can be loaded specific bike from within the class, a subreport is gen-
and displayed (almost) instantly, so users erated to show details, including a photograph and
are accustomed to some delay when they ask for other computed information. By separating the photo
reports that reflect current data. Some reports, how- and computed information from the base query, you
ever, can take much longer to generate than others. can help the report processor generate the base report
Complex or in-depth reports can take many hours to much more quickly.
produce, even on powerful systems, while others can In this example, my primary goal is to help the
be built and rendered in a few seconds. Parameter- user focus on a specific subset of the data—in other
William Vaughn
driven expressions, a technique that I expect is new words, to help users view only information in which (billva@betav.com) is an expert on Visual
Studio, SQL Server, Reporting Services, and
to many of you, can help you greatly in speeding up they’re interested. You can do this several ways, but
data access interfaces. He’s coauthor of the
your reports. typically you either add a parameter-driven WHERE Hitchhiker’s Guide series, includingg Hitchhiker’s
Visit the online version of this article at www clause to the initial query or parameter-driven filters Guide to Visual Studio and SQL Serverr, 7th ed.
.sqlmag.com, InstantDoc ID 125092, to down- to the report data regions. I’ll do the latter in this (Addison-Wesley).
load the example I created against the Adventure- example.
Works2008 database. If you’re interested in a more
general look at improving your report performance,
Because the initial SELECT query exe-
cuted by the report processor in this example
MORE
MO on the WEB
Download the example
see the web-exclusive sidebar “Optimizing SSRS doesn’t include a WHERE clause, it makes and read about more
Operations,” which offers more strategies for creating sense to capture several parameters that the optimization techniques
reports that perform well. In this sidebar, I walk you at InstantDoc ID 125092.
through the steps your system goes through when it
creates a report. I also share strategies, such as using
a stored procedure to return the rowsets used in a
report so that the SQL Server query optimizer can
reuse a cached query plan, eliminating the need to
recompile the procedure.
The concepts I’ll discuss here aren’t dependent
on any particular version of SQL Server Reporting
Services (SSRS) but I’ll be using the 2008 ver-
sion for the examples. Once you’ve installed the
AdventureWorks2008 database, you’ll start Visual
Studio (VS) 2008 and load the ClientSide Filtering
.sln project. (This technique will work with VS
2005 business intelligence—BI—projects, but I built
the example report using VS 2008 and you can’t
load it in VS 2005 because the Report Definition
Language—RDL—format is different.) Open Shared
Data Source and the Project Properties to make sure
the connection string points to your SSRS instance.
The example report captures parameters from the Figure 1
Fi
user to focus the view on a specific class of bicycles, The report’s Design pane

FEATURE PARAMETER-DRIVEN EXPRESSIONS
report processor can use to narrow the report’s focus. query or change the columns being fetched, the
(There’s nothing to stop you from further refining designer doesn’t keep in sync with the RDL very
the initial SELECT to include parameter-driven well. Be prepared to open the RDL to make changes
WHERE clause filtering.) I’ll set up some report from time to time.
parameters—as opposed to query parameters—to 3. Right-click the Parameters folder in the
accomplish this goal. Report Data dialog and choose Add Parameter
1. In VS’s Business Intelligence Development to open the Report Parameter Properties dialog
Studio (BIDS) Report Designer, navigate to the box, which Figure 2 shows. Here’s where each of
report’s Design pane, which Figure 1 shows. Note the parameters used in the filter expressions or the
that report-centric dialog boxes such as the Report query parameters are defined. The query param-
eters should be generated automatically if your
query references a parameter in the WHERE clause
By separating the photo and once the dataset is bound to a data region on the
report (such as a Tablix report element).
computed information from the 4. Use the Report Parameter Properties dialog
box to name and define the prompt and other
base query, you can help the report attributes for each of the report parameters you’ll
processor generate the base report need to filter the report. Figure 2 shows how I set
the values, which are used to provide the user with
much more quickly. a drop-down list of available Product Lines from
which to choose.
Data window only appear when focus is set to the Note that the Value setting is un-typed—you
Design pane. can’t specify a length or data type for these values.
2. Use the View menu to open the Report Data This can become a concern when you try to com-
dialog box, which is new in the BIDS 2008 Report pare the supplied value with the data column values
Designer. This box names each of the columns in the Filters expression I’m about to build, espe-
returned by the dataset that’s referenced by (the cially when the supplied parameter length doesn’t
case-sensitive) name. If you add columns to the match the length of the data value being tested.
dataset for some reason, make sure these changes 5. I visited the Default Value tab of the Report
are reflected in the Report Data dialog box as well Parameter Properties dialog box to set the param-
as on your report. eter default to M (for Mountain Bikes). If all of
Don’t expect to be able to alter the RDL (such your parameters have default values set, the report
as renaming the dataset) based on changes in the processor doesn’t wait before rendering the report
Report Data dialog box. When you rename the when first invoked. This can be a problem if you
can’t determine a viable default parameter configu-
ration that makes sense.
6. The next step is to get the report processor
to filter the report data based on the parameter
value. On the Report Designer Design pane,
click anywhere on the Table1 Tablix control and
right-click the upper left corner of the column-
chooser frame that appears. The trick is to make
sure you’ve selected the correct element before
you start searching for the properties, and you
should be sure to choose the correct region when
selecting a report element property page. It’s easy
to get them confused.
7. Navigate to the Tablix Properties menu item,
as Figure 3 shows. Here, you should add one or
more parameter-driven filters to focus the report’s
data. Start by clicking Add to create a new Filter.
8. Ignore the Dataset column drop-down list
because it will only lead to frustration. Just click the
fx expression button. This opens an expression
editor where we’re going to build a Boolean-
Figure 2
Fi returning expression that tests for the filter values.
The Report Parameter Properties dialog box It’s far easier to write your own expressions that
34
PARAMETER-DRIVEN EXPRESSIONS FEATURE
resolve to True or False than to get the report pro-

cessor to keep the data types of the left and right
side of the expressions straight. This is a big short-
coming, but there’s a fairly easy workaround that
I’m using for this example.
9. You can use the Expression editor dialog
box, which Figure 4 shows, to build simple
expressions or reference other, more com-
plex expressions that you’ve written as Visual
Basic and embedded in the report. (The SSRS Figure 3
Fi
and ReportViewer report processors only Opening an element’s properties
know how to interpret simple Visual Basic
expressions—not C# or any other language.)
Virtually every property exposed by the report
or the report elements can be configured to
accept a runtime property instead of fixed value.
Enter the expression shown in Figure 4. This
compares the user-supplied parameter value
(Parameters!ProductLineWanted.Value) with
the Dataset column (Fields!ProductLine.Value).
Click OK to accept this entry. Be sure to strip
any space from these arguments using Trim so
that strings that are different lengths will com-
pare properly.
10. Back in the Tablix Properties page, the
expression text now appears as <<Expr>>, as
Figure 5 shows. (This abbreviation, which hides
the expression, isn’t particularly useful. I’ve asked Figure 4
Fi
Microsoft’s BI tools team to fix it.) Now I’ll set the The Expression editor dialog box
Value expression. Enter =True in the expression’s
Value field. Don’t forget to prefix the value with the
equals sign (=). This approach might be a bit more
trouble, but it’s far easier to simply use Boolean
expressions than to deal with the facts that the RDL
parameters aren’t typed (despite the type drop-down
list) and that the report processor logic doesn’t auto-
type like you might expect because it’s interpreting
Visual Basic.
11. You’re now ready to test the report. Simply
click the Preview tab. This doesn’t deploy the report
to SSRS—it just uses the Report Designer’s built-in
report processor to show you an approximation of
what the report should look like when it’s deployed.
12. If the report works, you’re ready to deploy it
to the SSRS server. Right-click the project to access
the Properties page, which Figure 6 shows. Fill in
the Target Report Folder and Target Server URL.
Of course, these values will be up to your report
admin to determine. In my case, I’m deploying to
a folder called HHG Exampless and targeting my
Betav1/ReportServer’s SS2K8 instance. You should
note that the instance names are separated from the
ReportServer virtual directory by an underscore in
the 2008 version and a dollar sign in the 2000 and
2005 versions.
13. Right-click the report you want to deploy and Figure 5
Fi
choose Deploy. This should connect to the targeted Tablix properties page with <<Expr>> showing

FEATURE PARAMETER-DRIVEN EXPRESSIONS
SSRS instance and save the report to the SSRS catalog. 1. Using Internet Explorer (I haven’t had much
Deploying the report could take 30 seconds or longer luck with Firefox), browse to your SSRS Folder
the first time, as I’ve already discussed, so be patient. .aspx page, as Figure 7 shows. Your report should
appear in the report folder where you deployed it.
Creating a Snapshot Report 2. Find your report and click it. The report
Now that the report is deployed, you need to navigate should render (correctly) in the browser window,
to it with Report Manager so that you can generate a and this is how your end users should see the
snapshot. Use Report Manager to open the deployed report. Report users shouldn’t be permitted to see
report’s properties pages. or alter the report parameters, however—be sure to
configure the report user rights before going into
production. The following operations assume that
you have the appropriate rights.
3. Using the Parameters tab of the report prop-
erty dialog box, you can modify the default values
assigned to the report, hide them, and change the
prompt strings. More importantly, you’ll need to set
up the report to use stored login credentials before
you can create a snapshot.
4. Navigate to the Data Sources tab of the report
properties page. I configured the report to use a
custom data source that has credentials that are
securely stored in the report server. This permits
the report to be run unattended if necessary (such
as when you set up a schedule to regenerate the
snapshot). In this case, I’ve created a special SQL
Figure 6
Fi Server login that has been granted very limited
The project’s properties page rights to execute the specific stored procedure being
used by the report and query against the appropriate
tables, but nothing else.
5. Next, navigate to the Execution tab of the
report property dialog box. Choose Render this
report from a report execution snapshot. Here’s
where you can define a schedule to rebuild the
snapshot. This makes sense for reports that take
a long time to execute, because when the report is
requested, the last saved version is returned, not a
new execution.
6. Check the Create a report snapshot when you
click the Apply button on this page box and click
Apply. This starts the report processor on the desig-
Figure 7
Fi nated SSRS instance and creates a snapshot of the
Browsing to the SSRS Folder.aspx page report. The next time the report is invoked from a
browser, the data retrieved when the snapshot was
built is reused—no additional queries are required
to render the report. It also means that as the user
changes the filter parameters, it’s still not neces-
sary to re-run the query. This can save considerable
time—especially in cases where the queries require a
long time to run.
Handling More Complex

Expressions
I mentioned that the report processor can handle
both Visual Basic expressions embedded in the
report and CLR-based DLL assemblies. It’s easy
Figure 8
Fi to add Visual Basic functions to the report, but
The Report Code window calling an external DLL is far more complex and
36
PARAMETER-DRIVEN EXPRESSIONS FEATURE
the subject of a future article. The problem is

that code embedded in a report, be it a simple
expression or a sophisticated set of Visual Basic
functions, must be incorporated in every single
report that invokes it. Because of this require-
ment, it makes sense to be strategic with these
routines—don’t go too far toward building a set
of reports that use this logic. That way you can
build a template report that includes the report-
level logic so that it won’t have to be added on
a piecemeal basis.
The Report Designer custom code interface in
the Report Properties dialog box doesn’t provide
anything except a blank text box in which to save
the Visual Basic functions you provide, so you
probably want to add a separate Visual Basic class
to the report project or build and test the func-
tions independently. Once you’re happy with the
code, simply copy the source code to the Report
Code window, as Figure 8 shows. This custom
function is used to change the coded Product Line
designation to a human-readable value.
The code used to invoke a Visual Basic func- Figure 9
Fi
tion in an expression is straightforward—it’s too The Product Report Query
bad that the Report Designer doesn’t recognize
these functions to permit you to correctly point
to them. This function is invoked on each row
of the Tablix in the Product Line cell. The
expression invokes the named function. Note
that IntelliSense doesn’t recognize the function
(or any report-defined function).
The report supplied with this article has quite
a few more expressions built in. As Figure 9
shows, the report captures several parameters,
all of which are used in expressions to focus the
Query report on a specific subset of the data.
Because the report is recorded as a snapshot, the
initial query isn’t repeated when the user chooses
another Product Line, price range, or color. I
suggest you investigate the specific cells in the
Tablix data region control to see how these are
coded. Hopefully you won’t find any surprises
when you look through them, because they all
use the same approach to building a Boolean
left-hand side of the Filter expression.
By this point, I hope you’ve learned some-
thing about the inner workings of the SSRS
engine and how the report processor handles
requests for reports. Building an efficient query
is just the beginning to building an efficient
report, and you can use filters to minimize the
amount of additional (often expensive) queries
used to fetch more focused data. Remember that
a snapshot report can be generated and stored
indefinitely, so report consumers can use the
results of your work over and over again.
InstantDoc ID 125092

FEATURE
Getting Started with

Parallel Data
Warehouse
A peek at
a SQL Server 2008 R2's new edition
T
his summerr Microsoft will release the SQL Q orchestration of a single
g server called the control node.
Server 2008 R2 Parallel Data Warehouse The control node accepts client query requests, then
(PDW) Edition, its first product in the creates an MPP execution plan that can call upon one
Massively Parallel Processor (MPP) data ware- or more compute nodes to execute different parts of
house space. PDW uniquely combines MPP software the query, often in parallel. The retrieved results are
acquired from DATAllegro, parallel compute nodes, sent back to the client as a single result set.
commodity servers, and disk storage. PDW lets you
scale out enterprise data warehouse solutions into the Taking a Closer Look
hundreds of terabytes and even petabytes of data for Let’s dive deeper into PDW’s architecture in Figure 1.
Rich Johnson
the most demanding customer scenarios. In addition, As I mentioned previously, PDW has a control node (richjohn@microsoft.com) is a business intelligence
architect with Microsoft working in the US Public
because the parallel compute nodes work concur- that clients connect to in order to query a PDW data-
Sector Services organization. He has worked for
rently, it often takes only seconds to get the results of base. The control node has an instance of the SQL Microsoft since 1996 as a development manager
queries run against tables containing trillions of rows. Server 2008 relational engine for storing PDW meta- and architect of SQL Server database solutions for
For many customers, the large data sets and the fast data. It also uses this engine for storing intermediate OLTP and data-warehousing implementations.
query response times against those data sets are game- query results in TempDB for some query types. The
changing opportunities for competitive advantage. control node can receive the results of intermediate
The simplest way to think of PDW is a layer of
integrated software that logically forms an umbrella
query results from multiple compute nodes
for a single query, store those results in MORE on the WEB
M
over the parallel compute nodes. Each compute node SQL Server temporary tables, then merge Download the code at
is a single physical server that runs its own instance those results into a single result set for final
of the SQL Server 2008 relational engine in a shared- delivery to the client.
nothing architecture. In other words, compute node 1 The control node is an active/passive cluster server.
doesn’t share CPU, memory, or storage with compute Plus, there’s a spare compute node for redundancy
node 2. and failover capability.
Figure 1 shows the architecture for a PDW data A PDW data rack contains 8 to 10 compute
rack. The smallest PDW will take up two full racks of nodes and related storage nodes, depending on the
space in a data center, and you can add storage and hardware vendor. Each compute node is a physical
compute capacity to PDW one data rack at a time. server that runs a standalone SQL Server 2008 rela-
A data rack contains 8 to 10 compute servers from tional engine instance. The storage nodes are Fibre
vendors such as Bull, Dell, HP, and IBM, and Fibre Channel-connected storage arrays containing 10 to
Channel storage arrays from vendors such as EMC2, 12 disk drives.
HP, and IBM. The sale of PDW includes precon- You can add more capacity by adding data racks.
figured and pretested software and hardware that’s Depending on disk sizes, a data rack can contain in the
tightly configured to achieve balanced throughput neighborhood of 30TB to 40TB of useable disk space.
and I/O for very large databases. Microsoft and these (These numbers can grow considerably if 750GB or
hardware vendors provide full planning, implementa- larger disk drives are used by the hardware vendor.)
tion, and configuration support for PDW. The useable disk space is all RAID 1 at the hardware
The collection of physical servers and disk storage level and uses SQL Server 2008 page compression.
arrays that make up the MPP data warehouse is often So, if your PDW appliance has 10 full data racks,
referred to as an appliance. Although the appliance you have 300TB to 400TB of useable disk space and
is often thought of as a single box or single database 80 to 100 parallel compute nodes. As of this writing,
server, a typical PDW appliance is comprised of each compute node is a two-socket server with each
dozens of physical servers and disk storage arrays CPU having at least four cores. In our example, that’s
all working together, often in parallel and under the 640 to 800 CPU cores and lots of Fibre Channel

FEATURE PARALLEL DATA WAREHOUSE
improves queries’ join performance because the data

doesn’t have to be shuffled between compute nodes to
handle certain types of parallel queries or dimension-
only queries. Very large dimension tables might be
candidates for distributed tables.
Distributed tables are typically used for large fact
or transaction tables that contain billions or even
trillions of rows. PDW automatically creates distribu-
tionss for a distributed table. Distributions are separate
physical tables at the SQL Server instance level on a
compute node. Metadata on the control node keeps
track of the mapping of a single distributed table
and all its constituent distributions on each compute
node.
PDW automatically creates eight distributions
per compute node for a distributed table. (As of this
Figure 1 writing, the number of distributions isn’t configu-
Examining the architecture of a PDW data rack rable.) Therefore, a PDW appliance with 10 compute
nodes has 80 total distributions per distributed table.
Loading 100 billion rows into a distributed table will
disk storage. I’m not sure how many organizations cause that table to be distributed across all 80 dis-
currently need that much CPU and storage capacity tributions, providing a suitable distribution key is
for their enterprise data warehouses. However, in the chosen.
words of my big brother, “It’s coming!” A single-attribute column from a distributed table
Besides the control node, PDW has several addi- is used as the distribution key. PDW hashes distrib-
tional nodes: uted rows from a distributed table across the distribu-
Landing zone node. This node is used to run tions as evenly as possible. Choosing the right column
dwloader, a key utility for high-speed parallel loading in a distributed table is a big part of the design process
of large data files into databases, with minimal and will likely be the topic of many future articles and
impact to concurrent queries executing on PDW. With best practices. Suffice it to say, some trial and error
dwloader, data from a disk or SQL Server Integration is inevitable. Fortunately, the dwloader utility can
Services (SSIS) pipeline can be loaded, in parallel, to all quickly reload 5TB or more of data, which is often
compute nodes. A new high-speed destination adapter enough data to test a new design.
was developed for SSIS. Because the destination
adapter is an in-memory process, SSIS data doesn’t Creating Databases and Tables
have be staged on the landing zone prior to loading. To create databases and tables in PDW, you use code
Backup node. This node is used for backing up that is aligned with ANSI SQL 92 but has elements
user databases, which are physically spread across all unique to PDW. To create a database, you use the
compute nodes and their related storage nodes. When CREATE DATABASE command. This command
backing up a single user database, each compute node has four arguments:
backs up, in parallel, its portion of the user database. • AUTOGROW, which you use to specify
To perform the backups, the user databases leverage whether you want to allow data and log files to
the standard SQL Server 2008 backup functionality automatically grow when needed.
that’s provided by the SQL Server 2008 relational • REPLICATED_SIZE, which you use to specify
engine on each compute node. how much space to initially reserve for replicated
Management node. This node runs the Windows tables on each compute node.
Active Directory (AD) domain controller (DC) for • DISTRIBUTED_SIZE, which you use to specify
the appliance. It’s also used to deploy patches to all how much space to initially reserve for distributed
nodes in the appliance and hold images in case a node tables. This space is equally divided among all the
needs reimaging. compute nodes.
• LOG_SIZE, which you use to specify how much
Understanding the Table Types space to initially reserve for the transaction
PDW has two primary types of tables: replicated and log. This space is equally divided among all the
distributed. Replicated tables exist on every compute compute nodes.
node. This type of table is most often used for dimen-
sion tables. Dimension tables are often small, so For example, the following CREATE DATA-
keeping a copy of them on each compute node often BASE command will create a user database named
40
PARALLEL DATA WAREHOUSE FEATURE
my_DB that has 16TB of distributed data space, 1TB LISTING 1: Code that Creates
of replicated table space, and 800GB of log file space a Replicated Table
on a PDW appliance with eight compute nodes: CREATE TABLE DimAccount
(
AccountKey int NOT NULL,
CREATE DATABASE my_DB ParentAccountKey int NULL,
AccountCodeAlternateKey int NULL,
WITH ( AUTOGROW = ON ParentAccountCodeAlternateKey int NULL,
,REPLICATED_SIZE = 1024 GB AccountDescription nvarchar(50) ,
AccountType nvarchar(50) ,
,DISTRIBUTED_SIZE = 16,384 GB Operator nvarchar(50) ,
CustomMembers nvarchar(300) ,
,LOG_SIZE = 800 GB ) ValueType nvarchar(50) ,
CustomMemberOptions nvarchar(200)
)
A total of 8TB of usable disk space (1024GB × 8 WITH (CLUSTERED INDEX(AccountKey),
DISTRIBUTION = REPLICATE);
compute nodes) will be consumed by replicated
tables because each compute node needs enough disk
space to contain a copy of each replicated table. Two
LISTING 2: Code that Creates
terabytes of usable disk space will be consumed by
a Distributed Table
each of the 8 compute nodes (16,384GB / 8 compute
nodes) for distributed tables. Each compute node will CREATE TABLE FactSales
(
also consume 100GB of usable disk space (800GB / StoreIDKey int NOT NULL,
ProductKey int NOT NULL,
8 compute nodes) for log files. As a general rule of DateKey int NOT NULL,
SalesQty int NOT NULL,
thumb, the overall log-file space for a user database SalesAmount decimal(18,2) NOT NULL
should be estimated at two times the size of the largest )
WITH (CLUSTERED INDEX(DateKey),
data file being loaded. DISTRIBUTION = HASH(StoreIDKey));
When creating a new user database, you won’t
be able to create file groups. PDW does this auto-
matically during database creation because file group distributed as evenly as possible across all the com-
design is tightly configured with the storage to achieve pute nodes and their distributions. For a retailer
overall performance and I/O balance across all com- with a large point of sale (POS) fact table and a
pute nodes. large store-inventory fact table, a good candidate
After the database is created, you use the CREATE for the distribution key might be the column that
TABLE command to create both replicated and dis- contains the store ID. By hash distributing both fact
tributed tables. PDW’s CREATE TABLE command tables on the store ID, you might create a fairly even
is very similar to a typical SQL Server CREATE distribution of the rows across all compute nodes.
TABLE command and even includes the ability to Also, PDW will co-locate on the same distribution
partition distributed tables as well as replicated tables. (i.e., rows from the POS fact table and rows from
The most visible difference in this command on PDW the store-inventory fact table for the same store ID).
is the ability to create a table as replicated or to create Co-located data is related, so queries that access
a table as distributed. POS data and related store inventory data should
As a general rule of thumb, replicated tables perform very well.
should be 1GB or smaller in size. Listing 1 contains To take full advantage of PDW, designing data-
a sample CREATE TABLE statement that creates bases and tables for the highest-priority queries is
a replicated table named DimAccount. As you crucial. PDW excels at scanning and joining large
can see, the DISTRIBUTION argument is set to distributed tables, and often queries against these
REPLICATE. large tables are mission critical. A good database
Generally speaking, distributed tables design on PDW often takes a lot of trial and
are used for transaction or fact tables that are error. What you learned in the single server data-
often much larger than 1GB in size. In some base world isn’t always the same in the MPP data
cases, large dimension tables—for example, a warehouse world. For instance, clustered indexes
500-million row customer account table—is a can work well for large distributed tables, but non-
better candidate for a distributed table. Listing 2 clustered indexes can degrade query performance
contains code that creates a distributed table in some cases because of the random I/O patterns
named FactSales. (You can download the code in they create on the disk storage. PDW is tuned and
Listing 1 and Listing 2 by going to www.sqlmag configured to achieve high rates of sequential I/O
.com, entering 125098 in the InstantDoc ID text against large tables. For many queries, sequential
box, clicking Go, then clicking the Download the I/O against a distributed table can be faster than
Code Here button.) As I mentioned previously, using nonclustered indexes, especially under con-
a single-attribute column must be chosen as the current workloads. In the MPP data warehouse
distribution key so that data loading can be hash world, this is known as an index-light design.

FEATURE PARALLEL DATA WAREHOUSE
Querying Tables WHERE DateKey >= 20090401

After the tables are loaded with data, clients can AND DateKey <= 20090408
connect to the control node and use SQL statements AND ProductKey = 2501
to query PDW tables. For example, the following GROUP BY StoreIDKey
query runs against the FactSales table created
with Listing 2, leveraging all the parallel compute This query performs exceedingly well against very large
nodes and the clustered index for this distributed distributed tables. It’s distribution-compatible and
table: aggregation-compatible because each compute node
can answer its part of the parallel query in its entirety
SELECT StoreIDKey, SUM(SalesQty) without shuffling data among compute nodes or
FROM dbo.FactSales merging intermediate result sets on the control node.
Once the control node receives that
query, PDW’s MPP engine performs its
magic by taking the following steps:
1. It parses the SQL text.
2. It validates and authorizes all
objects.
3. It builds an MPP execution plan.
4. It runs the MPP execution plan by
executing SQL SELECT commands in
parallel on each compute node.
5. It gathers and merges all the par-
allel result sets from the compute nodes.
6. It returns a single result set to the
client.
As you can see, although queries appear

to be run against a single table, in
reality, they’re run against a multitude
of tables.
The MPP engine is responsible for
a variety of features and functions
in PDW. They include appliance con-
figuration management, authentication,
authorization, schema management,
MPP query optimization, MPP query
execution control, client interaction,
metadata management, and collection
of hardware status information.
The Power of PDW

PDW’s power lies in large distributed
tables and parallel compute nodes that
scan those distributed tables to answer
queries. Thus, PDW is well-suited for
vertical industries (e.g., retail, telecom,
logistics, hospitality), where large
amounts of transactional data exist.
It doesn’t take too much of a leap to
consider PDW well-suited for mission-
critical applications (e.g., for the military
or law enforcement), where lives depend
on such capabilities. I hope you’re as
excited as I am to work on real-world
and mission-critical solutions with this
new technology.
InstantDoc ID 125098
42
PRODUCT
REVIEW
Panorama NovaView Suite

P anorama Software’s Panorama NovaView is a
suite of analytical and reporting software prod-
ucts driven by the centralized NovaView Server. There
component lets business users publish their insights
to the cloud for further collaboration. It requires a
Google Docs account, and you can use it to supply
are two editions of NovaView: Standard Edition and information to non-traditional parties, such as
Enterprise Edition. All NovaView components are company suppliers and partners. It’s easy to publish
installed on the server, except for NovaView Spot- reports, and the reports look the same no matter
light, which is an extension to Microsoft Office, so it’s which edition of NovaView Analytics you’re using.
installed on client machines. The product has its own security model to ensure Derek Comingore
NovaView currently ships only in an x86 build. your company’s information is available to only
(dcomingore@bivoyage.com) is a principal
The software is highly multi-threaded and has built-in those you trust. architect with BI Voyage, a Microsoft Partner
support for connection pooling. The key hardware One of the most interesting features of NovaView that specializes in business intelligence services
requirements are that the number of CPUs be pro- is the company’s upcoming support for Microsoft and solutions. He’s a SQL Server MVP and holds
portional to the number of concurrent users (with a PowerPivot. PowerPivot will be just another data several Microsoft certifications.
recommended minimum of 4 CPUs and roughly two source from the perspective of the NovaView suite.
cores per 100 users) and that enough physical RAM You might wonder why anyone would want to use
be present to run all the pnSessionHost.exe processes another BI tool with PowerPivot. PowerPivot is an
(with a recommended minimum of 4GB). outstanding self-service BI product, but it’s also
On the software side, NovaView requires Windows a version-one product. There are a few areas of
Server 2008 or Windows Server 2003, IIS 6 or higher, PowerPivot that Microsoft has left to improve upon
.NET Framework 2.0 or higher, and Microsoft Visual that NovaView will provide, including complex
J# 2.0. If you’re going to source data from SQL hierarchies, data security, and additional data visual-
Server Analysis Services (SSAS) cubes, you’ll need ization options.
a separate server installation with SSAS. NovaView NovaView offers end-to-end BI delivery, and it
can also work with many other mainstream enterprise does it quite well. Panorama has clearly used its deep
data sources. knowledge of OLAP and MDX to produce some of
NovaView Server provides infrastructure services the very best delivery options on the market today.
for the entire NovaView suite of client tools. It sup- Businesses that are looking to extend their existing
ports a wide variety of data sources, including the Microsoft data warehouse and BI solutions or make
SQL Server relational engine, SSAS, Oracle, SAP, PowerPivot enterprise-ready should strongly con-
flat files, and web services. NovaView Server is a sider NovaView. Given the sheer breadth and depth
highly scalable piece of software that can support of the suite, it’s obvious that not all customers will
up to thousands of users and terabytes of data, need all of its components. Small-to-midsized busi-
according to reports from Panorama Software’s nesses might find NovaView’s relatively high cost
established customers. prohibitive.
The NovaView Dashboards, NovaView Visuals, InstantDoc ID 104648
and NovaView GIS Framework components pro-
vide the next layer of business intelligence (BI)
delivery, including basic analytics and other visual- PANORAMA NOVAVIEW SUITE
izations. NovaView Dashboards provides a mature Pros: All user-facing components are browser based; supports both
modern-day dashboarding product that lets you OLAP and non-OLAP data sources; components are tightly bound; supports
create complex views from NovaView Server. Both core needs of both business and IT users
Key Performance Indicators (KPIs) and charts are Cons: High price; additional server components required; neither edition is as
easily created and combined to form enterprise graphically rich or ﬂ
ﬂuid as alternatives such as Tableau Software’s client
dashboards.
Rating:
NovaView Visuals provides advanced informa-
tion visualization options for NovaView Dashboards. Price: Server licenses range from $12,000 to $24,000, depending on con-
For example, capabilities equivalent to ProClarity’s figuration; client licensess range from $299 to $14,000
Decomposition Tree are included as part of NovaView Recommendation: If you’re in the market for a third-party toolset to add
Visuals. With one click, you can map out analytics that functionality to Microsoft’s BI tools, your search is over. But if you only need a
come from NovaView Analytics. few of the suite’s functions, its cost could be prohibitive.
NovaView SharedViews represents a joint ven- Contact: info@panorama.com • 1-416-545-0990 • www.panorama.com
ture between Panorama Software and Google. This

INDUSTRY
BYTES
PowerPivot vs.Tableau
This article is a summarized version of Derek Tableau 5.1
Comingore’s original blog. To read the full article, go to Tableau Desktop provides an easy-to-use, drag-and-
sqlmag.com/go/SQLServerBI. drop interface letting anyone create pivot tables and
data visualizations. Dashboard capabilities are also
Microsoft PowerPivot 2010 available in Tableau Desktop by combining multiple
PowerPivot is composed of Desktop (PowerPivot for worksheets into a single display.
Excel) and Server (PowerPivot for SharePoint) com- Tableau’s strength lies in the product’s visualiza-
ponents. The client experience is embedded directly tion and easy-to-use authoring capabilities. The for-
Derek Comingore
within Microsoft Excel, so its authoring experience is mula language is easy enough to use and build custom
Excel. Users can create custom measures, calculated measures and calculated fields with. Both Tableau (dcomingore@bivoyage.com) is a principal
architect with B.I. Voyage, a Microsoft Partner
fields, subtotals, grand totals, and percentages. It uses Server and Desktop installations are extremely easy
that specializes in business intelligence services
a language called Data Analysis eXpressions (DAX) to perform. However, working with massive volumes and solutions. He’s a SQL Server MVP and
to create custom measures and calculated fields. of data can be painful. Tableau’s formula language holds several Microsoft certifications.
PowerPivot ships in x64 and leverages a column- is impressive in its simplicity but isn’t as extensive as
oriented in-memory data store, so you can work with PowerPivot’s DAX. Tableau Server is a good server
massive volumes of data efficiently. DAX provides an product but cannot offer the wealth of features found
extensive expression language to build custom mea- in SharePoint Server.
sures and fields with. PowerPivot for Excel supports If your company is looking for an IT-oriented
practically any data source available. PowerPivot for product that is geared for integration with corporate
SharePoint offers a wealth of features from Share- BI solutions, PowerPivot is a no brainer. If your
Point. On the downside, PowerPivot for Excel can be company is looking for a business-user centric
confusing and DAX is very complex. PowerPivot for platform with little IT usage or corporate BI
Excel does not support parent/child hierarchies either. integration, Tableau should be your choice.

NEW
PRODUCTS
BUSINESS INTELLIGENCE
Lyzasoft Enhances Data Collaboration Tool Editor’s Tipp
Lyzasoft has announced Lyza 2.0. This version lets groups within an enterprise synthesize data from many
Got a great
sources, visually analyze the data, and compare their findings among the workgroup. New features include
micro-tiling, improved user format controls, ad hoc visual data drilling, n-dimensional charting, advanced
new product?
sorting controls, and a range of functions for adding custom fields to charts. Lyza 2.0 also introduces new Send announce-
collaboration features, letting users interact with content in the form of blogs, charts, tables, dashboards, ments to products@
and collections. Lyza costs $400 for a one-year subscription and $2,000 per user for a perpetual license. To sqlmag.com.
learn more, visit www.lyzasoft.com. —Brian Reinholz,
production editor
DATABASE ADMINISTRATION
Attunity Updates Change Data Capture Suite
Attunity announced a new release of its change data capture and operational data replication software,
Attunity, with support for SQL Server 2008 R2. Attunity now tracks log-based changes across all versions of
SQL Server, supports data replication into heterogeneous target databases, and fully integrates with SQL Server
Integration Services (SSIS) and Business Intelligence Development Studio (BIDS). To learn more or download
a free trial, visit www.attunity.com.
Easily Design PDF Flow Charts
Aivosto has released Visustin 6.0, a flow chart generator that converts
T-SQL code to flow charts. The latest version can create PDF flow
charts from 36 programming languages—the new version adds sup-
port for JCL, Matlab, PL/I, Rexx and SAS code. Visustin reads source
code and visualizes each function as a flow chart, letting it see how the
functions operate. With the software, you can easily view two charts side
by side, making for easy comparisons. The standard edition costs $249
and the pro edition costs $499. To learn more, visit www.aivosto.com.
Embarcadero Launches Multi-platform DBArtisan XE
Embarcadero Technologies introduced DBArtisan XE, a solution that lets DBAs maximize the performance
of their databases regardless of type. DBArtisan XE helps database administrators manage and optimize the
schema, security, performance, and availability of all their databases to diagnose and resolve database issues.
DBArtisan XE is also the first Embarcadero product to include Embarcadero ToolCloud as a standard feature.
ToolCloud provides centralized licensing and provisioning, plus on-demand tool deployment, to improve tool
manageability for IT organizations with multiple DBArtisan users. DBArtisan starts at $1,100 for five server
connections. To learn more or download a free version, visit www.embarcadero.com.
DBMoto 7 Adds Multi-server Synchronization
HiT Software has released version 7 of DBMoto, the
company’s change data capture software. DBMoto 7
offers multi-server synchronization, letting organiza-
tions keep multiple operational databases synchronized
across their environment, including special algorithms for
multi-server conflict resolution. Other features include
enhanced support for remote administration, new gran-
ular security options, support for XML data types, and
new transaction log support for Netezza data warehouse
appliances. To learn more or download a free trial, visit
www.hitsw.com.

TheBA
ACKPage
SQL Server 2008 LOB
Data Types
Michael Otey D ealing with large object (LOB) data is one of
the challenges of managing SQL Server installa-
tions. LOBs are usually composed of pictures but they
2008 R2.
deprecated, but it’s still present in SQL Server
type is deprecated
(motey@sqlmag.com) is technical director

for Windows IT Proo and SQL Server can contain other data as well. Typically these images
Magazinee and author of Microsoft SQL Server are used to display product images or other graphical
2008 New Featuress (Osborne/McGraw-Hill). media on a website, and they can be quite large. VARCHAR(MAX) data type supports data up to
SQL Server has many data types that can be used 2^31 –1(2,147,483,647)—2GB. The VARCHAR(MAX)
for different types of LOB storage, but picking the right data type was added with SQL Server 2005 and is
one for LOB storage can be difficult—if you even want current.
to store the LOBs in the database at all. Many DBAs
prefer to keep LOBs out of the database.
The basic rule of thumb is that LOBs smaller than
256KB perform best stored inside a database while LOBs NVARCHAR(MAX) data type supports data up to
larger than 1MB perform best outside the database. 2^30–1(1,073,741,823)—1GB. The NVARCHAR
Storing LOBs outside the database offers perfor- (MAX) dataa type was added with SQL Server 2005
mance advantages, but it also jeopardizes data because and is current.
there’s no built-in mechanism to ensure data integrity.
To help you find the best way to store your LOB data,
it’s necessary to understand the differences between
SQL Server’s LOB data types. mance of accessing LOBs directly from the NTFS
file system with the referential integrity and direct
access through the SQL Server relational database
engine. It can be used for both binary and text
data type can’t be used for binary data. The TEXT data data, and it supports files up to the size of the disk
type supports data up to 2^31–1(2,147,483,647)—2GB. volume.
The TEXT data type is deprecated, but it’s still present The FILESTREAM data type is enabled using a
in SQL Server 2008 R2. combination of SQL Server and database configura-
tion and the VARBINARY(MAX) data type. It was
added with SQL Server 2008 and is current.
You can find more information about this data
the TEXT data type, this data type doesn’t support type in “Using SQL Server 2008 FILESTREAM
binary data. The NTEXT data type supports data Storage” at www.sqlmag.com, InstantDoc ID 101388,
up to 2^30–1(1,073,741,823)—1GB. The NTEXT and “Using SQL Server 2008’s FILESTREAM Data
data type is deprecated but is still present in SQL Type” at www.sqlmag.com, InstantDoc ID 102068.
Server 2008 R2.
type was enhanced in SQL Server 2008. It sup-

data type is the traditional LOB storage type for ports data up to 2^31–1(2,147,438,647) —2GB. It
SQL Server, and you can store both text and binary was added with the SQL Server 2005 release and is
data in it. The IMAGE data type supports data up current.
to 2^31–1(2,147,483,647)—2GB. The IMAGE data InstantDoc ID 125030
SQL Server Magazine, June 2010. Vol. 12, No. 6 (ISSN 1522-2187). SQL Server Magazinee is published monthly by Penton Media, Inc., copyright 2010, all rights reserved. SQL Server is a
registered trademark of Microsoft Corporation, and SQL Server Magazinee is used by Penton Media, Inc., under license from owner. SQL Server Magazinee is an independent publication
a for the editorial policy or other contents of the publication. SQL Server Magazine, 221 E.
not affiliated with Microsoft Corporation. Microsoft Corporation is not responsible in any way
29th St., Loveland, CO 80538, 800-621-1544 or 970-663-4700. Sales and marketing offices: 221 E. 29th St., Loveland, CO 80538. Advertising rates furnished upon request. Periodicals
Class postage paid at Loveland, Colorado, and additional mailing offices. Postmaster: Send address changes to SQL Server Magazine, 221 E. 29th St., Loveland, CO 80538. Subscribers:
Send all inquiries, payments, and address changes to SQL Server Magazine, Circulation Department, 221 E. 29th St., Loveland, CO 80538. Printed in the U.S.A.


SQL Server 2010-06

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

SQL Server 2010-06

Caricato da

Copyright:

Formati disponibili

www.WorldMags.net & www.aDowns.

Find out if SQL Server 2008 R2’s business intelligence

www.WorldMags.net & www.aDowns.net

COLUMNS Key Account Directors

11 Tool Time: Account Executives

www.WorldMags.net & www.aDowns.net

Y our responses to “Is

Do Small Businesses Need a Simpler Two Sides to Every Story

SQL Server Magazine • www.sqlmag.com June 2010 5

Reporting on Non-Existent Data

SQL Server Magazine • www.sqlmag.com June 2010 7

8 June 2010 SQL Server Magazine • www.sqlmag.com

SQL Server Magazine • www.sqlmag.com June 2010 11

What Happens if I Drop a

Changing the Definition of a

SQL Server Magazine • www.sqlmag.com June 2010 13

ALTER TABLE schema.tablename step 2 to re-enable

SQL Server Magazine • www.sqlmag.com June 2010 15

16 June 2010 SQL Server Magazine • www.sqlmag.com

Support for Up to TABLE 1: SQL Server 2008 R2 Editions

SQL Server Magazine • www.sqlmag.com June 2010 17

Master Data Services

18 June 2010 SQL Server Magazine • www.sqlmag.com

Find out more about

SQL Server Magazine • www.sqlmag.com June 2010 19

SQL Server Magazine • www.sqlmag.com June 2010 21

LISTING 1: Script to Create Sample Database and Tables

-- Creating and Populating the Orders Table

INSERT INTO dbo.Nums(n) VALUES(1); CREATE CLUSTERED INDEX idx_cl_od ON dbo.Orders(orderdate);

DELETE FROM dbo.Orders WHERE orderdate SELECT

Assuming that there’s currently no index to sup-

SQL Server Magazine • www.sqlmag.com June 2010 23

If you want to avoid sorting, you need to

A View into Replication Health

SQL Server Magazine • www.sqlmag.com June 2010 27

Transactional replication latency (Threshold: latency).

SQL Server Magazine • www.sqlmag.com June 2010 29

The desktop heap is the third value (512 in this

SQL Server Magazine • www.sqlmag.com June 2010 31

SQL Server Magazine • www.sqlmag.com June 2010 33

resolve to True or False than to get the report pro-

SQL Server Magazine • www.sqlmag.com June 2010 35

Handling More Complex

the subject of a future article. The problem is

SQL Server Magazine • www.sqlmag.com June 2010 37

Getting Started with

SQL Server Magazine • www.sqlmag.com June 2010 39

improves queries’ join performance because the data

SQL Server Magazine • www.sqlmag.com June 2010 41

Querying Tables WHERE DateKey >= 20090401

As you can see, although queries appear

The Power of PDW

Panorama NovaView Suite

SQL Server Magazine • www.sqlmag.com June 2010 43

SQL Server Magazine • www.sqlmag.com June 2010 45

SQL Server Magazine • www.sqlmag.com June 2010 47

(motey@sqlmag.com) is technical director

type was enhanced in SQL Server 2008. It sup-

48 June 2010 SQL Server Magazine • www.sqlmag.com

Potrebbero piacerti anche