Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
20467A
Designing Business Intelligence Solutions
with Microsoft SQL Server 2012
MCT USE ONLY. STUDENT USE PROHIBITED
iiDesigning Business Intelligence Solutions with Microsoft SQL Server 2012
Information in this document, including URL and other Internet Web site references, is subject to change
without notice. Unless otherwise noted, the example companies, organizations, products, domain names,
e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with
any real company, organization, product, domain name, e-mail address, logo, person, place or event is
intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the
user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in
or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical,
photocopying, recording, or otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property
rights covering subject matter in this document. Except as expressly provided in any written license
agreement from Microsoft, the furnishing of this document does not give you any license to these
patents, trademarks, copyrights, or other intellectual property.
The names of manufacturers, products, or URLs are provided for informational purposes only and
Microsoft makes no representations and warranties, either expressed, implied, or statutory, regarding
these manufacturers or the use of the products with any Microsoft technologies. The inclusion of a
manufacturer or product does not imply endorsement of Microsoft of the manufacturer or product. Links
may be provided to third party sites. Such sites are not under the control of Microsoft and Microsoft is not
responsible for the contents of any linked site or any link contained in a linked site, or any changes or
updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission
received from any linked site. Microsoft is providing these links to you only as a convenience, and the
inclusion of any link does not imply endorsement of Microsoft of the site or the products contained
therein.
© 2012 Microsoft Corporation. All rights reserved.
Released: xx/20xx
MCT USE ONLY. STUDENT USE PROHIBITED
MICROSOFT LICENSE TERMS
OFFICIAL MICROSOFT LEARNING PRODUCTS
MICROSOFT OFFICIAL COURSE Pre-Release and Final Release Versions
These license terms are an agreement between Microsoft Corporation and you. Please read them. They apply to
the Licensed Content named above, which includes the media on which you received it, if any. These license
terms also apply to any updates, supplements, internet based services and support services for the Licensed
Content, unless other terms accompany those items. If so, those terms apply.
BY DOWNLOADING OR USING THE LICENSED CONTENT, YOU ACCEPT THESE TERMS. IF YOU DO NOT ACCEPT
THEM, DO NOT DOWNLOAD OR USE THE LICENSED CONTENT.
If you comply with these license terms, you have the rights below.
1. DEFINITIONS.
a. “Authorized Learning Center” means a Microsoft Learning Competency Member, Microsoft IT Academy
Program Member, or such other entity as Microsoft may designate from time to time.
b. “Authorized Training Session” means the Microsoft-authorized instructor-led training class using only
MOC Courses that are conducted by a MCT at or through an Authorized Learning Center.
c. “Classroom Device” means one (1) dedicated, secure computer that you own or control that meets or
exceeds the hardware level specified for the particular MOC Course located at your training facilities or
primary business location.
d. “End User” means an individual who is (i) duly enrolled for an Authorized Training Session or Private
Training Session, (ii) an employee of a MPN Member, or (iii) a Microsoft full-time employee.
e. “Licensed Content” means the MOC Course and any other content accompanying this agreement.
Licensed Content may include (i) Trainer Content, (ii) software, and (iii) associated media.
f. “Microsoft Certified Trainer” or “MCT” means an individual who is (i) engaged to teach a training session
to End Users on behalf of an Authorized Learning Center or MPN Member, (ii) currently certified as a
Microsoft Certified Trainer under the Microsoft Certification Program, and (iii) holds a Microsoft
Certification in the technology that is the subject of the training session.
g. “Microsoft IT Academy Member” means a current, active member of the Microsoft IT Academy
Program.
h. “Microsoft Learning Competency Member” means a Microsoft Partner Network Program Member in
good standing that currently holds the Learning Competency status.
i. “Microsoft Official Course” or “MOC Course” means the Official Microsoft Learning Product instructor-
led courseware that educates IT professionals or developers on Microsoft technologies.
MCT USE ONLY. STUDENT USE PROHIBITED
j. “Microsoft Partner Network Member” or “MPN Member” means a silver or gold-level Microsoft Partner
Network program member in good standing.
k. “Personal Device” means one (1) device, workstation or other digital electronic device that you
personally own or control that meets or exceeds the hardware level specified for the particular MOC
Course.
l. “Private Training Session” means the instructor-led training classes provided by MPN Members for
corporate customers to teach a predefined learning objective. These classes are not advertised or
promoted to the general public and class attendance is restricted to individuals employed by or
contracted by the corporate customer.
m. “Trainer Content” means the trainer version of the MOC Course and additional content designated
solely for trainers to use to teach a training session using a MOC Course. Trainer Content may include
Microsoft PowerPoint presentations, instructor notes, lab setup guide, demonstration guides, beta
feedback form and trainer preparation guide for the MOC Course. To clarify, Trainer Content does not
include virtual hard disks or virtual machines.
2. INSTALLATION AND USE RIGHTS. The Licensed Content is licensed not sold. The Licensed Content is
licensed on a one copy per user basis, such that you must acquire a license for each individual that
accesses or uses the Licensed Content.
2.1 Below are four separate sets of installation and use rights. Only one set of rights apply to you.
ii. Use of Instructional Components in Trainer Content. You may customize, in accordance with the
most recent version of the MCT Agreement, those portions of the Trainer Content that are logically
associated with instruction of a training session. If you elect to exercise the foregoing rights, you
agree: (a) that any of these customizations will only be used for providing a training session, (b) any
customizations will comply with the terms and conditions for Modified Training Sessions and
Supplemental Materials in the most recent version of the MCT agreement and with this agreement.
For clarity, any use of “customize” refers only to changing the order of slides and content, and/or
not using all the slides or content, it does not mean changing or modifying any slide or content.
2.2 Separation of Components. The Licensed Content components are licensed as a single unit and you
may not separate the components and install them on different devices.
2.4 Third Party Programs. The Licensed Content may contain third party programs or services. These
license terms will apply to your use of those third party programs or services, unless other terms accompany
those programs and services.
2.5 Additional Terms. Some Licensed Content may contain components with additional terms,
conditions, and licenses regarding its use. Any non-conflicting terms in those conditions and licenses also
apply to that respective component and supplements the terms described in this Agreement.
3. PRE-RELEASE VERSIONS. If the Licensed Content is a pre-release (“beta”) version, in addition to the other
provisions in this agreement, then these terms also apply:
a. Pre-Release Licensed Content. This Licensed Content is a pre-release version. It may not contain the
same information and/or work the way a final version of the Licensed Content will. We may change it
for the final version. We also may not release a final version. Microsoft is under no obligation to
provide you with any further content, including the final release version of the Licensed Content.
b. Feedback. If you agree to give feedback about the Licensed Content to Microsoft, either directly or
through its third party designee, you give to Microsoft without charge, the right to use, share and
commercialize your feedback in any way and for any purpose. You also give to third parties, without
charge, any patent rights needed for their products, technologies and services to use or interface with
any specific parts of a Microsoft software, Microsoft product, or service that includes the feedback. You
will not give feedback that is subject to a license that requires Microsoft to license its software,
technologies, or products to third parties because we include your feedback in them. These rights
MCT USE ONLY. STUDENT USE PROHIBITED
survive this agreement.
c. Term. If you are an Authorized Training Center, MCT or MPN, you agree to cease using all copies of the
beta version of the Licensed Content upon (i) the date which Microsoft informs you is the end date for
using the beta version, or (ii) sixty (60) days after the commercial release of the Licensed Content,
whichever is earliest (“beta term”). Upon expiration or termination of the beta term, you will
irretrievably delete and destroy all copies of same in the possession or under your control.
4. INTERNET-BASED SERVICES. Microsoft may provide Internet-based services with the Licensed Content,
which may change or be canceled at any time.
a. Consent for Internet-Based Services. The Licensed Content may connect to computer systems over an
Internet-based wireless network. In some cases, you will not receive a separate notice when they
connect. Using the Licensed Content operates as your consent to the transmission of standard device
information (including but not limited to technical information about your device, system and
application software, and peripherals) for internet-based services.
b. Misuse of Internet-based Services. You may not use any Internet-based service in any way that could
harm it or impair anyone else’s use of it. You may not use the service to try to gain unauthorized access
to any service, data, account or network by any means.
5. SCOPE OF LICENSE. The Licensed Content is licensed, not sold. This agreement only gives you some rights
to use the Licensed Content. Microsoft reserves all other rights. Unless applicable law gives you more
rights despite this limitation, you may use the Licensed Content only as expressly permitted in this
agreement. In doing so, you must comply with any technical limitations in the Licensed Content that only
allows you to use it in certain ways. Except as expressly permitted in this agreement, you may not:
• install more copies of the Licensed Content on devices than the number of licenses you acquired;
• allow more individuals to access the Licensed Content than the number of licenses you acquired;
• publicly display, or make the Licensed Content available for others to access or use;
• install, sell, publish, transmit, encumber, pledge, lend, copy, adapt, link to, post, rent, lease or lend,
make available or distribute the Licensed Content to any third party, except as expressly permitted
by this Agreement.
• reverse engineer, decompile, remove or otherwise thwart any protections or disassemble the
Licensed Content except and only to the extent that applicable law expressly permits, despite this
limitation;
• access or use any Licensed Content for which you are not providing a training session to End Users
using the Licensed Content;
• access or use any Licensed Content that you have not been authorized by Microsoft to access and
use; or
• transfer the Licensed Content, in whole or in part, or assign this agreement to any third party.
6. RESERVATION OF RIGHTS AND OWNERSHIP. Microsoft reserves all rights not expressly granted to you in
this agreement. The Licensed Content is protected by copyright and other intellectual property laws and
treaties. Microsoft or its suppliers own the title, copyright, and other intellectual property rights in the
Licensed Content. You may not remove or obscure any copyright, trademark or patent notices that
appear on the Licensed Content or any components thereof, as delivered to you.
MCT USE ONLY. STUDENT USE PROHIBITED
7. EXPORT RESTRICTIONS. The Licensed Content is subject to United States export laws and regulations. You
must comply with all domestic and international export laws and regulations that apply to the Licensed
Content. These laws include restrictions on destinations, End Users and end use. For additional
information, see www.microsoft.com/exporting.
8. LIMITATIONS ON SALE, RENTAL, ETC. AND CERTAIN ASSIGNMENTS. You may not sell, rent, lease, lend or
sublicense the Licensed Content or any portion thereof, or transfer or assign this agreement.
9. SUPPORT SERVICES. Because the Licensed Content is “as is”, we may not provide support services for it.
10. TERMINATION. Without prejudice to any other rights, Microsoft may terminate this agreement if you fail
to comply with the terms and conditions of this agreement. Upon any termination of this agreement, you
agree to immediately stop all use of and to irretrievable delete and destroy all copies of the Licensed
Content in your possession or under your control.
11. LINKS TO THIRD PARTY SITES. You may link to third party sites through the use of the Licensed Content.
The third party sites are not under the control of Microsoft, and Microsoft is not responsible for the
contents of any third party sites, any links contained in third party sites, or any changes or updates to third
party sites. Microsoft is not responsible for webcasting or any other form of transmission received from
any third party sites. Microsoft is providing these links to third party sites to you only as a convenience,
and the inclusion of any link does not imply an endorsement by Microsoft of the third party site.
12. ENTIRE AGREEMENT. This agreement, and the terms for supplements, updates and support services are
the entire agreement for the Licensed Content.
b. Outside the United States. If you acquired the Licensed Content in any other country, the laws of that
country apply.
14. LEGAL EFFECT. This agreement describes certain legal rights. You may have other rights under the laws of
your country. You may also have rights with respect to the party from whom you acquired the Licensed
Content. This agreement does not change your rights under the laws of your country if the laws of your
country do not permit it to do so.
15. DISCLAIMER OF WARRANTY. THE LICENSED CONTENT IS LICENSED "AS-IS," "WITH ALL FAULTS," AND "AS
AVAILABLE." YOU BEAR THE RISK OF USING IT. MICROSOFT CORPORATION AND ITS RESPECTIVE
AFFILIATES GIVE NO EXPRESS WARRANTIES, GUARANTEES, OR CONDITIONS UNDER OR IN RELATION TO
THE LICENSED CONTENT. YOU MAY HAVE ADDITIONAL CONSUMER RIGHTS UNDER YOUR LOCAL LAWS
WHICH THIS AGREEMENT CANNOT CHANGE. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAWS,
MICROSOFT CORPORATION AND ITS RESPECTIVE AFFILIATES EXCLUDE ANY IMPLIED WARRANTIES OR
CONDITIONS, INCLUDING THOSE OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NON-INFRINGEMENT.
MCT USE ONLY. STUDENT USE PROHIBITED
16. LIMITATION ON AND EXCLUSION OF REMEDIES AND DAMAGES. TO THE EXTENT NOT PROHIBITED BY
LAW, YOU CAN RECOVER FROM MICROSOFT CORPORATION AND ITS SUPPLIERS ONLY DIRECT
DAMAGES UP TO USD$5.00. YOU AGREE NOT TO SEEK TO RECOVER ANY OTHER DAMAGES, INCLUDING
CONSEQUENTIAL, LOST PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES FROM MICROSOFT
CORPORATION AND ITS RESPECTIVE SUPPLIERS.
It also applies even if Microsoft knew or should have known about the possibility of the damages. The
above limitation or exclusion may not apply to you because your country may not allow the exclusion or
limitation of incidental, consequential or other damages.
Please note: As this Licensed Content is distributed in Quebec, Canada, some of the clauses in this agreement
are provided below in French.
Remarque : Ce le contenu sous licence étant distribué au Québec, Canada, certaines des clauses dans ce
contrat sont fournies ci-dessous en français.
EXONÉRATION DE GARANTIE. Le contenu sous licence visé par une licence est offert « tel quel ». Toute
utilisation de ce contenu sous licence est à votre seule risque et péril. Microsoft n’accorde aucune autre garantie
expresse. Vous pouvez bénéficier de droits additionnels en vertu du droit local sur la protection dues
consommateurs, que ce contrat ne peut modifier. La ou elles sont permises par le droit locale, les garanties
implicites de qualité marchande, d’adéquation à un usage particulier et d’absence de contrefaçon sont exclues.
Elle s’applique également, même si Microsoft connaissait ou devrait connaître l’éventualité d’un tel dommage.
Si votre pays n’autorise pas l’exclusion ou la limitation de responsabilité pour les dommages indirects,
accessoires ou de quelque nature que ce soit, il se peut que la limitation ou l’exclusion ci-dessus ne s’appliquera
pas à votre égard.
EFFET JURIDIQUE. Le présent contrat décrit certains droits juridiques. Vous pourriez avoir d’autres droits prévus
par les lois de votre pays. Le présent contrat ne modifie pas les droits que vous confèrent les lois de votre pays
si celles-ci ne le permettent pas.
Acknowledgments
Microsoft Learning wants to acknowledge and thank the following for their contribution toward
developing this title. Their effort at various stages in the development has ensured that you have a good
classroom experience.
Contents
Module 1: Planning a BI Solution
Lesson 1: Elements of a BI Solution page 2
Lesson 2: The Microsoft BI Platform page 10
Lesson 3: Planning a BI Project page 17
Lab: Planning a BI Solution page 24
Course Description
Note: This first release (‘A’) MOC version of course 20467A has been developed on pre-release
software. Microsoft Learning will release a ‘B’ version of this course with enhanced PowerPoint
slides, copy-edited content, and Course Companion content on Microsoft Learning site.
This training course teaches database and business intelligence (BI) professionals how to plan and design
a BI solution that is based on Microsoft SQL Server 2012 and other Microsoft BI technologies.
Audience
This course is not designed for students who are new to SQL Server 2012 BI technologies; it is targeted at
BI professionals with experience of implementing solutions with SQL Server database Engine, SQL Server
Integration Services, SQL Server Analysis Services, and SQL Server Reporting Services.
Student Prerequisites
In addition to their professional experience, students who attend this training should already have the
following technical knowledge:
The ability to create Integration Services packages that include control flows and data flows
The ability to create a basic tabular model with PowerPivot and Analysis Services
The ability to create Reporting Services reports with Report Designer
The ability to implement authentication and permissions in the SQL Server database engine,
Analysis Services, and Reporting Services
Familiarity with SharePoint Server and Microsoft Office applications – particularly Excel
Course Objectives
After completing this course, students will be able to:
Plan a BI solution.
Operate a BI solution.
Course Outline
The course outline is as follows:
Course Materials
The following materials are included with your kit:
Course Handbook: a succinct classroom learning guide that provides the critical technical
information in a crisp, tightly-focused format, which is essential for an effective in-class learning
experience.
Lessons: guide you through the learning objectives and provide the key points that are critical to
the success of the in-class learning experience.
Labs: provide a real-world, hands-on platform for you to apply the knowledge and skills learned
in the module.
Module Reviews and Takeaways: provide on-the-job reference material to boost knowledge
and skills retention.
Modules: include companion content, such as questions and answers, detailed demo steps and
additional reading links, for each lesson. Additionally, they include Lab Review questions and
answers and Module Reviews and Takeaways sections, which contain the review questions and
answers, best practices, common issues and troubleshooting tips with answers, and real-world
issues and scenarios with answers.
MCT USE ONLY. STUDENT USE PROHIBITED
About This Course iii
Resources: include well-categorized additional resources that give you immediate access to the
most current premium content on TechNet, MSDN®, or Microsoft® Press®.
Note: For this version of the Courseware on Prerelease Software, Companion Content is not
available. However, the Companion Content will be published when the next (B) version of this
course is released, and students who have taken this course will be able to download the
Companion Content at that time from the
http://www.microsoft.com/learning/companionmoc site. Please check with your instructor
when the ‘B’ version of this course is scheduled to release to learn when you can access
Companion Content for this course.
Student Course files: includes the Allfiles.exe, a self-extracting executable file that contains all
required files for the labs and demonstrations.
Note: For this version of the Courseware on Prerelease Software, Allfiles.exe file is not available.
However, this file will be published when the next (B) version of this course is released, and students
who have taken this course will be able to download the Allfiles.exe at that time from the
http://www.microsoft.com/learning/companionmoc site.
Course evaluation: at the end of the course, you will have the opportunity to complete an online
evaluation to provide feedback on the course, training facility, and instructor.
Important: At the end of each lab, you must close the virtual machine and must not save
any changes. To close a virtual machine (VM) without saving the changes, perform the
following steps:
2. In the Close dialog box, in the What do you want the virtual machine to do? list, click
Turn off and delete changes, and then click OK.
The following table shows the role of each virtual machine that is used in this course:
Software Configuration
The following software is installed:
Microsoft Windows Server 2012
Course Files
The files associated with the labs in this course are located in the D:\Labfiles folder on the 20467A-MIA-
SQLBI virtual machine.
Classroom Setup
Each classroom computer will have the same virtual machines configured in the same way.
Hardware Level 6+
Module 1
Planning a BI Solution
Contents:
Module Overview 1-1
Module Overview
Business Intelligence (BI) is an increasingly important IT service in many businesses. In the past, BI
solutions were primarily the preserve of large corporations; but as data storage, analytical, and reporting
technologies become more affordable, many small and medium-sized organizations are able to take
advantage of BI solutions.
As a SQL Server database professional, you may be required to participate in, or perhaps even lead, a
project with the aim of implementing an effective BI solution. Therefore, it is important that you have a
good understanding of the various elements that comprise a BI solution, the business and IT personnel
typically involved in a BI project, and the Microsoft products that you can use to implement the solution.
Objectives
After completing this module, you will be able to:
Lesson 1
Elements of a BI Solution
Although there’s no single definitive template for a BI solution, there are some common elements that are
typical across most BI implementations. Being familiar with these common elements will help you identify
the key components required for your specific BI solution.
Lesson Objectives
After completing this lesson, you will be able to:
Describe the role played by an extract, transform, and load (ETL) process in a BI solution.
Describe the role played by analytical models in a BI solution.
Overview of a BI Solution
Fundamentally, all BI solutions are designed to
take data generated by business operations,
structure it into an appropriate format for
consistent analysis and reporting, and use the
information gained by examining the data to
improve business performance. No two BI
solutions are identical, but most include the
following elements:
Business data sources. The data that will
ultimately provide the basis for business
decision making through the BI solution
usually resides in existing business
applications or external data sources (which may be commercially available data sets or data exposed
by business partner organizations).
A data warehouse. To make it easier to analyze and report on the business as a whole, the business
data is typically consolidated into a data warehouse. Depending on the size of the organization, and
the specific BI methodology adopted, this may be a single, central database that is optimized for
analytical queries; or a distributed collection of data marts, each pertaining to a specific area of the
business.
Extract, transform, and load (ETL) processes. To get the business data from the data sources into
the data warehouse, an ETL process periodically extracts data from the source systems, transforms the
structure and content of the data to conform to the data warehouse schema, and loads it into the
data warehouse. ETL processes are often implemented within a wider enterprise integration
management (EIM) framework that ensures the integrity of data across multiple systems through
master data management (MDM) and data cleansing.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 1-3
Analytical data models. The data warehouse schema is usually optimized for analytical querying and
in some cases you may decide to perform all analysis and reporting directly from the data warehouse
itself. However, it is common to build analytical data models on top of the data warehouse to abstract
the underlying data tables, add custom analytical values such as key performance indicators, and
aggregate the data for faster analytical processing.
Reporting. Most BI solutions include a reporting element that enables business users to view reports
containing business information. Most reporting solutions provide a set of standard business reports
that are generated on a regular basis, and some also empower users to perform self-service reporting
in order to generate their own custom reports. Reports can be created directly from the data
warehouse or from the analytical data models built on it, depending on your specific business
requirements and constraints.
Analytical Information Worker Tools. In addition to reports, most BI solutions deliver analytical
information to business users through information worker tools. These tools might be locally installed
applications, such as Microsoft Excel; or interactive dashboards in web-based applications, such as
Microsoft SharePoint Server.
Business data sources for a BI solution typically include some or all of the following:
Application databases, often implemented as relational databases in systems such as SQL Server,
Oracle, or Microsoft Access.
Proprietary data stores, such as those used by many commercial financial accounting applications.
Sensor readings emitted by plant machinery, which may be captured as a data stream using
technologies such as Microsoft SQL Server StreamInsight.
Master data hubs that contain definitive data values for core business entities.
One of the first tasks in any BI project is to audit the available data sources and try to identify:
The volume of data currently stored and being generated by ongoing operations.
The data types and range of values for important business data fields.
Business-specific values used to indicate key information (for example, a POS system may use numeric
codes to indicate payment types, such as 0 for cash, 1 for credit, and so on).
Common errors, reliability issues, and missing or null values in the data.
Technologies that can be used to extract the source data to a staging database.
An alternative data warehouse design, popularized by Bill Inmon, is the corporate information factory
(CIF) model. In the CIF model, the enterprise data warehouse stores the business data in a normalized
relational schema. This is then used to feed departmental data marts, in which specific subsets of the data
are exposed in a star schema. The dependency of the data marts on a central EDW leads many to refer to
the Inmon methodology as a top-down approach.
Common Implementations
Although the Kimball and Inmon methodologies in their pure form are designed for BI solutions that
distribute the data across multiple departmental data marts, it is common for organizations to begin with
a Kimball-style data mart for a subset of the business that expands over time into a single, central data
warehouse database for the entire enterprise. The availability of inexpensive storage and the increasing
power of server hardware mean that a single data warehouse can support a huge volume of data and
heavy user workloads.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 1-5
In very large enterprises, a federated approach is often used in which a hub-and-spoke architecture
synchronizes departmental data marts with a central enterprise data warehouse.
Note: SQL Server can be used to support both Kimball and Inmon style data warehouse
solutions. In response to the more common use of the Kimball methodology, the SQL Server
database engine has been designed to optimize star-join queries and most documentation about
data warehouse implementation in SQL Server assumes a dimensional model rather than a
normalized EDW. In deference to these facts, this course focuses on a Kimball style data
warehouse. However, you should investigate the details of both approaches and consider which
best suits your specific business requirements and constraints.
Another consideration for ETL is the logging strategy that you will use to record ETL activity provide
troubleshooting information in the event of a failure somewhere in the ETL process.
Data cleansing and matching capabilities provided by Data Quality Services (DQS).
Master data management (MDM) capabilities provided by Master Data Services (MDS).
system from a centrally managed catalog. To accomplish this, you can use SSIS or other synchronization
technologies such as SQL Server replication. When planning a BI solution in environments where data is
transferred between source systems, it is important to understand the lineage of the data and to be aware
of the schedule on which these data transfers occur.
The data model abstracts the underlying data warehouse tables, which enables you to create models
that reflect how business users perceive the business entities and measures regardless of the data
warehouse table schema. If necessary, you can modify or expand the underlying data warehouse
without affecting the data model used by business users to perform analysis.
Because the data model reflects the users’ view of the business, data analysis is easier for information
workers with little or no understanding of database schema design. You can use meaningful names
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 1-7
for tables and fields and define hierarchies based on attributes in dimension tables that make the
data more intuitive for business users.
You can add custom logic to a data model that adds business value when analyzing the data. For
example, you can define key performance indicators (KPIs) that make it easier to compare actual
business measures with targets.
Although the SQL Server database engine can provide extremely high query performance, a data
warehouse typically contains a massive volume of data. Because most analysis involves aggregating
measures across multiple dimensions, the processing overhead for complex queries can result in
unacceptable response times―especially when many users access the data concurrently. A data
model typically pre-aggregates the data, which provides vastly superior performance for analytical
queries.
Data models are a common feature in BI solutions, and a number of standards have been established.
By creating a data model, you can expose your analytical data through a standard interface to be
consumed by client applications, such as Microsoft Excel or third-party analytical tools.
Multidimensional data models. Multidimensional data models have been supported in every
version of SQL Server Analysis Services since the release of SQL Server 7.0. You can use a
multidimensional data model to create an Analysis Services database that contains one or more
cubes, each of which provides aggregations of measures in measure groups across multiple
dimensions.
Tabular data models. Tabular data models were first introduced with PowerPivot in SQL Server 2008
R2, and they are enhanced in SQL Server 2012. From the point of view of a user performing analysis,
tabular model provide similar functionality to a multidimensional model (in fact, in many cases, the
two models are indistinguishable from one another). For BI developers, tabular models do not require
as much online analytical processing (OLAP) modeling knowledge as multidimensional models,
because they are based on relationships between multiple tables of data.
Note: Multidimensional and tabular models are compared in more detail in Module 5:
Designing Analytical Data Models.
Reporting
Reporting is the communication of information
gained from BI. Most organizations rely on reports
to summarize business performance and activities.
Consequently, most BI solutions include a
reporting element that generates these reports.
Typical reports include financial and management
reports (such as cash flow, profit and loss, balance
sheet, open orders, and other accounts-based
reports), and other reports, depending on the nature of the business (for example, a retail business might
MCT USE ONLY. STUDENT USE PROHIBITED
1-8 Planning a BI Solution
require stock inventory reports, whereas a technical support call center might require a report that shows
call log data).
In some scenarios, users might need to view reports interactively in a web browser or custom application;
whereas in others, the reports might be required to be sent as email attachments in specific formats (such
as Excel workbooks or Word documents). In many cases, the reports might need to be printed (for
example, to send a physical report to customers or shareholders). When planning a reporting solution,
you must take into consideration the reports that are required, the audiences for those reports, and how
they will be delivered.
Regardless of the specific reports that are required, or how they will be distributed and consumed, there
are two common approaches to report generation in most BI solutions:
IT-provided reports. Traditionally, standard business reports are created by a specialist report
developer and automatically generated with current data as requested or on a regular basis. Although
the reports themselves may be developed by a business user with report development skills, the
reports themselves are generally supported by IT and delivered though the organization’s reporting
infrastructure.
Self-service reporting. As business users have become more technically proficient and report
authoring tools have become easier to use, many organizations supplement standard reports with the
ability for users to create their own reports with no intervention from IT. For self-service reporting to
be effective, some initial work needs to be done to design and implement a suitable reporting
infrastructure; but after that is in place, the users can benefit from the ability to customize the reports
they use without placing an additional burden on the IT department.
Analysis
Analysis is the interpretation of business data delivered by the BI solution. For some business users,
notably business analysis, performing analysis is a discrete activity that involves using specialist analytical
tools to examine data in analytical models. For others, analysis is simply a part of everyday work and takes
the form of using reports or dashboards as a basis for business decision making.
In general, when planning a BI solution, you should consider the following kinds of analytical
requirements:
Interactive analysis. Some BI solutions must support interactive “slice and dice” analysis in business
tools such as Microsoft Excel or specialist data analysis tools. The resulting information can then be
published as a report.
Data mining. Most analysis and reporting concerns historical data, but a BI solution can also support
predictive analysis by using that historical data to determine trends and patterns.
Data Sources
While you can access data for analysis and generate reports from virtually any data source; but in a BI
solution, reports are commonly based on one of the following data sources:
Analytical data models. If you have created analytical data models in your BI solution, you can use
them as a source for analysis and reports. This approach enables you to take advantage of the
benefits of data models in your reporting solution as described in the previous topic.
The data warehouse. You can create analytical reports directly from the data warehouse (or a
departmental data mart). This enables you to express queries in Transact-SQL which may be a more
familiar to a report developer than a data modeling query language (such as MDX or DAX).
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 1-9
Note: Considerations for designing a reporting solution are discussed in more depth later
in this course.
MCT USE ONLY. STUDENT USE PROHIBITED
1-10 Planning a BI Solution
Lesson 2
The Microsoft BI Platform
Microsoft products are used to provide the IT infrastructure for most of the organizations in the world.
Therefore, it makes sense for many of these organizations to consider using the Microsoft platform for BI
and benefiting from the close integration and common infrastructure capabilities of the various products
that can be used to deliver a BI solution.
As a Microsoft BI professional, you need to know which products can be used to implement the various
elements of a BI solution, and how those products can be integrated to work together.
Lesson Objectives
After completing this lesson, you will be able to:
Windows Server
Microsoft Windows Server 2012 is the foundation
for a Microsoft-based enterprise solution and
provides a number of core infrastructure services,
including:
Failover Clustering.
Virtualization.
Windows Server 2012 Datacenter. This edition provides all features of Windows Server and is
optimized for highly virtualized environments.
Windows Server 2012 Standard. This edition provides all features of Windows Server and is
designed for physical or minimally virtualized environments.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 1-11
Windows Server 2012 Essentials. This edition is designed for small business with up to 25 users and
50 client devices.
Windows Server 2012 Foundation. This edition is designed for environments with up to 15 users.
SQL Server
Microsoft SQL Server 2012 provides the core data
services for a BI solution. These services include:
Master Data Services (MDS), which provides master data management capabilities.
SQL Server Analysis Services (SSAS), which provides a storage and query processing engine for
multidimensional and tabular data models.
SQL Server Reporting Services (SSRS), which provides a platform for publishing and delivering reports
that users can consume through a native web-based interface or have delivered by way of
subscriptions.
SQL Server 2012 Enterprise. You should use this edition for data warehouses and BI solutions that
require advanced SSIS features, such as fuzzy logic and change data capture (CDC) components.
SQL Server 2012 Business Intelligence. You should use this edition for servers hosting SSIS, DQS,
and MDS. You should also use this edition for SSRS and SSAS solutions that require more than 16
processor cores or if you need to support tabular data models, PowerPivot for SharePoint, Power
View for SharePoint, or advanced data mining.
SQL Server 2012 Standard. You can use this edition for solutions that require basic SSRS reporting,
SSAS multidimensional models, and basic data mining.
Note: SQL Server 2012 is also available in Web and Express editions, but these are generally
not appropriate for BI solutions. A special edition of SQL Server named Parallel Data Warehouse
provides support for massively parallel processing (MPP) data warehouse solutions, but this
edition is only available pre-installed on an enterprise data warehouse appliance from selected
Microsoft hardware partners.
MCT USE ONLY. STUDENT USE PROHIBITED
1-12 Planning a BI Solution
SharePoint Server
Microsoft SharePoint Server 2013 provides
enterprise information sharing services through
collaborative websites. SharePoint Server provides
the following BI capabilities:
Integration with SSRS. You can deliver and manage reports and data alerts through SharePoint
document libraries instead of the native Report Manager interface provided with SSRS.
Power View. Power View is an interactive data visualization technology through which users can
graphically explore a tabular data model in a web browser.
PerformancePoint Services. PerformancePoint Services enables BI developers to create dashboards
and scorecards that deliver KPIs and reports through a SharePoint site.
Office Applications
Microsoft Office 2013 provides productivity
applications that business users can use to
consume and interact with BI data. These
applications include:
o Create PowerPivot workbooks that contain tabular data models without requiring SSAS.
o Create Power View visualizations from tabular models in the workbook or external tabular
models.
Microsoft Word. Word is a document authoring tool. In a BI scenario, users can export SSRS reports
in Word format and use Word’s editing and reviewing tools to enhance them.
Microsoft PowerPoint. PowerPoint is a widely used presentation tool. Users can save Power View
visualizations as PowerPoint presentations, and present business data in a dynamic, interactive format.
Microsoft Visio. Visio is a diagramming tool that can be used to visualize data mining analyses.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 1-13
Verify that the hardware and edition of Windows Server you plan to use are adequate for SQL Server
2012. SQL Server product documentation (Books Online) includes details of minimum hardware and
software requirements for each SQL Server component.
Verify that upgrading is possible from your current installation, or plan to upgrade by installing a new
installation of SQL Server and migrating databases, SSIS packages, reports, and other objects. You can
upgrade 32-bit installations of previous versions of SQL Server to SQL Server 2012 on the 32-bit
subsystem (WOW64) of a 64-bit server, and 64-bit installations of previous versions must be
upgraded to SQL Server 2012 64-bit. You can upgrade from the following previous versions of SQL
Server (Books Online includes a table showing specific edition upgrade paths):
Use Upgrade Advisor to prepare for upgrades. Upgrade Advisor is a tool provided on the SQL Server
2012 installation media that you can use to analyze an existing SQL Server installation and identify
any issues that could potentially prevent a successful upgrade.
Upgrading SSIS does not replace the previous instance of the SSIS service or tools. However, after
upgrading, you cannot use the old version of the tools to create, manage, or run SSIS packages. If you
have upgraded SSIS and want to use a command line utility such as DTExec.exe, you should enter the
full path to the required version of the utility.
Upgrading SSIS does not upgrade existing packages to the new format used in SQL Server 2012. You
should upgrade these packages by using the SSIS Package Upgrade Wizard.
Support for Data Transformation Services (DTS) packages created in SQL Server 2000 has been
discontinued in SQL Server 2012. If you have existing DTS packages, you must migrate them to SQL
Server 2005, 2008, or 2008 R2 Integration Services packages before they can be migrated to the SQL
Server 2012 Integration Services format. If this is not possible, you must recreate your packages in
SQL Server 2012 Integration Services after upgrading.
Support for ActiveX script components in SSIS packages has been discontinued in SQL Server 2012.
ActiveX scripts in existing packages can be upgraded to Visual Studio Tools for Applications (VSTA) by
using the SSIS Package Upgrade Wizard.
1. Upgrade the existing SQL Server 2008 R2 database engine instance to SQL Server 2012, or install a
new instance of SQL Server 2012. If you install a new instance, you can move the existing Master Data
Services database to the new instance or continue to host it in the SQL Server 2008 R2 instance.
2. Add the Master Data Services feature to the SQL Server 2012 instance.
3. Use SQL Server 2012 Master Data Services Configuration Manager to upgrade the existing Master
Data Services database to the new version of the schema.
4. Use SQL Server 2012 Master Data Services Configuration Manager to create a new Master Data
services web application and associate it with the upgraded database.
Similar to other components of SQL Server, you can perform an in-place upgrade, or you can install a
new instance and migrate existing reports, data sets, report parts, and data sources by attaching the
existing report server database to the new server.
Back up the report server encryption key before upgrading, and restore it to the new instance if you
are upgrading by installing a new instance.
You cannot perform an in-place upgrade that changes the installation mode. For example, you
cannot use SQL Server Setup to upgrade SQL Server 2008 Reporting Services in native mode to SQL
Server 2012 Reporting Services in SharePoint Integrated mode.
The format used for reports was updated in SQL Server 2008 R2. The compiled versions of reports are
automatically updated the first time they are run on an upgraded report server. The source report
definition language (RDL) files are not upgraded.
You can perform an in-place upgrade from SQL Server 2008 R2 Reporting Services integrated with a
SharePoint Server 2010 farm to SQL Server 2012 or SQL Server 2012 SP1 Reporting Services with no
downtime.
You can perform an in-place upgrade from SQL Server 2005 SP4 or 2008 SP2 Reporting Services
integrated with a SharePoint Server 2007 farm to SQL Server 2012 or SQL Server 2012 SP1 Reporting
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 1-15
Services with a SharePoint Server 2010 farm, but downtime is required because both SQL Server and
SharePoint must be upgraded. You should consider performing a new installation of both products
and migrating content and reports.
SQL Server 2012 SP1 is required for integration with a SharePoint Server 2013 farm. In-place upgrade
from previous versions of SQL Server and SharePoint is not supported, but you can install a new
SharePoint Server 2013 farm, migrate the SharePoint content and configuration databases to the new
farm, install SQL Server 2012 Reporting Services in SharePoint integrated mode, and migrate the
existing reporting services objects.
Upgrading PowerPivot
If you have an existing SQL Server 2008 R2 version of PowerPivot installed in a SharePoint Server 2010
farm, consider the following guidelines for upgrading it:
You must apply SQL Server 2008 R2 SP1 to the PowerPivot instance of SQL Server Analysis Services
before upgrading.
You must apply SharePoint Server 2010 SP1 and the SharePoint Server 2010 August 2010 cumulative
update or later to all SharePoint servers in the farm before upgrading.
Use SQL Server 2012 Setup to upgrade the PowerPivot instance of Analysis Services.
Use the PowerPivot Configuration Tool or PowerShell cmdlets to upgrade the solutions and websites
in the farm.
Remove the SQL Server 2008 R2 version of the PowerPivot add-in from all installations of Microsoft
Excel 2007 or Excel 2010 and replace it with the SQL Server 2012 version of the add-in. By default,
Excel 2013 includes the SQL Server 2012 PowerPivot add-in.
When planning a SQL Server-based BI solution, you should consider the following potential benefits of
using appliances:
MCT USE ONLY. STUDENT USE PROHIBITED
1-16 Planning a BI Solution
Massive scalability of enterprise data warehouses that goes beyond what is possible to achieve with a
symmetric multi-processing (SMP) architecture.
Rapid time to solution compared to designing and creating a custom server build.
Pre-tested and optimized hardware and software configurations that are specifically designed for BI
workloads.
Lesson 3
Planning a BI Project
Statistics show that a surprisingly high number of BI projects in organizations throughout the world fail.
Often projects are abandoned before completion, fail to deliver all of the originally specified deliverables,
or simply do not deliver a solution that adds value to the business. In many cases, the fundamental cause
of failure is that the project was insufficiently envisioned or that key stakeholders were not included in the
planning.
Careful planning can help to ensure that a BI project runs smoothly with a successful outcome. By
applying some common best practices, you can increase the likelihood that your BI project will not be
added to the long list of BI project failures.
Lesson Objectives
After completing this lesson, you will be able to:
BI Project Overview
There are numerous frameworks for planning and
managing IT projects, and many organizations
have a policy to use a specific approach when
implementing a new solution. Whichever
approach you use, a BI project must start with the
business requirements and use these to inform the
design of the overall technical architecture, the
data warehouse and ETL, and the reporting and
analysis that the solution will provide.
Business Requirements
The most important thing to consider when
planning a BI project is that the core purpose of
the project is to improve the business. More than any other type of IT project; BI projects are closely and
inseparably bound to business processes and goals. Most IT projects require a deep understanding of
technology, but in a BI project you must also have detailed knowledge of how various business processes
work and interact with one another, and what the commercial aims of the business are.
Understanding the overall structure, processes, and goals of the business makes it easier to gather,
interpret, and prioritize the business requirements for the BI solution. Typically, BI requirements are
fundamentally about being able to quantify core business metrics across various aspects of the business in
order to measure business performance and inform business decisions. For example, a requirement might
be that the solution enables sales managers to see monthly sales revenue by salesperson in order to
reward success and identify employees that need additional support or motivation. Or another
MCT USE ONLY. STUDENT USE PROHIBITED
1-18 Planning a BI Solution
requirement might be to view quarterly order amounts by product line in order to plan more efficient
manufacturing based on demand trends. Only after you have identified the specific business requirements
for your BI solution can you start considering the design of the infrastructure, data warehouse and ETL
solution, and analytical reports.
After selecting the technologies you intend to use, you can start to design the infrastructure for the BI
solution, including server hardware and configuration, security, and high availability considerations.
Note: Data warehouse design is discussed in Module 3: Designing a Data Warehouse. ETL
design is discussed in Module 4: Designing an ETL Solution.
Note: Considerations for designing analytical data models are discussed in Module 5:
Designing Analytical Data Models. Solutions for delivering analysis and reports are discussed in
Module 6: Planning a BI Delivery Solution, Module 7: Designing a Reporting Services Solution,
Module 8: Designing an Excel-Based Reporting Solution, and Module 9: Planning a SharePoint
Server BI Solution.
Note: Performance monitoring and optimization is discussed in Module 10: Monitoring and
Optimizing a BI Solution.
Note: Considerations for operations and maintenance are discussed in Module 11: Planning
BI Operations.
Project Infrastructure
It is easy to focus on the infrastructure
requirements of the solution you intend to build
and overlook the infrastructure required to
actually build it. In the same way that a
construction project to build an office building
requires a site office, parking facilities for the
construction crew, and so on, a BI project requires
hardware and software resources for the project
team to use during the development of the
solution.
Microsoft Project.
Team Foundation Server (TFS) to provide source control and issue tracking capabilities.
some enterprise-scale BI solutions, you may choose to simplify the test environment―for example, by
provisioning a single-server installation of SharePoint Server instead of a multi-server farm, combining
SQL Server components on a single server instead of provisioning dedicated servers, and using standalone
servers instead of failover clusters.
Project Personnel
A BI project involves several roles. These roles
typically include:
An infrastructure specialist. Implements the server and network infrastructure for the data
warehousing solution.
An ETL developer. Builds the ETL workflow for the data warehousing solution.
Business users. Provide requirements and help to prioritize the business questions that the data
warehousing solution will answer. Often, the team includes a business analyst as a full-time member
to help interpret the business questions and ensure that the solution design meets the needs of the
users.
Testers. Verify the business and operational functionality of the solution as it is developed.
Note: The list in this topic is not exhaustive and represents roles that must be performed,
not necessarily individual people. In some cases, multiple roles may be performed by a single
person―though in general, you should avoid having testers validate their own development
work.
In addition to the technical project personnel listed here, the project team should include business
stakeholders from the very beginning of the planning phase. The roles performed by business
stakeholders are discussed in the next topic.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 1-21
Business Stakeholders
The previous topic described the technical roles
required in a BI project. However, the project
team should also include representatives from key
areas of the business to help ensure that the
solution meets the business requirements and to
help promote user acceptance.
Executive Sponsor
The culture of each organization is unique, but in
almost all businesses, a BI project will face
personality clashes and political obstacles that
must be navigated to create a solution that is in
the best interests of the business as a whole.
Employees tend to focus on their own specific areas of the business, and they can often be resistant to
changes that affect their day-to-day activities or to what they see as external interference in their
responsibilities.
The challenge of obtaining “buy-in” from business users is easier to overcome if the project has an
executive sponsor who has aligned the project goals with the strategic aims of the business and can
champion the project at the highest level of the organization. When the BI project team meets resistance
or contradictory views from business users, the executive sponsor can use his or her influence to resolve
the issue.
Business Users
Although executive sponsorship is essential to drive the project forward, it is important to take into
account the input from business users. A solution that is enforced on users without consultation is unlikely
to gain acceptance, and in most cases it is unlikely that the primarily technical members of the project
team have sufficient knowledge of the business to create a useful solution even if users could be
persuaded to accept it.
Businesses are complex ecosystems in which many processes interact to achieve multiple objectives. In
some organizations, the business processes are formally defined and documented, but even when this is
the case, it is likely that day-to-day activities vary, often significantly, from “official” practices. Generally,
business users have a better insight into how the business processes actually work, what the various data
elements used in those processes actually mean, and how important they are than can be gained by a
technical architect examining existing systems and their documentation.
For example, suppose an existing system for processing sales includes a data field named SZ_Code with
values such as STD-140 and SPC-190. The usage of this field is not listed in the application
documentation, yet you see that it is used in approximately 75 percent of sales orders. Only a business
user who is familiar with the sales order process could tell you that the field represents a size code for
products that are available in multiple sizes, and that the value STD-140 represents a standard size of 140
centimeters, whereas SPC-190 means that the product was ordered in a special-order size of 190
centimeters that had to be custom made.
Data Stewards
Some information workers have particularly detailed knowledge of the business processes and data in a
specific area of the business. By formally including these people in the BI project team, you can have them
adopt the role of data steward (sometimes referred to as data governor or data curator) for the data
elements used in their area of the business. A data steward can provide valuable services to the project,
including:
MCT USE ONLY. STUDENT USE PROHIBITED
1-22 Planning a BI Solution
Representing the interests of a specific business area while the BI solution is planned. For example,
ensuring that all of the data elements that are important to that business area are included in the
data warehouse design or that the reports required by that business areas are considered.
Validating and interpreting data values in the source systems that will be used to populate the data
warehouse, and helping to identify the appropriate transformations and exceptions that will need to
be implemented.
Taking ongoing responsibility for maintaining a Data Quality Services knowledge base for the
business area, so that data values can be cleansed and matched effectively.
Taking ongoing responsibility for maintaining relevant business entities in a Master Data Services
model to ensure consistency of data definitions across the organization.
Project Scope
From the very beginning of a project, it is
important to prioritize the business requirements
in terms of their value to the business, and the
feasibility of meeting them with specific
constraints, such as available data, budget, and
project deadlines. This enables you to scope the
project in a way that maximizes the chances of it
successfully delivering value to the business.
Initial Scoping
After the initial requirements gathering is
accomplished, the project team and business
stakeholders must negotiate the importance or
value of the requirements. At this stage, you may be able to judge the feasibility of meeting some
objectives, but others will require further investigation to identify suitable source data or to estimate the
effort required.
You can use a matrix to record the relative value and feasibility of each requirement as they are agreed by
the team members. It is likely that there will be some disagreements about the importance of some
objectives, and feasibility may not be easy to assess. In these cases, you should make a note of the issues
and move on. At this stage, it is important to get a comprehensive view of the potential project
scope―further iterations of the design process will gradually resolve prioritization conflicts and help
clarify feasibility.
Using the techniques for auditing data sources discussed in the first lesson of this module to
determine whether sufficient data is available and accessible to meet the requirements.
Estimating the development effort and skills required for each of the requirements.
As the investigations reveal more information, the team should meet to refine the matrix created during
the initial scoping exercise.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 1-23
Using a pilot project enables you to reduce the time it takes for the BI project to add value to the
business. By prioritizing the requirements based on their value and feasibility, you can quickly
demonstrate the effectiveness of the BI initiative without losing the momentum the project has built up
during the initial scoping phase. In most cases, the pilot focuses on a related set of requirements, often in
a specific, high-profile area of the business. However, because you have used the scoping phase to
consider all requirements, you can design the pilot with extensibility in mind, ensuring that the design of
the pilot will support the addition of the other highly important business requirements at a later stage.
After scoping the pilot, you can start designing the solution. However, you must make sure that the
project team carefully considers the following questions:
How will the pilot incorporate user acceptance testing (UAT)? Instead of delivering the solution to all
users in the affected area of the business, you may want to enroll a subset of users in the pilot
program with a particular focus on providing feedback on the usability and usefulness of the solution.
Often, these users can provide valuable feedback that results in improvements to the design of
reports, data models, dashboards, SharePoint document library structures, and other user-visible
aspects of the solution.
How will you measure the success of the pilot? Other than qualitative measures based on feedback
from users, you should consider quantitative goals for the pilot. The criteria for success should
ultimately be aligned with the business goals, so you need to be able to measure the effects of the
solution in terms of revenue growth, increased profitability, reduced costs, increased customer
satisfaction survey scores, or whatever quantifiable goal the BI solution is intended to help the
business achieve. Therefore, you should determine a realistic time interval over which the success of
the project should be assessed.
MCT USE ONLY. STUDENT USE PROHIBITED
1-24 Planning a BI Solution
The company is financially sound and has a strong order book; however, sales volumes have remained
relatively static for the past few years. Senior management is under pressure from shareholders to develop
a strategy for growth that will drive increased revenue and profit. Management believes that a key factor
in their growth strategy is investment in technology that improves collaboration between the various
divisions of the company, and enables them to track and share key business performance metrics.
Objectives
After completing this lab, you will be able to:
Identify and prioritize business requirements.
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
Results: At the end of this exercise, you should have created a matrix that shows the relative value and
feasibility of the business requirements for the BI solution.
Document your software suggestions and the rationale for your choices in Software
Requirements.docx in the D:\Labfiles\Lab01\Starter folder.
Results: At the end of this exercise, you should have a list of suggested software components for the BI
solution.
Question: If you had been able to conduct discussions with real business stakeholders instead
of read paper-based interview transcripts, how would you have elicited clearer, more specific
business requirements?
Question: In a real project, how would you have determined the feasibility of the requirements
you captured in the requirements matrix?
MCT USE ONLY. STUDENT USE PROHIBITED
1-26 Planning a BI Solution
Question: What challenges have you experienced when gathering, interpreting, and prioritizing
business requirements in previous projects? How did you overcome them? What tips can you
share with your fellow students?
MCT USE ONLY. STUDENT USE PROHIBITED
2-1
Module 2
Planning SQL Server Business Intelligence Infrastructure
Contents:
Module Overview 2-1
Module Overview
The server and hardware infrastructure for a BI solution is a key consideration in any BI project. You must
balance the performance and scalability gains you can achieve by maximizing hardware specifications and
distributing the elements of your BI solution across multiple servers against hardware and software
licensing costs, and implementation complexity.
This module discusses considerations for selecting server hardware and distributing SQL Server services
across servers.
Note: This module focuses on SQL Server components. Considerations for including
SharePoint Server in a BI solution are discussed in Module 9: Planning a SharePoint Server BI
Solution.
Objectives
After completing this module, you will be able to:
Lesson 1
Considerations for BI Infrastructure
Planning server infrastructure for a SQL Server-based BI solution requires an understanding of how the
various SQL Server components work together, and how typical workloads for SQL Server components use
hardware resources.
Lesson Objectives
After completing this lesson, you will be able to:
Describe key features of the workloads associated with SQL Server components.
Analysis and Reporting Complexity. This includes the number, complexity, and predictability of the
queries that will be used to analyze the data or produce reports. Typically, BI solutions must support a
mix of the following types of query:
Number of Users. This is the total number of information workers who will access the system, and
how many of them will do so concurrently.
Availability Requirements. These include when will the system need to be used, and what planned
or unplanned downtime can the business tolerate.
Although it is difficult to be precise when categorizing a solution, the following table suggests some
typical examples the characteristics of small, medium, and large BI systems.
BI Workloads
In addition to the size categorization of the BI
solution your infrastructure needs to support, it is
useful to understand the types of workload that
typically occur in a BI solution. Each workload uses
hardware resources, and it is important to assess
the total impact of all workloads on resource
utilization and identify potential for contention
between workloads with similar resource
requirements.
Control flow tasks. SQL Server Integration Services (SSIS) packages often include control flow tasks
that require CPU processing time, memory, disk I/O, and network I/O. The specific resource
requirements for control flow tasks can vary significantly, so if your ETL solution includes a substantial
number of control flow tasks, you should monitor resource utilization on a test system to better
understand the workload profile.
Data query and insert. Fundamentally, ETL involves querying data sources and inserting and
updating data into staging and data warehouse tables. This incurs I/O and query processing on data
sources, the staging databases, and the data warehouse―especially if data loads require rebuilding
indexes or partition management.
Network data transfer. Typically, ETL processes transfer large volumes of data from one server to
another. This incurs network I/O and can require significant network bandwidth.
In-memory data pipeline. Data flow tasks in an SSIS package use an in-memory pipeline
architecture to process transformations. This places a demand on system memory resources. On
systems where there is contention between the SQL Server database engine and SSIS for memory
resources, data flow buffers might need to spill to disk, which reduces data flow performance and
incurs disk I/O.
SSIS catalog or MSDB database activity. SSIS packages deployed in project mode are stored in an
SSIS Catalog database. Alternatively, packages deployed in package mode are stored either in the
MDSB database or on the file system. Whichever deployment model is used, the ETL process must
access the package storage to load packages and their configuration. If the SSIS catalog or MSDB
database is used, and it is located on a SQL Server instance that hosts other databases used in the BI
solution (such as the data warehouse, staging database, or Report Server catalog), there may be
contention for database server resources.
Processing. Data models contain aggregated values for the measures in the data warehouse, and
they must be processed to load the required data from the data warehouse tables into the model and
perform the necessary aggregation calculations. When data in the data warehouse is refreshed, the
data models must be partially or fully processed again to reflect the new and updated data in the
data warehouse.
Aggregation storage. Data models must store the aggregated data in a structure that is optimized
for analytical queries. Typically, multidimensional models are stored on disk with some data cached in
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 2-5
memory for performance reasons. Tabular models are usually stored completely in memory;
therefore, they require sufficient memory resources.
Query execution. When users perform analytical queries against a data model, the model must
process the query and generate results. This requires CPU processing time, memory, and potentially
disk I/O.
Reporting Workloads
Although some reporting can be performed by client applications directly against the data warehouse or
data models, many BI solutions include SQL Server Reporting Services (SSRS). An SSRS reporting solution
typically involves the following workloads:
Handling client requests. Clients submit requests to Reporting Services over HTTP, so the report
server must listen for and process HTTP requests.
Data source queries. Reports are based on datasets that must be retrieved by querying data sources.
Typically, the data sources for the reports in a BI solution are SSAS data models or the data
warehouse, so report execution incurs query processing overheads in those sources.
Report rendering. After the data for a report has been retrieved, SSRS must render the report data
into the required format using the report definition language (RDL) for the report and the specific
rendering extension. Depending on the size, format, and complexity of the report, report rendering
can incur substantial CPU and memory resources.
Caching. To reduce query processing and rendering overheads, SSRS can cache datasets and reports
in the report server temporary database. Datasets and reports can be cached on first use or you can
use scheduled jobs to pre-cache objects at a regular interval.
Snapshot execution. In addition to caching reports, you can create persisted report snapshots at a
scheduled interval and store them in the report server database.
Subscription processing. You can configure SSRS to execute and deliver reports to file shares or
email addresses on a scheduled basis.
Report Server catalog I/O. Report definitions and resources, such as images, are stored in the report
server catalog; and database I/O tasks are required to retrieve these when needed. Additionally,
database I/O is required to retrieve cached reports and datasets from the temporary database, and to
retrieve report snapshots from the catalog database. This database access may compete for resources
with other databases hosted in the same SQL Server instance.
Logging activity.
SQL Server Integration Services. Used to execute packages that encapsulate ETL tasks and data
flows to extract data from source systems into the staging database, and then load it into the data
warehouse.
SQL Server Analysis Services. Used to provide analytical data models and data mining functionality.
Depending on business requirements, two instances on Analysis Services may be required: one for
multidimensional models and data mining, the other for tabular models.
SQL Server Reporting Services. Used to provide report publishing and execution services. Reporting
Services in native mode consists of a web service application, a web-based management user
interface, and two SQL Server databases.
Depending on business requirements, you may also choose to install SQL Server Data Quality Services on
the server to support data cleansing and matching capabilities for staged data before it is loaded into the
data warehouse.
Note: If SharePoint Server is required, you can deploy the SharePoint farm and SQL Server
integration components for Reporting Services and PowerPivot on the BI server. This architecture
is not recommended for BI solutions that require significant scalability or performance.
SharePoint Server topologies for BI are discussed in Module 9: Planning a SharePoint Server BI
Solution.
A single server architecture is suitable for test and development environments, and can be used in
production environments that have minimal data volumes and scalability requirements.
Distributed BI Architecture
If your BI solution requires even moderate levels of scalability, it will benefit from expanding beyond a
single server architecture and distributing the BI workload across multiple servers. Typical approaches to
distributing SQL Server components in a BI solution include:
Creating a dedicated report server. In many BI solutions, Analysis Services provides a data model
that contains most (or even all) of the data in the data warehouse, and all reporting is performed
against the data model. In scenarios like this, there is little database activity in the data warehouse
other than during ETL loading and data model processing (loading data from the data warehouse
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 2-7
tables into the model and aggregating it). The workloads on the server that compete for resources
most of the time are Analysis Services and Reporting Services; therefore, you can increase scalability
and performance by moving the reporting workload to a separate server. You can install the
Reporting Services database on either server. Leaving it on the data warehouse server keeps all of the
data engine elements on a single server but can result in I/O workloads competing for disk resources.
Installing the reporting services database on the report server necessitates the database engine being
installed on both servers but results in a cleaner separation of workloads. In extremely large enterprise
solutions with intensive reporting requirements, you can add a third server as a dedicated host for the
Reporting Services database.
Separating data warehouse and ETL workloads from analytical and reporting workloads. If the
data warehouse will be refreshed with new data frequently, or if it must support direct query access in
addition to processing data models, the database I/O activity might compete with Analysis Services
for disk, CPU, and memory resources. To prevent this, you can deploy Analysis Services on a separate
server. Depending on analytical and reporting workloads, you might choose to co-locate Analysis
Services and Reporting Services on the same server, or use a dedicated server for each service. If you
need to support tabular models and multidimensional or data mining models, and a single Analysis
Services server is not adequate to support both workloads, you could consider using separate servers
for the different types of Analysis Services model.
Using a dedicated ETL server. If your ETL process requires frequent data extractions and loads, or
involves particularly resource-intensive transformations, overall performance might benefit from
moving Integration Services and the SSIS Catalog database to a dedicated server. Depending on the
specific transformation and load operations that your ETL process requires, you can choose to locate
the staging database on the ETL server or the data warehouse server, or use a two-phase staging
approach whereby extracted data is staged on the ETL server for transformation and cleansing, and
then loaded into staging tables on the data warehouse server before being inserted into the data
warehouse tables.
When designing a distributed architecture, the key goal is to eliminate contention for hardware resources.
Therefore, you gain the greatest benefits by identifying workloads with similar hardware utilization
profiles and separating them.
SQL Server Analysis Services. Create a read-only copy of a multidimensional database and connect
to it from multiple Analysis Services query servers. To accomplish this, an SSAS server processes the
cubes in the database. The database is then detached, copied to a standby location, and reattached
so that it can be used by applications that need write-back capabilities. The copied database is then
attached in read-only mode to multiple SSAS instances, which will provide query services to clients.
Client requests can be distributed across the query servers by using a load-balancing middle-tier, or
you can assign specific subsets of the client population to specific query servers.
The data warehouse. You can scale out an extremely large data warehouse in several ways; typically,
this is done by partitioning the data across multiple database servers and using middle-tier logic to
direct queries to the appropriate server instance. SQL Server Parallel Data Warehouse edition, which is
provided in pre-configured enterprise data warehouse appliances, uses a massively parallel processing
(MPP) shared nothing architecture to scale out data warehouse queries across multiple independent
compute and storage nodes.
SQL Server Integration Services. Although it is not a pure scale-out technique, you can use multiple
SSIS servers to each perform a subset of the ETL processes in parallel. This approach requires
extensive custom logic to ensure that all tasks are completed, and it should be considered only if your
ETL requirements cannot be met through a scale-up approach in which you add hardware resources
to a single SSIS server.
Additional Reading: For more information about designing a scale-out solution for
Reporting Services, review the content and links in the SQLCAT Reporting Services Scale-Out
Architecture technical notes at
http://sqlcat.com/sqlcat/b/technicalnotes/archive/2008/06/05/reporting-services-scale-out-
architecture.aspx. For more information about using read-only databases to implement a scale-
out solution for Analysis Services, see Scale-Out Querying for Analysis Services with Read-Only
Databases at http://msdn.microsoft.com/en-us/library/ff795582(v=SQL.100).aspx.
Database Mirroring. Database-level protection that synchronizes a database across two SQL Server
instances. Note that Database Mirroring is deprecated in this release of SQL Server and new solutions
should use AlwaysOn Availability Groups instead.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 2-9
Log Shipping. Database-level protection that copies transaction logs from a primary server to a
secondary server, where they can be applied to a secondary copy of the database.
Additionally, considering the importance of the data warehouse data, the database files should be stored
on a redundant array of independent disks (RAID) array that provides protection in the case of a disk
failure.
On failover, you must re-encrypt the master database key in the new primary server before any
packages that include encrypted sensitive data can be executed.
Your SSIS packages must be able to recover the ETL process as a whole to a consistent state if
unplanned failover occurs. This means that you must include cleanup and failover logic in your
packages.
If you must apply a patch or update that modifies the SSIS catalog schema, you must remove the SSIS
Catalog database from the availability group, patch it, and then re-establish the availability group.
Additional Reading: For more information about using AlwaysOn Availability Groups with
the SSIS catalog, see SSIS with AlwaysOn at
http://blogs.msdn.com/b/mattm/archive/2012/09/19/ssis-with-alwayson.aspx.
Lesson 2
Planning Data Warehouse Hardware
The data warehouse is the foundation for a BI solution, and there are a number of recommendations from
Microsoft and its hardware partners that you should consider when planning a data warehouse system.
Data warehousing is substantially different from other database workloads, and the conventional database
design approach for optimizing hardware to support the highest possible number of I/O operations per
second (IOPS) is not always appropriate for a data warehouse.
This lesson introduces Microsoft SQL Server Fast Track Data Warehouse reference architectures, and it
goes on to explain how some of the Fast Track design principles can be applied when planning data
warehouse hardware.
Lesson Objectives
After completing this lesson, you will be able to:
Describe the key features of SQL Server Fast Track Data Warehouse reference architectures.
Multi-Vendor Support
Microsoft has partnered with multiple hardware vendors to create pre-tested system designs that use
commodity hardware. If your organization has an existing supplier relationship with one of the Fast Track
hardware vendors, you can easily specify a system based on components from that supplier and use the
support and consulting services offered by the hardware vendor to create a data warehouse system that is
based on a proven design.
Additional Reading: For more information about Fast Track Data Warehouse reference
architectures, see An Introduction to Fast Track Data Warehouse Architectures at
http://msdn.microsoft.com/en-us/library/dd459146(v=SQL.100).aspx and Fast Track Data
Warehouse Reference Guide for SQL Server 2012 at http://msdn.microsoft.com/en-
us/library/hh918452.aspx.
The diagram on the slide shows a balanced system in which the I/O rates of each component in the
system are reasonably similar. This diagram represents a large-scale data warehousing system in which
data is stored in a storage area network with fiber channel connectivity and multiple storage enclosures,
each containing multiple disk arrays. However, the same principles apply to a smaller architecture.
MCT USE ONLY. STUDENT USE PROHIBITED
2-12 Planning SQL Server Business Intelligence Infrastructure
The I/O rate of hardware components, such as hard disks, array storage processors, and fiber channel host
bus adapters (HBAs) can be established through careful testing and monitoring by using tools like SQLIO,
and many manufacturers (particularly those who participate in the Fast Track program) publish the
maximum rates. However, the initial figure that you need to start designing a data warehouse system is
the maximum consumption rate (MCR) of a single processor core combined with the SQL Server database
engine. After the MCR for the CPU core architecture you intend to use has been established, you can
determine the number of processors required to support your workload and the storage architecture
required to balance the system.
Note that MCR is specific to a combination of a CPU and motherboard, and SQL Server; it is not a
measure of pure processing speed or an indication of the performance you can expect for all queries in
your solution; instead, it is a system-specific benchmark measure of maximum throughput per-core for a
data warehouse query workload. Calculating MCR requires executing a query that can be satisfied from
cache while limiting execution to a single core, and reviewing the execution statistics to determine the
number of megabytes of data processed per second.
Demonstration Steps
Create tables for benchmark queries
1. Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
4. On the File menu, point to Open, and then click File. Browse to the D:\Demofiles\Mod02\Starter
folder, select Create Benchmark DB.sql, and then click Open.
5. Click Execute, and wait for query execution to complete. This query creates a database containing
two tables, one with a clustered index and one without. Both tables contain a substantial number of
rows.
1. On the File menu, point to Open, and then click File. In the D:\Demofiles\Mod02\Starter folder,
select Measure MCR.sql, and then click Open.
2. Click Execute, and wait for query execution to complete. The queries retrieve an aggregated value
from each table, and are performed twice. This ensures that on the second execution (for which
statistics are shown), the data is in cache so the I/O statistics do not include disk reads. Note that the
MAXDOP=1 clause ensures that only a single core is used to process the query.
1. In the results pane, click the Messages tab. This shows the statistics for the queries.
2. Add the logical reads value for the two queries together, and then divide the result by two to find
the average.
3. Add the CPU time value for the two queries together, and then divide the result by two to find the
average. Divide the result by 100 to convert it to seconds.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 2-13
For example, suppose the MCR of the CPU core you intend to use is 200 MB/s. If an average query is
expected to return 18,000 MB, the anticipated number of concurrent users is 10, and each query must
respond within 60 seconds; the calculation to find the number of cores required is:
This results in a requirement for 15 cores (which should be rounded up to 16 because no CPU architecture
includes exactly 15 cores).
Now that you know the number of cores required, you can make an initial determination of the number
of processors. For example, to provide 16 cores using quad-core processors, you would need 4 processors.
Alternatively, if dual-core processors are used, 8 CPUs would be required. However, keep in mind that you
need to balance the number of CPUs to closely match the number of storage arrays that will be used,
which in turn may depend on the volume of data your data warehouse must support.
Another way to estimate memory requirements is to consider that in an average data warehouse
workload, users regularly need to access approximately 20 percent of the data stored in the data
warehouse (for example, in a data warehouse that stores 5 years of sales records, users mostly query the
most recent year). Having enough memory to maintain approximately 20 percent of the data in cache will
significantly enhance performance.
MCT USE ONLY. STUDENT USE PROHIBITED
2-14 Planning SQL Server Business Intelligence Infrastructure
4. Factor In Compression
Finally, you should plan to compress the data in your data warehouse. Typically, SQL Server provides a
compression factor of approximately 3:1, so the 46 GB of data should compress to approximately 15.5 GB
on disk.
Configuration databases. If databases used by other BI services, including the SSIS Catalog and
Reporting Services databases, are to be installed on the data warehouse server, you should include
them in your storage estimate. Additionally, the SQL Server instance includes system databases,
though in practice, these are usually stored on separate storage from the data warehouse data files.
Transaction log files. Each database requires a transaction log. Typically, data warehouses are
configured to use the simple recovery model and few transactions are actually logged.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 2-15
TempDB. Many data warehouse queries require temporary storage space, and it is generally
recommended to locate TempDB on a suitable storage column and assign it a suitable initial size to
avoid the system having it grow automatically as needed.
Staging tables. Whether data is staged in a dedicated staging database, in tables within the data
warehouse database itself, or in a combination of both locations, you must allocate enough space to
allow for data staging during ETL processes.
Backups. If you intend to back up the data warehouse and other databases to disk, you must ensure
that the storage design provides space for backup files.
Analysis Services models. If you intend to host multidimensional Analysis Services data models on
the data warehouse server, you must allocate sufficient disk space for them.
Use RAID 10, or minimally RAID 5. RAID 10 (in which data is both mirrored and striped) provides
the best balance of read performance and protection from disk failure, and this should usually be the
first choice for a data warehouse. However, the requirement for a complete set of redundant disks per
array can make this an expensive option. As an alternative, you can use RAID 5, which provides
striping for high read performance and parity-based data redundancy to protect against disk failure.
Consider a dedicated storage area network. Although you can build a data warehouse that uses
direct attached storage (DAS), using a storage area network (SAN) generally makes it easier to
manage disk array configuration and to add storage in the future as the data warehouse grows. If you
do decide to use a SAN, it is best to have one that is dedicated to the BI solution and not shared with
other business applications. Additionally, try to balance the number of enclosures, storage processors
per enclosure, and disk groups to achieve a consistent I/O rate that takes advantage of parallel core
processing and matches the MCR of the system.
MCT USE ONLY. STUDENT USE PROHIBITED
2-16 Planning SQL Server Business Intelligence Infrastructure
At this time, you have been informed that you should not consider using SharePoint Server in your
planned solution.
Objectives
After completing this lab, you will be able to:
Plan server topology for a SQL Server–based BI solution.
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
Use Visio to document the servers required for the BI solution and the services on each server. Include
notes to justify your decisions.
For the purposes of this exercise, you can ignore any requirements for:
o SharePoint Server.
Results: At the end of this exercise, you should have a Visio diagram that documents your server
infrastructure design.
Use the calculated MCR figure to estimate the number of cores required to support the following
workload:
o Recommend the number and type (dual core or quad core) of processors to include in the data
warehouse server.
o Calculate the storage requirements for the data warehouse server, assuming a compression ratio
of 3:1.
Use Microsoft Word to open the Storage options.docx document in the D:\Labfiles\Lab02\Starter
folder and review the available storage options.
Based on the storage requirements you have identified, select a suitable storage option and record
your selection in the DW Hardware Spec.xlsx workbook.
Results: After this exercise, you should have a completed worksheet that specifies the required hardware
for your data warehouse server.
Module 3
Designing a Data Warehouse
Contents:
Module Overview 3-1
Module Overview
The data warehouse is at the heart of most business intelligence (BI) solutions, and designing the logical
and physical implementation of the data warehouse is crucial to the success of the BI project. Although a
data warehouse is fundamentally a database, there are some significant differences between the design
process and best practices for an online transaction processing (OLTP) database and a data warehouse
that will support online analytical processing (OLAP) and reporting workloads.
This module describes key considerations for the logical design of a data warehouse, and then it discusses
best practices for the physical implementation of the data warehouse.
Objectives
After completing this module, you will be able to:
Design and implement effective physical data structures for a data warehouse.
MCT USE ONLY. STUDENT USE PROHIBITED
3-2 Designing a Data Warehouse
Lesson 1
Data Warehouse Design Overview
Before designing individual database tables and relationships, it is important to understand the key
concepts and design principles for a data warehouse. This lesson describes the dimensional model used in
most data warehouse designs and the process used to translate business requirements into a data
warehouse schema.
Lesson Objectives
After completing this lesson, you will be able to:
Creating a matrix of business processes and conformed dimensions by which the data must be
aggregated for analysis and reporting.
Designing a dimensional model for each business process, including numeric facts and dimension
attributes and hierarchies.
Translating the dimensional model designs into a database schema consisting of fact and dimension
tables.
Designing appropriate physical data storage and structures to optimize data warehouse performance
and manageability.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-3
Ideally, a dimensional model can be implemented in a database as a star schema, in which each fact table
is directly related to its relevant dimension tables. However, in some cases, one or more dimensions may
be normalized into a collection of related tables to form a snowflake schema. Generally, you should avoid
creating snowflake dimensions, if possible, because in a typical data warehouse workload, the
performance benefits of a single join between fact and dimension tables outweigh the data redundancy
reduction benefits of normalizing the dimension data.
The query optimizer in the Enterprise edition of SQL Server 2012 includes logic that detects star schema
joins in queries and optimizes the way these joins are processed accordingly. Based on the selectivity of
the query (that is, the proportion of rows from the fact table that the query is likely to return), the query
optimizer uses bitmap filters to quickly eliminate non-qualifying rows from the fact table (which generally
accounts for the largest cost in a data warehouse query).
Additional Reading: For more detailed information about star join query optimization, see
Introduction to New Data Warehouse Scalability Features in SQL Server 2008 at
http://msdn.microsoft.com/en-us/library/cc278097(v=SQL.100).aspx and Data Warehouse Query
Performance at http://technet.microsoft.com/en-us/magazine/2008.04.dwperformance.aspx.
dimensional modeling methodology to a SQL Server-based data warehouse design, read The
Microsoft Data Warehouse Toolkit (Wiley, 2011).
Typically, asking questions such as “how will you be able to tell if the business requirement is being met?”
leads the discussion toward the analytical and reporting requirements. For example, to determine whether
sales performance is improving, the sales executive might say that they need to be able to see “order
volume by territory” or “sales revenue by salesperson.” Requirements expressed like this make it easier to
determine the measures and dimensions the solution must include, because the requirement often takes
the form “measure by dimension”.
Additionally, most analytical and reporting requirements include a time-based aggregation. For example,
the sales executive might want to compare sales revenue by month or quarter.
Order processing.
Stock management.
Order fulfillment.
Manufacturing.
Financial accounting.
Each of these processes generates data. This data includes numeric values and events that can be counted
(which can be sources for measures in a dimensional model) and information about key business entities
(which can be sources for dimension attributes).
A significant part of the effort in designing a data warehouse solution involves exploring the data in these
source systems and interviewing the users, system administrators, and application developers who
understand it best. Initial exploration might simply take the form of running Transact-SQL queries to
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-5
determine distinct value counts, average numerical values, and row counts. You can then use the basic
information gathered from these initial queries and discussions with data specialists as a foundation for
deeper data profiling using tools such as the Data Profiling task in SQL Server Integration Services to
determine minimum and maximum field lengths, data sparseness and null counts, and the reliability of
relational dependencies.
At this stage, you do not need to perform a full audit of the data and start planning the extract, transform,
and load (ETL) solution, but you do need to identify if and where the measures and dimension attributes
you need to meet the reporting requirements are stored, what range of values exist for each required data
field, what data is missing or unknown, and at what granularity is the data available.
Note: For information about how to use the Data Profiling task in SSIS, attend course
10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012.
6. Document and refine the models to determine the database logical schema
After you create initial dimensional models for each required business process, you can document the
models to show:
You can then iteratively refine the model to design the fact and dimension tables that will be required in
the data warehouse database. Considerations for fact and dimension tables are discussed later in this
module.
MCT USE ONLY. STUDENT USE PROHIBITED
3-6 Designing a Data Warehouse
Dimensional Modeling
After you identify the business processes and
conformed dimensions, you can document them
in a matrix, as shown on the slide. This approach is
based on the bus matrix design technique
promoted by the Kimball Group.
You can then use the matrix to select each business process based on priority, and design a dimensional
model for the business process by performing the following steps:
1. Identify the grain. The grain of a dimensional model is the lowest level of detail at which you can
aggregate the measures. It is important to choose the level of grain that will support the most
granular of reporting and analytical requirements, so typically the lowest level of grain possible from
the source data is the best option. For example, an order processing system might record order data
at two levels: order-level data such as the order date, salesperson, customer, and shipping cost; and
line item–level data such as the products included in the order and their individual quantities, unit
costs, and selling prices. To support the most granular analysis and reporting, the grain should be
declared at the line item level (so the fact table will contain one row per line item).
2. Select the required dimensions. Next, determine which of the dimensions that are related to the
business process should be included in the model. The selection of dimensions depends on the
reporting and analytical requirements―specifically on the business entities by which the business
users need to aggregate the measures. Almost all dimensional models include a time-based
dimension, and the other dimensions generally become obvious as you review the requirements.
Additionally, at this stage, you might begin to identify specific attributes of the dimensions that will
be needed (such as the country, state, and city of a customer or the color and size of a product).
In the example on the slide, the Time, Customer, Product, and Salesperson dimensions are selected.
Note: The Time dimension in this example is used for both order date and ship date.
Although it would be possible to define an individual dimension for each type of date, it is more
common to create a single time dimension and use it for multiple roles. In an analytical model,
these multiple usages of the same dimension table are known as role-playing dimensions. This
technique is most commonly used for time tables, but it can be applied to any dimension that is
used in multiple ways―for example, a dimensional model for an airline flight scheduling business
process might use a single Airport dimension to support Origin and Destination role-playing
dimensions.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-7
3. Identify the facts. Finally, identify the facts that you want to include as measures. These are numeric
values that can be expressed at the level of the grain chosen earlier and aggregated across the
selected dimensions. Some facts will be taken directly from source systems, and others might be
derived from the base facts. For example, you might choose Quantity and Unit Price facts from an
order processing source system and then calculate a total Sales Amount. Additionally, depending on
the grain you choose for the dimensional model and the grain of the source data, you might need to
allocate measures from a higher level of grain across multiple fact rows. For example, if the source
system for the order processing business process includes a Tax measure at the order level, but the
facts are to be stored at the line item level, you will need to decide how to allocate the tax amount
across the line items. Typically, tax is calculated as a percentage of selling price, so it should be
straightforward to apply the appropriate tax rate to each line item based on the sales amount.
In the example on the slide, the Item Quantity, Unit Cost, and Unit Price measures are taken from the
source system at the line item level. From these, the Total Cost and Sales Amount measures for each line
item can be calculated. Additionally, the Shipping Cost measure is defined at the order level in the source
system, so it must be allocated across the line items. You do this by simply dividing it equally across each
row or applying a calculation that distributes the shared cost based on the quantity of each item ordered,
total line item weight, or some other appropriate formula.
Eventually, the simple diagram will be refined to the point where it can be easily translated into a schema
design for database tables. At this stage, you can use a diagramming tool such as Microsoft Visio or a
specialist database modeling tool to start designing the logical schema of your data warehouse.
MCT USE ONLY. STUDENT USE PROHIBITED
3-8 Designing a Data Warehouse
Lesson 2
Designing Dimension Tables
After you design the dimensional models for the data warehouse, you can translate the design into a
logical schema for the database. However, before you design dimension tables, it is important to consider
some common design patterns and apply them to your table specifications.
This lesson discusses some of the key considerations for designing dimension tables.
Lesson Objectives
After completing this lesson, you will be able to:
The data warehouse might use dimension data from multiple source systems, so there is the
possibility that business keys are not unique.
Some source systems use non-numeric keys, such as a globally unique identifier (GUID), or natural
keys, such as an email address to uniquely identify data entities. Integer keys are smaller and more
efficient to use in joins from fact tables.
Each row in a dimension table represents a specific version of an instance of a business entity. If the
dimension table supports type 2 slowly changing dimensions, the table might need to contain
multiple rows that represent different versions of the same entity. These rows will have the same
business key, and without a surrogate key, they won’t be uniquely identifiable.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-9
Typically, the business key is retained in the dimension table as an alternate key. Business keys that are
based on natural keys can be familiar to users analyzing the data―for example, a ProductCode business
key that users will recognize might be used as an alternate key in the Product dimension table. However,
the main reason for retaining a business key is to make it easier to manage slowly changing dimensions
when loading new data into the dimension table. The ETL process can use the alternate key as a lookup
column to determine whether an instance of a business entity already exists in the dimension table.
Slicers. Attributes do not need to form hierarchies to be useful in analysis and reporting. Business
users can group or filter data based on single-level hierarchies to create analytical sub-groupings of
data. For example, the Gender attribute in the Customer table can be used to compare sales revenue
for male and female customers.
Drill-through detail. Some attributes have little value as slicers or members of a hierarchy. For
example, it may be unlikely that a business user will need to analyze sales revenue by customer phone
number. However, it can be useful to include entity-specific attributes to facilitate drill-through
functionality in reports or analytical applications. For example, in a sales order report that enables
users to drill-down to the individual order level, users might want to double-click an order and drill-
through to see the name and phone number of the customer who placed the order.
Note: The terminology for interacting with data in reports can be confusing, and is
sometimes used inconsistently. For clarity, in this course, the term “drill-down” means expanding
a hierarchy to see the next level of aggregation, and “drill-through” means viewing details
outside of the current hierarchy for a selected row. For example, while viewing sales revenue by
customer geography, you might view total revenue for a specific country in the hierarchy. You
might then “drill-down” to see a subtotal for each state within that country (the next level in the
hierarchy), or “drill-through” to see demographic details for that country.
In the example on the slide, note that the Name column contains the full name of the customer. In a data
warehouse table schema, it is not usually necessary to normalize the data to its most atomic level as is
common in OLTP systems. In this example, it is unlikely that users will want to group or filter data by
MCT USE ONLY. STUDENT USE PROHIBITED
3-10 Designing a Data Warehouse
customer first name or last name, and the data only has drill-through value at the full name level of detail.
Therefore, the FirstName, MiddleName, and LastName columns in the source system have been
combined into a single Name field in the data warehouse.
To support these two cases, a row for each case is added to the dimension table with appropriate
surrogate keys (such as -1 for "Unknown" and 0 for "None"). If the source systems had been more
ambiguous, you could add a single row to the dimension table to represent “None or unknown”.
NULL equality
Depending on the settings in a SQL Server database, you might not be able to compare NULL values for
equality. In its strictest definition, NULL means unknown, so a “NULL = NULL” comparison is actually
asking if one unknown value is the same as another unknown value; and because both values are
unknown, the answer is also unknown (and therefore NULL). You should not use NULL as the alternate key
for the “Unknown” dimension row, because lookup queries during the ETL load process must compare
this key to the data being loaded to determine whether a dimension row already exists. Instead, use an
appropriate key value that is unlikely to be the same as an existing business key, and use the Transact-SQL
ISNULL function to compare source rows with dimension rows, as shown in the following code sample.
Type 3. Type 3 changes are rarely used. In a type 3 change, the previous value (or sometimes a
complete history of previous values) is maintained in the dimension table row. This requires
modifying the dimension tale schema to accommodate new values for each tracked attribute, and
can result in a complex and difficult to manage dimension table.
After you define the dimensional model for the data warehouse and are evolving your design from a sun
diagram to a database schema, it can be useful to annotate dimension attributes to indicate what kind of
SCD changes they must support. This will help you plan the metadata columns required for each
dimension table.
to concatenate the integer values for each date part in descending order of scope. For example, use
the pattern YYYYMMDD to represent dates; so for January 31st 2013, use the value 20130131. This
ensures that the value used for the next sequential date (February 1st 2013) is a higher value of
20130201. The reason ascending values are recommended is that data warehouse queries typically
filter on a range of date or time values, and using an ascending numeric key enables you to use
indexes and partitions that store the fact data in chronological order, thereby enabling the query
optimizer to use sequential scans to read the data. Additionally, the actual datetime value for the row
is generally used as the alternate key to support datetime functions or client applications that can
apply datetime specific logic.
Granularity. The level of granularity used for a time dimension table depends on the business
requirements. For many reporting and analysis scenarios, such as viewing details about sales orders,
the lowest level of granularity that is likely to be required is a day. However, in some scenarios, users
might need to aggregate facts by hours, minutes, or seconds, or even smaller increments. The lower
the level of granularity used, the more rows will exist in the dimension table, and storing a row for
increments of less than a day can result in extremely large tables. An alternative approach is to create
a date dimension table that contains a row for each day, and a second time dimension table that
stores a row for each required time increment in a 24-hour period. Fact tables that are used for
analysis of measures at the day level or higher can be related to the date dimension table only; and
facts that are measured at smaller time increments can be related to both the date and time
dimension tables.
Range. Typically, a time dimension table stores a row for each increment between a start point and
an end point with no gaps. So for example, a time dimension in a data warehouse used to analysis
sales orders might have a row for each day between the first ever order and the most recent
order―even if no orders were placed on some of the intervening days. In reality, the start and end
dates are typically based on key calendar dates. For example, the start date might be January 1st of
the year the company started trading, or the start date of the company’s first fiscal year. The end date
is usually some future point, such as the end of the current year; and more rows are added
automatically as the end date gets closer to maintain a buffer of future dates. If the data warehouse
will be used to create and store projections or budget figures for future operations, you will need to
choose an end date that is far enough into the future to accommodate these values.
Attributes and hierarchies. You need to include attributes for each time period by which data will
be aggregated, for example year, quarter, month, week, and day. These attributes tend to form
natural hierarchies. Additionally, you can add attributes to be used as slicers, such as weekday (which
for example, would enable users to compare typical sales volumes for each day of the week). In
addition to numeric values, you might want to include attributes for date element names, such as
month names and day names. This enables more user-friendly reports (for example, enabling users to
compare sales in March and April instead of month 3 and month 4), but you should also include the
numeric equivalents so that client applications can use them to sort the data into the correct
chronological order (for example, sorting months by month number instead of month name).
Multiple calendars. Many organizations need to support multiple calendars, for example a normal
calendar year that runs from January to December, and a fiscal calendar, which might run from April
to March. If this is the case in your data warehouse, you can either create a separate time dimension
table for each calendar, or more preferably, include attributes for all alternative calendar values in a
single time dimension table. For example, a time dimension table might include a Calendar Year
attribute and a Fiscal Year attribute.
Unknown values. In common with other dimension tables, you might need to support facts for
which a date or time value is unknown. Instead of requiring a NULL value in the fact table, consider
creating a row in the time dimension table for unknown values. You can use an obvious surrogate key
value, such as 00000000) for this row, but because the alternate key must be a valid date, you should
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-13
use a date outside of the normal range of business operations, such as January 1st 1753 or December
31st 9999 (these are the minimum and maximum values supported by the datetime data type).
When you implement a self-referencing dimension table in a data warehouse, you should think about the
following considerations:
Like all dimension load operations, when records are to be loaded into the dimension table, the ETL
process must look up the alternate key to determine whether a record already exists for the entity.
However, the alternate key of the parent record must also be looked up to determine the correct
surrogate key to use in the foreign key column.
You may have to deal with a situation where you need to load a record for which the parent record
has not yet been loaded.
Supporting type 2 SCDs in a self-referencing dimension table can be complex. In a worst case
scenario, you might perform a type 2 change that results in a new row (and therefore a new surrogate
key), and then need to cascade that type 2 change to create new rows for all descendants of the
entity if the change has not altered the parent-child relationships.
Junk Dimensions
In some reporting and analytical requirements,
there are attributes that are useful for grouping or
filtering facts but which do not belong in any of
the dimensions defined in the dimensional model.
When these attributes have low cardinality (that is,
there are only a few discrete values), you can
group them into a single dimension table that
contains miscellaneous analytical values. This kind
of dimension table is generally referred to as a
junk dimension and is used to avoid creating
multiple, very small dimension tables.
provided, or there may be a column that stores “credit” or “debit” to indicate the payment method.
Instead of creating a dimension table for each of these attributes, you could combine them in every
possible combination in a junk dimension table.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-15
Lesson 3
Designing Fact Tables
Fact tables contain the numeric measures that can be aggregated across the dimensions in your
dimensional model. Fact tables can grow to be extremely large, and it is important to design them
carefully with reporting and analytical requirements, performance, and manageability in mind.
Lesson Objectives
After completing this lesson, you will be able to:
Measures. In most cases, a fact table is primarily used to store numeric measures that can be
aggregated by the related dimensions. For example, a row in a fact table that records sales orders
might include a column for the sales amount, which can then be aggregated by the dimensions to
show sales amount by date, by product, or by customer. In some cases, a fact table contains no
measures and is simply used to indicate that an intersection between the related dimensions
occurred. For example, a fact table in a manufacturing dimensional model might record a single row
each time a product assembly is completed, indicating the product and date dimension keys. The fact
table can then be used to calculate the number of times an assembly of each product was completed
per time period by simply counting the distinct rows. A fact table with no numeric measure columns
is sometimes referred to as a factless fact table.
Degenerate dimensions. Sometimes, a fact has associated attributes that are neither keys nor
measures, but which can be useful to group or filter facts in a report or analysis. You can include this
column in the fact table where client applications can use it as a degenerate dimension by which the
MCT USE ONLY. STUDENT USE PROHIBITED
3-16 Designing a Data Warehouse
fact data can be aggregated. In effect, including degenerate dimension columns in a fact table
enables it to also be used as a dimension table. Using degenerate dimensions can be a good
alternative to using a junk dimension if the analytical attributes are specific to a single fact table.
Note: Unlike most database tables, a fact table does not necessarily require a primary
key―in fact, unless you have a business requirement to uniquely identify each row in the fact
table, you should avoid creating a unique key column for the fact table and avoid defining a
primary key constraint. Facts are generally aggregated, and queries rarely need to individually
identify a fact row. In some cases, the combination of dimension keys can uniquely identify a fact
row, but this is not guaranteed―for example, the same customer could purchase the same
product twice in one day.
Types of Measure
Fact tables can contain the following three kinds
of measure:
Non-additive measures. Non-additive measures cannot be summed by any dimension. For example,
a fact table for sales orders might include a ProfitMargin measure that records the profit margin for
each order. However, you cannot calculate the overall margin for any dimension by summing the
individual profit margins.
Generally, semi-additive and non-additive measures can be aggregated by using other functions, for
example, you could find the minimum stock count for a month or the average profit margin for a product.
Understanding the ways in which the measures can be meaningfully aggregated is useful when testing
and troubleshooting data warehouse queries and reports.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-17
Objectives
After completing this lab, you will be able to:
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
Discuss the interviews, and identify the business processes in Adventure Works that generate the data
required to meet the analytical and reporting requirements.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-19
Prioritize the business processes by their importance to the business in terms of analytical and
reporting requirements.
Record the business processes you identify in the Matrix.xlsx Excel workbook in the
D:\Labfiles\Lab03A\Starter folder. List them in descending order of priority.
You can use SQL Server Management Studio to connect to the SQL Server database engine instances
described in the interviews, and you can use Excel to view the sample accounts data.
You can create a database diagram for each source database in SQL Server Management Studio―this
can be a useful way to familiarize yourself with the source database schemas.
Query the tables in the source databases to view the data they contain. If desired, you can export data
from source databases to comma-delimited text files and explore them further in Excel.
If you want to profile any of the data, you can use the Data Profiling task in SQL Server Integration
Services to do so.
As you examine the data sources, note the potential measures, dimensions, and dimension attributes
you discover; as well as any potential quality or consistency issues in the data.
Results: At the end of this exercise, you will have created a matrix of business processes and dimensions.
Identify an appropriate grain to use in the dimensional model for this business process.
MCT USE ONLY. STUDENT USE PROHIBITED
3-20 Designing a Data Warehouse
Create a sun diagram with the measures in appropriate fact tables with associated dimensions. You
can use any diagramming tool you want to, for example Visio, PowerPoint, Paint, or pen and paper.
Add dimension attributes and hierarchies to the sun diagram based on the data attributes you
identified when examining the data and the analytical and reporting requirements gathered from the
interviews.
As time permits, create sun diagrams for the remaining business processes in descending order of
priority.
You can use Visio, PowerPoint, Paint, the SQL Server Management Studio table designer, or pen and
paper to create your design.
Results: At the end of this exercise, you will have a sun diagram showing the facts, measures,
dimensions, attributes, and hierarchies you have identified, and a database schema diagram showing
your design for dimension and fact tables.
Lesson 4
Designing a Data Warehouse Physical Implementation
After designing the logical schema for the data warehouse, you need to implement it as a physical
database. This requires careful planning for file placement, data structures such as partitions and indexes,
and compression. This lesson discusses considerations for all of these aspects of the physical database
design.
Lesson Objectives
After completing this lesson, you will be able to:
ETL
ETL processes affect the data warehouse when
they load new or updated data into the data warehouse tables. In most cases, the inserts are performed as
bulk load operations to minimize logging and constraint checking. The load process may involve some
lookup operations to find alternate keys for slowly changing dimensions, and some update operations for
type 1 dimension changes or data modifications in fact tables where appropriate. Depending on the
design of the data structures, ETL load operations might also involve dropping and rebuilding indexes and
splitting partitions.
Data models
After each new load, any data models based on the data warehouse must be processed. This involves
reading data from the data warehouse tables into the data model and pre-aggregating measures to
optimize analysis queries. Depending on the size of the data warehouse and the time window for the
processing operation, the entire data model may be completely processed after each load, or an
incremental processing approach may be used in which only new or modified data is processed.
MCT USE ONLY. STUDENT USE PROHIBITED
3-22 Designing a Data Warehouse
Because of the volume of data being loaded into the model, the I/O activity typically involves sequential
table scans to read entire tables―particularly when performing a full process of the data model.
Reports
In some scenarios, all reporting is performed against the data models, so reporting does not affect the
data warehouse tables. However, it is common for some reports to query the data warehouse directly. In
scenarios where IT-provided reports are supported, the queries are generally predictable and retrieve
many rows with range-based query filters―often on a date field.
User queries
If self-service reporting is supported, users may be able to execute queries in the data warehouse (or use
tools that generate queries on their behalf). Depending on the query expertise of the users, this can result
in complex, unpredictable queries.
Disable autogrowth. If you begin to run out of space in a data file, it is more efficient to explicitly
increase the file size by a large amount than to rely on incremental autogrowth.
Because the logical disks for the database files are typically already configured as RAID 10 or RAID 5
arrays, you generally do not need to use filegroups to distribute tables across physical disk platters in
order to improve I/O performance. However, you should consider the following guidance for using
filegroups in a data warehouse:
Create at least one filegroup in addition to the primary filegroup, and then set it as the default
filegroup so you can separate data tables from system tables.
Consider creating dedicated filegroups for extremely large fact tables and using them to place those
fact tables on their own logical disks.
If some tables in the data warehouse are loaded on a different schedule from others, consider using
filegroups to separate the tables into groups that can be backed up independently.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-23
If you intend to partition a large fact table, create a filegroup for each partition so that older, stable
rows can be backup up and then set as read-only.
Staging tables
Most data warehouses require staging tables to support incremental data loads from the ETL process. In
some cases, you might use a separate staging database as well as staging tables in the data warehouse
itself. Consider the following recommendations for staging tables.
If a separate staging database is to be used, create it on a separate logical disk from the data
warehouse files.
If the data warehouse will include staging tables, create a file and filegroup for the staging tables and
place them on a separate logical disk from the fact and dimension tables.
An exception to the previous guideline is made for staging tables that will be switched with partitions
to perform fast loads. These must be created on the same filegroup as the partition with which they
will be switched.
TempDB
TempDB is used for temporary objects required for query processing. To avoid fragmentation of data files,
place it on a dedicated logical disk and set its initial size based on how much it is likely to be used. You
can leave autogrowth enabled, but set the growth increment to be quite large to ensure that performance
is not interrupted by frequent growth of TempDB. Additionally, consider creating multiple files for
TempDB to help minimize contention during page free space (PFS) scans as temporary objects are created
and dropped.
Transaction logs
Generally, the transaction mode of the data warehouse, staging database, and TempDB should be set to
Simple to avoid having to truncate transaction logs. Additionally, most of the inserts in a data warehouse
are typically performed as bulk load operations which are not logged. To avoid disk resource conflicts
between data warehouse I/O and logging, place the transaction log files for all databases on a dedicated
logical disk.
Backup files
You will need to implement a backup routine for the data warehouse, and potentially for a staging
database. In most cases, you will back up these databases to disk, so allocate a logical disk for this
purpose. You could allocate multiple logical disks and perform a mirrored backup, but because the disks
are already configured as RAID 5 or RAID 10 arrays, this would be of little benefit from a performance
perspective. Note that the backup files should be copied to offsite storage to provide protection in the
case of a complete storage hardware failure or natural disaster.
MCT USE ONLY. STUDENT USE PROHIBITED
3-24 Designing a Data Warehouse
Table Partitioning
Partitioning a table distributes data across
partitions based on a partition function that
defines a range of values for each partition. A
partition scheme maps the partitions to filegroups,
and the table is partitioned by applying the
partition scheme to the values in a specified
column.
More granular manageability. When you partition a large table, you can perform some
maintenance operations at the partition level instead of on the whole table. For example, indexes can
be created and rebuilt on a per-partition basis, compression can be applied to individual partitions,
and by mapping partitions to filegroups, you can back up and restore partitions independently. This
enables you to back up older data once and then configure the backed up partitions as read-only.
Future backups can be limited to the partitions that contain new or updated data.
Improved data load performance. The biggest benefit of using partitioning in a data warehouse is
that it enables you to load many rows very quickly by switching a staging table with a partition. This
technique dramatically reduces the time taken by ETL data loads, and with the right planning, it can
be achieved with minimal requirements to drop or rebuild indexes.
Partition large fact tables. Fact tables of around 50 GB or more should generally be partitioned for
the reasons described earlier. In general, fact tables benefit from partitioning more than dimension
tables.
Partition on an incrementing date key. When defining a partition scheme for a fact table, use a
date key that reflects the age of the data as it is incrementally loaded by the ETL process. For
example, if a fact table contains sales order data, partitioning on the order date ensures that the most
recent orders are in the last partition and the earliest orders are in the first partition.
Design the partition scheme for ETL and manageability. In a data warehouse, the query
performance gains realized by partitioning are small compared to the manageability and data load
performance benefits. Ideally, your partitions should reflect the ETL load frequency (monthly, weekly,
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-25
daily, and so on) because this simplifies the load process. However, you may want to merge partitions
periodically to reduce the overall number of partitions (for example, at the start of each year, you
could merge the monthly partitions for the previous year into a single partition for the whole year).
Maintain an empty partition at the start and end of the table. You can use an empty partition at
the end of the table to simplify the loading of new rows. When a new set of fact table rows must be
loaded, you can place them in a staging table, split the empty partition (to create two empty
partitions), and then switch the staged data with the first empty partition (which loads the data into
the table and leaves the second empty partition you created at the end of the table, ready for the
next load). You can use a similar technique to archive or delete obsolete data at the beginning of the
table.
Note: Partitioning is only available in SQL Server Enterprise edition. In previous releases of
SQL Server Enterprise edition, the number of partitions per table was limited to 1,000. In SQL
Server 2012, this limit has been extended to 15,000. On 32-bit systems, you can create a table or
index with over 1,000 partitions, but this is not supported.
Demonstration Steps
Create a Partitioned Table
1. Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
2. Start SQL Server Management Studio and connect to the localhost instance of the database engine
by using Windows authentication.
4. Select the code under the comment Create a database, and then click Execute. This creates a
database for the demonstration.
5. Select the code under the comment Create filegroups, and then click Execute. This creates four
filegroups in the demo database.
6. Select the code under the comment Create partition function and scheme, and then click Execute.
This creates a partition function that defines four ranges of values (less than 20000101, 20000101 to
20010100, 20010101 to 20020100, and 20020101 and higher), and a partition scheme that maps
these ranges to the FG0000, FG2000, FG2001, and FG2002 filegroups.
7. Select the code under the comment Insert data into the partitioned table, and then click Execute.
This inserts four records into the table.
1. Select the code under the comment Query the table, and then click Execute. This retrieves rows
from the table and uses the $PARTITION function to show which partition the datekey value in each
row is assigned to. This function is useful for determining which partition of a partition function a
specific value belongs in.
2. Select the code under the comment View filegroups, partitions, and rows, and then click Execute.
This code uses system tables to show the partitioned storage and the number of rows in each
MCT USE ONLY. STUDENT USE PROHIBITED
3-26 Designing a Data Warehouse
partition. Note that there are two empty partitions; one at the beginning of the table, and one at the
end.
Split a Partition
1. Select the code under the comment Add a new filegroup and make it the next used, and then
click Execute. This creates a new filegroup named FG2003 and adds it to the partition scheme as the
next used partition.
2. Select the code under the comment Split the empty partition at the end, and then click Execute.
This creates a new partition for values of 20030101 and higher and assigns it to the next used
filegroup (FG2003), leaving an empty partition for values between 20020101 and 20030100.
3. Select the code under the comment Insert new data, and then click Execute. This inserts two new
rows into the partitioned table.
4. Select the code under the comment View partition metadata, and then click Execute. This shows
that the two rows inserted in the previous step are in partition 4, and that partition 5 (on FG2003) is
empty.
Merge Partitions
1. Select the code under the comment Merge the 2000 and 2001 partitions, and then click Execute.
This merges the partition that contains the value 20010101 into the previous partition.
2. Select the code under the comment View partition metadata, and then click Execute. This shows
that partition 2 (on FG2000) now contains four rows, and that the partition that was previously on
FG2001 has been removed.
inserts and updates, and index maintenance. However, in most data warehouse scenarios, you should
consider the guidelines in this topic as a starting point for index design.
Create a clustered index on the surrogate key column. This column is used to join the dimension table
to fact tables, and a clustered index will help the query optimizer minimize the number of reads
required to filter fact rows.
Create a non-clustered index on the alternate key column and include the SCD current flag, start
date, and end date columns. This index will improve the performance of lookup operations during
ETL data loads that need to handle slowly changing dimensions.
Create non-clustered indexes on frequently searched attributes, and consider including all members
of a hierarchy in a single index.
Columnstore indexes
SQL Server 2012 introduces columnstore indexes, an in-memory indexing solution that uses xVelocity
compression technology to organize index data in a column-based format instead of the row-based
format used by traditional indexes. Columnstore indexes are specifically designed to improve the
performance of queries against large fact tables joined to smaller dimension tables in a star schema, and
can dramatically improve the performance of most data warehouse queries. In many cases, you can
achieve the same performance improvements or better by replacing the recommended fact table indexes
described previously with a single columnstore index that includes all of the columns in the fact table.
There are some queries that do not benefit from columnstore indexes (for example, queries that return an
individual row from a dimension table will generally perform better by using a conventional clustered or
non-clustered index), but for most typical data warehouse queries that aggregate many fact rows by one
or more dimension attributes, columnstore indexes can be very effective.
Note: For more information about columnstore indexes, see “Columnstore Indexes” in SQL
Server Books Online.
A consideration for using columnstore indexes is that a table with a columnstore index defined on any of
its columns is read-only. For most fact tables, this restriction does not affect user query workloads,
because data warehouses are designed to support reporting and analytical queries, not transaction
processing. However, when the ETL process needs to load new data or update existing fact rows, a
columnstore index must be dropped and recreated. For an unpartitioned fact table, the overhead of
recreating the columnstore index after each data load can be significant. However, if the table is
partitioned, you can use the ability to switch partitions and staging tables to load or update data without
dropping the columnstore index.
Note: Techniques for loading partitioned fact tables with columnstore indexes are
discussed in Module 4: Designing an ETL Solution.
MCT USE ONLY. STUDENT USE PROHIBITED
3-28 Designing a Data Warehouse
Demonstration Steps
Create Indexes on Dimension Tables
1. Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
2. Start SQL Server Management Studio and connect to the localhost instance of the database engine
by using Windows authentication.
4. Select the code under the comment Create the data warehouse, and then click Execute. This
creates a database for the demonstration.
5. Select the code under the comment Create the DimDate dimension table, and then click Execute.
This creates a time dimension table named DimDate.
6. Select the code under the comment Populate DimDate with values from 2 years ago until the end of
this month, and then click Execute. This adds rows to the DimDate table.
7. Select the code under the comment Create indexes on the DimDate table, and then click Execute.
This creates a clustered index on the surrogate key column, and non-clustered indexes on commonly
queried attribute columns.
8. Select the code under the comment Create the DimCustomer table, and then click Execute. This
creates a dimension table named DimCustomer and inserts some customer data.
9. Select the code under the comment Create indexes on the DimCustomer table, and then click
Execute. This creates a clustered index on the surrogate key column, and non-clustered indexes on
commonly queried attribute columns.
10. Select the code under the comment Create the DimProduct table, and then click Execute. This
creates a dimension table named DimProduct and inserts some product data.
11. Select the code under the comment Create indexes on the DimProduct table, and then click
Execute. This creates a clustered index on the surrogate key column, and non-clustered indexes on a
commonly queried attribute column.
1. Select the code under the comment Create a fact table, and then click Execute. This creates a fact
table named FactOrder that contains more than 7.5 million rows from the existing data in the
dimension tables.
3. Select the code under the comment View index usage and execution statistics, and then click
Execute. This enables statistics messages and queries the tables in the data warehouse to view orders
for the previous six months.
4. After query execution completes, in the results pane, click the Messages tab. Note the logical reads
from each table―the number from the FactOrder table should be considerably higher than the
dimension tables; and note the CU time and elapsed time for the query.
5. Click the Execution Plan tab, which shows a visualization of the steps the query optimizer used to
execute the query. Scroll to the right and to the bottom, and note that a table scan was used to read
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-29
data from the FactOrder table. Then hold the mouse pointer over each of the Index Scan icons for
the dimension tables to see which indexes were used.
6. Execute the selected code again and compare the results when the data is cached.
1. Select the code under the comment Create traditional indexes on the fact table, and then click
Execute. This creates a clustered index on the date dimension key, and non-clustered indexes on the
other dimension keys (the operation can take a long time).
2. Select the code under the comment Empty the cache, and then click Execute. This clears any cached
data.
3. Select the code under the comment Test the traditional indexes, and then click Execute. This
executes the same query as earlier.
4. Click the Messages tab and compare the number of logical reads for the FactOrders table and the
CPU and elapsed time values with the previous execution. They should all be lower.
5. Click the Execution Plan tab and note that the clustered index on the date key in the fact table was
used.
6. Execute the selected code again and compare the results when the data is cached.
2. Select the code under the comment Create a columnstore index on the copied table, and then
click Execute. This creates a columnstore index on all columns in the FactOrderCS table.
3. Select the code under the comment Empty the cache again, and then click Execute. This clears any
cached data.
4. Select the code under the comment Test the columnstore index, and then click Execute. This
executes the same query as earlier.
5. Click the Messages tab and compare the number of logical reads for the FactOrdersCS table and the
CPU and elapsed time values with the previous execution. They should all be lower.
6. Click the Execution Plan tab and note that the columnstore index on the fact table was used.
7. Execute the selected code again and compare the results when the data is cached.
Data Compression
SQL Server 2012 Enterprise edition supports data
compression at both page and row level. Row
compression stores all fields in a variable width
format and reduces the number of bytes used to
store each field if possible. Page compression
applies the same compression technique to rows
on a page and also identifies redundant values
and stores them only once per page. You can
apply compression to a table, and index, or a
partition.
MCT USE ONLY. STUDENT USE PROHIBITED
3-30 Designing a Data Warehouse
Reduced storage requirements. Although results vary, on average, most data warehouses can be
compressed at a ratio of 3.5 : 1, reducing the amount of disk space required to host the data files by
more than two thirds.
Improved query performance. Compression can improve query performance in two ways. First,
fewer pages must be read from disk, so I/O is reduced; and second, more data can be stored on a
page, and therefore cached.
When page or row compression is used, data must be compressed and decompressed by the CPU, so the
performance gains resulting from compression must be balanced by the increase in CPU workload.
However, in most adequately specified data warehouse servers, the additional workload on CPU is
minimal compared to the benefits gained by compressing the data.
Demonstration Steps
Create Uncompressed Tables and Indexes
1. Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
2. Use Windows Explorer to view the contents of the D:\Demofiles\Mod03 folder, and set the folder
window to Details view and resize it if necessary so that you can see the Size column.
3. Start SQL Server Management Studio and connect to the localhost instance of the database engine
by using Windows authentication.
5. Select the code under the comment Create the data warehouse (from line2 to line 113 in the script),
and then click Execute. This creates a database with uncompressed tables.
6. While the script is still executing, view the contents of the D:\Demofiles\Mod03 folder and note the
increasing size of DemoDW.mdf. This is the data file for the database.
Note: The log file (DemoDW.ldf) will also be growing, but you can ignore this.
7. When execution is complete (after approximately 3 minutes), view the final size of DemoDW.mdf and
return to SQL Server Management Studio.
1. Select the code under the comment Estimate size saving (line 119 in the script), and then click
Execute. This uses the sp_estimate_data_compression_savings system stored procedure to
compress a sample of the FactOrder table (which consists of a clustered index and two non-clustered
indexes).
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-31
2. View the results returned by the stored procedure, noting the current size and estimated compressed
size of each index.
1. Select the code under the comment Create a compressed version of the data warehouse (from
line125 to line 250 in the script), and then click Execute. This creates a database with compressed
tables and indexes.
2. While the script is still executing, view the contents of the D:\Demofiles\Mod03 folder and note the
increasing size of CompressedDemoDW.mdf. This is the data file for the database.
Note: The log file (CompressedDemoDW.ldf) will also be growing, but you can ignore this.
3. When execution is complete (after approximately 3 minutes), compare the final size of
CompressedDemoDW.mdf with DemoDW.mdf (the file for the compressed database should be
smaller) and return to SQL Server Management Studio.
1. Select the code under the comment Compare query performance (from line255 to line 277 in the
script), and then click Execute. This executes an identical query in the compressed and uncompressed
databases and displays execution statistics.
2. When execution is complete, click the Messages tab and compare the statistics for the two queries.
The execution time statistics (the second and third set of figures labeled “SQL Server Execution Time”)
should be similar, and the second query (in the compressed database) should have used considerably
less logical reads for each table than the first.
Do not include metadata columns in views. Some columns are used for ETL operations or other
administrative tasks, and can be omitted from views that will be consumed by business users. For
example, SCD current flag, start date, and end date columns may not be required for end user
reporting or data models, so you can create views that do not include them.
MCT USE ONLY. STUDENT USE PROHIBITED
3-32 Designing a Data Warehouse
Create views to combine snowflake dimension tables. If you have included snowflake dimensions in
your dimensional model, create a view for each set of related dimension tables to create a single
logical dimension table.
Partition-align indexed views. SQL Server supports indexed views, which can be partitioned using the
same partition scheme as the underlying table. If you use indexed views, you should partition-align
them to support partition switching that does not require the indexes on the views to be dropped
and recreated.
Use the SCHEMABINDING option. This ensures that the underlying tables cannot be dropped or
modified in such a way as to invalidate the view unless the view itself is dropped first. The
SCHEMABINDING option is a requirement for indexes views.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-33
Objectives
After completing this lab, you will be able to:
Review your database schema design from the previous lab. If you did not complete the previous lab,
review DW Schema.vsdx in the D:\Labfiles\Lab03B\Starter folder. Note any fact tables that are likely to
become very large.
Discuss and agree a storage plan that uses some or all of the available drives and includes
considerations for all aspects of the data warehouse.
MCT USE ONLY. STUDENT USE PROHIBITED
3-34 Designing a Data Warehouse
You can refer to the information in the “Considerations for Database Files” topic in the “Designing a
Data Warehouse Physical Implementation” lesson.
Document your planned usage of the logical drives in the AWDataWarehouse.docx document in the
D:\Labfiles\Lab03B\Starter folder.
Results: At the end of this exercise, you should have a document that contains a table describing your
planned usage for each logical volume of the data warehouse server.
1. Plan Partitioning.
2. Plan Indexes.
3. Plan Compression.
4. Plan Views.
If you plan to partition any tables, decide which column you will partition the table on, and the range
of data values to be allocated to each partition.
If you want to experiment with your partitioned table design, create a test database in the localhost
instance of SQL Server.
Note: SQL Server Books Online is installed locally on the virtual machine. You can use this to review
documentation about partitioned tables and indexes.
Document your partitioning plan in the AWDataWarehouse.docx document you used in the previous
exercise, including justification for your decisions.
If you plan to include any indexes, decide the columns to be indexed in each table, and the types of
index to be used.
If you want to experiment with indexes, create a test database in the localhost instance of SQL
Server.
Note: SQL Server Books Online is installed locally on the virtual machine. You can use this to review
documentation about indexes.
Document your index plan in the AWDataWarehouse.docx document you used in the previous
exercise, including justification for your decisions.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 3-35
If you plan to use compression, decide which tables and indexes you will compress, and the type of
compression to be used.
If you want to experiment with compression, create a test database in the localhost instance of SQL
Server.
Note: SQL Server Books Online is installed locally on the virtual machine. You can use this to review
documentation about compression.
Document your index plan in the AWDataWarehouse.docx document you used in the previous
exercise, including justification for your decisions.
If you want to experiment with views, create a test database in the localhost instance of SQL Server.
Note: SQL Server Books Online is installed locally on the virtual machine. You can use this to review
documentation about views.
Document your index plan in the AWDataWarehouse.docx document you used in the previous
exercise, including justification for your decisions.
Results: At the end of this exercise, you will have a document that contains information about your
plans for partitions, indexes, compression, and views in the data warehouse.
Question: Like in the previous lab, there is no definitive correct solution, but a sample
solution has been provided. To view the sample data warehouse implementation, run Setup
Solution.cmd in the D:\Labfiles\Lab03B\Solution folder as Administrator. Then, after the
script finishes running, use SQL Server Management Studio to connect to the MIA-SQLDW
instance of the database engine and examine the AWDataWarehouse database. You can use
the AWDataWarehouse.docx document in the D:\Labfiles\Lab03B\Solution folder as a guide
to the key features of the solution.
After spending some time reviewing the solution, what are the key aspects of the
implementation that differ from your design in the lab, and how else might you have
designed the solution?
MCT USE ONLY. STUDENT USE PROHIBITED
3-36 Designing a Data Warehouse
Module 4
Designing an ETL Solution
Contents:
Module Overview 4-1
Module Overview
The extract, transform, and load (ETL) element of the business intelligence (BI) solution is what makes it
possible to provide up-to-date analytical and reporting data in the data warehouse. Although some ETL
processes can be simple, many BI professionals find the design and implementation of effective ETL
solutions to be the most challenging aspect of creating a BI solution.
This module describes some general considerations for designing an ETL solution, and then it discusses
specific considerations for planning data extraction, transformation, and load processes.
Objectives
After completing this module, you will be able to:
Lesson 1
ETL Overview
When planning an ETL solution, your first step is to gain a high level understanding of the data flows that
must be performed to copy data from source systems to the tables in the data warehouse. This lesson
introduces commonly used ETL architectures that you should consider, and some useful techniques for
planning and documenting data flows that will help you design and maintain your ETL solution.
Lesson Objectives
After completing this lesson, you will be able to:
ETL in a BI Project
ETL design is closely related to data warehouse
design. Indeed, many data warehouse design
decisions, such as metadata columns for slowly
changing dimensions, indexes, and partitioning,
are made with ETL processes in mind. As with all
aspects of a BI solution, the ETL design is driven by
business requirements. These requirements dictate
constraints for the ETL design, including:
Whether the ETL solution must support updates to existing fact records.
How long data must be stored in the data warehouse for analysis and reporting before being deleted
or archived.
In addition to the business requirements, when planning an ETL solution, you need to consider the
following:
What should be done with extracted rows that fail data validation requirements?
How will exceptions in the data flow be handled, logged, and communicated?
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 4-3
How frequently data is generated in source systems, and how long it is retained.
Suitable times to extract source data while minimizing the impact on performance for users.
Single-stage ETL
In a very small BI solution with few data sources and simple data requirements, it may be possible to copy
data from data sources to the data warehouse in a single data flow. Basic data validations (such as
checking for NULL fields or specific value ranges) and transformations (such as concatenating multiple
fields into a single field, or looking up a value from a key) can either be performed during extraction (for
example, in the Transact-SQL statement used to retrieve data from a source database) or in-flight (for
example, by using transformation components in an SSIS data flow task).
Two-stage ETL
In many cases, a single-stage ETL solution is not suitable because of the complexity or volume of data
being transferred. Additionally, if multiple data sources are used, it is common to synchronize loads of the
data into the data warehouse to ensure consistency and integrity across fact and dimension data from
different sources, and to minimize the performance impact of the load operations on data warehouse
activity. If the data is not ready to extract from all systems at the same time, or if some sources are only
available at specific times when others are not available, a common approach is to stage the data in an
interim location before loading it into the data warehouse.
Typically, the structure of the data in the staging area is similar to the source tables, which minimizes the
extraction query complexity and duration in the source systems. When all source data is staged, it can
then be conformed to the data warehouse schema during the load operation―either as it is extracted
from the staging tables or during the data flow to the data warehouse.
Staging the data also provides a recovery point for data load failures and enables you to retain extracted
data for audit and verification purposes.
Three-stage ETL
A two-stage data flow architecture can reduce the extraction overhead and source systems and enable a
coordinated load of data from multiple sources. However, performing validation and transformations
during the data flow into the data warehouse can affect load performance, and cause the load to
negatively affect data warehouse activity. When large volumes of data must be loaded into the data
warehouse, it is important to minimize load times by preparing the data as much as possible before
performing the load operation.
For BI solutions that involve loading large volumes of data, a three-stage ETL process is recommended. In
this data flow architecture, the data is initially extracted to tables that closely match the source system
schemas―often referred to as a “landing zone.” From here, the data is validated and transformed as it is
MCT USE ONLY. STUDENT USE PROHIBITED
4-4 Designing an ETL Solution
loaded into staging tables that more closely resemble the target data warehouse tables. Finally, the
conformed and validated data can be loaded into the data warehouse tables.
Like high-level data flow diagrams, many BI professionals have adopted different variations of source-to-
target mapping. If the organization in which you are working does not have a standard format for this
kind of documentation, the important thing is to use a consistent format that is helpful during ETL design
and easy to understand for anyone who needs to troubleshoot or maintain the ETL system in the future.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 4-5
Lesson 2
Planning Data Extraction
The first stage in any ETL process is to extract data from source systems. This lesson describes some
important considerations for planning and designing data extraction processes.
Lesson Objectives
After completing this lesson, you will be able to:
What data types and formats are used in each source system?
Next, examine the data sources to determine data type compatibility with the target tables in the data
warehouse. In many cases, you will need to use transformations to cast or convert source data types into
compatible target data types. Common data type issues include:
o Numeric data that is stored in text format in source systems―for example, a text file exported
from an accounts system is likely to contain numeric values as text fields.
o Numeric data that is the wrong numeric type―for example, a decimal column in a source table
that must be mapped to an integer column in the target table.
o Numeric data that is in the right data type but the wrong size―for example, an integer in a
source system that must be mapped to a tinyint column in the target table.
o Variations in date and time formats―for example, a datetime column in the source that must be
mapped to a date column in the target table.
MCT USE ONLY. STUDENT USE PROHIBITED
4-6 Designing an ETL Solution
o Text data that may need to be truncated―for example, a source nvarchar(100) column mapped
to a target nvarchar(50) column.
o Variations in data encoding―a simple example is an ASCII source column mapped to a Unicode
target column (such as a varchar field mapped to an nvarchar field). More complex issues can
arise if, for example, mainframe data in EBCDIC or packed decimal encoding must be extracted.
What data integrity and validation issues exist in the source data?
After documenting the source and target data types, examine the source data to identify any data
integrity issues. Common issues include:
o Columns with a high proportion of null values. You need to decide how to handle null values
in the data flow, by ignoring them, using ISNULL to assign an alternative value (such as
“unknown” or -1), or by redirecting them to an interim table or for assessment and cleansing.
o Lookup columns with missing values. In most relational databases, referential integrity is
enforced between foreign keys and their corresponding primary key columns. However, you
cannot always rely on this, and you should try to identify cases where a lookup value in one table
does not have a matching row in the lookup table. If such rows exist, you may need to use an
OUTER join combined with an ISNULL expression to extract all rows.
o Columns with a specific range of valid values. For example, a Gender column might be
defined with a char(1) data type. All non-null values in this column should be either M or F. You
should examine the data source to determine whether this validation rule is enforced; and if it
isn’t, you should decide how you will find and handle invalid rows in the data flow.
o Data quality issues. Spend time with a business user who understands the data in context, and
identify potential data quality issues such as mistyped or misspelled values in free-form text entry
fields, or multiple values that mean the same thing.
Use SQL Server Management Studio to examine table metadata and data in SQL Server data sources.
Extract a sample of data to a text file by using the Import and Export Wizard or an SSIS package, and
examine the data in Excel.
Use the Data Profiling task in an SSIS package to gather statistics about the source data.
If the source data has potential data quality issues, consider using Data Quality Services to identify and
resolve them.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 4-7
Another factor in planning data extraction timings is the requirement for the data warehouse to be kept
up to date with changes in the source systems. If real-time (or near real-time) reporting must be
supported, data must be extracted and loaded into the data warehouse as soon as possible after each
change. Alternatively, if all reporting and analysis is historical, you may be able to leave a significant
period of time (for example, a month) between data warehouse loads. However, note that you do not
need to match data extractions one-to-one with data loads. If less overhead is created in the data source
by a nightly extraction of the day’s changes than a monthly extraction, you might choose to stage the
data nightly, and then load it into the data warehouse in one load operation at the end of the month.
Perform a test extraction and note the time taken to extract a specific number of rows. Then, based on
how many new and modified rows are created in a particular time period, estimate the time an extraction
would take if performed hourly, daily, weekly, or at any other interval that makes sense based on your
answers to the first two questions.
During what time periods are source systems least heavily used?
Some data sources may be available only during specific periods, and others might be too heavily used
during business hours to support the additional overhead of an extraction process. You must work closely
with the administrators and users of the data sources to identify the ideal data extraction time periods for
each source.
After you consider these questions for all source systems, you can start to plan extraction windows for the
data. Note that it is common to have multiple sources with different extraction windows, so that the
elapsed time to stage all of the data might be several hours or even days.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 4-9
Lesson 3
Planning Data Transformation
Most ETL processes require data transformations to conform source data to target table schemas. This
lesson describes some important considerations for planning and designing data transformations.
Lesson Objectives
After completing this lesson, you will be able to:
Describe considerations for choosing where in the data flow to perform transformations.
Performing transformations on
extraction
If the data sources support it, you can perform
transformations in the queries used to extract
data. For example, in a SQL Server data source,
you can use joins, ISNULL expressions, CAST and
CONVERT expressions, and concatenation
expressions in the SELECT query used to extract the data. In an enterprise BI solution, this technique can
be used during the following extractions:
Extraction from the source system.
Staging tables.
Minimize the extraction workload on source systems. This enables you to extract the data in the
shortest time possible with minimal adverse effect on business processes and applications that use the
data source.
Perform validations and transformations in the data flow as soon as possible. This enables you to
remove or redirect invalid rows and unnecessary columns early in the extraction process and reduce
the amount of data being transferred across the network.
Minimize the time it takes to load the data warehouse tables. This enables you to get the new data
into production as soon as possible and perform the load with minimal adverse effect on data
warehouse users.
Data type Use the CAST or Use the Data Conversion transformation.
conversion CONVERT function.
Replacing NULL Use the ISNULL Use the Derived Column transformation with an
values function. expression containing the ReplaceNull function.
Looking up Use an OUTER JOIN, Use the Lookup transformation with the Ignore
related values and optionally use failure option, and then add a transformation later in
where referential ISNULL to replace the data flow to handle null values (either by replacing
integrity is not null values where them with a Derived Column or redirecting them
enforced no matching rows with a Condition Split).
exist. Alternatively, use the Redirect rows to no match
output option and handle the nulls before using a
Merge transformation to return the fixed rows to the
main data flow.
Note: Some people who are not familiar with SSIS make the erroneous assumption that the
data flow processes rows sequentially, and that transformations in a data flow are inherently
slower than set-based transformations performed with Transact-SQL. However, the SSIS pipeline
performs set-based operations on buffered batches of rows, and it is designed to provide high
performance transformation in data flows.
Using the Conditional Split transformation that uses expressions to validate column values, and
redirects rows to multiple outputs based on the results of the expression evaluations.
Using the No Match Output of a Lookup transformation to redirect rows for which there is no
matching value in a related table.
Using the Error Output of a source, transformation, or destination to redirect rows that cause data
flow errors or that will be truncated.
Handling errors
In addition to planning to redirect invalid rows, you should plan a consistent error handling and logging
solution to make it easier to troubleshoot problems with the ETL system. If you plan to deploy the SSIS
packages in project mode, the SSIS Catalog provides detailed event logging that can be used to
troubleshoot errors. However, you may want to consider creating a custom solution over which you have
greater control. A common approach is to create a table for generic error event information, and use
event handlers to log details of the package being executed, the date and time of the error, the
component that caused the exception, and any other useful diagnostic information. Additionally, you
should consider creating an error table for each target table in the ETL process, and log table-specific data
there.
Lesson 4
Planning Data Loads
A key challenge in loading a data warehouse is minimizing the time it takes to load a large volume of
data. Data warehouse loads often involve tens or even hundreds of thousands of rows, and although
many organizations can support loading during periods of low use or inactivity, the load operation must
still be optimized to complete within the available load window.
This lesson describes considerations and techniques to help you plan an efficient data load process.
Lesson Objectives
After completing this lesson, you will be able to:
Minimizing Logging
One way in which load times can be reduced is to
minimize database transaction logging during the
load operations. SQL Server uses a write-ahead
transaction log to log transactional activity in the
database, and this logging overhead can affect
load performance. However, you can use
minimally logged operations, in which only extent
allocations and metadata changes are logged, to
reduce the adverse impact of logging. In most
cases, using the TABLOCK query hint causes the
database engine to use minimal logging if it is
supported by the operation being performed and
the destination table.
Use a SQL Server Destination component when both the source data and the SSIS service are hosted
on the same server as the destination tables. This destination provides the fastest bulk load
performance for SQL Server and supports bulk load options to fine-tune load behavior. You can use
this destination only if the SSIS service is running on the same server as the data warehouse into
which the data is being loaded.
If SSIS is hosted on a different computer from the data warehouse, use an OLE DB Destination
component. This destination supports bulk load, though some additional configuration may be
required to support bulk loading of ordered data into a clustered index. Ordered loads into clustered
indexes are discussed in the next topic.
Additionally, when combined with the OUTPUT clause, you can use the $action metadata column to
detect updates, and implement type 2 changes, as shown in this code example.
Additional Reading: For more information about non-logged and minimally logged
operations, see “The Data Loading Performance Guide” at http://msdn.microsoft.com/en-
us/library/dd425070(v=sql.100).
MCT USE ONLY. STUDENT USE PROHIBITED
4-16 Designing an ETL Solution
Additional Reading: For more information about when to drop an index to optimize a
data load, see “Guidelines for Optimizing Bulk Import” at http://msdn.microsoft.com/en-
us/library/ms177445.aspx. This article is the source of the information in the preceding table.
Sort data by the clustering key and specify the ORDER hint
When using the BULK INSERT statement, if the table you are loading has a clustered index that you do not
intend to drop, and the data to be loaded is already sorted by the clustering column, specify the ORDER
hint, as shown in the following example. This eliminates the internal sort operation that usually occurs
when inserting data into a clustered index.
If you are using the INSERT … SELECT statement, you cannot specify the ORDER hint, but the database
engine detects that the data is ordered by the clustering key and optimizes the insert operation
accordingly. If you are using an SSIS SQL Server destination, you can specify bulk load options, including
order columns on the Advanced tab of the SQL Destination Editor dialog box. For OLE DB destinations,
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 4-17
you must view the Component Properties tab of the Advanced Editor for the destination, and add the
ORDER(ColumnName) hint to the FastLoadOptions property.
Alternatively, consider partitioning the table and using the partition switch technique described in the
next topic to load new data.
Has a check constraint that uses the same criteria as the partition function.
To use this technique to load new data into a partition, maintain an empty partition at the end of the
table. The lower bound of the partition range for the empty partition should be the date key value for the
next set of data to be loaded. The basic technique to load new data into a partition uses the following
procedure:
1. If each partition is stored on its own filegroup, add a filegroup to the database and set it as the next
used filegroup for the partition scheme.
2. Split the empty partition at the end of the table, specifying the key for the upper bound of the data to
be loaded. This should create an empty partition for the new data, and a second empty partition to
be maintained at the end of the table for the next load cycle.
MCT USE ONLY. STUDENT USE PROHIBITED
4-18 Designing an ETL Solution
3. Create a table on the same filegroup as the second to last, empty partition, with the same columns
and data types as the partitioned table. For fastest load performance, create this table as a heap (a
table with no indexes).
4. Bulk insert the staged data into the load table you created in the previous step.
5. Add a constraint that checks that the partitioning key column values are within the range of the
target partition to the load table.
6. Add indexes to the load table that match the indexes on the partitioned table.
This technique works best when the table is partitioned on a date key that reflects the data warehouse
load cycle, so each new load is performed into a new partition. However, it can also be used when
partitions do not match load intervals:
When partitions are based on more frequent intervals than load cycles (for example, each partition
holds a week’s worth of data, but the data is loaded monthly), you can switch multiple load tables
into multiple partitions.
When partitions are based on less frequent intervals than load cycles (for example, each partition
holds a month’s worth of data, but the data is loaded daily), you can:
o Create a new partition for the load and then merge it with the previous partition.
o Switch out a partially loaded partition, drop the indexes on the partially populated load table,
insert the new rows, recreate the indexes, and switch the partition back in. This technique can
also be used for late arriving facts (rows that belong in partitions that have previously been
loaded) and updates.
Additional Reading: For more information about loading partitioned fact tables, see
“Loading Bulk Data into a Partitioned Fact Table” at http://technet.microsoft.com/en-
us/library/cc966380.aspx.
Demonstration Steps
Split a Partition
1. Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
2. Start SQL Server Management Studio and connect to the localhost instance of the database engine
by using Windows authentication.
3. Open Partitions.sql from the D:\Demofiles\Mod04 folder.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 4-19
4. Select the code under the comment Create a database (from line 1 to line 45), and then click
Execute. This creates a database with a partitioned fact table, on which a columnstore index has been
created.
5. Select the code under the comment View partition metadata, and then click Execute. This shows
the partitions in the table with their starting and ending range values, and the number of rows they
contain. Note that the partitions are shown once for each index (or for the heap if no clustered index
exists). Note that the final partition (4) is for key values of 20020101 or higher and currently contains
no rows.
6. Select the code under the comment Add a new filegroup and make it the next used, and then
click Execute. This creates a filegroup, and configures the partition scheme to use it for the next
partition to be created.
7. Select the code under the comment Split the empty partition at the end, and then click Execute.
This splits the partition function to create a new partition for keys with the value 20030101 or higher.
8. Select the code under the comment View partition metadata again, and then click Execute. This
time the query is filtered to avoid including the same partition multiple times. Note that the table
now has two empty partitions (4 and 5).
2. Select the code under the comment Bulk load new data, and then click Execute. This inserts the
data to be loaded into the load table (in a real solution, this would typically be bulk loaded from
staging tables).
3. Select the code under the comment Add constraints and indexes, and then click Execute. This adds
a check constraint to the table that matches the partition function criteria, and a columnstore index
that matches the index on the partitioned table.
Switch a Partition
1. Select the code under the comment Switch the partition, and then click Execute. This switches the
load table with the partition on which the value 20020101 belongs. Note that the required partition
number is returned by the $PARTITION function.
2. Select the code under the comment Clean up and view partition metadata, and then click Execute.
This drops the load table and returns the metadata for the partitions. Note that partition 4 now
contains two rows―these are the rows that were inserted into the load table.
MCT USE ONLY. STUDENT USE PROHIBITED
4-20 Designing an ETL Solution
Objectives
After completing this lab, you will be able to:
Prepare for ETL design.
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to
Use Microsoft Visio to open the DW Schema.vsxd diagram in the D:\Labfiles\Lab04\Starter folder,
and then examine the Reseller Sales and Internet Sales dimensional models.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 4-21
Note that these diagrams indicate the columns in the dimension and fact tables, and the slowly
changing dimension (SCD) type for historical dimension attributes.
Use SQL Server Management Studio to examine the columns and data types in the following tables in
the AWDataWarehouse database in the MIA-SQLDW instance of the database engine:
o dbo.DimCustomer
o dbo.DimDate
o dbo.DimProduct
o dbo.DimPromotion
o dbo.DimReseller
o dbo.DimSalesperson
o dbo.DimSalesTerritory
o dbo.FactInternetSales
o dbo.FactResellerSales
o dbo.SalesOrderDetail
o dbo.Customer
o dbo.StateOrProvince
o dbo.Country
These tables provide the source data for the following tables in the data warehouse:
o dbo.FactInternetSales
o dbo.DimCustomer
Note: Total product cost for a sales order is calculated by multiplying the unit cost for each order line
item by the ordered quantity. Similarly, a sales amount is calculated by multiplying the unit price by the
quantity.
o dbo.SalesOrderHeader
o dbo.SalesOrderDetail
o dbo.Reseller
o dbo.BusinessType
o dbo.SalesEmployee
o dbo.SalesTerritory
o dbo.SalesRegion
o dbo.StateOrProvince
o dbo.Country
These tables provide the source data for the following tables in the data warehouse:
MCT USE ONLY. STUDENT USE PROHIBITED
4-22 Designing an ETL Solution
o dbo.FactResellerSales
o dbo.DimReseller
o dbo.DimSalesperson
o dbo.DimSalesTerritory
Note: Total cost and sales amount for reseller orders are calculated the same way as for Internet orders.
The sales territory for a sales order is determined by the sales territory where the reseller placing the order
is located, not by the sales territory assigned to the salesperson. Sales territories are often reassigned
between salespeople, but resellers stay within a single sales territory.
Explore the dbo.Promotions table in the Marketing database. This table provides the source data
for the DimPromotion table in the data warehouse.
Note: The MarketingPromotion column in the SalesOrderHeader table in the InternetSales database
contains the PromotionID value from this table when an order is placed in response to a promotion.
When no promotion is associated with the order, the MarketingPromotion column contains a NULL
value.
Explore the following views in the ProductsMDS database:
o mdm.Product
o mdm.ProductSubcategory
o mdm.ProductCategory
These views provide the source data for the DimProduct table in the data warehouse.
Note: This database represents a master data hub for the product data. This data is replicated to the
InternetSales and ProductSales databases, but the ProductsMDS database contains the master version
of the data.
Results: At the end of this exercise, you will have examined the data sources for the ETL process.
On the DimCustomer page, view the data flow for the DimCustomer table, noting the following
details:
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 4-23
o The data flow is shown from the Customer table in the InternetSales database to the
DimCustomer table (which is in the AWDataWarehouse database).
o The steps that need to be performed during the data flow are documented next to the data flow.
o Data from the StateOrProvince and Country tables is added to the data flow during lookup
steps.
o The details of the SCD columns are shown next to the relevant steps.
Review the other pages in the Visio document, noting the details documented for each data flow.
o SCD attributes.
Add another page to document the high-level data flow for the FactResellerSales table.
On the DimCustomer worksheet, scroll to the right to view the Data Warehouse section of the map,
and note that it contains the columns in the DimCustomer table. Each row documents a data flow
from a source column to a column in the DimCustomer table.
Scroll back to the left, and note that the Source section of the worksheet contains details of the
source fields that are extracted from tables in the InternetSales database.
Examine the Landing Zone section of the worksheet, and note that it contains details of the tables
that the source data is initially extracted to, together with any validation rules or transformations that
are applied during the extraction.
Examine the Staging section of the worksheet, and note that it contains details of the staging tables
that are created from the extracted data in the landing zone, together with any validation rules or
transformations that must be applied to the data.
Examine the FactInternetSales worksheet, and note that it documents the data flow for each column
in the FactInternetSales table.
Results: At the end of this exercise, you will have a Visio document that contains high-level data flow
diagrams and an Excel workbook that contains detailed source-to-target documentation.
The table is partitioned on the ShipDateKey column, and each data load contains only records for orders
that were shipped after the previous data load. The data for the current year is partitioned by month. Data
loads are performed on the last night of the month.
o The partitions in the table and the filegroups on which they are stored.
Make a note of the details for the last partition in the table (which should currently contain no rows).
o lz.InternetSalesOrderDetails. A landing zone table that contains data extracted from the
SalesOrderDetails table in the InternetSales database.
o lz.InternetSalesOrderHeader. A landing zone table that contains data extracted from the
SalesOrderHeader table in the InternetSales database.
o stg.FactInternetSales. A staging table that contains transformed data from the landing zone
tables that is ready to be loaded into the FactInternetSales table.
View the contents of the stg.FactInternetSales table, noting the number of records it contains. Also
note that it contains alternate (business) keys for the dimension members that have already been
loaded into the dimension tables in the data warehouse.
Open the LoadFactInternetSales.dtsx package and view the Control Flow tab.
Note: The control flow for this package is deliberately simplified to focus on the task required to load
data into a partitioned table. A production package would also include tasks to log audit information,
validate the data, and manage exceptions.
View the variables in this package. They include the following user variables:
o Filegroup. Used to store the name of the filegroup containing the last (currently empty) partition
of the table (into which the staged data will be loaded).
o LastBoundary. Used to store the starting key value for the last partition currently in the table.
o NextBoundary. Used to store the key value to be used for the start of the new partition that will
be created to maintain an empty partition at the end of the table.
o PartitionNumber. Used to store the partition number of the last (currently empty) partition in
the table (into which the staged data will be loaded).
o SQL_AddPartition. A Transact-SQL script that adds a filegroup and a file to the database,
configures the PS_FactInternetSales partition scheme to use the new filegroup for the next
partition, and then splits the PF_FactInternetSales partition function to create a new partition
that starts at the NextBoundary value.
o SQL_CreateLoadTable. A Transact-SQL script that creates a table with the same structure and
compression as FactInternetSales on the filegroup that contains the last (empty) partition
(indicated by a file_group placeholder value, which will be replaced at run time).
o SQL_SwitchPartition. A Transact-SQL script that switches the loaded table with the empty
partition indicated by a partitionnumber placeholder value, which will be replaced at run time.
Double-click the Get Partition Info task to view its editor, and note the following details:
o On the General tab, the result set is a single row returned through the MIA-
SQLDW.AWDataWarehouse connection manager from a direct input query that retrieves the
partition number, filegroup, and boundary values for the last partition in FactInternetSales.
o On the Result Set tab, the PartitionNumber, FileGroup, and LastBoundary values returned in
the result set are mapped to the User::PartitionNumber, User::Filegroup, and
User::LastBoundary variables, respectively.
View the editor for the Get Next Boundary task, and note that it retrieves the LastExtract value for
the InternetSales data source from the dbo.ExtractLog table in the Staging database (converted to
a varchar with style 112 to format it as YYYYMMDD, and assigned the alias NextBoundary). The
NextBoundary value in the result set is then mapped to the User::NextBoundary variable.
View the editor for the Transact-SQL to Add Filegroup task, and note that it contains an expression
to replace the placeholder text in the User::SQL_AddPartition variable with the contents of the
User::NextBoundary variable. You can click Evaluate Expression to see the resulting Transact-SQL
code when the default variable values are used.
View the editor for the Add next filegroup task, and note that it uses the MIA-
SQLDW.AWDataWarehouse connection manager to execute the query in the
User::SQL_AddPartition variable (which was set in the previous task).
MCT USE ONLY. STUDENT USE PROHIBITED
4-26 Designing an ETL Solution
View the editor for the Transact-SQL to Create Load Table task, and note that it contains an
expression to replace the placeholder text in the User::SQL_CreateLoadTable variable with the
contents of the User::Filegroup variable. You can click Evaluate Expression to see the resulting
Transact-SQL code when the default variable values are used.
View the editor for the Create Load Table task, and note that it uses the MIA-
SQLDW.AWDataWarehouse connection manager to execute the query in the User::SQL_
CreateLoadTable variable (which was set in the previous task).
Double-click the Load Staged Data task to view its data flow. Note that this data flow extracts data
from the stg.FactInternetSales table in the staging database, and then it uses a series of lookup
tasks to find the surrogate dimension keys for each alternate key in the source data, before loading
the data into the load table created by the Create Load Table task examined previously.
Note: To simplify the lab, the data flow does not redirect rows with non-matching alternate keys. In a
production system, you should design the data flow to include non-match outputs from the lookup tasks
or a conditional split to redirect any rows that have no matching dimension records to a holding table for
reconciliation.
Click the Data Flow design surface and press F4 to view the Properties pane. Note that the Delay
Validation property for the Load Staged Data task is set to True. This is necessary because the table
referenced in the Load Table destination does not exist when package execution starts―it is created
dynamically at run time.
Note: To create the Load Table destination, the developer used a Transact-SQL script to create a table
named LoadInternetSales in the AWDataWarehouse database and used it as the destination table at
design time. After the data flow implementation was complete and the DelayValidation property set to
True, the table was dropped from the AWDataWarehouse database.
Return to the Control Flow design surface, and view the editor for the Transact-SQL to Add
Constraint and Index task. Note that it contains an expression to replace the placeholder text in the
User::SQL_AddConstraintAndIndex variable with the contents of the User::LastBoundary and
User::NextBoundary variables. You can click Evaluate Expression to see the resulting Transact-SQL
code when the default variable values are used.
View the editor for the Add Constraint and Index task, and note that it uses the MIA-
SQLDW.AWDataWarehouse connection manager to execute the query in the User::
SQL_AddConstraintAndIndex variable (which was set in the previous task).
View the editor for the Transact-SQL to Add Constraint and Index task and note that it contains an
expression to replace the placeholder text in the User::SQL_SwitchPartition variable with the
contents of the User::PartitionNumber variable. You can click Evaluate Expression to see the
resulting Transact-SQL code when the default variable values are used.
View the editor for the Switch Partitions task, and note that it uses the MIA-
SQLDW.AWDataWarehouse connection manager to execute the query in the User::
User::SQL_SwitchPartition variable (which was set in the previous task).
View the editor for the Drop Load Table task and note that it executes a DROP TABLE statement to
drop the dbo.LoadInternetSales table in the AWDataWarehouse database.
Run the package and observe the control flow as it executes. After execution completes, stop
debugging.
Use SQL Server Management Studio to re-execute the View FactInternetSales Partitions.sql script
you ran earlier, and note that the staged rows have been loaded into what was the last partition, and
that a new empty partition has been added to the end of the table.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 4-27
In the LoadPartition SSIS project, add an SSIS package named Load FactResellerSales.dtsx and
implement the load process for the partitioned FactResellerSales table.
o The package should be very similar in design to the Load FactInternetSales.dtsx package.
o The D:\Labfiles\Lab04\Starter\Code Snippets folder contains text files from which you can copy
and paste suitable code for the variables and tasks you will need to create.
o To create a load table to use during development, use SQL Server Management Studio to
generate a CREATE TABLE script from the FactResellerSales table, and then change the table
name to LoadResellerSales before executing it. Remember to drop the table after you complete
the data flow implementation.
o If you test your package and it fails, you can re-run the Setup.cmd batch file in
D:\Labfiles\Lab04\Starter to reset the databases to the starting point.
Results: At the end of this exercise, you will have an SSIS package that loads data into the
FactResellerSales table by using the partition switching technique.
Question: How might your design of the SSIS package that loads the FactResellerSales
table have differed if the table was partitioned on OrderDateKey instead of ShipDateKey?
MCT USE ONLY. STUDENT USE PROHIBITED
4-28 Designing an ETL Solution
Question: In what scenarios would you consider using Transact-SQL for transformations, and
in what scenarios are SSIS data flow transformations appropriate?
MCT USE ONLY. STUDENT USE PROHIBITED
5-1
Module 5
Designing Analytical Data Models
Contents:
Module Overview 5-1
Module Overview
SQL Server Analysis Services (SSAS) enables you to store and analyze data. Microsoft SQL Server 2012
provides two different Analysis Services data models that can be used within an organization:
multidimensional and tabular. Each technology is encapsulated within the single framework of the
Business Intelligence Semantic Model (BISM).
Multidimensional solutions use Online Analytical Processing (OLAP) modeling constructs such as cubes,
measures, and dimensions; and they can also be used to implement data mining solutions. Tabular data
models enable self-service BI solutions by using familiar relational modeling constructs such as tables and
relationships for modeling data. Additionally, the xVelocity in-memory analytics engine is used for storing
and calculating large volumes of data. Understanding the capabilities and differences between each
analytical data model will help you design an appropriate analytical model for your organization.
Objectives
After completing this module, you will be able to:
Lesson 1
Introduction to Analytical Data Models
Analytical data models add value to the underlying data in a data warehouse. There are many similarities
between the two different kinds of analytical data model supported by SQL Server Analysis Services, and
the user experience when using them to analyze data can be almost indistinguishable. However, there are
some fundamental differences in their capabilities and in the way that they are designed and
implemented. Understanding the capabilities of each analytical data model will help you choose which
data model to use, and help you understand the considerations that will be required when implementing
a specific data model.
Lesson Objectives
After completing this lesson, you will be able to:
Describe how analytical model design fits into a BI project.
Analytical modeling in a BI project is based on the business requirements for the BI solution. Before you
start building an analytical data model, it is important to ensure that you understand the business
requirements in order to:
Establish the measures and dimensions required within the data model.
Identify additional objects, such as KPIs, that are required to add value to the data model.
Determine which types of model support the analytical capabilities that the business requires.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-3
Establish how many instances of SQL Server Analysis Services to install and in what mode.
By considering these factors, you can determine the appropriate data model to design. Initially, this may
involve creating an analytical data model for both a multidimensional cube and a tabular data model.
From this, it can be established which data model can be used to meet the business requirement as
defined within the BI project scope.
Note: In the early adoption of SQL Server 2012, it has been common for organizations to
adopt both a multidimensional cube and a tabular data model for the same solution. This has
been observed with organizations that originally had a multidimensional cube in an earlier
version of SQL Server and want to perform a parallel migration of the solution to a tabular data
model.
Tabular data models. These are new in SQL Server 2012 Analysis Services. Tabular data models
expose data in a relational format. They use the Data Analysis Expressions (DAX) language to
implement business logic, and can provide access to data by using the in-memory xVelocity engine
(previously known as VertiPaq) or, subject to limitations in the design of the model, the Direct Access
mode, which provides access to data in the underlying data source.
The BI Semantic Model provides a single logical model for all end-user experiences by exposing the same
set of application programming interfaces (APIs) to client tools, regardless of the underlying data model
(tabular or multidimensional). Business users can connect to a tabular or multidimensional BI Semantic
Model by using Power View, Excel, PowerPivot or any other client tool. This approach benefits
organizations in various ways, including:
It ensures the maximum flexibility for businesses, enabling them to choose client tools without having
to consider compatibility with a specific data model.
It eliminates the risk for organizations that have already invested in a particular client technology of
becoming locked into that technology because it depends on a particular data model.
MCT USE ONLY. STUDENT USE PROHIBITED
5-4 Designing Analytical Data Models
Tabular data models are generally simpler in design that multidimensional models, so companies can
achieve a faster time-to-deployment for BI applications.
Business users can use Power View to create interactive visualizations from tabular data models.
Business users can use PowerPivot in Excel to create and share their own tabular data models. If a
user-created PowerPivot workbook becomes heavily used and needs to be brought under centralized
management, it can be easily imported into a tabula Analysis Services database.
Creators of tabular models can use DAX to create measures and calculated columns and to implement
security. DAX is similar to the formulae used in Excel workbooks, so information workers who already use
Excel should find it relatively easy to learn and use. Tabular models are suitable for a wide range of BI
scenarios, from a personal desktop BI application developed in Excel to departmental or even larger
solutions, depending on the complexity of the application.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-5
Actions Yes No
Linked Yes No
objects
Many-to- Yes No
many
relationships
Translations Yes No
Custom Yes No
assemblies
Custom Yes No
rollups
Writeback Yes No
The preceding table provides a clear indication of the current capabilities of each analytical data model.
Multidimensional data models are a mature technology available since SQL Server 7.0. If you require data
mining, writeback, or translations within a data model, this can only be facilitated by multidimensional
data models. Conversely, if Power View is the main driver for using a data model, only tabular data
models provide this support.
Additional Reading: For more information about the different features supported by
multidimensional and tabular models, see “Choosing a Tabular or Multidimensional Modeling
Experience in SQL Server 2012 Analysis Services” at http://msdn.microsoft.com/en-
us/library/hh994774.
Hierarchies. Multidimensional data models provide support for balanced, ragged, and parent-child
hierarchies. Tabular data models simplify the support for balanced hierarchies because attribute
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-7
relationships do not need to be defined within a tabular data model. Ragged hierarchies are not
supported by tabular data models, and parent-child hierarchies can be implemented only by using
DAX expressions.
Self-service to BI. Multidimensional data models are presented through client applications including
SQL Server Reporting Services and Microsoft Excel, though typically the model itself is created by a
data modeling specialist. Business users can create tabular data models by using PowerPivot Excel.
The table and relationship paradigm used by tabular models makes it easier for business users to
create a data model. If this requires additional help from the IT department, they can import the
tabular data model from PowerPivot into SQL Server Data Tools to further refine the model. This
improves collaboration between the business and IT in delivering BI to the organization.
MCT USE ONLY. STUDENT USE PROHIBITED
5-8 Designing Analytical Data Models
Lesson 2
Designing an Analytical Data Model
The initial steps to create an analytical data model vary, depending on which kind of model you want to
create. However, many of the key design considerations are the same for both models.
Lesson Objectives
After completing this lesson, you will be able to:
Enumerate the data sources available for multidimensional and tabular models.
Data source views provide a logical data model of related tables, views, and queries from one or more
data sources. You can define a subset of the source data in a data source view, typically when you
want to expose a data mart from a data warehouse that will be used as a basis for creating a cube.
The data source view contains metadata that is cached so that you can develop SQL Server Analysis
Services solutions without being permanently connected to the source data. Additionally, because the
data source view is a logical model, changes can be made to the data source view without changing
the underlying schema in the data source.
Typically, you use the Data Source View Wizard to create a data source view. In this wizard, you define
the data source on which the data source view is based, and select the tables or views from the data
source that you want to include in the data source view. You can use the Data Source View Designer
to make modifications to the data source view, such as adding and renaming columns, or adding
logical relationships between tables. You can also create multiple diagrams that show different
subsets of the data source view.
3. Create a cube.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-9
You can use the Cube Wizard to create a cube. The cube can be based on an existing data source, a
cube that generates tables in the data source, or an empty cube. The Cube Wizard provides a useful
starting point, although additional work is required in the Cube Designer to refine the cube
properties. When you create a cube from an existing data source view, you specify which tables
contain measures (usually the fact tables in the data warehouse), and which measures should be
included in the cube. The selected measures are collectively grouped together within a measure
group for each fact table, and aggregations for the measures are automatically included in the cube.
Tabular data models provide support for a much wider range of data sources than multidimensional
data models. You can include data from databases, files, cloud services, and even paste tables from
the clipboard. You use the Table Import Wizard to import the tables you want to include in your
model from one or more data sources, and a worksheet is created for each table in the project. After
you import the tables, you can view a graphical representation of the model and define relationships
between tables if necessary.
2. Create measures.
Unlike in a multidimensional model, the wizard used to import the tables does not identify fact tables
and create measures. Each table worksheet contains a data grid that contains detailed information
from the underlying table and a measures grid that enables you to define DAX calculation to
calculate and summarize data. You must use the measure grid in the table worksheets to create
measures for each column that you want to aggregate in the cube. Typically, you do this by entering
a DAX formula to aggregate the data under the corresponding numeric column. The tabular model
development environment in SQL Server Data Tools includes automatic formulae that you can apply
(such as a SUM calculation).
Multidimensional Tabular
Multidimensional Tabular
reports and Microsoft
Azure Marketplace
DataMarket
Informix relational
databases
Sybase relational
databases
Text files
The clipboard
Note: Setup does not install the providers that are listed for each data source. Some
providers might already be installed with other applications on your computer; in other cases,
you will need to download and install the provider.
In multidimensional solutions, it is possible to refer to multiple data sources through a data source view.
For example, this can be achieved by retrieving data from a SQL Server data source and an Oracle data
source within a data source view that has a primary data source to SQL Server, in which a named query
uses an OPENROWSET function to retrieve data from the Oracle source. This will have a performance
impact that may prolong cube processing. As a best practice, use an ETL solution to centralize the data
from all of your data sources in a SQL Server data warehouse, and base the data model on the data
warehouse.
When tabular data models process data from a data source, the following data types are supported:
Whole Number.
Decimal Number.
Boolean.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-11
Text.
Date.
Currency.
When importing data or using a value in a formula, even if the original data source contains a different
data type, the data is converted to one of the preceding data types. Consider the source data type before
importing data into the appropriate data model and consider explicit conversions through the use of
views to control the data type that is used within the tabular data source.
Consider consuming data through views in the data warehouse instead of directly importing base
tables. This means that you will probably need to manually define relationships in the data model, but
it enables you to use query hints in the underlying views to optimize concurrency and abstract the
underlying table structures.
Select only the columns required to produce an analytical data model to reduce the time it takes to
process the data.
When using multidimensional data models, define data source views against a single data source.
Multiple heterogeneous data sources slow down the processing of data.
When using tabular data models, consider placing filters on the source data to reduce the data used
during development. You can remove the filters before the tabular data model is deployed to a
production server.
When using multidimensional data models, the default aggregation for measures is a Sum function.
MCT USE ONLY. STUDENT USE PROHIBITED
5-12 Designing Analytical Data Models
There are many attribute properties that you can configure in a multidimensional data model, important
properties are shown in the following table.
Category Property
Basic Basic attribute properties include Name, Description, and ID. Additional
properties include:
Type. The Type property defines any additional special attribute associated with
the selected dimension type to the dimension. The Type property contains a
drop-down list of preset dimension types. Each dimension type will add
attributes. For example, if you select the Type as Account, it can optionally add
attributes such as AccountName and AccountNumber.
Usage. When you define dimension attributes, you can set how the attribute will
be used within the dimension. A setting of Key means that the attribute is a key
column to which other attributes relate to. A setting of Parent means that the
attribute is part of a parent-child relationship that exists within the dimension.
Advanced There are properties that are available to configure attributes. Important attributes
include:
AttributeHierarchyEnabled. When set to true, it allows the attribute to be
displayed within the default attribute hierarchy. This means that each attribute
acts as a flat hierarchy with one level that shows all member values within the
attribute. If you use an attribute within a user-defined hierarchy, consider setting
this to false.
AttributeHierarchyVisible. Determines if the attribute is visible. If set to True,
the attribute is visible. If set to false, the attribute is not visible, but it can still be
used within MDX statements or user-defined hierarchies.
IsAggregatable. Specifies whether the values of the attribute members can be
aggregated.
OrderBy. Controls how attributes are sorted in an attribute hierarchy. If the
NameColumn setting is defined, attributes are ordered by the values specified
in the NameColumn property.
Category Property
member. The setting of NonLeafDataVisible will display the data that is both
directly and indirectly associated with the member.
NamingTemplate. The naming template allows you to define custom names for
each level of the parent-child hierarchy. In an Employee dimension, the first
level may be Executives, the second level Senior Management, and so on.
Column Name.
Data Format.
Data Type.
Description.
Hidden.
Sort By Column.
Additionally, you can use reporting properties to define the behavior of the tabular data model within
client reporting tools including:
Default Image.
Default Label.
Image URL.
Row Identifier.
Summarize By.
Objectives
After completing this lab, you will be able to:
5. Create Relationships.
6. Create a Cube.
7. Configure Measures.
8. Configure Attributes.
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
o Password: Pa$$w0rd
The data source should include the following views from the AW Data Warehouse data source:
o dw_views.Customer
o dw_views.Date
o dw_views.InternetSales
o dw_views.Product
o dw_views.Reseller
o dw_views.ResellerSales
o dw_views.SalesPerson
o dw_views.SalesTerritory
4. After all the relationships are created, save the data source view.
MCT USE ONLY. STUDENT USE PROHIBITED
5-16 Designing Analytical Data Models
Use the InternetSales and ResellerSales tables as the measure group tables.
o Internet Sales;
Order Quantity.
Unit Price.
Product Unit Cost.
Total Product Cost.
Sales Amount.
o Reseller Sales;
Order Quantity - Reseller Sales.
Unit Price - Reseller Sales.
Product Unit Cost - Reseller Sales.
Total Product Cost - Reseller Sales.
Sales Amount - Reseller Sales.
Include all dimension tables.
Create attributes in the Date dimension from the following columns in the Date table (note that
spaces are automatically added to the attribute names to make them more readable):
o DateAltKey
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-17
o MonthName
o CalendarYear
o FiscalQuarter
o FiscalYear
Modify the following properties of the Month Name attribute so that month names are sorted into
month number order:
o OrderBy: Key.
Configure the dimensions described in the following table, setting the AttributeHierarchyVisible
property of the key attribute to False in each dimension.
Use Excel to view the Internet Revenue and Reseller Revenue measures by the Category attribute
of the Product dimension.
After you finish browsing the cube, close Excel without saving the workbook, and then close Visual
Studio.
Results: At the end of this exercise, you will have a multidimensional data model named AWSalesMD.
3. Create Relationships.
4. Create Measures.
5. Configure Attributes.
Use the localhost\SQL2 instance of Analysis Services as the workspace server, and set the
compatibility level of the project to SQL Server 2012 SP1 (1103).
o Customer (Customer).
o Date (Date).
o Product (Product).
o Reseller (Reseller).
In the measures grid, under the appropriate columns, add the following measures.
To hide multiple columns in a table, click the columns you want to hide while holding the Ctrl key,
and then right-click any selected column and click Hide from Client Tools.
On the Model menu, click Analyze in Excel and open the default perspective in Excel by using the
credentials of the current Windows user.
Use Excel to view the Internet Revenue and Reseller Revenue measures by the Category attribute
of the Product dimension.
After you finish browsing the cube, close Excel without saving the workbook, and then close Visual
Studio.
Results: At the end of this exercise, you will have a tabular data model named AWSalesTab.
Lesson 3
Designing Dimensions
Dimensions provide the analytical business factors by which the measures in the data model can be
aggregated.
Lesson Objectives
After completing this lesson, you will be able to:
A balanced hierarchy is the most common type of hierarchy implemented. A hierarchy is balanced when
all branches of the hierarchy descend to the same level, and each member's logical parent is the level
immediately above the member. This is the most common hierarchy that is implemented within SQL
Server Analysis Services and can be implemented in both multidimensional and tabular data models. A
popular example is a calendar hierarchy that contains levels such as Calendar Year, Calendar Quarter,
and Calendar Month.
Creating natural or balanced hierarchies in tabular data models is a simple process of creating a hierarchy,
naming the hierarchy, and then dragging an attribute onto the hierarchy, with the top level dragged first,
followed by each lower level.
When a dimension is created, by default a one-to-many relationship is created between the key attribute
and the remaining attributes within the dimension. Although they do not need to be defined, creating
attribute relationships can improve the performance of browsing data by providing information about
additional relationships between attributes that result in the selection of more effective aggregates when
the cube is processed. As a result, the cube processing time can be reduced. Defining relationships
between levels in a hierarchy enables Analysis Services to define more useful aggregations to increase
query performance and can also save memory during processing performance, which can be important
with large or complex cubes.
When you create relationships between attributes, you must also set the KeyColumns and
NameColumns properties of the attributes to ensure that each member in the hierarchy can be uniquely
identified by a combination of columns. For example, a hierarchy might include attributes and levels for
Calendar Month, Calendar Quarter, and Calendar Year. In this case, the Calendar Month attribute
should have a KeyColumns property that includes Calendar Month, Calendar Quarter, and Calendar
Year (because a specific month is unique within a quarter and year). Similarly, the Calendar Quarter
attribute should have a KeyColumns property that includes Calendar Quarter and Calendar Year,
because a quarter is unique within a specific year. If the members within the hierarchies you have created
cannot be uniquely identified, an error will occur when you try to process the dimension.
Role-Playing Dimensions
Some dimensions can be reused for multiple
relationships with the same measure group or fact
table. For example, an Order table might be
related to a date dimension on both OrderDate
and ShipDate, so users can choose to aggregate
orders by the date on which they were placed, the
date on which they were shipped, or both.
Dimensions that can be used for multiple
relationships in this way are known as role-playing
dimensions.
An important point to note is that, although there are two dimensions in the cube, in reality, only one
physical dimension exists, and any changes you make to the underlying dimension (such as defining
hierarchies) will be applied to all of the role-playing dimensions that are based on it.
To implement multiple role-playing dimensions in a tabular data model, import the dimension table
multiple times and apply a friendly name to each copy that is reflective of the role-playing dimension that
the table will be used to support.
Parent-Child Hierarchies
Parent-child user-defined hierarchies create a
single-level hierarchy that is derived from a single
attribute that is defined as a parent attribute. A
parent attribute describes a self-referencing
relationship, or self-join, within a dimension table.
As a result, there is the appearance that there are
multiple ragged levels within the hierarchy.
However, the multiple levels occur as a result of
the relationship that exists between the parent
attribute and the key attribute.
PATH(<id_columnName>, <parent_columnName>)
<id_columnName> refers to the key column in a table. For example in an Employees table, the key
column might be EmployeeID.
<parent_columnName> refers to the parent column for a key column in a table. In the example of an
Employees table, this might be ManagerID.
Note: There are additional DAX functions that are similar to PATHITEM, including
PATHITEMREVERSE, PATHLENGTH, and PATHCONTAINS. For more information, see SQL
Server Books Online.
<result_columnName> is the name of an existing column that contains the value you want to return.
<search_value> is a value that can be used to provide a filter for the LOOKUPVALUE function. This
can include a string literal value or another function, such as PATHITEM, to filter the data.
Optionally, additional <search_columnName> and <search_value> parameters can be defined.
<path> refers to a column that contains the results of the PATH function. In the example of an
Employees table, this could be a column named EmployeeLevel.
<type> is an optional parameter that can be used to determine the data type that the result should
be returned in. A value of 0 is text―which is the default―and a value of 1 is an integer data type.
You can use the LOOKUPVALUE and PATHITEM function together to create a parent-child hierarchy. An
additional calculated measure can be created in a table named Level1 that contains the following code to
populate a value of Firstname for an employee’s immediate manager:
The calculated measure to create the second level of the employee hierarchy in a column named Level2 is
shown in the following code example:
When the new calculated measures are defined to represent the different levels of an employee hierarchy,
you can use the Create Hierarchy button in SQL Server Data Tools to create a hierarchy, and then click
and drag each level into the new hierarchy.
Design considerations
When you design parent-child hierarchies in analytical data models, consider the following points:
Ensure that the parent key and the child key are of compatible data types.
Ensure that a self-join relationship exist between the parent key and the child for best query and
processing performance.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-27
View reseller sales performance by reseller geography; drilling down from countries, to states, to
cities, and to individual resellers.
View reseller sales performance by business type, drilling down to individual resellers.
View reseller sales performance by sales territory, drilling down from region, to country, and to
individual sales territory.
View Internet sales by customer geography; drilling down from countries, to states, to cities, and to
individual customers.
View both Internet and reseller sales by product; drilling down from category, to subcategory, to
individual product.
View both Internet and reseller sales by calendar date; drilling down from year, to month, to date
based on both order dates and ship dates.
View both Internet and reseller sales by fiscal date; drilling down from fiscal year, to fiscal quarter, to
month, to date based on both order dates and ship dates.
View reseller sales performance by salesperson, drilling down through the sales management
structure from senior sales managers to individual sales representatives.
Objectives
After completing this lab, you will be able to:
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
Examine the Customer.dim dimension, and note that a hierarchy named Customers By Geography
has been created.
Examine the attribute relationships defined for this dimension. These relationships are optional, but
they can significantly improve aggregation performance when processing the dimension.
Examine the properties of the attributes in the dimension, and note the following:
o Each attribute that is included in the Customers By Geography hierarchy is uniquely identified
by a combination of multiple columns in the KeyColumns property. For example, the City
attribute has a KeyColumns property value that includes the City, StateOrProvince, and
Country columns. This ensures that a city in the hierarchy is uniquely identified based on the city
name, the state or province, and the country―for example, two instances of Paris in the Seine
region of France are known to be the same city, while Paris in Texas, USA is a different city.
o Attributes with multiple KeyColumn values have the NameColumn and ValueColumn property
set to reflect the column that should be used for the attribute’s name and value.
o All attributes have the AttributeHierarchyVisible property set to False, so the only way to
browse the dimension is through the Customers By Geography hierarchy.
Process the dimension, deploying the database if necessary. Then browse the Customers By
Geography hierarchy in the Browser tab of the Dimension Designer.
Examine the Reseller.dim dimension (which has two hierarchies) and the Sales.dim Territory.dim
dimension, and note that the attributes in these dimensions have been similarly configured.
o Category
o Subcategory
o Product
Create the following attribute relationships:
Edit the properties of the attributes in the hierarchy so that the following statements are true:
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-29
o None of the attributes is visible other than in the Products By Category hierarchy.
o The product attribute is uniquely identified by the ProductKey and ProductName columns (in
that order).
o The name and value of the Product attribute are based on the ProductName column.
o The name and value of the Subcategory attribute are based on the ProductSubcategoryName
column.
o The name and value of the Category attribute are based on the ProductCategoryName
column.
After you create the hierarchy and configure the attributes, save all files in the project.
Process the dimension, deploying the database if necessary. Then browse the Products By Category
hierarchy in the Browser tab of the Dimension Designer.
o Calendar Date;
Calendar Year.
Month Name.
Date.
o Fiscal Date;
Fiscal Year.
Fiscal Quarter.
Month Name.
Date.
Create the following attribute relationships, and change the existing relationship type between
DateKey and Date to Rigid.
Set the OrderBy property of the Month Name attribute to Key. Note that because you used the
MonthNumber column in the attribute’s KeyColumns property instead of MonthName, the month
names will be displayed, but they will be sorted by the month number.
Set the Type property of the dimension to Time by clicking the Date dimension icon above the
attributes in the Attributes pane, and changing the property in the Properties pane. Specifying a
type of Time enables time intelligence for the dimension (so Analysis Services can calculate time
intervals between values).
Process the dimension, deploying the database if necessary. Then browse the Calendar Year and
Fiscal Year hierarchies in the Browser tab of the Dimension Designer.
Rename the Parent Employee Key attribute to Salesperson, and set its Usage property to Parent.
Process the dimension, deploying the database if necessary, and then browse the Salesperson
hierarchy in the Browser tab of the Dimension Designer.
In the Cube Designer for Sales.cube, on the Browser tab, click the Analyze in Excel button (or click
Analyze in Excel on the Cube menu) to open the cube in Excel. Enable data connections if
prompted.
Use Excel to view the Reseller Revenue measure by the Products By Category hierarchy, the Order
Date.Calendar Year hierarchy, and the Salesperson hierarchy, verifying that the hierarchies behave
as expected.
After you finish browsing the cube, close Excel without saving the workbook, and then close Visual
Studio.
Results: At the end of this exercise, you will have a multidimensional model that includes balanced
hierarchies, a role-playing dimension, and a parent-child dimension.
Process all of the tables in the model, using the user name ADVENTUREWORKS\ServiceAcct and
the password Pa$$w0rd to connect to the data source.
In Diagram View, examine the Customers table, and note the following details:
o The Customers By Geography hierarchy contains the Country, State Or Province, City, and
Customer attributes.
o The Country, State Or Province, City, and Customer attributes in the table (not in the
hierarchy) have been hidden from client tools.
Note that hierarchies have also been created in the Reseller and Sales Territory tables.
Use the Create Hierarchy button in the title bar of the maximized product table to create a hierarchy
named Products By Category.
Add the Category, Subcategory, and Product attributes to the Products By Category hierarchy.
Hide the Category, Subcategory, and Product attributes that are not in the hierarchy from client
tools.
Restore the Product table to its previous size.
Rename the columns in the Ship Date table to match the equivalent columns in the Order Date
table.
Create relationships between the Reseller Sales and Internet Sales tables and the Date table by
linking the ShipDateKey column in the Internet Sales and Reseller Sales tables to the DateKey
column in the Ship Date table.
Create the following hierarchies in both the Order Date and Ship Date tables:
o Calendar Date;
Calendar Year.
Month Name.
Date.
o Fiscal Date;
Fiscal Year.
Fiscal Quarter.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-33
Month Name.
Date.
Hide all columns in the Order Date and Ship Date tables (other than the hierarchies) from client
tools.
Mark the Order Date and Ship Date tables as date tables:
o On the Table menu, point to Date, and click Mark as Date Table.
o Use the Date column when marking the tables as date tables.
Sort the Month Name column in the Order Date and Ship Date tables by the MonthNumber
column:
o In Data View, select the Month Name column header, and then on the Column menu, point to
Sort, and click Sort by Column.
=PATH([EmployeeKey], [ParentEmployeeKey
Add a column named Level1 to the Salesperson table. Use the following DAX formula to calculate its
value:
Add a column named Level2 to the Salesperson table. Use the following DAX formula to calculate
its value:
Add a column named Level3 to the Salesperson table. Use the following DAX formula to calculate its
value:
In Diagram View, create a hierarchy named Salesperson in the Salesperson table, and add the
Level1, Level2, and Level3 attributes to the hierarchy.
Hide all columns in the Salesperson table (other than the hierarchy) from client tools.
On the Model menu, click Analyze in Excel and open the default perspective in Excel by using the
credentials of the current Windows user.
Use Excel to view the Reseller Revenue measure by the Products By Category hierarchy, the Order
Date.Calendar Year hierarchy, and the Salesperson hierarchy, verifying that the hierarchies behave
as expected.
MCT USE ONLY. STUDENT USE PROHIBITED
5-34 Designing Analytical Data Models
After you finish browsing the cube, close Excel without saving the workbook, and then close Visual
Studio.
Results: At the end of this exercise, you will have a tabular model that includes balanced hierarchies, a
role-playing dimension, and a parent-child dimension.
Question: How do the two models compare when designing dimensions and hierarchies?
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-35
Lesson 4
Enhancing Data Models
After you create a data model and define its dimensions and hierarchies, business users can use it to
perform analytics. However, you can enhance a data model to add more value to the aggregated
measures and improve the business analytical experience.
Lesson Objectives
After completing this lesson, you will be able to:
Custom Calculations
You can extend data models by adding custom
calculations to create measures that are not
available in the data source. For example, a data
model might include measures for sales revenue
and cost, and you could create a calculated
member to calculate gross profit by subtracting
cost from sales revenue. When designing custom
calculations for a data model, there are typically
two kinds of custom calculation that you can
create:
Calculated columns and measures. You can
use an MDX expression in a multidimensional
model or a DAX expression in a tabular model to create a calculated column in a table. Typically, the
column uses values from other columns in the same table to create a row-level value. For example, if
your data model included a table that contains a Unit Price column that contains the cost of a single
product, and an Order Quantity column that contains the number of units ordered, you could create
a calculated Sales Amount column by multiplying the unit price by the order quantity. You could
then create an aggregated measure based on the calculated column (by adding it to the measure
group for the table in a multidimensional model, or by creating a DAX calculation that uses an
aggregate function in the measure grid in a tabular model).
Global calculated members. In addition to using calculated columns to create measures, you can
use MDX or DAX expressions to create global calculated members that exist independently of a
measure group or table. A calculated member can perform a calculation that spans multiple fact
tables―for example, by adding a Costs measure in a marketing fact table to a Costs measure in a
manufacturing fact table to generate a total cost across both areas of the business. In a
multidimensional model, you create calculated members in the Cube Designer and define a named
folder in which client tools can browse them. In tabular data models, you can create calculated
measures in the measure grid of any table, but when browsed by a client tool, the calculated
measures will appear as members of the table in which they were created. To provide the custom
folder browsing experience of a multidimensional calculated member in a tabular mode, you can
MCT USE ONLY. STUDENT USE PROHIBITED
5-36 Designing Analytical Data Models
paste an empty table into the model from the clipboard and use that as a folder for global calculated
measures.
Additionally, you can use the Calculations tab in the Cube Designer to create calculated members. A
calculated member is a custom MDX expression that defines a member in any hierarchy in the cube. No
data is stored in a calculated member and the MDX calculation is calculated at run time. Calculations are
solved in the order listed in the Script Organizer pane. For best performance with cell calculations,
specify only a single member when possible. The Non-empty Behavior option stores the names of
measures used to resolve NON EMPTY queries in MDX. The script view enables you to add additional
settings to the calculated members that are not available through the form-based calculated member
user interface.
Calculated measures and calculated members can involve simple calculations of existing measures, or the
use of a wide range of MDX functions to perform advanced analytics. The following example uses tuples
to subtract the discount amount from the sales amount:
The following example uses the ParallelPeriod function to return the order quantity for the parallel
period for
The following example of a column named Profit uses a DAX expression to subtract the cost from the
sales amount and will be evaluated against every row in a table:
=[Measures].[SalesAmount] - [Measures].[Cost]
Like any other numeric column, you can create calculated measures from calculated columns. Calculated
measures are aggregated based on a filter or slicer that the user applies in the reporting client such as a
PivotTable. Measures can be based on standard aggregation functions, such as DISTINCT COUNT, COUNT
or SUM, or you can define your own formula by using DAX. The following example calculates the sum of
the Profit column described earlier for a given filter or slicer:
Total Profit:=SUM([Profit])
The formula for a calculated column can be more resource-intensive than the formula used for a measure
because the result for a calculated column is always calculated for each row in a table, whereas a measure
is only calculated for the cells defined by the filter.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-37
You can create a calculated measure that uses a DAX expression that references values outside of the
current table― this enables you to create global calculated measures that do not naturally belong in any
one table. When you need to create this kind of global calculated member, you should consider adding a
global calculations table to the data model to centralize the storage of global calculations and make
browsing more intuitive for business users. An easy way to accomplish this is to copy an Excel worksheet
that contains only a single column header to the clipboard, and paste it into the tabular data model as an
empty table. You can then use DAX expressions to define measures in the measure grid of the new table.
When users browse the data model, the measures will be listed under the table you pasted instead of
under one of the fact tables or dimension tables that contain data rows.
Note: MDX and DAX are rich expression languages that contain a wide range of functions
and expressions. For more information about these expression languages, see SQL Server Books
Online.
KPIGOAL. This function is an MDX expression that calculates the target of the KPI to which the
KPIVALUE function is compared to.
KPITREND. This function is an MDX expression to calculate a trend that is applied to a trend
indicator.
Perspectives
Perspectives enable you to organize data model
objects that make it easier for users to browse a
data model. Perspectives are very useful with very
large cubes that contain many dimensions,
measure groups, KPIs, and other objects that can
make it difficult and confusing for users to find the
objects that they need.
After the data model is deployed, when a user creates a connection to the data model from within
Microsoft Excel, a list of perspectives is presented, together with the cubes that are deployed.
Note: Perspectives are not used to apply security to analytical data models.
In multidimensional data models, measures are collectively grouped together in measure groups, which
define the granularity of the data and contain information about how measures relate to dimensions. You
can use the properties of a measure group to control how it manages errors, processing mode options,
and the storage mode for the cube.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-39
The storage modes in the following table can be defined for measure groups in multidimensional data
models. The choice of storage mode can dramatically affect the performance and cube size.
Storage
Description
Mode
MOLAP Setting the storage mode to MOLAP (Multidimensional Online Analytical Processing)
causes measures to be stored in the OLAP database. This speeds up the query
performance for data within the OLAP database, but it also increases the size of the
cube and the time it takes to process the data.
ROLAP Setting the storage mode to ROLAP (Relational Online Analytical Processing) causes
measures to be stored the source system, typically the fact table in the data
warehouse. SQL Server Analysis Services creates additional tables in the data source
for any pre-aggregated data.
HOLAP Setting the storage mode to HOLAP (Hybrid Online Analytical Processing), can be
seen as a compromise between the MOLAP and ROLAP storage mode. With the
HOLAP storage mode setting, pre-aggregated data is stored within the OLAP
database, and detailed data is stored in the relational data source. As a result, queries
for detailed data in the cube are slower than queries for pre-aggregated data.
When using SQL Server 2012 Enterprise edition, you can partition a measure group. Partitioning can
enable users to query the cube efficiently and reduce the time it can take to process a cube, because it
can be processed at partition level instead of processing the entire cube. For example, if you have a cube
that stores 10 years of historical data, you could create two partitions. The first partition could store the
data for the two most recent years, which is the most frequently queried time period, in MOLAP storage
mode. The second partition may store the remaining eight years of data using ROLAP storage mode.
Partitioning is also supported in tabular data models, providing the ability to spread the data across
multiple partitions. You can also copy, merge, and delete partitions.
MCT USE ONLY. STUDENT USE PROHIBITED
5-40 Designing Analytical Data Models
The CFO would like to be able to view profit for Internet and reseller sales, and also grand totals for
cost, revenue, and profit across both Internet and reseller sales.
The CEO also wants an easy way to quickly see how profit margin is performing against a target of 40
percent.
Sales analysts have requested simplified cubes, specifically for Internet sales and reseller sales.
Objectives
After completing this lab, you will be able to:
4. Create a KPI.
5. Create Perspectives.
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
In the AW Data Warehouse.dsv data source view, add a named calculation to the InternetSales
table:
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-41
[SalesAmount] - [TotalProductCost]
Add a similar calculated column named Reseller Profit to the ResellerSales table using the same
expression.
In the Sales cube, add the Internet Profit calculated column to the Internet Sales measure group,
and the Reseller Profit calculated column to the Reseller Sales measure group.
([Measures].[Internet
Revenue] +
[Measures].[Reseller
Revenue])
After you add the calculated members, save all files in the project.
Process the cube, redeploying the AWSalesMD database if necessary. If you are prompted for
credentials, use the ADVENTUREWORKS\ServiceAcct user account with the password Pa$$w0rd.
Create a new KPI named Gross Margin with the following settings:
o Value Expression:
[Measures].[Gross Margin]
o Goal Expression:
0.4
MCT USE ONLY. STUDENT USE PROHIBITED
5-42 Designing Analytical Data Models
o Status Expression:
CASE
WHEN([Measures].[Gross Margin]) < 0.3 THEN -1
WHEN ([Measures].[Gross Margin]) > 0.4 THEN 1
ELSE 0
END
After you create the KPI, save all files in the project.
Process the cube, redeploying the AWSalesMD database if necessary. If you are prompted for
credentials, use the ADVENTUREWORKS\ServiceAcct user account with the password Pa$$w0rd.
Create a perspective named Reseller Sales that includes only the following objects:
o The Reseller Quantity, Reseller Cost, Reseller Revenue, and Reseller Profit measures.
o The Order Date, Product, Ship Date, Reseller, Sales Territory, and Salesperson dimensions.
After you create the perspectives, save all files in the project.
In the Cube Designer for Sales.cube, on the Browser tab, click the Analyze in Excel button (or click
Analyze in Excel on the Cube menu), and then select the Sales perspective (which is the complete
cube). Enable data connections if prompted.
Use Excel to view the Total Revenue, Total Cost, Total Profit, and Gross Margin measures and the
Status of the Gross Margin KIP by the Products By Category hierarchy.
Close Excel without saving the workbook, and then on the Browser tab, click the Analyze in Excel
button (or click Analyze in Excel on the Cube menu), and then select the Internet Sales perspective.
Enable data connections if prompted, and verify that only four measures from the Internet Sales
measure group, and the Customer, Order Date, Product, and Ship Date dimensions are available.
After you finish browsing the cube, close Excel without saving the workbook, and then close Visual
Studio.
Results: At the end of this lab, you will have a multidimensional model that contains custom
calculations, a KPI, and perspectives.
3. Create a KPI.
4. Create Perspectives.
Note: If you reverted the virtual machine after the previous lab, you may be prompted to select a
workspace server. If so, use the localhost\SQL2 instance of Analysis Services as the workspace server, and
set the compatibility level of the project to SQL Server 2012 SP1 (1103).
Process all of the tables in the model, using the user name ADVENTUREWORKS\ServiceAcct and
the password Pa$$w0rd to connect to the data source.
In Data View, add a column named Profit to the Internet Sales table. Use the following DAX formula
to calculate its value:
=[SalesAmount] - [TotalProductCost]
In the measures grid, under the Profit column, add the following measure:
Internet Profit:=SUM(Profit)
Perform the same steps in the Reseller Sales table to create a hidden column named Profit, and an
aggregated measure named Reseller Profit.
Click any table tab, and then on the Edit menu, click Paste to paste the data from the clipboard as a
new table.
The KPI should use an absolute value of 0.4 as the target, and have status thresholds of 0.3 and 0.4.
o The Internet Quantity, Internet Cost, Internet Revenue, and Internet Profit measures.
o The Reseller Quantity, Reseller Cost, Reseller Revenue, and Reseller Profit measures.
o The Order Date, Product, Ship Date, Reseller, Sales Territory, and Salesperson dimensions.
After you create the perspectives, save the model.
On the Model menu, click Analyze in Excel and open the default perspective in Excel by using the
credentials of the current Windows user.
Use Excel to view the Total Revenue, Total Cost, Total Profit, and Gross Margin measures and the
Status of the Gross Margin KIP by the Products By Category hierarchy.
Close Excel without saving the workbook, and then on the Model menu, click Analyze in Excel, and
then open the Internet Sales perspective in Excel by using the credentials of the current Windows
user to verify that only four measures from the Internet Sales measure group, and the Customer,
Order Date, Product, and Ship Date dimensions are available.
After you finish browsing the cube, close Excel without saving the workbook, and then close Visual
Studio.
Results: At the end of this exercise, you will have a tabular model that contains calculated measures, a
KPI, and perspectives.
Question: How did the experience of creating cube enhancements vary between the two
models?
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 5-45
Question: Now that you are familiar with both models, how would you decide which to use
in a particular business scenario?
MCT USE ONLY. STUDENT USE PROHIBITED
5-46 Designing Analytical Data Models
MCT USE ONLY. STUDENT USE PROHIBITED
6-1
Module 6
Planning a BI Delivery Solution
Contents:
Module Overview 6-1
Module Overview
The primary goal of any business intelligence (BI) solution is to deliver meaningful data to business users,
and empower them to make informed business decisions. A well-designed data warehouse and
comprehensive analytical data model are of no use without a way for users to consume the information
they contain. When planning your BI solution, you must consider how it will deliver data to users, and
choose the most appropriate reporting tools for the business requirements.
This module describes common reporting scenarios and the Microsoft tools that can be used to support
them.
Objectives
After completing this module, you will be able to:
Lesson 1
Considerations for Delivering BI
The reports and analytical interfaces that business users consume are the visible “face” of the BI solution.
For most users, the reporting tools and the information they make available represent the entire BI
solution, and its success or failure will largely be judged on their effectiveness.
Therefore, in addition to the detailed planning that is required for the server infrastructure, the data
warehouse, the extract, transform, and load (ETL) processes, and the analytical data models on which the
BI solution is based, you must give careful consideration to how the BI data will be delivered to business
users in the form of reports and analytical interfaces.
Lesson Objectives
After completing this lesson, you will be able to:
Describe how reporting and analysis design fits into the overall BI project.
Describe types of data source commonly used for reporting and analysis.
Describe the Microsoft tools that are commonly used for reporting and analysis.
Describe how Microsoft SharePoint Server can be used as a platform for delivering BI.
Before you start planning the detailed design of reports, you should consider the required data sources
and determine how the reporting tools will access the data they provide. In some cases, tools can
consume the data directly from the data source, whereas in others you might need to consider
developing an ETL process that retrieves the required data into an intermediate format for reporting.
Reporting Tools
The Microsoft BI platform includes a wide range of
tools and technologies for reporting and analysis,
each with its own set of capabilities and
restrictions.
Report Builder. A self-service report authoring environment that can be used by business users to
create reports.
Power View. An interactive data visualization environment in a SharePoint site that can be used to
create graphical data elements from tabular data models.
Microsoft Excel
Excel is a commonly used application in most organizations, and provides a rich, mature environment for
all kinds of data analysis and reporting. Excel is a comprehensive spreadsheet tool that you can use to
create tables of data, including complex calculations and lookups, and apply rich formatting, including
conditional formats that highlight meaningful data values. The key to the value of Excel as a BI reporting
tool is its built-in support for importing data from a wide range of data sources, including relational
databases and analytical data models in SQL Server Analysis Services. Additionally, you can install the
Excel Add-in for the Windows Azure data market, and import data from a wide range of data services. You
MCT USE ONLY. STUDENT USE PROHIBITED
6-4 Planning a BI Delivery Solution
can use the data in your spreadsheet as a source for a wide range of charts and visualizations, including
common charts such as bar, line, and pie charts, as well as inline sparklines, data bars, and indicators.
PivotTables and PivotCharts. Excel is commonly used as an interactive interface for exploring
analytical cubes provided by multidimensional and tabular data sources. The built-in PivotTable and
PivotChart capabilities make it easy to aggregate and filter measures by the dimensions in a cube,
and create related tables and charts. Additionally, data analysis tools such as slicers and timelines
make it easy to filter analytical data.
PowerPivot for Excel. In addition to consuming data models from SQL Server Analysis Services, users
can use PowerPivot for Excel to create their own tabular data models. PowerPivot is an add-in that is
provided with Excel 2013, and is available as a download for earlier versions of Excel. Users can use
PowerPivot to create a tabular model that is saved with the Excel workbook, and which can be
published to SharePoint for use by other business stakeholders.
Power View. In Excel 2013, users can create Power View visualizations from tabular models in an
Excel workbook, using the same intuitive graphical Power View interface as Reporting Services
provides in SharePoint.
Data Mining add-ins. With the SQL Server Data Mining add-ins for Excel, users can create and use
data mining models in an Analysis Services instance to analyze tables of data in an Excel spreadsheet.
PerformancePoint Services
PerformancePoint Services is a component of SharePoint Server that provides business analysis
functionality. With PerformancePoint Services, you can use the built-in Dashboard Designer to create:
Data connections to a wide variety of corporate data sources, including relational databases and data
models in SQL Server Analysis Services.
Key performance indicators (KPIs) that compare business performance metrics to targets.
enables some features of Reporting Services that are not available in native mode, including Power
View and data alerts.
Excel Services. SharePoint Server includes Excel Services, which provide an interactive Excel interface
in a web browser. By publishing Excel workbooks and PowerPivot data models in a SharePoint site,
you can make them available to users across the enterprise, even when Excel is not installed on their
computers. When combined with Excel Services in SharePoint Server, Excel provides a comprehensive
BI reporting tool for a variety of scenarios.
PowerPivot for SharePoint. PowerPivot for SharePoint is built on Excel Services, and enables
business users to publish Excel workbooks that contain PowerPivot tabular data models to a
SharePoint site, where other users can view them and use them as data sources for their own
interactive analysis in Excel and Power View.
Power View. When SQL Server Reporting Services is installed in SharePoint Integrated mode, users
can use Power View to create interactive visualizations of tabular models in PowerPivot workbooks or
Analysis Services databases in a browser, and save their Power View reports in SharePoint.
PerformancePoint Services Content. PerformancePoint Services content such as KPIs, scorecards,
and dashboards can be delivered only in a SharePoint site.
Business Intelligence Center. You can enable individual BI functionality in any SharePoint site, but it
can be more effective to use the built-in SharePoint Server Business Intelligence Center template to
create a BI portal. You can create a site or subsite based on this template, which includes all of the
necessary components for PerformancePoint Services content. Then, you can add a document library
for Reporting Services reports and PowerPivot Gallery for PowerPivot and Power View reports to the
site for a complete BI delivery solution.
Note: Considerations for using SharePoint as a BI delivery platform are discussed in detail
in Module 9: Planning a SharePoint Server BI Solution.
MCT USE ONLY. STUDENT USE PROHIBITED
6-6 Planning a BI Delivery Solution
Lesson 2
Common Reporting Scenarios
Although the business requirements in every organization are different, there are some reporting
scenarios that are commonly required. Not every BI solution requires all of these scenarios, but it is useful
to be able to map business requirements to these scenarios and plan their implementation accordingly.
Lesson Objectives
After completing this lesson, you will be able to:
Formal Reports
Most corporate environments include a
requirement for formal reporting. Formal reports
are a traditional way for business users to view key
information about the business, and to
communicate business results to external
stakeholders such as shareholders and partners.
Typically, a formal reporting solution has the
following characteristics:
Reports are based on managed corporate data sources, such as a data warehouse or a financial
accounts system.
Reports are formally structured with limited interactivity, such as expanding summarized values to
reveal details.
Reports are often distributed in document formats, such as Microsoft Word, PDF, or as a Microsoft
Excel workbook.
The analysis is performed by users who are familiar with the business measures and dimensions, and
can interpret the findings in context.
The data for the analysis is provided by a data warehouse (or a departmental data mart), often
exposed through an analytical data model.
The analyst might have a specific goal or hypothesis in mind, but the analytical activity is unstructured
and as unconstrained as possible within the limitations of the available data, the data model, and the
analytical tool used.
Users often want to be able to present or communicate their findings to colleagues and managers.
The analysis is based on a combination of managed corporate data and data from other, often
external, sources.
The users performing the analysis are familiar with advanced data access techniques and can acquire,
cleanse, and restructure the data they need with minimal IT support.
The conclusions of the analysis might be shared with colleagues and managers, but the activity is not
a regular part of business operations and periodic distribution of the results is not required.
Scorecards
In every business, there are key metrics that are
considered to be indicators of the overall health
and performance of the company. Many of these
indicators are financial measures, such as revenue
and profit, but others might include figures
related to employee turnover, manufacturing
output, the number of customers, or any other
measurable aspect of the company that is
considered important to its success. It is common
practice to track key performance indicators (KPIs)
for these metrics, and measure them against
targets or performance in previous periods.
The KPIs for a particular area of the business, or the business as a whole, can be combined to create a
scorecard that enables business stakeholders to measure overall performance. A scorecard shows an
indicator for each important metric, and enables users to drill-down into the individual areas of the
business that contribute to the overall score. Additionally, the individual KPIs can be weighted to reflect
their relative importance in terms of measuring business performance.
Typically, a scorecard:
Shows a score for the KPIs that are a considered to reflect the performance of the business, or a
business area.
Enables users to see overall performance at a glance, and drill-down into a particular KPI to see the
contributory scores for that indicator.
Dashboards
A dashboard provides a high-level overview of
performance for a business or a business area by
showing a combination of related indicators,
charts, and other data elements. By bringing
together top-level business performance
information into a single view, a dashboard
provides a way for business managers to get an
overall view of how the business is performing at a
glance, and quickly identify any areas of concern
that require further exploration.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 6-9
Typically, a dashboard:
Includes a mixture of different data visualizations that convey related high-level business information
in a way that can be quickly understood.
Provide interactive filtering, so that selecting a data value in one element of the dashboard filters the
data shown in the other dashboard elements.
Is used as a primary means of business performance monitoring that makes it easy to identify areas of
concern that require further exploration.
Is created by a BI specialist developer based on specific business requirements and priorities provided
by stakeholders.
MCT USE ONLY. STUDENT USE PROHIBITED
6-10 Planning a BI Delivery Solution
Lesson 3
Choosing a Reporting Tool
The common reporting scenarios can all be addressed by multiple reporting tools. However, some tools
are more appropriate for some scenarios than others. This lesson explores the reporting tools available
from Microsoft, and their suitability for the reporting scenarios described in the previous lesson.
Lesson Objectives
After completing this lesson, you will be able to:
Describe how SQL Server Reporting Services supports common reporting scenarios.
Data exploration and analysis. When deployed in SharePoint Integrated mode, Reporting Services
includes Power View, which provides a simple to use, intuitive interface for graphical data exploration.
Users can create comprehensive visualizations of data from tabular data models and save them in a
SharePoint site for others to view, or export them to Microsoft PowerPoint for presentation to
colleagues, managers, or other business stakeholders.
Analytical data mashups. Report Designer and Report Builder support a wide range of data sources,
but they offer limited interactive analytical capabilities. Power View provides a high level of
interactivity, but it is limited to the data in the underlying tabular data model. To use Power View in a
SharePoint site as a tool for data mashups, business users would need to use PowerPivot for Excel to
create their own tabular data models with all of the data they need.
Scorecards. Report Designer and Report Builder both enable report authors to visualize comparative
data measures by using indicators. By creating and publishing a report that shows indicators for key
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 6-11
business metrics, you could use Reporting Services to create a scorecard. However, for complex
scorecards, other KPI visualization tools are generally easier to use than Reporting Services.
Dashboards. You can create a report that shows summarized data in tables and matrices, charts,
indicators, data bars, sparklines, and gauges. You can then publish this report as a dashboard.
Reporting Services supports drill-down aggregation, and actions such as links to reports or other
online resources, so you could use Reporting Services to create reasonably interactive dashboards.
However, it is difficult to incorporate inter-element filtering in a report, and other tools might provide
a more flexible solution for your particular dashboard requirements.
Note: Considerations for using Reporting Services in a BI solution are discussed in detail in
Module 7: Designing a Reporting Services Solution.
Microsoft Excel
You can use Excel in the following reporting
scenarios:
Data exploration and analysis. Excel provides a comprehensive solution for user-drive analysis and
data exploration. The ability to connect to a data model and create PivotTables and PivotCharts
makes data exploration easy, and the addition of PowerPivot and Power View means that
sophisticated business users can create their own data models and visualizations in a familiar tool.
Analytical data mashups. The ability to import data from multiple data sources, including data
services in the Windows Azure data market, makes Excel a powerful tool for creating data mashups.
Even when a connection to a particular data source can’t be created, Excel makes it easy to import
data from text files and XML documents, and in many cases, users can simply paste data from the
clipboard. Users can use the built-in functionality of Excel to filter and cleanse the data before
incorporating it into a PowerPivot tabular model for inclusion in analysis.
Scorecards. Excel provides built-in support for graphical indicators based on KPIs defined in
multidimensional or tabular data models, and it can be used to create a spreadsheet-based scorecard.
You can then publish the workbook that contains the KPIs to SharePoint, where users can view them
in Excel Services. In some organizations, particularly those where Excel is commonly used for
reporting, this approach can be effective. However, comprehensive scorecards that include
hierarchical rollups and weightings can be complex to create in Excel.
Dashboards. You can use all of the data elements in Excel, including PivotTables, PivotCharts, slicers,
timelines, and indicators to create a spreadsheet-based dashboard, which you can publish to
MCT USE ONLY. STUDENT USE PROHIBITED
6-12 Planning a BI Delivery Solution
SharePoint for interactive viewing in Excel Services. However, you must be careful to ensure that the
data in the Excel workbook on which the dashboard is based is frequently refreshed. Additionally, the
Excel environment provides a rich data analysis tool, but it is not designed for online viewing at a
glance. You can design the default layout of the spreadsheet to be easily viewed in a browser, but as
users interact with it, the elements can be resized and repositioned.
Note: Considerations for using Excel in a BI solution are discussed in Module 8: Designing
an Excel-Based Reporting Solution.
PerformancePoint Services
You can use PerformancePoint Services in the
following reporting scenarios:
Scorecards. PerformancePoint Services provides comprehensive support for scorecard authoring and
publishing, and makes it easy to define hierarchical KPIs with weighted scorecard values.
Dashboards. PerformancePoint Services makes it easy to create dashboards that contain related data
visualizations with inter-element filtering, and because PerformancePoint Services is a component of
SharePoint Server, it is easy to create dashboards that are seamlessly embedded into the SharePoint
site.
Data exploration and analysis. Excel is the most comprehensive and flexible platform for self-service
data analysis and exploration. Most business users are familiar with Excel, and can easily connect to
data sources and create PivotTables and PivotCharts or create Power View visualizations. More
advanced Excel users can use PowerPivot to create their own tabular data models or data mining
tools to apply predictive analytical models to data.
Analytical data mashups. Similarly, Excel is the most suitable tool for analytical data mashups. Its
support for importing data from multiple corporate and external data and its range of data editing
and filtering capabilities make it a flexible tool for user-driven analysis across data sources.
Scorecards. Generally, PerformancePoint Services is the best choice for scorecards in environments
where SharePoint Server is used to deliver BI services. You can create complex scorecards that span
business areas and apply custom weightings and threshold values to the KPIs in the scorecards more
easily than in Excel or Reporting Services, and you can integrate PerformancePoint scorecards with
other elements of a PerformancePoint dashboard. In most cases, scorecards are developed by BI
specialists.
Dashboards. PerformancePoint Services provides the best solution for dashboards in a SharePoint-
based environment. The ability to create interactive dashboards that are closely integrated into the
SharePoint Server user experience helps you embed BI into everyday business operations. Like
scorecards, dashboards are usually created by BI specialists.
Regardless of which tools are used to implement specific reports, all reporting and analysis solutions can
be delivered through SharePoint Server.
Additional Reading: For more information about reporting scenarios and the Microsoft
tools that can be used to support them, see “How to Choose the Right Reporting and Analysis
Tools to Suit Your Style” at http://msdn.microsoft.com/en-us/library/jj129615.aspx.
MCT USE ONLY. STUDENT USE PROHIBITED
6-14 Planning a BI Delivery Solution
Objectives
After completing this lab, you will be able to:
Discuss the reporting requirements in the interviews and agree on appropriate tools to support them.
Results: At the end of this exercise, you will have a reporting requirements document that lists the
reporting scenarios that the BI solution must support, and the tools that you plan to use.
Question: How does the inclusion of a requirement for self-service BI influence the choice of
data tools?
MCT USE ONLY. STUDENT USE PROHIBITED
6-16 Planning a BI Delivery Solution
MCT USE ONLY. STUDENT USE PROHIBITED
7-1
Module 7
Designing a Reporting Services Solution
Contents:
Module Overview 7-1
Module Overview
Microsoft SQL Server Reporting Services provides a scalable, versatile platform for reporting solutions. If
your Microsoft-based business intelligence (BI) solution requires formal reporting, it is likely to include
Reporting Services. When planning your BI solution, you must consider how Reporting Services will affect
the overall infrastructure requirements and design, and how you will use it to deliver the reports required
to support the business requirements.
Objectives
After completing this module, you will be able to:
Lesson 1
Planning a Reporting Services Solution
Before you start designing reports, you must consider the business requirements and plan the Reporting
Services implementation. This lesson describes some key considerations and guidelines that will help you
plan a successful reporting solution.
Lesson Objectives
After completing this lesson, you will be able to:
The team members who are responsible for designing Reporting Services infrastructure should include
someone with knowledge of Reporting Services architecture and experience with Reporting Services
component installation and configuration. Additionally, the team should include IT personnel with
knowledge of the network and domain environment into which Reporting Services will be deployed. If
Reporting Services is to be deployed in SharePoint Integrated mode, a SharePoint Server specialist should
also be involved.
elements, such as datasets and charts, and the rendering formats and delivery channels that will be used
to deliver the reports to business users.
Typically, team members involved in designing reports and report items include a database administrator
or developer who is familiar with the data warehouse and other data sources; a specialist report designer
who can translate user requirements into report designs; and business users who can articulate and clarify
report data and structure requirements, determine report format and interactivity requirements, and
provide feedback on report mockups, prototypes, and drafts.
Be aware that in most business environments, requirements can change over time. It is not uncommon to
start a BI project with a list of required reports, and after the data warehouse and data models required to
support them are in place and users have started using them, learn that new reporting requirements
emerge because of the insights gained from the BI solution. Therefore, you need to consider extensibility
when planning data sources, datasets, and reporting folder structures.
After identifying the data sources, you can start to consider the specific queries that will be required to
retrieve the reporting data. In some cases, you might find that similar datasets can be used to support
MCT USE ONLY. STUDENT USE PROHIBITED
7-4 Designing a Reporting Services Solution
multiple reports. For example, a monthly sales report might include the same columns as a quarterly sales
report, thereby enabling you to use parameters to filter the rows returned by the dataset, depending on
the report being rendered.
Even in scenarios where reports will be viewed on-demand, you should determine when the reports
should be refreshed with new data. In some cases, users want to see the very latest data from the sources
when they view the report, but in other cases the reports might only need to reflect data for a specific
time period. Understanding this requirement enables you to plan the use of caching or snapshots to
optimize report rendering performance and reduce the workload on data sources.
Native mode
In native mode, Reporting Services provides a
report server that runs as a service on a Windows
server. This service uses a SQL Server database to
store the report catalog and temporary object
database, and provides user and administrative access through a web-based tool named Report Manager.
Additionally, the report server configuration is managed through a dedicated Reporting Service
Configuration Manager tool.
Reports are organized into folders and secured based on permissions applied to Reporting Services roles.
Administrators can create and manage roles and permissions in Report Manager.
In native mode, users can use a special My Reports folder to manage their own set of reports. They can
also subscribe to reports to have them delivered by email or to a file share. Additionally, administrators
can create data-driven subscriptions that deliver reports to a collection of users based on a predefined set
of delivery and format preferences.
The reports in document libraries can be organized into folders, and permissions on these items are
applied to SharePoint users just like any other document in a SharePoint library. Like in native mode, users
and administrators can create subscriptions for scheduled delivery of reports, and additionally in
SharePoint Integrated mode, users can subscribe to data alerts, which automatically send notifications if
the source data for a report has changed. However, there is no support for the My Reports folder in
SharePoint Integrated mode.
Configuration settings for the email server. You must determine the simple mail transfer protocol
(SMTP) server that will be used to relay reports and configure Reporting Services to use it in
Reporting Services Configuration Manager or the SharePoint Central Administration site.
The types of subscription to be supported. Reporting Services supports standard subscriptions,
which can be defined by individual users; and data-driven subscriptions, which must be defined by an
administrator. Empowering users to create their own subscriptions enables them to manage their own
report delivery to suit their own requirements, but it reduces the level of control the administrator has
over the subscriptions being created. In an environment where a large number of users are creating
subscriptions that run on multiple schedules, the workload on the report server can increase
dramatically. If data-driven subscriptions are to be created, you must determine which reports need
to be delivered to which users, when, in what format, and with what parameter values. Additionally,
data-driven subscriptions are only available in SQL Server 2012 Enterprise and Business Intelligence
editions.
Data source credentials. Subscriptions require that the credentials used to retrieve report data from
data sources are stored on the report server.
Planning Security
Reports can contain sensitive data, and security is
a primary concern in many reporting
environments. When planning report security, you
need to consider how to secure access to the
reports themselves, and also how to secure access
to the data sources that provide the data for the
reports.
Report security
The specific steps for configuring report security
depend on whether Reporting Services is
deployed in native mode or SharePoint Integrated
mode. However, the planning considerations for
securing reports are the same, regardless of the deployment mode. The basic principle to apply when
planning security for reports is the same as for any resource:
1. For each resource or group of related resources, determine the different levels of access required (for
example, read, view definition, or manage). This enables you to identify the required roles or groups.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 7-7
2. Identify the users who need to access the resources and the level of access required. This enables you
to determine the membership of the roles or groups.
You should use the matrix of reports and user audiences that you compiled during the requirements
gathering phase as a starting point for determining the required permissions. For each report, group the
users who need access, and then identify reports with common audiences. From this information, you can
determine the roles that are required and the users who need to be members of those roles.
If your reporting solution supports subscriptions, caching, or snapshots, you cannot use Windows
Integrated authentication or an interactive prompt for credentials, and you must store a single set of
credentials on the report server to be used when accessing the data source.
Security. By arranging reports and items into folders based on security requirements, you can easily
apply the same security settings to a group of related reports by setting permissions at the folder
level. You can take advantage of permission inheritance through a hierarchy of folders.
Item types. In addition to reports, a reporting solution includes items such as data sources, shared
datasets, and report parts. Often, it makes sense to keep items of the same type together, because
this makes it easier for report developers, administrators, and self-service report authors to find and
manage them.
Audience. In some BI solutions, the overriding organizational principle for reports is the audience
that will consume them. For example, a BI solution might include reports for multiple groups of users,
MCT USE ONLY. STUDENT USE PROHIBITED
7-8 Designing a Reporting Services Solution
such as executives, sales managers, and production engineers. If each audience requires a discrete set
of reports, it makes sense to organize the reports into folders that reflect the audiences.
Business areas. Another way to organize reports is by the area of the business that the reports relate
to. For example, a BI solution might include reports that show information about sales, financial
accounts, and production. In this case, it might be logical to organize the reports so that all of the
sales-related reports are stored in one folder, the financial reports in a second folder, and the
production reports in a third folder.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 7-9
Lesson 2
Designing Reports
After you plan the reporting solution as a whole, you can focus on the specific reports that must be
created. This lesson describes some key considerations and guidelines for designing reports.
Lesson Objectives
After completing this lesson, you will be able to:
Design datasets.
Include only the columns that are required for display, sorting, or grouping in the report. For
example, avoid using a SELECT * statement, and instead list only the columns that are actually
required.
Include only the rows that are required for the detail level of the report. For example, if you are
creating a report to show sales performance for each product, grouped by category, there is no need
to include the individual order rows. Instead, use a query with a GROUP BY clause that aggregates the
orders by product and category so that the product-level aggregation in the query is the detail level
of the report.
Use parameters to filter data in the query. You can apply filters at the report level, or in the query
used to retrieve the data. Generally, it is inefficient to retrieve all of the data from the data source and
then filter in the report. Most business reports are scoped by a specific time period, such as month,
quarter, or year, so most reports should have time-based parameters that restrict the number of rows
retrieved.
MCT USE ONLY. STUDENT USE PROHIBITED
7-10 Designing a Reporting Services Solution
Note: In some scenarios, such as when using snapshots or cached datasets, you might
choose to retrieve the largest volume of data than is needed by a single user of the report or
dataset, and then use report-level filters to generate reports at the required scope. In this
scenario, the performance overhead of initial data retrieval is compensated for by a reusable
dataset or report, which minimizes the need for subsequent requests to the data source.
Define restrictive default parameter values. Define default parameters to return a minimal number
of rows based on common usage of the report. For example, a sales report might be used to view
sales by month, quarter, or year based on parameters for the start and end of a time period range. By
setting the default values of these parameters to the shortest time period that is commonly used, you
can reduce unnecessary data retrieval. A common technique for applying minimal time period
parameters across multiple reports is to create a shared dataset that retrieves a range of commonly
used date values, and to use that dataset as the source for default and available parameter values for
all time-filtered reports. For example, the following query could be used to return values for:
Define pagination. Reporting Services provides fine-grained control over pagination behavior for
reports, including the ability to force page breaks before or after data regions and groupings, and to
repeat column headers on each new page. Use these settings to ensure that the report is easy to
consume when spread across multiple physically printed pages.
Include page numbers in the report header or footer. Multipage reports often contain many
similar-looking pages. If the report is printed on a printer that does not collate the pages, or the
printed report pages are dropped, it can be difficult to sort the pages into the correct order. Including
page numbers makes it easier to do this.
Include the report execution time. When a printed report is distributed on a regular schedule, it
can be difficult to identify the current version. By including the date and time of the report execution
in the report, users can be sure that they are looking at the most recent data.
Include parameter values. When viewing a printed report, the user has no way to determine the
parameters that were used to filter the data, unless the parameter values are included in the report.
Optimize for monochrome documents. Although color printers have become increasingly
prevalent in business environments, it is common for reports to be printed on monochrome printers
or to be photocopied on a black and white photocopier. Therefore, you should avoid relying on color
to indicate specific meaning in the report, and try to use colors that are easily distinguishable when
viewed as greyscale. If the report is intended primarily for consumption as a printed document,
consider using only black and grey fonts on a white background.
MCT USE ONLY. STUDENT USE PROHIBITED
7-12 Designing a Reporting Services Solution
o Avoid including subreports in groups where a report includes a large number of parent groups.
o Use the Visibility property to hide child data and provide drill-down functionality where there
are a moderate number of detail rows in the report.
o Use link actions to provide drill-through functionality to parameterized child data reports where
there are a large number of detail rows.
Avoid complex expressions in headers and footers. If an expression includes anything other than a
simple field value, it is assumed that the expression may include a reference to the TotalPages global
variable. This means that the first page of the report cannot be rendered until after the entire report
is processed and paginated.
o Gauges. Used to indicate a level of performance within a range of possible values. For example, a
gauge might show the revenue to date against a sales target.
o Data bars and sparklines. Used to provide graphical comparisons of values in multiple rows. For
example, a data bar could show comparative sales volumes across multiple salespeople, or a
sparkline could compare monthly sales levels across regions.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 7-13
o Indicators. Used to show how a specific data value compares to a target. For example, you could
use indicators to create a scorecard report that shows how monthly sales revenue compares to
target or to the same period in the previous year.
o Maps. Used to show business data that is geographical in nature. For example, you could use a
map to show countries color-coded to indicate the level of sales in each country.
Consider creating charts and other graphical elements as parameterized subreports or report
parts. Often, the same chart is useful in multiple reports, and you can use subreports or report parts
to create a library of reusable data visualizations that can be incorporated into any report with a
compatible dataset and appropriate parameters. For example, the dashboard shown on the slide
consists of a report that contains the title and execution time, and four subreports that each use
parameter values from the parent report to filter the graphical data.
Avoid overcomplicating charts. Charts are most useful when they show a clear visualization of a key
data element. You can create charts that include multiple categories and series, and show many data
points; but the more detail you add to a chart, the more difficult it can be to interpret the information
that the chart is designed to convey. Additionally, be careful when selecting color palettes for charts,
because some color combinations can be difficult to distinguish.
Page size and pagination settings can significantly affect the way a report is rendered.
Interactive functionality, such as drill-down expansion of hidden groups, is not supported in all
formats.
Background images are not supported in all renderers, and may be displayed above or below the
data region to which they are applied instead of behind it.
The CanGrow property of text boxes and the AutoSize property of images can cause problems in
some formats. Some renderers are more efficient if the size of text boxes and images is fixed. You can
use rectangles in reports to fix the position of text boxes and images.
To overcome these problems, you can create a version of the same report for each target format; for
example, an interactive report that includes a hyperlink to a static version for printing. However, this can
entail a lot of additional work to develop and manage multiple versions of each report. An alternative is to
design adaptive reports that modify their behavior, depending on the rendering extension being used
when the report is rendered. To help you accomplish this, Reporting Services supports the following
global variables:
RenderFormat.IsInteractive. You can use this variable to determine if the render format supports
interactivity, such as drill-down functionality to show hidden groups.
MCT USE ONLY. STUDENT USE PROHIBITED
7-14 Designing a Reporting Services Solution
RenderFormat.Name. You can use this variable to determine the specific rendering extension being
used and apply format-specific settings.
For example, the following expression could be used to set the Hidden property of a group in a Tablix
data region. The group would then be hidden when rendered to an interactive format, but visible in
formats that do not support interactivity.
Edit MDX queries in a suitable query tool. Report Builder and Report Designer include a graphical
query editor that you can use to create simple MDX queries, but in many cases you will want to
modify the MDX that is generated to include additional metadata or optimize the query syntax. You
should use the query designer in Report Builder and Report Designer to create the initial query and
configure parameters, and then copy the code to a more fully featured editing tool to refine it before
importing it back into the dataset.
Remove the NON EMPTY clause. Often, a report should include empty rows; for example, to
indicate a lack of sales of a particular product or on a particular day. To ensure that empty rows are
included in the dataset, remove the NON EMPTY clause from the MDX query.
Let the cube perform aggregation. Instead of using a specific function such as Sum or Count in
report field expressions, use Aggregate. This ensures that the specific aggregation defined in the
cube is applied. This is particularly important when the cube includes semi-additive measures that can
be aggregated across some dimensions but not others.
Let the cube perform sorting. Dimension attributes in a cube can be configured with a sort order
based on their name, key value, or a completely different attribute in the same dimension. Avoid
specifying sort expressions for groups in a report based on a data model and rely on the sort order
defined for the dimension attribute in the data model.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 7-15
Lesson 3
Planning Report Consistency
Organizations commonly generate many reports, often hundreds or even thousands of discrete reports.
Making sure that the reports generated by a company are consistently formatted and structured can
result in many benefits, including:
A consistent, professional image through the use of branding and approved formatting.
This lesson describes some strategies for enforcing report consistency across an organization or
business area.
Lesson Objectives
After completing this lesson, you will be able to:
Create and use report templates.
Use shared data sources and datasets to enforce consistent report data.
Use linked reports to create consistent reports containing subsets of relevant data.
Create a self-service reporting environment that encourages report consistency.
Report Templates
To enforce consistency, you can create a report
that includes the elements that you want all
corporate reports to include, and save it as a
template for report developers. For example, you
can create a report template that includes
expressions to display the report title, execution
time, and page numbers in the report header. You
can also apply corporate branding, such as font
formatting and images.
Standardize parameters. Another use of shared datasets is to provide a library of standard fields for
default and available parameter values. This simplifies the development of parameterized reports, and
helps provide a consistent user experience when browsing reports.
Create templates. You can save shared data sources and datasets in the same folder as report
templates to make them available to report developers using Report Designer. Report Designer
requires that the Visual Studio project used to create the reports includes local data sources and
datasets that can be used during report development. You can use the project properties to ensure
that when the project is deployed, existing data sources and datasets on the server are not
overwritten.
Linked Reports
Sometimes you need to provide a consistent
report to multiple audiences, but with different
parameterized data in the report. For example,
you might need to create a corporate sales report
for executives showing sales across all regions, and
provide the same report to each regional sales
manager showing only the sales for the relevant
region.
Make shared data sources and datasets easy to find. Report Builder does not require local design-
time data sources and datasets, so business users should be able to easily find and select published
data sources and datasets to use in their reports. This not only simplifies report development for
business users, but it also helps ensure consistency and manageability of the queries being used to
populate reports. If some business users are sufficiently skilled in writing their own queries, they can
be empowered to create new shared datasets where required, using existing shared data sources and
datasets for default and available parameter values.
Publish parameterized report parts. Create charts and other reusable data elements, such as
scorecards, as parameterized report parts and publish them as a library that business users can
incorporate into their reports. The parameters enable users to apply the report parts to their own
subsets of reporting data while helping ensure a consistent style for data visualizations across all
corporate reports.
MCT USE ONLY. STUDENT USE PROHIBITED
7-18 Designing a Reporting Services Solution
Objectives
After completing this lab, you will be able to:
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
Discuss the reporting requirements in the interviews and agree on an appropriate folder structure to
support them.
Create the folders you think are needed in Report Manager at http://mia-sqlbi/reports_sql2.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 7-19
Results: At the end of this exercise, you should have created folders in the report server at http://mia-
sqlbi/reports_sql2.
In the project, create shared data sources for each data source the reporting solution must support.
o Use Windows authentication for the data source credentials―you will use this form of
authentication when working in the project.
Configure the project properties so that the data sources, datasets, and report parts in the project will
be deployed to the appropriate folders you created earlier on the http://mia-sqlbi/reportserver_sql2
report server. Reports in this project should be deployed an appropriate folder for report templates.
Modify the report so that it matches the design specification provided by the CEO in the interviews
document:
o The Adventure Works Cycles corporate logo is provided in the D:\Labfiles\Lab07\Starter folder.
Note: Transact-SQL scripts for the datasets you need to create in this task are provided in the
MCT USE ONLY. STUDENT USE PROHIBITED
7-20 Designing a Reporting Services Solution
D:\Labfiles\Lab07\Starter folder.
In the AWReports project, add the following shared datasets, which should all use the data source
for the data warehouse:
o Internet Sales By Country. This dataset should return the following fields filtered by starting and
ending ship dates that are provided as Date/Time parameters:
Fiscal Year.
Month Number.
Month Name.
Country.
State or Province.
City.
Total Sales Amount.
Total Product Cost.
The results should be grouped and ordered by the following fields:
Fiscal Year.
Month Number.
Month Name.
Country.
State or Province.
City.
o Reseller Sales By Region. This dataset should return the following fields filtered by starting and
ending ship dates that are provided as Date/Time parameters, and by region, which is provided
as a Text parameter that supports multiple values and has default value that includes the Europe,
North America, and Pacific regions;
Fiscal Year.
Month Number.
Month Name.
Sales Region.
Sales Country.
Sales Territory.
Total Sales Amount.
Total Product Cost.
The results should be grouped and ordered by the following fields:
Fiscal Year.
Month Number.
Month Name.
Sales Region.
Sales Country.
Sales Territory.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 7-21
o Last Month. This dataset should retrieve the date of the first and last days in the month prior to
the current month. It will be used to provide default values for parameters in monthly sales
reports.
Copy the datasets and report template that you have created to the C:\Program Files\Microsoft Visual
Studio 10.0\Common7\IDE\PrivateAssemblies\ProjectItems\ReportProject folder.
Launch Report Builder, downloading it if necessary, and use it to create a new chart report based on
the Internet Sales By Country dataset you published to the report server in the previous task.
Create a chart that shows sales revenue by country, and format it as you like.
If you want to preview the chart, enter any two dates within the last year as parameters.
When you are happy with the chart, use Report Builder to publish it as a report part to the folder you
created in Report Manager. Do not publish any other objects in the project (such as parameters or
datasets).
After the chart is published as a report part, close Report Builder, discarding the report.
Results: At the end of this exercise, you will have published shared data sources, a report template, shared
datasets, and a report part.
Tip: You have previously deployed shared data sources and datasets to the report server. However, you
need to create local substitutes for these with the same names in the project to use during development.
When the project is deployed, the local versions do not overwrite the existing server versions, and the
reports in the project transparently use the versions that already exist on the server.
MCT USE ONLY. STUDENT USE PROHIBITED
7-22 Designing a Reporting Services Solution
In the project, create a shared data source with the same name as the one you created earlier for the
data warehouse SQL Server database:
o Use Windows authentication for the data source credentials―you will use this form of
authentication when working in the project. When you deploy the project, the data source on the
server will be used instead of the one in the project.
Add a new item to the Shared Datasets folder in the project, based on the Reseller Sales By Region
dataset template you created earlier, and named Reseller Sales By Region.rsd:
o When the dataset is added, it is opened so you can see its source XML definition. After you close
it, re-opening it will display its properties.
Add a new item to the Shared Datasets folder in the project, based on the Last Month dataset
template you created earlier, and named Last Month.rsd.
Tip: Now that you have created local copies of the data source and datasets on the server, you can create
a report and add references to the local data source and datasets in the project. These references will be
switched to the existing server versions when the report is deployed.
Add a new item to the Reports folder in the project, based on the AWReport template you created
earlier, and named Reseller Sales.rdl.
View the report data for the report, and add a reference to the shared data source you created earlier.
Add a reference to the Reseller Sales By Region shared dataset to the report, and name it
ResellerSalesByRegion.
Add a reference to the Last Month shared dataset to the report, and name it LastMonth.
Tip: The ResellerSalesByRegion dataset includes parameters for the start and end dates by which the
data in the report will be filtered. You will use the fields returned by the LastMonth dataset to set the
default values for these parameters to the first and last dates of the previous month, respectively.
The ResellerSalesByRegion dataset also includes a multi-valued parameter for the regions that should be
included in the report, and a default value that includes all of the regions has already been defined for
this.
Configure the StartDate and EndDate parameters to use the FirstDate and LastDate fields from the
LastMonth dataset as default values.
Add a table to the report, and use fields from the ResellerSalesByRegion dataset to create a report
that shows sales revenue for each territory, grouped by country and region:
o To create groups in the report, drag fields to the Row Groups pane under the report.
Preview the report and apply formatting until you are satisfied with it.
o Data sources and datasets in the project will not overwrite existing objects with the same name
on the server.
o Reports in this project will be deployed to an appropriate folder for executive reports based on
the folder structure you defined at the start of this lab.
Deploy the project.
Use Internet Explorer to verify that the Reseller Sales report has been deployed, and that it shows
reseller sales by region with the expected default parameters.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 7-23
o Name the linked report Reseller Sales - Europe and save it in an appropriate folder for access by
the regional sales manager for Europe.
o After you save the linked report, edit its properties and override the default Regions parameter
value so that the report shows only sales for Europe, and hide the Regions parameter so that
users cannot change it.
Create a second linked report for the regional sales manager of North America, and the third report
for the regional sales manager of the Pacific region:
o Name the linked reports appropriately and save them in appropriate folders.
o Override the Regions parameter so that each report shows only sales for the appropriate region
and cannot be changed.
In this task, you will use Report Builder to create a report based on the template you saved earlier. In a
self-service reporting scenario, you can use this technique to ensure report consistency. However, Report
Builder does not include functionality for template-based authoring, so self-service authors must open an
existing report that serves as the template, and then save the modified report under a different name.
Start Report Builder and open the AWReport template from the folder in the report server where you
deployed it in the previous exercise.
After you open the report, save it as Internet Sales in an appropriate folder for the financial reports
that will be created by self-service reporting authors. Be careful not to overwrite the AWReport
template.
Tip: In a production environment, you could use permissions to ensure that self-service reporting authors
have read-only access to the report template, removing the risk of accidental modification.
Insert the InternetSalesChart report part you published previously into the report:
o Search the report part gallery for “InternetSales” to find the report part, and then double-click it
to insert it.
o You may need to resize the report and chart appropriately after you have added it.
View the report data and note that adding the report part also added the dataset on which the chart
is based, and the parameters defined in that dataset.
Tip: Unlike when using Report Designer in SQL Server Data Tools, when using Report Builder, you can add
a reference to data sources and datasets on the server directly, without having to create a local copy in a
project.
Add a dataset named LastMonth that is a reference to the Last Month shared dataset you deployed
to the report server in the previous exercise.
Configure the StartDate and EndDate parameters to use the FirstDate and LastDate fields from the
LastMonth dataset as default values.
Insert a table into the report, under the chart, and use it to show revenue, cost, and profit for each
city, grouped by state or province and country.
=Fields!Revenue.Value - Fields!Cost.Value
Run the report to preview it, and then make any formatting changes you want.
When you are satisfied with the report, save it and close Report Builder.
Use Internet Explorer to view your report in the folder where you deployed it.
Results: At the end of this exercise, you should have created a report from a template, created a linked
report, and used Report Builder to create a report that includes a previously published report part.
Question: What were the key organizational principles you applied when designing the
report server folder structure, and what revisions did you consider when you started to create
and publish report items?
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 7-25
Question: What are likely to be the key challenges in providing a self-service reporting
solution that includes an Analysis Services data model as a data source, and how might you
overcome them?
MCT USE ONLY. STUDENT USE PROHIBITED
7-26 Designing a Reporting Services Solution
MCT USE ONLY. STUDENT USE PROHIBITED
8-1
Module 8
Designing a Microsoft Excel-Based Reporting Solution
Contents:
Module Overview 8-1
Module Overview
Microsoft Excel is used in many organizations around the world, and is familiar to most business users. By
leveraging the data analysis capabilities of Excel, organizations can empower users to explore data models
and other data sources, identify patterns in business data, and improve business decision making. As a
business intelligence (BI) specialist, you must be familiar with the data analysis functionality in Excel, and
be able to determine suitable Excel features for specific business requirements.
Objectives
After completing this module, you will be able to:
Lesson 1
Using Excel for Data Analysis and Reporting
Excel includes comprehensive functionality for displaying and analyzing data. This lesson describes some
of the ways in which users can connect to data sources from Excel and perform data analysis.
Lesson Objectives
After completing this lesson, you will be able to:
Access to external data sources and cloud services. Excel includes support for importing data from
Web pages and cloud services, such as the Windows Azure data market and Bing Maps. For these
resources, users will require access to the Internet and, in some cases, a registered account with the
data service provider.
How users will share Excel workbooks. In some scenarios, specialist data analysis will use Excel in
isolation. However, increasingly, organizations rely on users sharing information. In an enterprise
organization, users can share Excel workbooks by publishing them on a Microsoft SharePoint Server
site and use Excel Services to view and interact with the workbooks in a web browser. Additionally,
Microsoft Office 365 enables organizations to publish Excel workbooks in the cloud and interact with
them by using Office web applications.
Note: This module focuses on the data analysis functionality of Excel. Considerations for
sharing workbooks in a SharePoint Server site are discussed in Module 9: Planning a SharePoint
Server BI Solution.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 8-3
SQL Server databases. Users can also retrieve data from SQL Server relational databases. Most
commonly in a BI solution, this type of access is performed against the data warehouse or a
departmental data mart. You must give consideration to the credentials that users will use to connect
to SQL Server and the permissions they will need. Typically, users access the data warehouse by using
Windows Integrated authentication and require only read access to the data. A common approach is
to grant read access to views instead of to base tables, which enables you to remove metadata
columns that are not required for analysis and include query hints to optimize performance and
concurrency.
Other data sources that can be used to access corporate data include:
OLE DB and ODBC databases. If the organization stores data in databases for which OLE DB or
ODBC drivers are available, users can connect to these data sources from Excel.
Microsoft Access databases. If the organization stores data in a Microsoft Access database, it can be
imported into an Excel worksheet for analysis.
Text files. You can import data from text file, such as comma-separated values (CSV) files. This can be
useful when a data provider is not available for corporate applications that can export data as text.
XML files. You can import data from XML files, which is a common export format for many
applications.
Increasingly, businesses can extend the value of their data analysis by incorporating data from sources
outside of the organization. With Excel, you can import data from the following external sources:
Windows Azure SQL Database. The growth in popularity of cloud services has led many
organizations to use cloud-based database services such as Windows Azure SQL Database. From
Excel, Windows Azure SQL Database is accessed like any other SQL Server data source, except that
only SQL Server authentication is supported.
Windows Azure DataMarket. Users can subscribe to third-party datasets in Windows Azure
DataMarket and use them to obtain useful data that augments corporate data models. For example, a
DataMarket feed containing historical weather statistics for specific geographical areas could be used
to analyze the effect of weather on sales in those regions.
MCT USE ONLY. STUDENT USE PROHIBITED
8-4 Designing a Microsoft Excel-Based Reporting Solution
OData Feeds. OData has become a popular format for data feeds from applications and services
across the web. In Excel, you can connect to an OData feed and use it as a source of data for a table
or PivotTable.
Calculated values
One of the key strengths of Excel is its ability to calculate cell values based on formulae. You can use Excel
formulae to:
Calculate column values based on data in other columns. For example, subtract cost from revenue
to calculate profit.
Calculate aggregate totals and subtotals. For example, use the SUM function to calculate total
revenue.
Look up related data values. For example, use the VLOOKUP function to find a value in another
table based on a key value in the current table.
Formatting
You can apply formatting to a worksheet to improve its visual appeal, format numbers and dates
appropriately, and emphasize important values. Examples of formatting you can use in an Excel-based
report include:
Font formatting. You can apply font formatting to alter the size, color, and font of the text in an
Excel worksheet.
Number formatting. You can apply number formatting to display values in specific date, currency,
percentage, and other number formats.
Cell and border formatting. You can apply colors to lines and backgrounds and make it easier to
see distinct areas of the report; for example, column headers and subtotal rows.
Conditional formatting. You can use conditional formatting to change the appearance of cells
based on the values they contain. A wide range of conditional formatting options are provided in
Excel, including the ability to color-code cells based on value ranges, display data bars that are
relative to cell values, and add indicator icons to cells based on the values they contain.
Bar charts.
Column charts.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 8-5
Pie charts.
Scatter charts.
Sparklines.
You can easily create a chart by selecting the data you want to show graphically and inserting the desired
chart type. Additionally, Excel 2013 can suggest appropriate charts based on an analysis of the selected
data.
PivotTables
A PivotTable shows measures aggregated by
dimension attributes and hierarchies. With a
PivotTable, you can interactively:
PivotCharts
PivotTables show dimension attribute member names and measure values in a tabular matrix format,
which can be an effective way to explore the data in a data model. However, many users prefer to
consume data visually, and often a graphical summary can make it easier to see key trends or insights at a
glance. You can use PivotCharts to show aggregated data graphically, and interact with the chart to
explore the data. You can use PivotCharts in isolation, or you can link them to PivotTables so any
interactive exploration of the data in the PivotTable is automatically reflected in all related PivotCharts.
Slicers
Although you can apply filters to PivotTable and PivotCharts, when an attribute supports a manageable
range of possible values, it can be more intuitive to create a slicer for this attribute and filter the data
interactively by selecting the values you want to include in the slicer. Slicers can be linked to one or more
PivotTables or PivotCharts, enabling you to easily view alternative data scenarios in all elements on the
worksheet by selecting or de-selecting a slicer value.
Timelines
Most data analysis involves a time dimension, and often you want to view data for a specific period of
time. Timelines make it easy to select a time range based on a date field in the fact table.
Note: To support timelines, the table containing the measures to be aggregated (typically,
a fact table in a data warehouse) must contain a datetime value.
MCT USE ONLY. STUDENT USE PROHIBITED
8-6 Designing a Microsoft Excel-Based Reporting Solution
Lesson 2
PowerPivot for Excel
PowerPivot for Excel enables Excel users to create a tabular data model in the Excel workbook. In
organizations where business users have sufficient experience and skills to build tabular data models
based on data from multiple sources, this capability can significantly enhance the ability of these users to
analyze business data.
Lesson Objectives
After completing this lesson, you will be able to:
4. In the Manage drop-down menu, select COM Add-ins, and then click Go.
Note: Be careful to download the SQL Server 2012 PowerPivot Add-In for Excel 2010, not
the earlier SQL Server 2008 R2 version.
Authoring environment. PowerPivot data models can only be created in Excel. Typically, tabular
data models for Analysis Services are created in SQL Server Data Tools. Additionally, PowerPivot
workbooks can be imported into a tabular data model project in SQL Server Data Tools.
Implicit measures. PowerPivot data models automatically generate implicit measures for numerical
values that can be aggregated. In a tabular data model for Analysis Services, all measures must be
explicitly defined as Data Analysis Expressions (DAX) expressions.
Linked tables. PowerPivot data models can use worksheets in the Excel workbook as linked tables.
This makes it easy to supplement source tables in a data model with data in the Excel workbook itself.
Analysis Services tabular data models do not support linked tables, because they are not hosted
within an Excel workbook.
Storage format. PowerPivot workbooks use the xVelocity in-memory storage format for data models
exclusively. A tabular data model in Analysis Services can use xVelocity storage or DirectQuery storage
(in which the data is retrieved from the source table).
Model size. The maximum size for a PowerPivot data model in a 64-bit workbook is 4 GB, with a
much lower limit for 32-bit workbooks. The size of an Analysis Services tabular model is limited only
by the physical resources in the server.
Partitions. The tables in a PowerPivot model cannot be partitioned. Analysis Services tabular data
models support partitioning to help optimize processing of extremely large tables.
Security. To secure a PowerPivot data model, you must restrict access to the workbook in which it is
contained. Analysis Services supports role-based security with granular permissions.
Shared access. A PowerPivot workbook in Excel is designed for personal BI by the Excel user.
Conversely, an Analysis Services data model is designed for multiple concurrent users.
Both models can be shared through Microsoft SharePoint Server. To share a PowerPivot data model, the
user must publish the workbook to a SharePoint Server site, where PowerPivot for SharePoint makes the
workbook available to other users through Excel Services. Users can use the workbook interactively in a
browser through Excel Services, create a Power View report from the model defined in the workbook, or
use it as an Analysis Services data source for an Excel PivotTable by specifying the URL for the workbook
in the Data Source Wizard in Excel. You can also create BI Sematic Model (BISM) connections in a
SharePoint document library that reference either a PowerPivot workbook on the SharePoint site or a
tabular database in Analysis Services. BISM connections for either model type can be used as a source for
Power View reports or Excel PivotTables.
MCT USE ONLY. STUDENT USE PROHIBITED
8-8 Designing a Microsoft Excel-Based Reporting Solution
User training.
User training
PowerPivot is a relatively new technology, and even users with many years of experience in using Excel
may require some training to help them learn how to create and use data models. User training should
include:
Organization-specific training on how to access data sources and policies for consuming and
distributing data.
Lesson 3
Power View for Excel
Power View provides an intuitive data visualization environment that enables users to graphically explore
data. Power View is available as a component of Reporting Services in SharePoint Integrated mode, or
within an Excel 2013 workbook.
Lesson Objectives
After completing this lesson, you will be able to:
PivotTable that is based on the PowerPivot data model and insert a Power View report that automatically
includes the tables defined in the data model.
To specify how the data should be visualized, the user can select a chart type on the Design tab of the
ribbon. Options include bar charts, column charts, scatter charts, line charts, pie charts, and maps.
Additionally, users can select options on the Powerview and Layout tabs of the ribbon to add images
and text boxes, set background images, and specify legend and title options for the charts in the Power
View report. Charts can be displayed in tiles, each containing the same chart for each value in a selected
field, or displayed on vertical and horizontal multiples to enable side-by-side comparison of the same
data for different categories.
To interact with a Power View report, users can click series areas in charts or legends to highlight selected
values. Additionally, charts with related data in the same report are linked, so highlighting a value in one
chart will also highlight it in all other charts in the report. Another type of interaction can be used to
observe changes in data values over time. Users can create a chart that includes a play axis based on a
date or time field, and watch the data change as the play axis progresses through the time values.
User training
Although Power View is an intuitive tool, some
users may require training on how to select
appropriate data in Excel and create a Power View
report from it.
Note: Considerations for using Power View in SharePoint Server are discussed in Module 9:
Planning a SharePoint Server BI Solution.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 8-11
Objectives
After completing this lab, you will be able to:
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
Discuss the reporting requirements in the interviews and agree on the Excel features required to
support them.
MCT USE ONLY. STUDENT USE PROHIBITED
8-12 Designing a Microsoft Excel-Based Reporting Solution
Results: At the end of this exercise, you should have a document that contains a list of the required Excel
features.
3. Add Slicers.
4. Add a PivotChart.
Create a data connection to Analysis Services on MIA-SQLBI, and then import the Sales cube from
the AWSalesMD database.
Create a PivotTable on the existing worksheet, leaving blank rows above the table.
o ShipDate.Fiscal Date.
Test the slicers by selecting categories and business types and verifying that the data in the
PivotTable is filtered accordingly. Then clear all filters.
Verify that expanding hierarchies in the PivotTable updates the data in the PivotChart.
Verify that clicking slicer items filters the data in the PivotChart.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 8-13
Results: At the end of this exercise, you will have an Excel workbook that contains a PivotTable and a
PivotChart based on an Analysis Services cube.
4. Configure Attributes.
5. Create Hierarchies.
6. Test the PowerPivot Data Model.
o Customer
o Date
o InternetSales
o Product
o Promotion
=YEAR([BirthDate])
Mark the Date table as a date table, and configure the MonthName column to be sorted by
MonthNumber.
o The option to mark a table as a date table is on the Design tab of the ribbon.
o The option to specify a sorting column is on the Home tab of the ribbon.
Switch back to diagram view, and then hide and rename columns in each of the tables so that only
the columns listed below are visible.
o To hide multiple columns in a table in diagram view, maximize the table, click the columns you
want to hide while holding the Ctrl key, and then right-click any selected column and click Hide
from Client Tools.
o To rename a column, right-click it and click Rename.
o Subcategory
o Product
Date: Calendar Date
o Year
o Month
o Day
o Promotion Type
o Promotion
o Country
o State Or Province
o City
o Postal Code
Note: If an error message is displayed when saving the workbook in this pre-release version of Excel 2013,
click OK to close it.
In the PowerPivot for Excel – Marketing Analysis.xlsx window, in the ribbon, on the Home tab, in
the PivotTable drop-down list, click PivotTable. Insert a PivotTable into the existing worksheet,
leaving blank rows above the table.
Use the PivotTable to view the Revenue measure in the Internet Sales table by the Products By
Category hierarchy and the Sales Promotion hierarchy.
MCT USE ONLY. STUDENT USE PROHIBITED
8-16 Designing a Microsoft Excel-Based Reporting Solution
Format the Revenue measure using an accounting format that shows the values as currency with two
decimal places.
Add the Cars, Children, and Birth Year customer attributes to the filters of the PivotTable.
Insert slicers for the Marital Status and Gender customer attributes, and then hide slicer items with
no data.
Filter the data to show revenue in the past two years from single female customers born after 1970
with no cars or children.
After you finish, save the workbook, ignoring any errors. Close the PowerPivot window, but keep the
workbook open.
Results: At the end of this exercise, you will have an Excel workbook that contains a PowerPivot data
model based on data from the data warehouse.
Note: If a Power View report does not open on the POWER VIEW tab of the ribbon, view Excel options
and remove the Power View COM add-in, and then add it again.
Set the report title to Sales promotion Analysis, and then hide the filters area to maximize your
working area.
o Revenue
o Promotion Type
o Commute Distance
Display the fields as a clustered bar chart that fills the left half of the report. Tile the chart by Year so
that you can click the year headers above the chart to view revenue by promotion type broken down
by commute distance for each year.
In the blank area to the right of the bar chart, add the Revenue and Country fields, and then display
them as a pie chart that fills the top of the right half of the report.
In the blank area under the pie chart, add the Revenue and Cars fields. Display them as a clustered
column chart that fills the bottom of the right half of the report.
Click the Commute Distance legend values to shade all of the charts in the report based on the
selected commute distance.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 8-17
After you finish exploring the data, save the workbook and close Excel.
Results: At the end of this exercise, you will have an Excel workbook that contains a Power View report
based on a PowerPivot data model.
Question: How might you support the Sales VP’s requirement to visually analyze the
marketing data models in Excel 2010?
MCT USE ONLY. STUDENT USE PROHIBITED
8-18 Designing a Microsoft Excel-Based Reporting Solution
Question: What challenges do think organizations will face when empowering users to
analyze data in Excel?
MCT USE ONLY. STUDENT USE PROHIBITED
9-1
Module 9
Planning a SharePoint Server BI Solution
Contents:
Module Overview 9-1
Module Overview
SharePoint Server is an increasingly important part of the end-to-end solution for the centralized delivery
of business intelligence (BI) solutions. SharePoint Server provides a platform that makes it easier for
business users to share and collaborate on a wide range of information. Understanding how to enable
SharePoint capabilities within a BI project is being seen by organizations as a useful skill and asset for a
SQL Server professional to possess.
Objectives
After completing this module, you will be able to:
Lesson 1
Introduction to SharePoint Server as a BI Platform
SharePoint Server can play an integral part of a SQL Server 2012 business intelligence solution. After you
establish that the business reporting requirements requires a centralized platform, there are
considerations that should be addressed before setting up SharePoint Server.
Lesson Objectives
After completing this lesson, you will be able to:
As a result, SharePoint Server can address previous concerns expressed by BI professionals by providing a
single location for the storage of reports. The added benefit is that SharePoint Server provides versioning
capabilities that enable business users to browse older versions of the same documents. This capability
enables easier and greater collaboration between business users in a consistent environment while
providing a platform that can be centrally managed by IT professionals.
In the context of a BI project plan, SharePoint Server has to be considered in two areas:
Note: For more information about reporting and data analysis requirements, see Module 6:
Planning a BI Delivery Solution.
Appliance. The Microsoft Business Decision Appliance contains a pre-built SharePoint 2010 farm with
support for Reporting Services. This only requires you to plug in the appliance and start it. Note that
at the time of writing, PowerPivot is not available on the Business Decision appliance.
Self-build. You can manually install and configure SharePoint Server on a dedicated Windows
system. This enables you to customize the solution for the business. This module will focus on
SharePoint from a self-build perspective. The web front end layer and the application layer can also
be virtualized using Windows Hyper-V.
Cloud. Office 365 provides SharePoint capabilities and is offered as a service to which your
organization can subscribe to. This solution is useful to organizations that do not have the expertise
to implement a full SharePoint environment.
When planning a SharePoint Server BI solution, there are three tiers of the architecture to consider:
Web font-end tier. One or more servers that are used to accept requests for a SharePoint
application/service and direct the request to the appropriate application server.
Application tier. One or more servers that host the service applications in the SharePoint Server
infrastructure.
Data tier. A SQL Server instance, which can be clustered, that hosts SharePoint databases.
Each tier can contain multiple servers to meet the business requirements for performance, scalability,
and/or availability. The key point is that all servers must belong to the same SharePoint farm; a logical
grouping of servers that provides the infrastructure for a SharePoint Server solution.
MCT USE ONLY. STUDENT USE PROHIBITED
9-4 Planning a SharePoint Server BI Solution
SharePoint terminology
Before you start to plan a BI solution that uses SharePoint Server, it is important to understand the core
components and terminology of a SharePoint Server solution. The following list defines some important
SharePoint Server concepts:
SharePoint Server farm. A farm is a collection of servers that work together to provide SharePoint
services. Each server in the farm hosts one or more SharePoint Server components, and the entire
farm constitutes a logical container for all of the SharePoint services provided by those servers and
the core unit of administration.
SharePoint databases. SharePoint Server is primarily a platform for publishing and collaborating on
content. The content in a SharePoint site, together with farm configuration data and application
settings, is stored in one or more SQL Server databases.
Service applications. SharePoint Server provides an extensible platform that can deliver a broad
range of services. Each service is encapsulated in an application, which can be hosted on one or more
application servers in the SharePoint Server farm.
Web applications. SharePoint web applications are Internet Information Services (IIS) applications
where users can consume SharePoint Server services. The services available in a specific web
application are determined by associating application services in the farm with the web application.
Site collection. A site collection, as its name suggests, is a collection of SharePoint sites hosted in a
web application. You can use a site collection as a central unit of management and configuration for
multiple sites. SharePoint Server supports site collection features, which can be enabled or disabled at
the site collection level.
Site. A site is a container for related content, and provides a specific endpoint to which users can
browse. Sites inherit the features of their parent site collection, and each site has site features that can
be enabled or disabled on a site by site basis.
Apps. The content on a site is delivered through visual elements, which are known as apps.
SharePoint Server includes apps, such as document libraries and lists, which you can use to create the
user interface for the site. Additionally, service applications and third-party software developers can
provide additional apps.
Subsites. In many cases, you can deliver all of the content you need to in a site. However, you can
also group related content into subsites under a parent site. Subsites inherit the features of their
parent site.
Single Server
In this farm topology, all of the SharePoint Server
architecture layers are hosted on a single Windows
Server. The main benefit of this model is that the
licensing cost for the solution is minimized.
Typically, this type of configuration is found in
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 9-5
development environments or training environments, and provides the easiest setup of a SharePoint farm.
However, because all three layers run on the same Windows server and share the same hardware, there
can be increased contention of resources. If this affects the performance of business reports, consider
implementing a scale out solution.
Scale Out
In this farm topology, each of the SharePoint Server architecture layers is separated onto different
Windows servers. Scaling out a SharePoint farm distributes the workload across multiple servers―this
reduces contention on a single server, which improves throughput. This does come with an additional
licensing cost. Furthermore, there is more infrastructure preparation required to manage security
seamlessly across the SharePoint farm. There is also no resilience if one of the servers shuts down. Scale
out should be considered as a valid topology if performance is important without the need for resilience.
High Availability
In this farm topology, the SharePoint architecture layers are separated across Windows servers, and then
each layer is duplicated onto another server. This is the most expensive topology to implement, but it
provides load balancing and high availability across the entire SharePoint farm. The infrastructure
preparation is similar to that of creating a scale out architecture. However, a Network Load Balancer is
also required to distribute incoming requests to the first available web front-end server.
Excel Services
After the site is defined, configure Excel Services. This should be done for two reasons. Firstly, it enables
your business users to share and collaborate with Excel files. It is also a prerequisite service for PowerPivot
for SharePoint. Excel Services is configured on the application layer servers within a SharePoint farm.
The Claims to Windows Token Service is responsible for converting a Windows authentication token to a
claims-based token for incoming connections into the SharePoint farm, and then convert the outgoing
MCT USE ONLY. STUDENT USE PROHIBITED
9-6 Planning a SharePoint Server BI Solution
traffic from the SharePoint farm from a claims-based token to a Windows authentication token. Because
this service deals with the sensitive task of handling authentication tickets, the account that runs this
service should be added to the Local Administrators group on the server on which it is hosted.
Additionally, Local Security Policy configuration is required in Windows to enable the following rights:
Log on as a service.
For a BI implementation of a SharePoint farm, the Claims to Windows Token Service must be configured
on the same server on which Excel Services is installed.
PerformancePoint Services
PerformancePoint Services is installed as part of SharePoint Setup on the application servers on the
SharePoint farm. In the context of a BI solution, selecting the Business Intelligence Center template when
creating a site means that only minimal configuration is required within Central Administration to start
this service.
Additionally, the following services are recommended in that they can add value to the BI services that are
mandatory:
Search service
As more documents are added to the SharePoint farm, it can become cumbersome to manually search for
documents. Enabling the SharePoint Search service will catalog the content of a SharePoint site so that
the Search feature can be used to quickly retrieve documents.
SharePoint logging
Logging is an extremely useful component of SharePoint to enable, particularly during the installation and
configuration of the SharePoint farm. Any errors within the configuration are reported to the error log
files known as a Unified Logging Service (ULS) log files. This is located in C:\Program Files\Common
Files\Microsoft Shared\Web Server Extensions\15\LOGS folder. If there is a problem in the SharePoint
farm, open the latest file and perform a search for the word “error”.
After a site or a subsite is created, depending on the template that is selected during the site creation, a
site structure will be created with specific apps and features enabled. When you use the Business
Intelligence Center template, default folders are created for PerformancePoint data sources and
dashboards, but there is no default folder structure for PowerPivot files or reports. However, you can add
document libraries and PowerPivot Gallery pages to support them.
When creating a site or subsite, use the following guidelines:
Keep the site structure simple and easy-to-use for business users.
Before you can create a subsite in a SharePoint Server site, you must activate:
Lesson 2
Planning Security for a SharePoint Server BI Solution
An important aspect of planning a SharePoint farm is security. This requires choosing an appropriate
authentication and user identity model that can ensure that the site is secure, and if necessary that a user
can be audited. This can involve additional considerations when implementing a Scale Out or High
Availability topology to ensure that credentials are correctly identified when connecting to back end data
sources through a SharePoint farm.
Lesson Objectives
After completing this lesson, you will be able to:
Windows authentication.
Forms-based authentication.
Claims Based Authentication provides support for Windows and third-party authentication protocols and
directory providers in multi-vendor environments. Claims Based Authentication in Windows is built on the
Windows Identity Foundation (WIF).
NTLM
Kerberos
Anonymous
Basic
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 9-9
Digest
NTLM, Kerberos, and Anonymous are configured through Active Directory; Basic and Digest are
configured using Internet Information Services (IIS). Kerberos is a common protocol that is used within
organizations because it has the ability to delegate client credentials to access back-end data sources. This
is an important requirement if it is required to audit individual user access to a back-end data source.
<Port> is the port on which the web application will be created in IIS.
Introduction to Kerberos
Kerberos is an authentication protocol that is
designed to provide a single sign-on environment.
A client session authenticates against a domain
controller in a domain, which issues a session
ticket if the correct user name and password are
supplied. The session presents this ticket to
network resources, such as SQL Server or a file
server, to get access.
Delegation
Kerberos delegation is the process of giving an Active Directory account permissions to perform a task. An
example could be the ability to pass a set of credentials to another user account.
Impersonation
Kerberos impersonation is the process of one account impersonating the credential of another account.
This permission must be delegated for impersonation to work.
If Kerberos is not configured and a user connects to an application such as PowerPivot, Report Builder, or
Power View that accesses data in a backend database, by default a connection will be made using the
service account of the application. If there is a need to audit access against the back-end database, it will
audit the fact that the service account accessed the back-end database, and not the user who made the
request, for the report.
If your business requirements include the auditing of individual users’ access to data, Kerberos delegation
and impersonation is required to retain the identity of the user who originally made the request for the
report. The process of maintaining the user’s credentials over two or more connections is referred to as a
“double hop,” and creates a requirement to delegate the rights to authenticate as the original user’s
identity. Kerberos supports this scenario, enabling a user to authenticate using his or her password only
once when logging on to the domain. After that, it is the session ticket that is used to authenticate. As a
result, Kerberos has the ability to delegate control of a user’s session ticket, or even a workstation’s session
ticket.
To retain the identity of the user who originally made the request for the report, the following tasks need
to be performed:
1. You must first must represent SQL Server 2012 Business Intelligence applications in SharePoint Server
as objects within Active Directory so that they can be secured.
2. You must use delegation to enable the service accounts used by the BI applications to impersonate a
user against a back-end server.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 9-11
Before running SetSPN, use the following guidelines to determine the required information.
Other services can have a custom name defined of your choice. The following tables contains some
suggested names to use to make it easier to identify the service in Active Directory.
PowerPivot SP/PPivot
PerformancePoint SP/PPS
<service class> denotes the name of the service or application. If it is SQL Server, the <service class>
is MSSQLSvc. Analysis Services is MSOLAPSvc.3, and Reporting Services would be HTTP.
<host> is the fully qualified domain name or NetBIOS name in which the application is running. The
recommended practice is that each application should have two entries: one for fully qualified
domain name and one for NetBIOS.
<port> is optional and is used to define the port on which the service is running. This should be used
when multiple instances of an application is running.
<service account name> is the service account that is defined for the application.
For example, if there is a default SQL Server instance running on a computer named AWSQL.AW.Local
under the service account AW\SQLService, two SPNs would be registered as follows:
If an Excel Services service is running on the same computer using the service account AW\ExcelService, its
SPN could be set with the following code.
These code samples register service principal names in Active Directory that can then be delegated. Now
that you’ve registered the SPNs, you should create a list of fully qualified domain names and associated
service accounts for any applications that will be subjected to the double hop issue and plan the
delegation configuration required to pass a user’s credential from one service to another.
Lesson 3
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 9-13
You will also have to enable the Reporting Services content type for the SharePoint document libraries
where you want to publish reports The content type required will be based on the reporting requirements
that have been gathered in the business requirements phase.
Lesson Objectives
After completing this lesson, you will be able to:
2. Run the SharePoint setup.exe on every server in the farm. Note that on the first server you will define
the location of the back-end database server and create the farm with a dedicated user account and
passphrase. This passphrase is then used to join remaining servers to the same farm.
Install the Reporting Services add-in for SharePoint Server on web front-end servers
After installation is complete, on the web front-end servers, use the SQL Server 2012 installation media to
install the Reporting Services add-in for SharePoint Server. In a single server deployment, the Reporting
Services add-in for SharePoint Server is installed with Reporting Services. In a Scale Out or High
Availability farm topology, you must install the Reporting Services add-in for SharePoint Server separately.
MCT USE ONLY. STUDENT USE PROHIBITED
9-14 Planning a SharePoint Server BI Solution
2. Start the Reporting Services service. Reporting Services must then be started in the SharePoint
farm to complete the configuration.
3. Create the Reporting Services application pool. Ideally, each application should run within its own
application pool with its own managed account to ease security and maintainability.
4. Configure service settings. Reporting Services has configuration settings that you can use to control
its behavior, such as the SMTP server to be used for sending subscriptions by email.
5. Enable Reporting Services and Power View in SharePoint sites. The final step is to enable
Reporting Services as a feature within sites where you want to be able to publish and view reports.
Additionally, you can add the BISM Connection content type if you want to enable the creation of
connections to tabular data models for Power View and Excel.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 9-15
Enable only the content types that the business requires. For example, many organizations will not enable
the Report Builder Model content type because users will connect directly to data sources using the
Report Data source content type. It is important that you are led by the reporting requirements that were
determined during the business analysis phase of the project.
Before content types can be added to a site, you have to enable the ability to add content types to give
the ability to customize a library. This is to be done by a SharePoint Server Farm administrator, after which
content types can be added to a SharePoint Server site.
MCT USE ONLY. STUDENT USE PROHIBITED
9-16 Planning a SharePoint Server BI Solution
Lesson 4
Planning PowerPivot Configuration
The installation and configuration of PowerPivot is similar to that of Reporting Services with the location
of the PowerPivot for SharePoint installation determined by the farm topology that is selected. However,
after it is installed, additional consideration is required for managing data refresh of PowerPivot files and
monitoring PowerPivot activity.
Lesson Objectives
After completing this lesson, you will be able to:
1. Run the SharePoint Server Prerequisite installer on every server in the farm.
2. Run the SharePoint Server setup.exe on every server in the farm. Note that on the first server you will
define the location of the SharePoint Server farm database server and create the farm with a
dedicated user account and passphrase. This passphrase is then used to join remaining servers to the
same farm.
Allow users to specify Windows credentials. This approach enables users to specify a Windows
user name and password when configuring data refresh options for a workbook. These credentials are
then used for scheduled data refreshes. This approach has the advantage of requiring minimal
administrative configuration, but it can result in a difficult-to-manage environment where credentials
for data access are defined in multiple places, and may need to be changed if a user leaves the
organization or their data access privileges change. You can disable this option in the settings for the
PowerPivot service application in SharePoint Central Administration.
Use the unattended PowerPivot data refresh account. When configuring PowerPivot, you can
specify credentials to be used by an unattended PowerPivot data refresh account. This enables you to
create a single account that can be used for data refresh connections for all workbooks. The benefit
of this approach is that a single service account for data access authentication can be centrally
managed and granted the required permissions in all data sources. However, this approach cannot be
used for data sources that do not support Windows authentication.
Use custom credentials that are saved in the secure storage service. The secure storage service in
SharePoint Server 2013 provides a mechanism for credentials to be stored securely and associated
with an application name that is used to look up the credentials when required. This enables users to
use the credentials without having to know the user name or password. The benefit of using this
MCT USE ONLY. STUDENT USE PROHIBITED
9-18 Planning a SharePoint Server BI Solution
approach is that it can be used to store both Windows and non-Windows credentials. Additionally, it
enables you to create multiple credentials for data refresh to facilitate finer-grained auditing than a
single service account can accommodate.
BISM Connections
The Business Intelligence Semantic Model (BISM)
has its own content type within SharePoint Server.
This enables you to define connection information
to Analysis Services directly from SharePoint
Server. After the connection information is
defined, there is the capability to create
PowerPivot and Power View reports directly from
a BISM connection that is defined within a
SharePoint library.
First, the BISM connection should be enabled as a
content type within the SharePoint Library. After
the content type is enabled, perform the following
steps to define a BISM connection:
4. In the File text box, type in a name for the BISM connection file, and optionally, add a description
under Description.
5. In the Workbook URL or Server Name text box, type in the URL of the Excel file or the name or IP
address of a tabular instance of Analysis Services.
6. In the Database text box, type in a tabular database that is currently available on the server.
7. Click OK.
After the BISM connection file is defined, click the down arrow on the BISM file, and then select Create
Power View Report to create a Power View report or Launch Excel to create a PowerPivot report.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 9-19
Monitoring PowerPivot
SharePoint Server provides the capability to
monitor the PowerPivot activity that is occurring
on a SharePoint farm. Using the PowerPivot
Management Dashboard, you can establish the
following metrics about PowerPivot instance and
its workbooks:
Quality of Service. Provides metrics for query response time when retrieving PowerPivot reports.
The PowerPivot Management Dashboard provides you with information that will enable you to take
appropriate steps to deal with:
Manage popular PowerPivot reports so they are optimized for query response times.
Taking appropriate action on hardware should the CPU or memory be under pressure.
Understand reporting patterns that enable the conversion of self-service reports to standard reports.
Lesson 5
Planning for PerformancePoint Services
PerformancePoint Services is a service that is available within SharePoint Server Enterprise edition, which
enables the creation of highly visual reports without the need to install SQL Server. However, adding this
capability side-by-side with SQL Server BI components will provide a complete business intelligence
solution within the organization.
Lesson Objectives
After completing this lesson, you will be able to:
PerformancePoint Services is available in SharePoint Enterprise edition only, and is automatically enabled
when the Business Intelligence template is selected when creating a site. It provides a graphical
environment designed to make it easy for users to create dashboards and reduces the development time
required to create a report compared to tools such as Reporting Services. However, Reporting Services
provides more flexibility in defining the layout of a report or dashboard. PerformancePoint Services
dashboards have fixed layout, and although a choice of layouts is presented in the Dashboard Designer, it
is not as flexible as Reporting Services.
Combining PerformancePoint Services with technologies such as Reporting Services and PowerPivot gives
the business more versatility in creating business intelligence reports. This increases the capability that the
business has to create a wide range of reports for multiple audiences.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 9-21
Dashboard Designer
The Dashboard Designer is the primary tool used
by end users to create a PerformancePoint report,
scorecard, KPI, or dashboard. For the first time in
SharePoint 2013, it is available within the
SharePoint site on the SharePoint ribbon, and is
found on the PerformancePoint tab.
Ensure that the account used to connect to the data source has permission to read the data.
If Kerberos delegation and impersonation is used, ensure that a user’s credential is successfully
presented to the data source.
Use the business requirements to determine the report type of PerformancePoint reports.
Ensure that users are educated on how to use Dashboard Designer to reduce the number of support
calls.
Editor. This tab is where the server name, an authentication mechanism, and a database name is
defined. You can use one of three authentication mechanisms:
Unattended service account. A predefined account that connects using a dedicated connection.
Stored account. A target application defined in the Secure Store Service that has a dedicated
connection configured.
Per-user identity. The credential of the user using a data source is used.
Properties. This tab allows you to provide a name and a description for the data source. A
responsible person can be defined by defining an email address of an individual or a group. Finally, a
display folder enables the ability to determine a folder name for storing the data source.
Time. This tab enables you to define time mappings. Time mapping involves selecting a table in a
data source that contains a time-based hierarchy, from which an attribute in a dimension is mapped
to a time property in the PerformancePoint data source. For example, the Year property could be
mapped to a column within the data source named CalendarYear, and Month could be mapped to
MonthName. By providing this information, PerformancePoint is informed how the time-based data
is defined, and can aggregate data over time periods.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 9-23
KPIs
KPIs display visual objects that measures numeric
metrics against a target. In PerformancePoint
Services, you can use the Dashboard Designer to
create KPIs that compare an actual value against a
target value. Both the actual and target values are
associated with data sources and formulae, and
can be formatted with an appropriate number
format. You can also define threshold percentages
that determine the icons to use when comparing
the actual value to the target value.
For example, a business requirement might be to track sales revenue performance with a year-on-year
growth target of 10 percent. To support this requirement, you might create a KPI with the following
characteristics:
The actual value uses a data source that applies the YearToDate time-intelligence function to the
sales revenue measure. This results in a figure that shows the sales revenue for the current year so far.
The target value uses the same data source to create a variable named LastYearSales that is based on
the formula YearToDate-1, which returns the sales revenue figure for the year-to-date period in the
previous year. The target value is then defined as LastYearSales * 1.1 (in other words, last year’s
revenue to date plus 10 percent).
Both values are formatted as currency.
o A red indicator if the actual value is less than 75 percent of the target value.
o A yellow indicator if the actual value is between 75 and 95 percent of the target value.
o A green indicator if the actual value is above 90 percent of the target value.
Reports
Reports provide an interactive, graphical representation of data that can be displayed on a dashboard
page. You can create many kinds of report with PerformancePoint Services, including:
Strategy Map.
Decomposition Tree.
Additionally, you can show SQL Server Reporting Services reports and Excel Services reports in a
PowerPivot dashboard.
Dashboard Designer provides a graphical report design environment in which you can create reports by
dragging measures and dimension hierarchies from an Analysis Services data source. One of the key
benefits of using PerformancePoint Services to create reports is that reports automatically provide drill-
down interactivity. For example, you might create a report containing a pie chart that shows sales revenue
by product category based on a dimension hierarchy that includes category, subcategory, and product
levels. When the report is displayed on a SharePoint Server site, users can click the pie segment for a
MCT USE ONLY. STUDENT USE PROHIBITED
9-24 Planning a SharePoint Server BI Solution
particular category, and the chart will be redrawn to show sales for subcategories in that category.
Clicking a subcategory segment redraws the chart to show sales for products in that subcategory.
Scorecards
A scorecard is a collection of KPIs that enables users to drill down into hierarchies to identify specific areas
of the business that are over-performing or under-performing against the target. For example, a
scorecard could show the sales revenue KPI discussed earlier in this topic aggregated by sales region. At
the top level, the scorecard shows sales revenue performance against the target for the company as a
whole, but users can expand the scorecard to view performance for individual sales regions.
A scorecard can contain multiple KPIs that measure different aspects of business performance to provide
an overall view of how the organization is meeting its targets. For example, a scorecard might include KPIs
for sales revenue, profitability, and productivity levels based on hours of continuous operation for plant
machinery. The KPIs in the scorecard can each be weighted to reflect their relative importance to the
overall goals of the business, and a total score can then be calculated. This approach is often referred to as
a balanced scorecard, because it balances multiple factors to provide a high-level view of how the business
is performing.
Dashboards
Dashboards are a PerformancePoint component that enables the user to bring together multiple
PerformancePoint objects in one place. A dashboard page provides preset layouts that allow you to
choose which PerformancePoint object should populate a particular area. There are different page
layouts, including the following:
1 Zone.
2 Columns.
2 Rows.
3 Columns.
3 Rows.
A key benefit of creating a dashboard is that, in addition to a single view of high-level business
performance data, the various data elements on the dashboard can be linked. For example, clicking a
column for product category sales revenue in a column chart might filter a different chart to show
profitability and productivity data for the selected product category.
Objectives
After completing this lab, you will be able to:
Lab Setup
Estimated Time: 90 Minutes
Start 20467A-MIA-DC and 20467A-MIA-SQLBI, and then log on to 20467A-MIA-SQLBI as
ADVENTUREWORKS\Student with the password Pa$$w0rd.
If you are unfamiliar with SharePoint Server, it is highly recommended that you perform the lab using the
lab answer key instead of the high-level steps.
3. Create a Subsite.
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
Run Setup.cmd in the D:\Labfiles\Lab09\Starter folder as Administrator.
Tip: To view the site settings, use the Settings menu, which can be accessed from the Settings icon at the
upper-right of the home page next to the name of the currently logged on user.
The subsite should be based on the Business Intelligence Center enterprise template and be
accessible at the following URL:
http://mia-sqlbi/sites/adventureworks/bi
On the Adventure Works Portal home page, in the Quick Launch area on the left, add a link to the
new subsite.
Results: At the end of this exercise, you should have created a subsite based on the Business Intelligence
Center template at http://mia-sqlbi/sites.adventureworks/bi.
This feature is inherited by all subsites of the top-level site where it is activated.
Modify the advanced settings for the AWReports document library to enable management of
content types, and then add the existing Report Builder Report and Report Data Source content
types are from the SQL Server Reporting Services Content Types group.
Change the new button order and content type so that the Document content type is no longer
visible, and the default content type is Report Builder Report.
Modify the list name, description, and navigation settings of the document library to display a link to
it in the Quick Launch area.
o TargetServerURL: http://mia-sqlbi/sites/adventureworks/bi
o TargetDatasetFolder: http://mia-sqlbi/sites/adventureworks/bi/awreports/Datasets
o TargetReportFolder: http://mia-sqlbi/sites/adventureworks/bi/awreports/Templates
Verify that the Data Sources, Datasets, and Templates folders are created in the AWReports
document library in the Adventure Works BI Portal site you created earlier.
Edit the data sources that were deployed and configure them to use the following stored Windows
credentials:
o Password: Pa$$w0rd
After you open the report, save it as Internet Sales in the Self-Service Reports folder. Be careful not
to overwrite the AWReport template.
Configure the StartDate and EndDate report parameters to use the FirstDate and LastDate fields
from the LastMonth dataset as default values.
Insert a table into the report, below the chart, and use it to show revenue, cost, and profit for each
city, grouped by state or province and country.
o To calculate profit, use the following expression:
=Fields!Revenue.Value - Fields!Cost.Value
Run the report to preview it, and then make any formatting changes you want.
After you are satisfied with the report, save it and close Report Builder.
Use Internet Explorer to view your report in the Self-Service Reports folder.
Results: At the end of this exercise, you will have published Reporting Services reports to the BI subsite
and verified that self-service reporting is supported.
MCT USE ONLY. STUDENT USE PROHIBITED
9-28 Planning a SharePoint Server BI Solution
Modify the list name, description, and navigation settings of the document library to display a link to
it in the Quick Launch area.
Verify that you can view the PowerPivot workbook in Excel Services within the SharePoint site.
Edit the service application associations for the SharePoint-80 web application to add the Secure
Store Service service.
o This enables the default Web site and its subsites to use credentials in the secure store service.
o If you do not associate the secure store service with a site where PowerPivot workbooks are
hosted, users will not be able to configure data refresh for the workbooks.
View the configuration settings for the PowerPivot service application, and note the default data
refresh settings.
o Review the default settings for data refresh and note that the unattended data refresh credentials
in the secure store service are used.
o The dashboard may not contain any data because the job to process the dashboard data may not
have been run.
Review the timer job descriptions, and then run the job named PowerPivot Management
Dashboard Processing Timer Job.
After the job runs successfully, view the PowerPivot Management Dashboard again and explore
the charts it contains.
Results: At the end of this exercise, you will have a PowerPivot Gallery that contains a published
PowerPivot workbook.
3. Create a KPI.
4. Create a Report.
5. Create a Scorecard.
6. Create a Dashboard.
o Password: Pa$$w0rd
Create a data source named AWSalesMD that connects to the Sales cube in the AWSalesMD
Analysis Services database on MIA-SQLBI.
Configure the AWSalesMD data source to use a time dimension based on the Fiscal Date hierarchy
in the Ship Date dimension.
Specify July 1 of the most recent fiscal year in the cube as the start date for the time dimension,
specify that this reference member is at the Day level of the hierarchy.
Map the reference member above to the same date in the data source.
Map the following dimension attributes to the time hierarchy levels in the data sources:
MCT USE ONLY. STUDENT USE PROHIBITED
9-30 Planning a SharePoint Server BI Solution
o Date: Day.
o An actual value named YTD that is based on the Reseller Revenue measure filtered by the
YearToDate time intelligence function.
o A target value named Target that is based on a calculated metric in which the Reseller Revenue
measure filtered by the YearToDate-1 function is multiplied by 1.25.
Both the target and actual values should be formatted as currency, and the following threshold values
should be used to determine the KPI value:
o Best: 120%
o Threshold 2: 90%
o Threshold 1: 50%
o Worst: 0%
Deploy the dashboard to the Dashboards folder in the Adventure Works BI Portal site.
Make the dashboard the home page for the Adventure Works BI Portal site.
Explore the dashboard and verify that the chart and scorecard provide interactive functionality.
Results: At the end of this exercise, you will have created four PerformancePoint reports on the
SharePoint site.
Question: What is the benefit of creating a subsite in SharePoint Server for storing business
intelligence content? Is it a mandatory process?
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 9-31
The module then concluded with an exploration of the capabilities of Reporting Services, PowerPivot, and
PerformancePoint Services in a SharePoint farm and how this could be centralized in a single subsite to
provide a one-stop shop for an organization’s BI platform.
Question: Now that you are familiar with the capabilities that SharePoint Server brings to a
BI project, what considerations would there be for implementing SharePoint as part of a BI
project in your organization?
MCT USE ONLY. STUDENT USE PROHIBITED
9-32 Planning a SharePoint Server BI Solution
MCT USE ONLY. STUDENT USE PROHIBITED
10-1
Module 10
Monitoring and Optimizing a BI Solution
Contents:
Module Overview 10-1
Module Overview
After an organization implements a business intelligence (BI) solution, the key components of the solution
need to be monitored to ensure the ongoing health of the solution and to troubleshoot and resolve
performance problems. This module discusses monitoring tools and techniques for the main SQL Server
services in a BI solution, and provides guidance on how to troubleshoot problems and optimize
performance.
Objectives
After completing this module, you will be able to:
Describe key considerations for monitoring a BI solution.
Lesson 1
Overview of BI Monitoring
Performance monitoring and optimization is a critical consideration in a BI solution, and should be
considered from the start of the project. This lesson describes considerations for health and performance
monitoring, and emphasizes the importance of creating a performance baseline against which future
measurements can be compared.
Lesson Objectives
After completing this lesson, you will be able to:
Note: The remainder of this module focuses on performance monitoring and optimization
of the data warehouse, Analysis Services, and Reporting Services. Information about monitoring
SQL Server Integration Services package execution is provided in Module 11: Planning BI
Operations. Some details of health monitoring for BI services in a SharePoint Server farm were
discussed in Module 9: Planning a SharePoint Server BI Solution. For courses that provide broader
coverage of performance monitoring and optimization for SharePoint Server in general, refer to
the Microsoft Learning catalog at www.microsoft.com/learning.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 10-3
What to Monitor
Like any complex IT system, a BI solution requires
monitoring and maintenance to ensure it
continues to perform efficiently and effectively.
Broadly, there are two kinds of monitoring
required for a BI solution: health monitoring and
performance troubleshooting.
Health monitoring
Health monitoring is the ongoing review of
hardware utilization for the key components of
the BI solution. Usage of CPU, memory, disk, and
network resources during typical workloads can be
gathered and monitored for any changes that
might indicate a problem. This is similar to a physical check from a doctor, where blood pressure, heart
rate, and other vital signs are periodically measured to evaluate the health of an individual and detect any
potential problems early.
Performance troubleshooting
Performance troubleshooting is the diagnosis of a perceived problem with the performance of the BI
solution. Typically, performance troubleshooting occurs in response to a symptom that has been detected
by a user or through health monitoring. A performance problem usually relates to a degradation in
response time (amount of time the system takes to perform a specific task) or throughput (the number of
concurrent activities the system can support).
The goal of baseline monitoring is to gain an understanding of the average utilization during each
workload period of the following hardware resources:
CPU
Memory
Disk
Network
MCT USE ONLY. STUDENT USE PROHIBITED
10-4 Monitoring and Optimizing a BI Solution
A common technique for establishing baseline resource utilization metrics is to use Performance Monitor
in the Computer Management console to create a data collector set that includes counters for the
resource utilization you want to include in the baseline. Each SQL Server service provides counters that
provide detailed information about how the resources are used by the service. In addition to these
service-specific counters, most baseline data collector sets include general system metrics such as the
following counters:
Processor: % Processor Time. This counter indicates the percentage of time that the CPU is utilized.
Memory: Available Mbytes. This counter indicates the amount of memory in megabytes available in
the system.
Paging File: % Usage. This counter indicates the percentage of time that memory values are
accessed from the paging file on disk.
System: Processor Queue Length. This counter indicates the number of requests waiting for an
available execution thread on the processor.
After defining the data collector sets, you must run them for sufficient periods of time to gather
meaningful data. You can then view the counter values that you have recorded as a graphical or text-
based report in Performance Monitor, or you can export them to a comma-separated values (CSV) file for
further viewing and analysis in Microsoft Excel. These reports should then be retained as a baseline set of
metrics with which future measurements can be compared.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 10-5
Lesson 2
Monitoring and Optimizing the Data Warehouse
The data warehouse is at the heart of the BI solution, and provides the basis for analytical and reporting
activity. Understanding how to monitor, troubleshoot, and optimize performance in a data warehouse is
an important part of planning a BI solution, and this lesson discusses key considerations and techniques
that you should employ.
Lesson Objectives
After completing this lesson, you will be able to:
Data model processing. This usually occurs after each ETL data load to refresh multidimensional and
tabular data models that are based on the data warehouse tables.
Report queries. These are performed when data is retrieved from data warehouse tables to create a
report or a user-defined data model in Excel.
Operational activities. These are scheduled operations such as index maintenance or database
backups.
Note: Generally, SQL Server configuration settings are best left at their default values unless
you have specific, validated reasons to change them. For information about configuring SQL
Server settings, see “How to determine proper SQL Server configuration settings” at
http://support.microsoft.com/kb/319942.
Database Engine Tuning Advisor. The database engine tuning advisor can use a SQL Server Profiler
trace or query plan cache to evaluate the physical data structures in a database against a specific
query workload, and recommend schema modifications for table partitioning, indexes, and statistics.
Data Collector. The data collector provides a performance management framework for SQL Server
instances. A central management data warehouse is created, and data collection is enabled on each
instance of SQL Server to be monitored. SQL Server Agent jobs on each server then record
performance-related data at regular intervals and upload it to the management data warehouse.
Database administrators can then use a centralized set of reports to view database server
performance and health data across the data center.
Dynamic Management Views. SQL Server provides dynamic management views (DMVs) that you
can query to obtain system performance data. By using DMVs, you can create a custom monitoring
solution that gathers the statistics that are most relevant to your specific data warehouse workloads
and performance priorities.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 10-7
Resource Governor provides a way to define resource pools with scoped constraints that define the
system resources available to them. You can then define workload groups in each resource pool with
specific priorities in terms of how they can use the resources in the pool. Finally, you can define a classifier
function that is used to determine the workload group in which a specific session should be executed.
For example, you could use the following Transact-SQL code to create resource groups for low priority
and high priority activities.
Next, you can create workload groups for ETL operations and user queries, and assign them to the
appropriate resource pools with specific restrictions on resource pool resources they can use.
Then you could create a classifier function that determines which workload group each session belongs in.
In this example, the ETL process is identified as being run from an application named “SQL Server” (which
is the name used by SQL Server Integration Services) using the user account
ADVENTUREWORKS\ServiceAcct between 1:00 A.M. and 2:00 A.M. on the first day of the month.
MCT USE ONLY. STUDENT USE PROHIBITED
10-8 Monitoring and Optimizing a BI Solution
by SQL Server or another application that could be moved to a different server. If the CPU is being
heavily used by SQL Server, consider adding processors to the server.
2. Review the Paging File: % Usage and Memory: Available Mbytes counters to determine whether
the system is running short of physical memory and having to page memory values to disk. If a
reasonable amount of memory is available, but paging is still occurring, review counters in the
SQLServer:Buffer Manager and SQLServer:Memory Manager objects to check for configuration
issues with SQL Server’s cache settings.
3. If no obvious memory issues are detected, review the counters in the Physical Disk object. A high
amount of I/O might indicate caching problems or high page file activity, or low numbers of reads
per second might indicate a requirement for faster storage or more effective placement of data files.
Additional Reading: For additional troubleshooting tips for data warehouse workloads,
see “Top 10 SQL Server 2005 Performance Issues for Data Warehouse and Reporting
Applications” at http://sqlcat.com/sqlcat/b/top10lists/archive/2007/11/21/top-10-sql-server-
2005-performance-issues-for-data-warehouse-and-reporting-applications.aspx.
MCT USE ONLY. STUDENT USE PROHIBITED
10-10 Monitoring and Optimizing a BI Solution
Lesson 3
Monitoring and Optimizing Analysis Services
Analysis Services provides a platform for centrally hosted and managed data models. In a BI solution
where business users and applications make frequent use of these data models for analysis, performance
of Analysis Services can be a significant factor in the overall success of the BI solution.
Lesson Objectives
After completing this lesson, you will be able to:
Queries. Queries are submitted by user applications to retrieve data from Analysis Services data
models. The specific details of query execution vary between multidimensional data models and
tabular data models, but in both cases, queries can be broken down into two subprocesses:
o Data retrieval. The data necessary to satisfy the query is extracted from the data model by the
storage engine.
o Calculations. The data is aggregated, sorted, and otherwise manipulated to satisfy the query
requirements by the formula engine.
Note: In addition to the core workloads in the preceding list, Analysis Services also supports
operational tasks, such as backup operations. However, the effect of these operations on
performance is generally not as significant as processing or query execution.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 10-11
Memory\LowMemoryLimit. This is the minimum amount of memory that Analysis Services requires.
Analysis Services will not release memory below this limit.
Memory\VertiPaqPagingPolicy. With the default value 1, data in the data model can be paged to
disk if the system runs low on physical memory resources. When this value is set to 0, all data in a
tabular model must remain in memory.
Memory\VertiPaqMemoryLimit. This setting specifies the maximum amount of physical memory
that can be used to store an in-memory data model. When Memory\VertiPaqPagingPolicy is set to
0, this setting specifies the maximum size of the data model. When Memory\VertiPaqPagingPolicy
is set to 1, this setting determines the maximum amount of memory for the data model beyond
which it will be paged to disk.
Flight Recorder
By default, Analysis Services uses a feature named Flight Recorder to log server activity into a short-term
log for troubleshooting purposes. This logging can incur a performance overhead, and should be disabled
in production servers unless you are using it to troubleshoot a specific problem. You can disable Flight
Recorder by setting the Log\FlightRecorder\Enabled server property to False.
Additional Reading: For more information about optimizing Analysis Services, see
“Microsoft SQL Server Analysis Services Multidimensional Performance and Operations Guide” at
http://social.technet.microsoft.com/wiki/contents/articles/11608.e-book-gallery-for-microsoft-
technologies.aspx#MicrosoftSQLServerAnalysisServicesMultidimensionalPerformanceandOperatio
nsGuide.
MCT USE ONLY. STUDENT USE PROHIBITED
10-12 Monitoring and Optimizing a BI Solution
Dynamic Management Views. SQL Server provides dynamic management views (DMVs) that you
can query to obtain information about activity in the server. Analysis Services includes the following
DMVs:
o $system.discover_commands. This view lists all currently running commands on the server.
MSAS11:Proc Temp file bytes The amount of temporary data written during
Aggregations written processing operations. Ideally this should be near to
zero.
Memory Available Kbytes The amount of memory available in the server and
Page Faults/sec the number of times memory is paged in from disk.
Network Interface Bytes Received/sec Data sent and received through network
Bytes Sent/sec connections.
Additionally, you can include the Memory Limit Hard KB, Memory Limit High KB, and Memory Limit
Low KB counters from the MSAS11:Memory object to compare memory usage to the limits set in the
server configuration. For an Analysis Services instance in tabular mode, the Memory Limit VertiPaq KB
counter shows the maximum data model memory allocation. To troubleshoot performance problems, you
can also include counters from the MSAS11:Thread object; and the MSAS11:MDX and MSAS11:Storage
Engine Query objects include counters that can be useful when troubleshooting query performance. If
you are troubleshooting I/O problems, you can also include additional counters from the Memory object
to estimate the size of the file system cache as well as the Physical Disk object.
Note: An important tip to follow when viewing graphical data in Performance Monitor is to
ensure that the counters being monitored are viewed at the same scale. You can use the built-in
Scale Selected Counters functionality in the list of counters below the chart, or you can set the
scale individually in the properties of each counter.
MCT USE ONLY. STUDENT USE PROHIBITED
10-14 Monitoring and Optimizing a BI Solution
2. A query is started.
5. If the data model includes aggregations that satisfy the subcube query, the aggregations are
retrieved. Otherwise, more granular data must be retrieved and aggregated.
6. If the required data is cached, the subcube queries get the data from the cache. Otherwise, the data is
retrieved from the stored dimension or measure group.
7. At the end of each subcube query, the results are passed to the query processor. The query processor
then begins to serialize the results, applying any additional aggregations, sort operations, or other
calculations as required.
9. When the query ends, the results are passed to the user session.
If the process spends more time in the query processor than in the storage engine, consider
optimizing the MDX or DAX query to reduce the number of calculations being performed.
If the query process spends more time in the storage engine than the query processor, consider
creating partitions in the data model, and defining attribute relationships in multidimensional
hierarchies.
If the query process spends more time in the storage engine than the query processor, and data is
seldom retrieved from aggregations, consider optimizing the aggregations in the cube based on
usage.
If the query process spends more time in the storage engine than the query processor, but data is
rarely retrieved from cache, investigate the memory resources, utilization, and configuration.
Lesson 4
Monitoring and Optimizing Reporting Services
Reporting Services provides a platform for delivering reports based on data from data models and the
data warehouse. Business users can use Reporting Services interactively, or receive reports automatically
through subscriptions. This lesson discusses considerations for monitoring and optimizing Reporting
Services performance.
Lesson Objectives
After completing this lesson, you will be able to:
Report rendering. When a user views or exports a report, or the report must be formatted for
delivery as a subscription, Reporting Services uses the appropriate rendering extension to render the
report into the required format.
WorkingSetMinimum. You can add this setting to the RSReportServer.config file to set the minimum
amount of system memory that Reporting Services must have allocated before it will start to release
memory resources. By default, this setting is 60 percent of the memory available on the server. If
Reporting Services is within this value, the level of memory pressure is considered low.
medium, and Reporting Services begins to refuse some requests for memory reduce memory
allocations.
MemoryThreshold. You can use this value to specify a percentage of WorkingSetMaximum that is
higher than MemorySafetyMargin. If Reporting Services exceeds this amount of memory, the level
of memory pressure is considered high, and Reporting Services begins to manage requests for
memory aggressively.
WorkingSetMaximum. You can add this setting to the RSReportServer.config file to set the
maximum amount of system memory that Reporting Services can use. By default, this setting is not
included in the RSReportServer.config file, and Reporting Services can access all memory available on
the server.
The following table describes how Reporting Services adapts the priority of reporting tasks based on the
memory thresholds defined by the settings in RSReportServer.config.
viewed in Report Manager. When Reporting Services is deployed in SharePoint Integrated mode, the
MSRS 2011 SharePoint Mode Web Service object provides the same counters.
2: Low
3: Medium
4: High
5: Maximum exceeded
When planning to use caching as a performance optimization technique, consider the following features
of cached objects in Reporting Service:
Cached reports are stored in an intermediate format that includes data and layout information, but
must still be rendered to the requested format.
Cached datasets and reports are based on specific parameter value combinations. A cached copy of
the report or dataset is created for each combination of parameter values requested.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 10-19
You can configure a cached object to expire after a specified interval (in minutes) or at a time
specified in a schedule. Schedules can be specific to an individual cached object, or shared across
multiple objects. When a cached object expires, it is removed from the cache and the next request
results in a new execution with live data.
You can preload a cached object by creating a cache refresh schedule, or by scheduling a subscription
for a cached report with a NULL delivery extension.
Creating snapshots
Cached reports can improve performance, but the cache is volatile and can change when reports, data
sources, or datasets are modified. If you require a more predictable way to generate reports that are
based on the data at a specific point in time, consider using snapshots.
A snapshot is a copy of a report in intermediate format that is created for a specific parameter
combination at a specific time. Snapshots are stored in the Report Server database (by default, named
ReportServer) and can be created on a regularly scheduled basis. By default, each snapshot is replaced
with the new version when the snapshot is created, but you can also create a report history that includes a
set of previous snapshots.
MCT USE ONLY. STUDENT USE PROHIBITED
10-20 Monitoring and Optimizing a BI Solution
Objectives
After completing this lab, you will be able to:
Ensure that the physical data structures in the database, such as indexes and statistics, are optimal for
the queries being executed.
Manage workload priorities and prevent user queries from adversely affecting data load resource
availability.
Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
o Set up data collection in MIA-SQLDW to log performance data to the management data
warehouse you created.
You can access Performance Monitor in the Computer Management tool. To open this tool in
Windows Server 2012, move the mouse pointer to the lower-left of the taskbar until the Start screen
image appears. Right-click the Start screen image, and then click Computer Management.
For counters that provide multiple instances, select the _Total instance.
Start the data collection set, and then run the LoadDW.dtsx package in the LoadPartition.sln SQL
Server Integration Services project to load the data warehouse.
After the load completes, stop the data collection set and view its latest report.
o SQL:BatchCompleted
o SQL:Stmnt:Completed
o ApplicationName
o DatabaseName
o Duration
o EndTime
o LoginName
MCT USE ONLY. STUDENT USE PROHIBITED
10-22 Monitoring and Optimizing a BI Solution
o Reads
o RowCounts
o SPID
o StartTime
o TextData
Apply a column filter so that events are recorded only when the DatabaseName column value is like
%AWDataWarehouse%.
Run the trace, and while it is running, run the RunDWQueries.cmd command file in the
D:\Labfiles\Lab10\Starter folder. This executes a script that runs queries in the data warehouse for
more than a minute.
After the command file finishes, stop the SQL Server Profiler trace and view the events it recorded.
Use the Database Engine Tuning Advisor tool to analyze the trace file you recorded in the
AWDataWarehouse database, and generate recommendations for indexes and views on the
assumption that aligned partitioning should be used and existing aligned partitioning structures
should be retained.
Low 0 50 0 50
Priority
High 20 90 20 90
Priority
Create the following workload group for the Low Priority resource pool.
CPU Memory
Maximum Memory
Name Importance Time Grant Time- Degree of Parallelism
Requests Grant %
(sec) out (sec)
User Low 10 50 50 20 1
Queries
Create the following workload group for the High Priority resource pool.
Memory
Maximum CPU Time Memory
Name Importance Grant Time- Degree of Parallelism
Requests (sec) Grant %
out (sec)
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 10-23
Memory
Maximum CPU Time Memory
Name Importance Grant Time- Degree of Parallelism
Requests (sec) Grant %
out (sec)
Use the following Transact-SQL code to create a classifier function named dbo.fn_classify_apps that
returns the string “User Queries” if the application name in the current session is “SQLCMD”; or “ETL”
if the current application is named “SQL Server”. Alternatively, you can execute the Classifier
Function.sql script file in the D:\Labfiles\Lab10\Starter folder.
USE master;
GO
CREATE FUNCTION dbo.fn_classify_apps() RETURNS sysname
WITH SCHEMABINDING
AS
BEGIN
DECLARE @retval sysname
IF (APP_NAME() LIKE '%SQLCMD%')
SET @retval = 'User Queries';
IF (APP_NAME() LIKE '%SQL Server%')
SET @retval = 'ETL';
RETURN @retval;
END
GO
Run the RunDWQueries.cmd command file in the D:\Labfiles\Lab10\Starter folder and observe the
counters in performance monitor.
With the RunDWQueries.cmd command file still running, run the RunETL.cmd command file to
simulate ETL activity, and observe the counters in performance monitor.
Note that the CPU control effect % for both workloads increases as Resource Governor prioritizes
CPU resources for the ETL workload.
o collection_set_1_noncached_collect_and_upload
o collection_set_2_upload
o collection_set_3_upload
MCT USE ONLY. STUDENT USE PROHIBITED
10-24 Monitoring and Optimizing a BI Solution
View the Server Activity History data collection report, which is generated from the management data
warehouse.
Use the interactive zoom-in icon to filter the report for shorter time periods, and note the activity that
has been recorded.
After you finish reviewing the server activity, disable data collection for the MIA-SQLDW server
instance.
Results: At the end of this exercise, you will have a Performance Monitor report showing activity during
an ETL data load and recommendations from the Database Tuning Advisor based on a SQL Server
Profiler trace. You will also have created resource pools and workload groups for Resource Governor,
and generated server health data with the Data Collector.
o Query Begin.
o Query End.
o Query Subcube.
o EventSubclass
o TextData
o ApplicationName
o Duration
o DatabaseName
o ObjectName
o SPID
o CPUTime
Apply a column filter so that events are recorded only when the DatabaseName column value is like
AWSalesMD.
Change the view in Performance Monitor to show the counter values as a report. The initial value for
the Total cells calculated counter should be zero.
Use SQL Server Management Studio to open and execute the MDX Query.mdx script in the
D:\Labfiles\Lab10\Starter folder.
Stop the SQL Server Profiler trace and freeze the display in Performance Monitor.
View the events in the SQL Server Profiler trace, and compare the amount of time spent in the
storage engine (querying subcubes) with the amount of time spent serializing the results.
Note: The results indicate that the query spent significantly more time manipulating the data than
retrieving it from the storage engine, and a very large number of cells were calculated during the
execution of the query. The most appropriate way to improve the query performance is to optimize the
MDX and reduce the number of calculations being performed.
Use SQL Server Management Studio to open and execute the Revised MDX Query.mdx script in the
D:\Labfiles\Lab10\Starter folder.
Stop the SQL Server Profiler trace and freeze the display in Performance Monitor.
View the events in the SQL Server Profiler trace, and compare the amount of time spent in the
storage engine (querying subcubes) with the amount of time spent in the formula engine (serializing
the results).
MCT USE ONLY. STUDENT USE PROHIBITED
10-26 Monitoring and Optimizing a BI Solution
Note: The revised version of the query uses a WITH SET statement to sort the resellers by revenue
before applying the RANK function. This enables the query processor to use a linear hash scan to find
each reseller’s position in the ordered list, dramatically reducing the number of calculations required
to produce the results.
Results: At the end of this exercise, you will have created a SQL Server Profiler trace and used
Performance Monitor to view Analysis Services performance data while executing an MDX query.
Verify that the items have been deployed to the Reports document library in the SharePoint Server
site at http://mia-sqlbi/sites/adventureworks.
Modify the deployed AWDataWarehouse data source so that it uses the following stored Windows
credentials:
o Password: Pa$$w0rd
View the Reseller Sales report and verify that it displays sales data for the previous month.
View the counter values as a report, and note their initial values.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 10-27
o RPC:Completed
o SQL:BatchCompleted
o TextData
o ApplicationName
o CPU
o Duration
o SPID
o StartTime
o BinaryData
o DatabaseName
Apply a column filter so that events are only recorded when the DatabaseName column value is like
%AWDataWarehouse%.
In Performance Monitor, verify that the total number of report executions has increased by one.
In SQL Server Profiler, note the Transact-SQL queries that were executed in the data warehouse. These
should include:
o A query to retrieve the default StartDate and EndDate parameter values.
Change the StartDate and EndDate parameters and view the report again.
View the Performance Monitor counter values and SQL Server Profiler events, and then stop the SQL
Server Profiler trace, but keep it open so you can restart it later.
o Refresh the cache on a custom schedule as 12:00 AM on the first day of every month.
View the Reseller Sales report with the default parameter values and note the counter values in
Performance Monitor and the events in the SQL Server Profiler trace. These should be the same as
before.
Change the StartDate and EndDate parameters and view the report again.
View the Performance Monitor counter values and SQL Server Profiler events. This time, the number
of cache hits should have increased, and the query to retrieve the default parameter values was not
executed.
MCT USE ONLY. STUDENT USE PROHIBITED
10-28 Monitoring and Optimizing a BI Solution
Stop the SQL Server Profiler trace, but keep it open so you can restart it later.
View the Reseller Sales report with the default parameter values and note the counter values in
Performance Monitor and the events in the SQL Server Profiler trace.
o There should be a cache miss (because the report is configured to be cached, but this is the first
time it has been executed since caching was configured) and the query to retrieve the report
data should have been executed in the data warehouse.
Change the StartDate and EndDate parameters and view the report again.
View the Performance Monitor counter values and SQL Server Profiler events.
o There is another cache miss and the query is executed in the data warehouse, because the report
that was cached during the previous execution used different parameter values.
View the Reseller Sales report with the default parameter values and note the counter values in
Performance Monitor and the events in the SQL Server Profiler trace.
o This time, the report had previously been cached with the requested parameter values, so the
cached copy was rendered and no query was executed in the data warehouse.
Change the StartDate and EndDate parameters to the same values you used previously and view the
report again.
View the Performance Monitor counter values and SQL Server Profiler events.
o The cached copy of the report that was generated the last time you used these parameter values
is rendered, so no query was executed in the data warehouse.
Stop the SQL Server Profiler trace and close all windows when you have finished.
Results: At the end of this exercise, you will have deployed Reporting Services items to a SharePoint
Server document library, and configured caching for a dataset and a report.
Question: How might the classifier function you would create to prioritize ETL workloads in
a real solution differ from the one used in the lab?
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 10-29
Question: In this module, you have considered the components of SQL Server that must be
monitored and optimized in a BI solution. What other elements of the solution should you
monitor and troubleshoot in the event of performance problems?
MCT USE ONLY. STUDENT USE PROHIBITED
10-30 Monitoring and Optimizing a BI Solution
MCT USE ONLY. STUDENT USE PROHIBITED
11-1
Module 11
Operating a BI Solution
Contents:
Module Overview 11-1
Module Overview
Much of the emphasis when designing a business intelligence (BI) solution is on meeting the functional
and performance-related business requirements. However, you must also consider the ongoing
operational requirements for the solution, and plan suitable strategies to maintain the various elements of
the BI infrastructure. This module describes some of the main considerations for operating a BI solution.
Objectives
After completing this module, you will be able to:
Lesson 1
Overview of BI Operations
BI operations should be considered at the very beginning of a BI project so that you design a solution that
can be maintained within the constraints of the business and IT environment in which it will be deployed.
This lesson discusses some core considerations for planning BI operations.
Lesson Objectives
After completing this lesson, you will be able to:
Descriptions and procedures for maintenance tasks that need to be performed in each subsystem of
the BI solution.
Scheduling and dependency information about the order and frequency with which tasks must be
performed.
Descriptions of operators and notifications used to track the success or failure of automated tasks.
You should compile the operations manual as the BI solution is designed and implemented, and ensure
that it is kept up to date when the solution is deployed into production.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 11-3
SQL Server Agent jobs. The SQL Server Agent is a task and notification automation engine for
managing one or more SQL Server instances in a datacenter. With the SQL Server Agent, you can
schedule database maintenance jobs that consist of multiple steps, and alert operators of their
success or failure.
SQL Server Integration Services (SSIS) packages. Although SSIS packages are generally regarded as
data flow solutions for extract, transform, and load (ETL) processes; they can also include a wide range
of control flow tasks to perform maintenance and configuration operations.
SharePoint Server timer jobs. If your BI solution includes SharePoint Server, operational tasks
required to maintain the SharePoint Server environment and refresh PowerPivot data can be
implemented as timer jobs that are defined in SharePoint Central Administration.
The Windows task scheduler. If your BI solution includes a custom task such as a PowerShell script
or command line call to the bulk copy program (BCP), you can use the Windows task scheduler to
automate this task if none of the previously listed solutions is suitable.
You should note that some tasks can be performed by using a more than one of the listed options. For
example, you can use a SQL Server Agent job to process an Analysis Services cube by including an
Analysis Services Command step that runs an XMLA command. Alternatively, you could run the XMLA
command by scheduling a PowerShell script, or by creating an SSIS package that includes the Analysis
Services Processing task in a control flow. It is also common to combine the technologies, for example
by creating a SQL Server Agent job that runs an SSIS package on a scheduled basis.
Additional Reading: For more information about automating database maintenance tasks
with the SQL Server Agent, attend Course 10775A: Administering Microsoft SQL Server 2012
Databases.
MCT USE ONLY. STUDENT USE PROHIBITED
11-4 Operating a BI Solution
Data flow dependencies. As the previous bullet point indicates, a data flow-driven BI solution
creates operational dependencies between the steps in the data flow. This operational flow
determines the order in which tasks should be performed. For example, an ETL process might load a
large volume of data into a data warehouse, and in doing so invalidate the data distribution statistics
for tables and cause fragmentation of indexes; and so create a requirement for index and statistics
maintenance tasks. Additionally, after a load the data warehouse contains a significant volume of new
data that is not reflected in data models and cached reports, so data models need to be processed
and cached reports refreshed. Finally, the new data in the data warehouse should be backed up to
ensure recoverability in the event of a disaster. There would be little point in scheduling a data model
process or to occur before the data warehouse load operation; or in refreshing a cached report that
uses data from a data model before the model is processed.
Operational windows. Another constraint that you must consider when planning operational tasks is
the period of time in which the operations must be completed. For example, a business requirement
might be that all reporting data is updated each month, with the previous month’s data being
reflected in reports by 9.00 am on the first of the month. The ETL process to transfer the previous
month’s data might start at midnight on the first of each month and take several hours, and there
may not be sufficient time to update all indexes and statistics, perform a full process of all data
models, refresh all cached reports, and backup the data warehouse before the 9.00 am deadline. If
the volume of data in your ETL solution results in a requirement for a larger operational window than
the business can support, you must consider alternative designs for the operational schedule. For
example, you could consider the following options:
o Transfer new data into the data warehouse incrementally each day instead of in a single load on
the first of the month, but do not reprocess data models and refresh cached reports until the
start of the new month.
o Partition the data model and perform an incremental process of modified dimensions and
measure group partitions instead of processing the entire data model.
o Delay index and statistics maintenance until a later date in the month, and accept that query
performance might be degraded until it has been performed.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 11-5
o Delay backing up the database until later in the month, and retain data that as staged by the ETL
process until the backup has been performed to support recovery in the event of a database
failure.
MCT USE ONLY. STUDENT USE PROHIBITED
11-6 Operating a BI Solution
Lesson 2
ETL Operations
The ETL process drives the ongoing operations of the entire BI solution, and includes some specific tasks
that must be performed during the data flow from source systems to the data warehouse. The ETL process
is primarily concerned with the data flow from source systems to the data warehouse. The tasks to
perform the data flow are typically implemented as SSIS packages, which can be deployed individually to
the file system of to the msdb database on a SQL Server instance, or as a project to an SSIS catalog.
Lesson Objectives
After completing this lesson, you will be able to:
The following table lists the key differences between the package deployment model and the project
deployment model.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 11-7
Storage Packages and all associated files A project is deployed to the SSIS
can be copied to the file system of catalog, or a folder within the catalog,
a local or remote computer, or of an instance of SQL Server.
packages can be deployed to the
MSDB database of an instance of
SQL Server.
Compiled format Packages and associated resources The entire project is compiled as a
are each stored as single files in the single file (with an .ispac extension).
file system. The entire project
might comprise of many files
Troubleshooting To log events you must add a log Events are automatically logged and
provider to the package and saved to the catalog. These events can
configure logging for each package then be displayed with views such as
individually. catalog.executions and
catalog.event_messages.
When you execute the project packages and specify the Test environment, the data is loaded from
\\TestSrv1\TestData into a database on the TestDBSrv server. Executing the same packages in the
Production environment results in the data being loaded from \\AccountsSrv1\Data to a database in the
DWSrv server.
MCT USE ONLY. STUDENT USE PROHIBITED
11-8 Operating a BI Solution
The user account context in which the package should run. When a package will run on a
scheduled basis, it must use an appropriate user context that has the necessary permissions to
perform the tasks in the package and any child packages that it executes. When scheduling SSIS
package execution as a SQL Server Agent job step, you can use the SQL Server Agent service account
or a proxy account that has the required permissions. When planning package execution context, you
should apply the principle of “least privilege” and use an account that has the required permissions
but no more.
The environment or configuration that should be applied. As described in the previous topic, you
can use environments to apply dynamic configuration values to the packages in a deployed project. If
a package is deployed in the package deployment model, you can use configurations to achieve a
similar abstraction of values for settings. Whichever approach is used, you must know which
environment or configuration to apply when executing the package automatically.
The operators who should be notified of the outcome of the execution. When a package is run
on a scheduled basis, it is typically executed in an unattended environment with nobody to observe
its success or failure. You should therefore plan to notify operators by email of the package execution
outcome. In some cases, the package itself might include control flow logic that uses the Send Mail
task to notify operators of specific task outcomes, and these can be supplemented with operator
notifications from the SQL Server Agent.
occurred, the duration of each event, the parameter values that were used, and a performance
comparison with previous executions of the same package.
Integration Services Dashboard. This report provides a central summary that shows details of
package executions. For each package execution listed in this report, you can drill into three
subreports: Overview, All Messages, and Execution Performance.
All Executions. This report provides details of all package executions on the server, and can be
filtered to show executions within a specified date range.
All Connections. This report shows details of all connections that have been used in package
executions, including connection strings and whether the connection failed or succeeded.
All Operations. This report shows details of all operations that have been performed on the server,
including package deployments, executions, and other administrative operations.
All Validations. This report shows details of all validations that SSIS has performed for packages.
Note: In addition to the standard reports listed here, you can create custom reports that
retrieve information from dynamic management views, and publish the .rdl file for your report to
the SSISDB node under the Integration Services Catalog node in SQL Server Management
Studio. For information about creating and publishing custom SQL Server Management Studio
reports, see SQL Server Books Online.
USE SSISDB
GO
BACKUP MASTER KEY TO FILE = 'C:\Keys\SSISDBKey’
ENCRYPTION BY PASSWORD = ‘Pa$$w0rd’
As well as backing up the master key, you should generate Transact-SQL scripts to recreate the following
objects that are used by the SSIS catalog in case you need to restore the catalog to a new server.
Additional Reading: For step by step instructions to restore the SSIS catalog on a new
server, see “Backup, Restore, and Move the SSIS Catalog” in SQL Server Books Online.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 11-11
Lesson 3
Data Warehouse Operations
The data warehouse in a BI solution is a database, and requires the same operational maintenance as any
database. Unlike most business application databases, the data warehouse is generally used for query
operations with no user-driven data modifications. However, the ETL process periodically inserts and
updates large numbers of rows.
Updating data distribution statistics to ensure that the query optimizer can choose the best query
plans.
Managing partitioned tables as new data is inserted or to consolidate and archive old data.
Additionally, you must monitor database server health and disk usage, extending file groups as
necessary to avoid running out of disk space.
Additional Reading: For more information about monitoring database server health,
attend Course 10775A: Administering Microsoft SQL Server 2012 Databases
Lesson Objectives
After completing this lesson, you will be able to:
Managing Indexes
As data is added to the data warehouse, the
indexes you have created on tables can become
fragmented. This fragmentation can be
detrimental to query performance, and must be
reduced through periodic reorganizing or
rebuilding of indexes. Depending on the
technique used to load the tables in the data
warehouse, some indexes may be dropped and
recreated by the ETL process, thereby reducing the
likelihood of fragmentation. For example, a
columnstore index on a partitioned table that is
loaded using the SWITCH statement is created for
each new partition. Other indexes however can suffer from fragmentation and should be maintained
periodically.
MCT USE ONLY. STUDENT USE PROHIBITED
11-12 Operating a BI Solution
avg_fragmentation_in_percent. The percent of pages in the index that are out of sequential order.
fragment_count. The number of fragments containing physically consecutive leaf pages in the index.
Managing fragmentation
Depending on the level of fragmentation detected by the sys.dm_db_index_physical_stats system
function, you can reorganize or rebuild the index. Reorganizing an index physically reorders the leaf-level
pages of clustered and nonclustered indexes, and is usually faster and less resource-intensive than
performing a full rebuild of the index. An index is always kept online while being reorganized. If the index
is highly fragmented, you can rebuild it. Depending on the specific index, you may be able to keep it
online during the rebuild operation, but in some cases indexes must be taken offline to be rebuilt.
You must assess the resource and availability impact of reorganizing or rebuilding indexes against the
performance degradation caused by fragmentation and decide what action to take. The specific
thresholds for action will vary between solutions, but the following guidelines provide a good starting
point for evaluating your own requirements.
If the avg_fragmentation_in_percent value is between 5 percent and 30 percent, use the ALTER
INDEX REORGANIZE statement to reorganize the index.
If the avg_fragmentation_in_percent value is greater than 30 percent, use the ALTER INDEX
REBUILD statement. Where possible, use the WITH (ONLINE = ON) clause to perform the rebuild
without taking the index offline.
Maintaining Statistics
The SQL Server query optimizer uses statistics
about the distribution of data in a table, view, or
index to select appropriate query execution plans.
In most cases, you should rely on the
AUTO_CREATE_STATISTICS and
AUO_UPDATE_STATISTICS settings to ensure that
SQL Server automatically creates and maintains
statistics. However, in some cases, particularly after
a large data load, you might improve query
performance by proactively updating statistics on
specific objects or across the entire data
warehouse.
updates the statistics on all columns and indexes in the dbo.DimProduct table by sampling 25 percent of
the rows in the table.
If no SAMPLE option is specified, SQL Server calculates an appropriate sample size based on the size of
the object. You can specify a RESAMPLE option to update the statistics with the same sample settings that
were used during the previous update. Alternatively, to force a full scan of the object, you can specify the
FULLSCAN option.
The following code sample updates the statistics for all objects as necessary in the AWDataWarehouse
database.
USE AWDataWarehouse;
GO
EXEC sp_updatestats;
Consolidating partitions. As well as creating new partitions for data loads, a data warehouse might
use a different partitioning interval for older data than for current data. For example, you might
design a partitioned table to include a partition for each week of the current month, and a single
MCT USE ONLY. STUDENT USE PROHIBITED
11-14 Operating a BI Solution
partition for each previous month. At the start of each new month, you must merge the weekly
partitions for the previous month into a single monthly partition.
Using a sliding window to archive old data. Over time, a data warehouse can grow extremely large
with a lot of historical data. In cases where extremely old data is of little reporting or analytical value,
you may decide to archive or simply delete it. When the data is stored in a partitioned table, you can
use a sliding window archive process that uses switch, merge, and split operations to move partitions
containing old data to an archive table.
Additional Reading: For information about how to automate a sliding window archival
process, see “How to Implement an Automatic Sliding Window in a Partitioned Table on SQL
Server 2005” at http://msdn.microsoft.com/en-us/library/aa964122(SQL.90).aspx.
One of the advantages of including read-only filegroups in partial backup strategy is that it enables you
to perform a piecemeal restore. In a piecemeal restore, you can recover read/write filegroups and make
them available to users for querying before the recovery of read-only filegroups is complete.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 11-15
Demonstration Steps
Implement a partial backup strategy
1. Ensure that the 20467A-MIA-DC and 20467A-MIA-SQLBI virtual machines are both running, and then
log on to 20467A-MIA-SQLBI as ADVENTUREWORKS\Student with the password Pa$$w0rd.
2. In the D:\Demofiles\Mod11 folder, right-click Setup.cmd and click Run as administrator. When
prompted, to allow the program to make changes, click Yes.
3. Start SQL Server Management Studio and connect to the localhost instance of the database engine
by using Windows authentication.
5. Select the code under the comment View filegroups and partitions, and then click Execute. This
retrieves information about the filegroups and partitions in the DemoDW database.
6. Select the code under the comment Make inactive filegroups read-only, and then click Execute.
This marks the filegroups containing partitions with inactive data as read-only.
7. Select the code under the comment Backup inactive filegroups, and then click Execute. This backs
up the read-only filegroups.
8. Select the code under the comment Backup read/write filegroups, and then click Execute. This
backs up the read/write filegroups.
9. Select the code under the comment Perform a data load (from line 45 to line 74) and then click
Execute. This performs ETL tasks to load a new row into a dimension table and a new partition in a
fact table.
10. Select the code under the comment Make loaded filegroup read-only and back it up and then
click Execute. This marks the newly loaded filegroup as read-only and backs it up.
11. Select the code under the comment Perform a differential backup of read/write filegroups and
then click Execute. This creates a differential backup of the read/write filegroups.
1. Select the code under the comment Simulate a disaster and then click Execute. This drops the
DemoDW database.
2. Select the code under the comment Restore the initial full backup with the partial option and
then click Execute. This restores the original full backup and specifies that the database will be
recovered from partial backups.
3. Select the code under the comment Restore the read/write filegroups and recover and then click
Execute. This restores the differential backup of the read/write filegroups.
4. Select the code under the comment Access read/write data and then click Execute. This queries the
dimension table, which is on the PRIMARY filegroup and has been recovered.
5. Select the code under the comment Restore the read-only filegroups (from line 118 to line 141)
and then click Execute. This restores the read-only filegroups, which were backed up individually.
6. Select the code under the comment Access read-only data and then click Execute. This queries the
fact table to verify that it has been restored.
MCT USE ONLY. STUDENT USE PROHIBITED
11-16 Operating a BI Solution
Lesson 4
Analysis Services Operations
Analysis Services provides the data models on which business analysis and reporting can be based.
Analysis Services can be installed in multidimensional or tabular mode depending on the kind of data
model required. However, regardless of the installation mode, there are some Analysis Services
maintenance and management tasks that must be included in your operations plan. Common Analysis
Services operations include:
This lesson describes the key operational tasks that you must perform to maintain an Analysis Services
server in a BI solution.
Lesson Objectives
After completing this lesson, you will be able to:
Creating new partitions as new data is loaded from the data warehouse. For example, if a data
model contains a partition for each month and the data warehouse on which it is based is loaded
monthly, a new partition must be created each month before processing the data model to add the
new data.
Merging partitions as data becomes less active. For example, if a data model includes a partition
for each month of the current year and a single partition for each previous year, at the beginning of
each new year, the monthly partitions for the previous year must be merged.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 11-17
Create an Analysis Services Command job step in a SQL Server Agent job.
Include an Analysis Services Execute DDL Task component in an SSIS control flow.
Use the Invoke-ASCmd cmdlet to execute the script from a PowerShell script (you must import the
sqlps and sqlascmdlets modules to use this cmdlet).
To generate the XMLA script, use SQL Server Management Studio to configure the partition and script the
action to a file. You can then modify the script as required before executing it, for example to change the
partitioning criteria when creating a new partition.
Processing options in a
multidimensional data model
To refresh the data in a multidimensional model,
you can process the entire database, a cube, a dimension, a measure group, or a partition. When
processing objects in a multidimensional model, you can specify the following processing modes.
Process Default. This option detects the current processing state of the objects, and processes them
only if required.
Process Full. This option drops the data in all objects and processes them by loading all data from
the source tables.
Process Clear. This option removes all data from objects and leaves them empty.
Process Data. You can use this option to process a dimensions, cube, measure group, or partition by
refreshing the data in the object without creating aggregations or indexes.
Process Add. You can use this option to process dimensions, measure groups, and partitions. For
dimensions, this option adds new members and updates dimension attribute captions and
descriptions. For measure groups and partitions, this option adds new fact data and process only the
relevant partitions.
Process Update. You can use this option to process a dimension by forcing a re-read of data and an
update of dimension attributes.
Process Index. You can use this option to process cubes, dimensions, measure groups, and partitions
by rebuilding indexes and aggregations for all processed partitions. For previously unprocessed
objects, this option generates an error.
MCT USE ONLY. STUDENT USE PROHIBITED
11-18 Operating a BI Solution
Note: If you have created structures for data mining, you can also process them. For more
information about data mining, attend Course 10778: Implementing Data Models and Reports
With SQL Server 2012.
Process Default. This option detects the current processing state of the objects, and processes them
only if required.
Process Full. This option drops the data in all objects and processes them by loading all data from
the source tables.
Process Clear. This option removes all data from objects and leaves them empty.
Process Recalc. This option can only be used to process a database and updates and recalculates
hierarchies, relationships, and calculated columns.
Process Data. This option can be used when processing a table or a partition, and loads data into the
object without rebuilding hierarchies or relationships or recalculating calculated columns and
measures.
Process Defrag. This option can be used to process a table and defragment its indexes.
Process Add. This option can be used to process a partition by incrementally loading new data into
it.
Processing only the dimensions and measure groups in a multidimensional model, or tables in a
tabular model for which new data has been loaded.
Performing an incremental process that only adds new data to the model.
When evaluating either of these options, you must determine whether a partial or incremental processing
mode will maintain the full integrity of all data and aggregations in the data model.
Additional Reading: For more information about choosing a processing mode for
multidimensional data models, see “Analysis Services Processing Best Practices” at
http://msdn.microsoft.com/en-US/library/cc966525.
Automating processing
You can automate processing by creating an XMLA script to process the appropriate object and executing
it by using one of the following methods:
Create an Analysis Services Command job step in a SQL Server Agent job.
Include an Analysis Services Execute DDL Task component in an SSIS control flow.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 11-19
Use the Invoke-ASCmd cmdlet to execute the script from a PowerShell script (you must import the
sqlps and sqlascmdlets modules to use this cmdlet).
Alternatively, you can use the Analysis Services Processing Task component in an SSIS control flow.
However, this component is designed for multidimensional data models, and may not support all
processing options for tabular data models.
Backing up an Analysis Services database does not back up the source data on which it is based
(usually a data warehouse). You should implement a separate backup process for source data.
As well as performing a backup of the Analysis Services database, you should maintain the source
projects for your data models in a source control system so that should a restore from a backup fail,
you can recover the data model by redeploying the project and processing it from the source data.
Performing a backup
The Analysis Services backup process can be invoked interactively in SQL Server Management Studio or
automated by executing an XMLA script that includes the Backup element. When you back up an
Analysis Services database, you can optionally choose to compress the backup file and encrypt it with a
password that must be supplied in order to restore the backup.
MCT USE ONLY. STUDENT USE PROHIBITED
11-20 Operating a BI Solution
Lesson 5
Reporting Services Operations
Reporting Services provides a platform for publishing and delivering reports. This lesson describes
considerations for planning the operational tasks required to support Reporting Services in a BI solutions.
Lesson Objectives
After completing this lesson, you will be able to:
Shared schedules make it easier to manage multiple tasks that should be performed at the same regular
interval. For example, in a reporting solution that is based on a data warehouse where new data is loaded
each month, you could use a shared schedule to perform the following tasks at the beginning of the
month:
Refresh specific cached datasets and reports to include the new data.
Force expiration of other cached datasets and reports so that future requests will retrieve new data.
Schedules depend on the SQL Server Agent. If you stop the SQL Server Agent, no scheduled tasks will
be performed.
Using shared schedules enables you to centrally pause, resume, and modify multiple scheduled tasks
in a single location. Using object-specific schedules entails managing each schedule individually.
A report server uses the time zone of the computer on which it is installed, regardless of the time
zone configuration of client computers that access the report server. All schedules use the local time
of the server on which they are defined.
If you change the time zone of a server on which Reporting Services is installed in native mode, you
must restart the Report Server service for the time zone change to take effect. When you change the
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 11-21
time zone of a report server, existing schedules retain the same times in the new time zone. For
example, a task that was scheduled to run at 2:00 in the old time zone will be scheduled to run at
2:00 in the new time zone.
Time zone settings for a report server installed in SharePoint integrated mode are determined by the
SharePoint Server regional settings.
Note: The database names for a specific instance may be different from the default names,
but the names will always match the same naming convention – the database for temporary
objects will always be the name of the primary database with “TempDB” appended.
Note: In the event of a hardware failure, after restoring the ReportServer database and
recreating the ReportServerTempDB database, you must restore the encryption key on all
report servers that use the database.
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 11-23
Objectives
After completing this lab, you will be able to:
In the catalog, create a folder named DW ETL with the description Folder for the Adventure Works
ETL SSIS Project.
o Set the ServerName parameter for the AWDataWarehouse connection manager to the
DWServer environment variable.
o Set the ServerName parameter for the Staging connection manager to the StagingServer
environment variable.
Results: At the end of this exercise, you will have an SSIS catalog that contains environments named
Test and Production, and you will have deployed the LoadPartition SSIS project to the SSIS catalog.
o Tip: You can generate a script to perform offline rebuilds of a table’s indexes by right-clicking the
Indexes folder in Object Explorer, clicking Rebuild, and scripting the action to a new query
window. You can then edit this script to perform the rebuilds online.
Add a statement to the end of the script that executes the sp_updatestats system stored procedure
to update all statistics.
You can generate an XMLA script to process a cube from the Process Cube dialog box, which you
open by connecting to Analysis Services in Object Explorer, right-clicking the cube, and clicking
Process.
o A step that runs the LoadDW.dtsx Integration Services package, which is in the LoadPartition
SSIS project you deployed to the SSIS catalog on MIA-SQLDW previously. The package should be
executed in the Test environment.
o A step that executes the Transact-SQL script you created previously to rebuild indexes and
update statistics.
o A step that runs the Analysis Services command XMLA script to process the Sales cube you
created earlier in the MIA-SQLBI Analysis Services server.
Schedule the job to run at 12:00 on the first day of every month.
Results: At the end of this exercise, you will have a SQL Server Agent job named Data Warehouse
Load.
o The Integration Services Dashboard report is listed among the standard reports for the SSISDB
catalog.
Verify that the most recent execution succeeded, and view the overview report for the execution.
In the overview report for the most recent execution, verify that the parameters used for the
AWDataWarehouse.ServerName and Staging.ServerName parameters were the values you
specified in the Test environment.
View the performance statistics for the package execution and note its duration.
Results: At the end of this exercise, you will have executed a job, reviewed job history, and reviewed SSIS
catalog reports.
Question: How might the operations solution you created in the lab have differed if the
measure groups in the cube were partitioned on the same basis as the fact tables in the
relational database?
Question: If the volume of data to be loaded and processed was significantly larger, or the
time period available for performing the ETL load was shorter, how might you change the
solution you created in the lab?
MCT USE ONLY. STUDENT USE PROHIBITED
Designing Business Intelligence Solutions with Microsoft SQL Server 2012 11-27
Question: As a BI specialist, your involvement in a BI solution may end when the solution is
deployed into production. How can you ensure that the IT personnel who will support the
solution are able to manage and troubleshoot the necessary operational tasks?
MCT USE ONLY. STUDENT USE PROHIBITED
11-28 Operating a BI Solution
Course Evaluation
Your evaluation of this course will help Microsoft understand the quality of your learning experience.
Please work with your training provider to access the course evaluation form.
Microsoft will keep your answers to this survey private and confidential and will use your responses to
improve your future learning experience. Your open and honest feedback is valuable and appreciated.
MCT USE ONLY. STUDENT USE PROHIBITED
L1-1
2. Discuss the interviews and identify as many business requirements as you can.
3. Open Requirements Matrix.docx in the D:\labfiles\Lab01\Starter folder.
4. Based on the available information, assess the business value and feasibility of the requirements.
Record these in Requirements Matrix.docx.
Results: At the end of this exercise, you should have created a matrix that shows the relative value and
feasibility of the business requirements for the BI solution.
MCT USE ONLY. STUDENT USE PROHIBITED
L1-2 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
3. Document your software suggestions and the rationale for your choices.
Results: At the end of this exercise, you should have a list of suggested software components for the BI
solution.
MCT USE ONLY. STUDENT USE PROHIBITED
L2-1
5. Click Yes when prompted to confirm that you want to run the command file, and then wait for the
script to finish.
o SharePoint Server
3. On the taskbar, click the Visio 2013 icon to open Microsoft Visio.
4. Use Visio to document your server infrastructure design. Save the file as BI Topology.vsdx in the
D:\Labfiles\Lab02\Starter folder.
5. Close Visio.
Results: At the end of this exercise, you should have a Visio diagram that documents your server
infrastructure design.
MCT USE ONLY. STUDENT USE PROHIBITED
L2-2 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
2. When prompted, connect to the database engine on the localhost instance by using Windows
authentication.
3. On the File menu, point to Open, and then click File. Browse to the D:\Labfiles\Lab02\Starter folder,
select Create Benchmark DB.sql, and then click Open.
8. Add the logical reads value for the two queries together, and then divide the result by two to find
the average.
9. Add the CPU time value for the two queries together, and then divide the result by two to find the
average. Divide the result by 100 to convert it to seconds.
11. Calculate the number of cores required to support a workload with an average query size of 500 MB,
10 concurrent users, and a target response time of 20 seconds:
((500 / MCR) * 10) / 20
12. Close SQL Server Management Studio without saving any files.
2. In any blank cell, use the following formula to calculate the number of cores required for the given
workload figures:
=((B6/C3)*B7)/B8
3. Based on the results of the preceding formula, recommend the number and type of processors to
include in the data warehouse server.
4. Calculate the volume of fact data in gigabytes (estimated fact rows x bytes per row, divided by
100,000), and add 50 GB for indexes and dimensions. Then divide the result by 3 to allow for a 3:1
compression ratio. The resulting figure is the required data storage.
5. Add 50 GB each for log space, TempDB storage, and staging data to calculate the total data volume.
6. Assuming an annual data growth of 150 GB, calculate the required storage capacity in three years.
7. Based on the data volume and CPU requirements, suggest a suitable amount of memory for the
server.
8. In the D:\Labfiles\Lab02\Starter folder, double-click Storage Options.docx and review the available
options for storage hardware. Then, based on the storage requirements you have calculated, select a
suitable storage option for the data warehouse.
MCT USE ONLY. STUDENT USE PROHIBITED
L2-3
9. Record your recommendations in DW Hardware Spec.xlsx, and then close Excel and Word.
Results: After this exercise, you should have a completed worksheet that specifies the required hardware
for your data warehouse server.
MCT USE ONLY. STUDENT USE PROHIBITED
MCT USE ONLY. STUDENT USE PROHIBITED
L3-1
2. Discuss the interviews, and identify the business processes in Adventure Works that generate the data
required to meet the analytical and reporting requirements.
3. Prioritize the business processes by their importance to the business in terms of analytical and
reporting requirements.
4. On the taskbar, click the Excel 2013 icon to start Excel, and then open Matrix.xlsx in the
D:\Labfiles\Lab03A\Starter folder.
5. In Excel, under the heading Business Processes, enter the business processes you have identified in
descending order of priority.
2. On Object Explorer, expand Databases, expand ResellerSales, and expand Tables. This folder
contains the tables that are defined in the ResellerSales database.
3. Right-click Database Diagrams, and then click New Database Diagram. If you are prompted to
create the required support objects, click Yes.
4. Click the first table in the list, hold Shift and click the last table in the list, and then click Add.
5. Click Close, and then view the diagram to familiarize yourself with the database schema.
6. If you want to view the data in a table, right-click the table in Object Explorer, and then click Select
Top 1000 Rows. You can then modify the Transact-SQL code (for example, by deleting the TOP 1000
clause) and re-executing the query to view specific data values.
MCT USE ONLY. STUDENT USE PROHIBITED
L3-2 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
o ProductsMDS.
o InternetSales.
o Marketing.
9. Use File Explorer to view the contents of the D:\Accounts folder, and then double-click each file in this
folder to open them in Microsoft Excel.
2. Under the heading Dimensions, next to the existing Time dimension, enter the dimensions you
believe can be supported by the data and meet the analytical and reporting requirements.
3. Indicate which dimensions relate to which business processes by entering “x” in the intersecting cell.
Results: At the end of this exercise, you will have created a matrix of business processes and dimensions.
MCT USE ONLY. STUDENT USE PROHIBITED
L3-3
3. Add a circle shape and place it at the center of the drawing. Double-click the shape and type an
appropriate fact table name for the measures generated by the highest priority business process you
documented in Matrix.xlsx. Then, hold the Ctrl key and press Enter to create a new line.
4. Type the names of each measure to be included in the fact table on a new line. After you finish, press
Enter.
5. Add a rectangle shape for each dimension that is related to the business process based on the
information you entered in Matrix.xlsx; and arrange the rectangles around the circle.
7. Use the Connector tool to draw a line between each dimension rectangle and the fact table circle.
8. Use the Text tool to list the attributes and hierarchies supported by each dimension.
9. For each remaining business process in matrix.xlsx, click the Insert Page icon at the bottom of the
drawing area, and repeat steps 2 to 8 to create a dimensional model of the business process.
10. Save the Visio document as Initial Sun Diagram.vsdx in the D:\Labfiles\Lab03A\Starter folder.
2. Add an Entity shape for each table you want to use to implement the highest priority dimension
model.
3. In each table, add an Attribute for each column you want to define in the table.
4. Add a Relationship connector for each relationship you want to define between the tables.
5. Repeat steps 2 to 4 to add the tables you want to define for the remaining business processes.
7. Close Visio.
Results: At the end of this exercise, you will have a sun diagram showing the facts, measures,
dimensions, attributes, and hierarchies you have identified, and a database schema diagram showing
your design for dimension and fact tables.
MCT USE ONLY. STUDENT USE PROHIBITED
L3-4 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
5. Click Yes when prompted to confirm that you want to run the command file, and then wait for the
script to finish.
2. Use Windows Explorer to view the available logical drives (drives E, F, G, H, I, J, K, L, M, and N on MIA-
SQLBI).
3. Review your database schema design from the previous lab. If you did not complete the previous lab,
use Visio to review DW Schema.vsdx in the D:\Labfiles\Lab03B\Starter folder.
4. In the D:\Labfiles\Lab03B\Starter folder, double-click AWDataWarehouse.docx to open it in Microsoft
Word.
5. In the table under the heading Storage, document your planned usage for each logical drive. Your
plan should include:
o Data warehouse filegroups for system tables, dimension tables, and fact tables.
o Staging tables.
o Log files.
o TempDB.
o Backup files.
For more information, see the “Considerations for Database Files” topic in the “Designing a Data
Warehouse Physical Implementation” lesson.
Results: At the end of this exercise, you should have a document that contains a table describing your
planned usage for each logical volume of the data warehouse server.
MCT USE ONLY. STUDENT USE PROHIBITED
L3-5
2. If you want, you can create a test database in the localhost instance of SQL Server and use it to
experiment with partitioned table designs.
Note: SQL Server Books Online is installed locally on the virtual machine. You can use this to review
documentation about partitioned tables and indexes.
3. In the AWDataWarehouse.docx document you edited in the previous exercise, under the heading
Partitioning, type a description of your proposed use of partitioning in the data warehouse. You
should include:
2. If you want, you can create a test database in the localhost instance of SQL Server and use it to
experiment with indexes.
Note: SQL Server Books Online is installed locally on the virtual machine. You can use this to review
documentation about indexes.
3. In the AWDataWarehouse.docx document you edited in the previous exercise, under the heading
Indexes, type a description of your proposed use of indexing in the data warehouse. You should
include:
o The tables (if any) that will be indexed.
2. If you want, you can create a test database in the localhost instance of SQL Server and use it to
experiment with compression.
Note: SQL Server Books Online is installed locally on the virtual machine. You can use this to review
documentation about compression.
3. In the AWDataWarehouse.docx document you edited in the previous exercise, under the heading
Compression, type a description of your proposed use of compression in the data warehouse. You
should include:
2. If you want, you can create a test database in the localhost instance of SQL Server and use it to
experiment with views.
Note: SQL Server Books Online is installed locally on the virtual machine. You can use this to review
documentation about views.
3. In the AWDataWarehouse.docx document you edited in the previous exercise, under the heading
Views, type a description of your proposed use of compression in the data warehouse. You should
include:
Results: At the end of this exercise, you will have a document that contains information about your
plans for partitions, indexes, compression, and views in the data warehouse.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-1
3. Click the Reseller Sales page and view the tables in the diagram. Note that the diagram indicates the
columns in the dimension and fact tables, and the slowly changing dimension (SCD) type for historical
dimension attributes.
4. Click the Internet Sales page and view the tables it contains.
6. Start SQL Server Management Studio, and when prompted, connect to the MIA-SQLDW instance of
the database engine by using Windows authentication.
7. In Object Explorer, expand Databases, expand AWDataWarehouse, and then expand Tables.
8. Expand dbo.DimCustomer, and then expand Columns. Note the columns in this table, and their
data types.
o dbo.DimDate
o dbo.DimProduct
o dbo.DimPromotion
o dbo.DimReseller
o dbo.DimSalesperson
o dbo.DimSalesTerritory
o dbo.FactInternetSales
o dbo.FactResellerSales
10. Keep SQL Server Management Studio open for the next task.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-2 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
o dbo.SalesOrderHeader
o dbo.SalesOrderDetail
o dbo.Customer
o dbo.StateOrProvince
o dbo.Country
These tables provide the source data for the following tables in the data warehouse:
o dbo.FactInternetSales
o dbo.DimCustomer
Note: Total product cost for a sales order is calculated by multiplying the unit cost for each order line
item by the ordered quantity. Similarly, a sales amount is calculated by multiplying the unit price by the
quantity.
2. In Object Explorer, expand the ResellerSales database, expand Tables, and then expand the
following tables and their Columns folders:
o dbo.SalesOrderHeader
o dbo.SalesOrderDetail
o dbo.Reseller
o dbo.BusinessType
o dbo.SalesEmployee
o dbo.SalesTerritory
o dbo.SalesRegion
o dbo.StateOrProvince
o dbo.Country
These tables provide the source data for the following tables in the data warehouse:
o dbo.FactResellerSales
o dbo.DimReseller
o dbo.DimSalesperson
o dbo.DimSalesTerritory
Note: Total cost and sales amount for reseller orders are calculated the same way as for Internet orders.
The sales territory for a sales order is determined by the sales territory where the reseller placing the order
is located, not by the sales territory assigned to the salesperson. Sales territories are often reassigned
between salespeople, but resellers stay within a single sales territory.
3. In Object Explorer, expand the Marketing database, expand Tables, and then expand the
dbo.Promotions table and its Columns folder. This table provides the source data for the
DimPromotion table in the data warehouse.
Note: The MarketingPromotion column in the SalesOrderHeader table in the InternetSales database
contains the PromotionID value from this table when an order is placed in response to a promotion.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-3
When no promotion is associated with the order, the MarketingPromotion column contains a NULL
value.
4. In Object Explorer, expand the ProductsMDS database, expand Views, and then expand the
following views and their Columns folders.
o mdm.Product
o mdm.ProductSubcategory
o mdm.ProductCategory
These views provide the source data for the DimProduct table in the data warehouse.
Note: This database represents a master data hub for the product data. This data is replicated to the
InternetSales and ProductSales databases, but the ProductsMDS database contains the master version
of the data.
Results: At the end of this exercise, you will have examined the data sources for the ETL process.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-4 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
3. Click the DimCustomer page, and examine the data flow for the DimCustomer table, noting the
following details:
o The data flow is shown from the Customer table in the InternetSales database to the
DimCustomer table (which is in the AWDataWarehouse database).
o The steps that need to be performed during the data flow are documented next to the data flow.
o Data from the StateOrProvince and Country tables is added to the data flow during lookup
steps.
o The details of the SCD columns are shown next to the relevant steps.
4. Click the DimProduct, DimPromotion, and FactInternetSales pages, and then review the diagrams
they contain.
2. Right-click the page tab at the bottom of the drawing area, and then click Rename. Rename the page
to DimReseller.
3. On the ribbon, on the HOME tab, use the Rectangle, Text, and Connector tools to create a high-
level data flow diagram for the DimReseller table. The diagram should include:
o Source tables.
o Steps that need to be performed during the data flow.
o SCD attributes.
4. Repeat the previous steps to create a new page named FactResellerSales that contains a diagram for
the FactResellerSales data flow.
5. Close Visio.
2. On the DimCustomer worksheet, scroll to the right to view the Data Warehouse section of the map,
and note that it contains the columns in the DimCustomer table. Each row documents a data flow
from a source column to a column in the DimCustomer table.
3. Scroll back to the left, and note that the Source section of the worksheet contains details of the
source fields that are extracted from tables in the InternetSales database.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-5
4. Examine the Landing Zone section of the worksheet, and note that it contains details of the tables
that the source data is initially extracted to, together with any validation rules or transformations that
are applied during the extraction.
5. Examine the Staging section of the worksheet, and note that it contains details of the staging tables
that are created from the extracted data in the landing zone, together with any validation rules or
transformations that must be applied to the data.
6. Click the FactInternetSales worksheet, and note that it documents the data flow for each column in
the FactInternetSales table.
7. Keep Excel open.
2. Complete the source to target map for the FactResellerSales table. You should design a data flow in
which the source data is initially extracted into appropriately named landing zone tables and then
transformed and loaded into staging tables before being loaded into the data warehouse.
Note: A completed map is provided in Source to Target Mappings.xlsx in the D:\Labfiles\Lab04\Solution
folder.
Results: At the end of this exercise, you will have a Visio document that contains high-level data flow
diagrams and an Excel workbook that contains detailed source-to-target documentation.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-6 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
2. Click Execute.
o The partition scheme and partition function used to partition the FactInternetSales table.
o The partitions in the table and the filegroups on which they are stored.
o The start and end key values for each partition.
4. Make a note of the details for the last partition in the table (which should currently contain no rows).
5. Keep SQL Server Management Studio open.
2. Note that the Staging database includes the following tables (and some others):
o lz.InternetSalesOrderDetails. A landing zone table that contains data extracted from the
SalesOrderDetails table in the InternetSales database.
o lz.InternetSalesOrderHeader. A landing zone table that contains data extracted from the
SalesOrderHeader table in the InternetSales database.
o stg.FactInternetSales. A staging table that contains transformed data from the landing zone
tables that is ready to be loaded into the FactInternetSales table.
3. Right-click stg.FactInternetSales, and then click Select Top 1000 Rows. In the script that is
generated, delete the TOP 1000 clause, click Execute, and then note the number of rows returned by
the query (displayed at the bottom right).
2. In Solution Explorer, double-click the LoadFactInternetSales.dtsx package, and then click the
Control Flow tab if it is not already selected.
3. If the Variables pane is not visible, right-click the control flow design surface, and then click
Variables. Note that the package contains the following variables:
o Filegroup
o LastBoundary
o NextBoundary
o PartitionNumber
o SQL_AddConstraintAndIndex
MCT USE ONLY. STUDENT USE PROHIBITED
L4-7
o SQL_AddPartition
o SQL_CreateLoadTable
o SQL_SwitchPartition
4. Double-click Get Partition Info to view its editor, review the settings for the task, and then click
Cancel.
5. Repeat the previous step for the following tasks:
8. Double-click Transact-SQL to Add Constraint and Index to view its editor, review the settings for
the task, and then click Cancel.
9. Repeat the previous step for the following tasks:
10. On the Debug menu, click Start Debugging. Then, after execution completes, on the Debug menu,
click Stop Debugging and minimize Visual Studio.
11. Maximize SQL Server Management Studio and click the query editor for the View FactInternetSales
Partitions.sql script you ran earlier. Click Execute and review the results, noting that the staged rows
have been loaded into what was the last partition, and that a new empty partition has been added to
the end of the table.
Note: In this task, you will create an SSIS package to load some staged data. The process is
complex and contains many steps, and the package you create does not include error handling.
If you test your package and it fails, you can re-run the Setup.cmd batch file in
D:\Labfiles\Lab04\Starter to reset the databases to the starting point before trying to resolve
the problem.
1. In SQL Server Management Studio, open View FactResellerSales Partitions.sql script from the
D:\Labfiles\Lab04\Starter folder. Click Execute, and then review the information returned about the
partitions in the FactResellerSales table.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-8 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
2. In Object Explorer, right-click the stg.FactResellerSales table in the Staging database, and then click
Select Top 1000 Rows. In the script that is generated, delete the TOP 1000 clause, click Execute,
and then note the number of rows returned by the query (displayed at the bottom right).
3. In Object Explorer, expand the AWDataWarehouse database and its Tables folder. Right-click the
dbo.FactResellerSales table, point to Script Table as, point to CREATE To, and then click New
Query Editor Window.
4. In the resulting Transact-SQL code, change the table name in the first line of the CREATE TABLE
statement to [dbo].[LoadResellerSales], and then click Execute. After execution completes, right-
click the Tables folder for the AWDataWarehouse database, and then click Refresh to verify that
the dbo.LoadResellerSales table has been created.
5. Minimize SQL Server Management Studio, and maximize Visual Studio. If the SSIS Toolbox is not
visible, on the SSIS menu, click SSIS Toolbox.
6. In Solution Explorer, right-click SSIS Packages, and then click New SSIS Package. Right-click
Package1.dtsx, click Rename, and then type Load FactResellerSales.dtsx.
7. Right-click the control flow design surface, click Variables, and then click the Add Variable button in
the Variables pane to add the following variables.
Data
Name Value
Type
PartitionNumber String 0
8. In the SSIS Toolbox, drag Execute SQL Task to the control flow surface. Double-click the new task,
and in the Execute SQL Task Editor dialog box, set the following properties and click OK.
9. In the SSIS Toolbox, drag Execute SQL Task to the control flow surface, click Get Partition Info, and
then connect the green precedence constraint from Get Partition Info to the new task. Double-click
the new task, and in the Execute SQL Task Editor dialog box, set the following properties, and then
click OK.
in the Value
column.
10. In the SSIS Toolbox, drag Expression Task to the control flow surface, click Get Next Boundary, and
then connect the green precedence constraint from Get Next Boundary to the new task.
11. Right-click the new task, click Rename, and then rename it to Transact-SQL to Add Filegroup.
Double-click the new task, and in the Expression Builder dialog box, enter the following expression,
and then click OK.
12. In the SSIS Toolbox, drag Execute SQL Task to the control flow surface, click Transact-SQL to Add
Filegroup, and then connect the green precedence constraint from Transact-SQL to Add Filegroup
to the new task. Double-click the new task, and in the Execute SQL Task Editor dialog box, set the
following properties, and then click OK.
13. In the SSIS Toolbox, drag Expression Task to the control flow surface, click Add next filegroup, and
then connect the green precedence constraint from Add next filegroup to the new task.
14. Right-click the new task, click Rename, and then rename it to Transact-SQL to Create Load Table.
Double-click the new task, and in the Expression Builder dialog box, enter the following expression,
then and click OK.
15. In the SSIS Toolbox, drag Execute SQL Task to the control flow surface, click Transact-SQL to Create
Load Table, and then connect the green precedence constraint from Transact-SQL to Create Load
Table to the new task. Double-click the new task, and in the Execute SQL Task Editor dialog box, set
the following properties, and then click OK.
16. In the SSIS Toolbox, drag Data Flow Task to the control flow surface, click Create Load Table, and
then connect the green precedence constraint from Create Load Table to the new task.
17. Right-click the new task, click Rename, and then rename it to Load Staged Data. Double-click the
new task to view its data flow.
18. In the SSIS Toolbox, drag Source Assistant to the data flow surface, and in the Source Assistant -
Add New Source dialog box, select the SQL Server source type and the MIA-SQLDW.Staging
connection manager, and then click OK.
19. Right-click the new task, click Rename, and then rename it to Staged Reseller Sales. Double-click
the new task, and in the OLE DB Source Editor dialog box:
c. Click Browse, browse to the D:\Labfiles\Lab04\Starter\Code Snippets folder, change the file type
to All Files (*.*), select Staged Reseller Sales.txt, and then click Open.
d. Click OK.
20. In the SSIS Toolbox, drag Lookup to the data flow surface, click Staged Reseller Sales, and then
connect the blue data flow output from Staged Reseller Sales to the new lookup transformation.
21. Right-click the new lookup transformation, click Rename, and then rename it to Lookup
OrderDateKey. Double-click the lookup transformation, and in the Lookup Transformation Editor
dialog box, set the following properties, and then click OK.
22. In the SSIS Toolbox, drag Lookup to the data flow surface, click Lookup OrderDateKey, and then
connect the blue data flow output from Lookup OrderDateKey to the new lookup transformation. In
the Input Output Selection dialog box, in the Output drop-down list, select Lookup Match
Output, and then click OK.
23. Right-click the new lookup transformation, click Rename, and then rename it to Lookup
ProductKey. Double-click the lookup transformation, and in the Lookup Transformation Editor
dialog box, set the following properties, and then click OK.
24. In the SSIS Toolbox, drag Lookup to the data flow surface, click Lookup ProductKey, and then
connect the blue data flow output from Lookup ProductKey to the new lookup transformation. In
the Input Output Selection dialog box, in the Output drop-down list, select Lookup Match
Output, and then click OK.
25. Right-click the new lookup transformation, click Rename, and then rename it to Lookup
ShipDateKey. Double-click the lookup transformation, and in the Lookup Transformation Editor
dialog box, set the following properties, and then click OK.
manager.
26. In the SSIS Toolbox, drag Lookup to the data flow surface, click Lookup ShipDateKey, and then
connect the blue data flow output from Lookup ShipDateKey to the new lookup transformation. In
the Input Output Selection dialog box, in the Output drop-down list, select Lookup Match
Output, and then click OK.
27. Right-click the new lookup transformation, click Rename, and then rename it to Lookup
ResellerKey. Double-click the lookup transformation, and in the Lookup Transformation Editor
dialog box, set the following properties, and then click OK.
28. In the SSIS Toolbox, drag Lookup to the data flow surface, click Lookup ResellerKey, and then
connect the blue data flow output from Lookup ResellerKey to the new lookup transformation. In
the Input Output Selection dialog box, in the Output drop-down list, select Lookup Match
Output, and then click OK.
29. Right-click the new lookup transformation, click Rename, and then rename it to Lookup
SalespersonKey. Double-click the lookup transformation, and in the Lookup Transformation Editor
dialog box, set the following properties, and then click OK.
MCT USE ONLY. STUDENT USE PROHIBITED
L4-14 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
30. In the SSIS Toolbox, drag Lookup to the data flow surface, click Lookup SalespersonKey, and then
connect the blue data flow output from Lookup SalespersonKey to the new lookup transformation.
In the Input Output Selection dialog box, in the Output drop-down list, select Lookup Match
Output, and then click OK.
31. Right-click the new lookup transformation, click Rename, and then rename it to Lookup
SalesTerritoryKey. Double-click the lookup transformation, and in the Lookup Transformation
Editor dialog box, set the following properties, and then click OK.
32. In the SSIS Toolbox, drag Destination Assistant to the data flow surface, and in the Destination
Assistant - Add New Destination dialog box, select the SQL Server source type and the MIA-
SQLDW.AWDataWarehouse connection manager, and then click OK.
33. Click Lookup SalesTerritoryKey, and connect the blue data flow output from Lookup
SalesTerritoryKey to the new OLE DB destination. In the Input Output Selection dialog box, in the
Output drop-down list, select Lookup Match Output, and then click OK.
34. Right-click the new OLE DB destination, click Rename, and then rename it to Load Table. Double-
click Load Table, and in the OLE DB Destination Editor dialog box:
a. Ensure that the MIA-SQLDW.AWDataWarehouse connection manager is selected.
b. Ensure that the data access mode is Table or view – fast load.
c. In the Name of the table or the view drop-down list, select [dbo].[LoadResellerSales].
d. Select Keep nulls.
e. Click the Mappings tab, and ensure that all destination columns are mapped to identically-
named input columns.
f. Click OK.
35. Click the Control Flow tab, and then click the Load Staged Data task to select it. Press F4, and then
set the DelayValidation property of the Load Staged Data task to True.
36. Maximize SQL Server Management Studio, and in Object Explorer, right-click the
dbo.LoadResellerSales table you created earlier, and then click Delete. In the Delete Object dialog
box, ensure that only the LoadResellerSales table is listed, and then click OK. Return to Visual Studio.
37. In the SSIS Toolbox, drag Expression Task to the control flow surface, click Load Staged Data, and
then connect the green precedence constraint from Load Staged Data to the new task.
38. Right-click the new task, click Rename, and then rename it to Transact-SQL to Add Constraint and
Index. Double-click the new task, and in the Expression Builder dialog box, enter the following
expression, and then click OK.
@[User::SQL_AddConstraintAndIndex]=
REPLACE(REPLACE(@[User::SQL_AddConstraintAndIndex], "LastBoundary",
@[User::LastBoundary]), "NextBoundary", @[User::NextBoundary])
39. In the SSIS Toolbox, drag Execute SQL Task to the control flow surface, click Transact-SQL to Add
Constraint and Index, and then connect the green precedence constraint from Transact-SQL to
Add Constraint and Index to the new task. Double-click the new task, and in the Execute SQL Task
Editor dialog box, set the following properties, and then click OK.
40. In the SSIS Toolbox, drag Expression Task to the control flow surface, click Add Constraint and
Index, and then connect the green precedence constraint from Add Constraint and Index to the
new task.
41. Right-click the new task, click Rename, and then rename it to Transact-SQL to Switch Partition.
Double-click the new task, and in the Expression Builder dialog box, enter the following expression,
and then click OK.
@[User::SQL_SwitchPartition]= REPLACE(@[User::SQL_SwitchPartition],
"partitionnumber", @[User::PartitionNumber])
42. In the SSIS Toolbox, drag Execute SQL Task to the control flow surface, click Transact-SQL to
Switch Partition, and then connect the green precedence constraint from Transact-SQL to Switch
Partition to the new task. Double-click the new task, and in the Execute SQL Task Editor dialog box,
set the following properties, and then click OK.
43. In the SSIS Toolbox, drag Execute SQL Task to the control flow surface, click Switch Partition, and
then connect the green precedence constraint from Switch Partition to the new task. Double-click
the new task, and in the Execute SQL Task Editor dialog box, set the following properties, and then
click OK.
44. On the Debug menu, click Start Debugging. Then, after execution completes, on the Debug menu,
click Stop Debugging and close Visual Studio. Save your work if prompted.
45. Maximize SQL Server Management Studio, and then click the query editor for the View
FactResellerSales Partitions.sql script you ran earlier. Click Execute, and then review the results,
MCT USE ONLY. STUDENT USE PROHIBITED
L4-17
noting that the staged rows have been loaded into what was the last partition, and that a new empty
partition has been added to the end of the table.
46. Close SQL Server Management Studio without saving any changes.
Results: At the end of this exercise, you will have an SSIS package that loads data into the
FactResellerSales table by using the partition switching technique.
MCT USE ONLY. STUDENT USE PROHIBITED
MCT USE ONLY. STUDENT USE PROHIBITED
L5-1
3. In the New Project dialog box, click Analysis Services Multidimensional and Data Mining Project,
in the Name text box, type AWSalesMD, in the Location box browse to D:\Labfiles\Lab05A\Starter,
and then click OK.
4. In the Connection Manager dialog box, in the Server name drop-down list, type MIA-SQLDW; in
the Log on to the server area, ensure that Use Windows Authentication is selected; in the Select
or enter a database name list, click AWDataWarehouse, and then click OK.
5. On the Select how to define the connection page, ensure that the MIA-
SQLDW.AWDataWarehouse data connection is selected, and then click Next.
6. On the Impersonation information page, select Use a specific Windows user name and
password, in the User name text box, type ADVENTUREWORKS\ServiceAcct, in the password text
box type Pa$$w0rd, and then click Next.
7. On the Completing the Wizard page, set the data source name to AW Data Warehouse, and then
click Finish.
2. On the Welcome to the Data Source View Wizard page, click Next.
3. On the Select a Data Source page, verify that the AW Data Warehouse data source is selected, and
then click Next.
5. On the Select Tables and Views page, in the Available objects list, click Customer (dw_views), and
then hold down the Ctrl key and click the following objects:
o Date (dw_views).
o InternetSales (dw_views).
o Product (dw_views).
o Reseller (dw_views).
o ResellerSales (dw_views).
o Salesperson (dw_views).
o SalesTerritory (dw_views).
6. On the Select Tables and Views page, click the > button to add the selected tables to the Included
objects list, and then click Next.
7. On the Completing the Wizard page, set the name of the data source view to AW Data
Warehouse, and then click Finish.
Dimension
Fact Table Foreign Key Primary Key
Table
3. On the Select Creation Method page, verify that Use existing tables is selected, and then click
Next.
4. On the Select Measure Group Tables page, verify that the AW Data Warehouse data source view is
selected, click the check boxes next to InternetSales and ResellerSales, and then click Next.
5. On the Select Measures page, clear all of the check boxes other than the ones for the following
measures, and then click Next:
o Internet Sales:
• Order Quantity.
• Unit Price.
• Product Unit Cost.
• Total Product Cost.
• Sales Amount.
o Reseller Sales:
• Order Quantity - Reseller Sales.
• Unit Price - Reseller Sales.
• Product Unit Cost - Reseller Sales.
• Total Product Cost - Reseller Sales.
• Sales Amount - Reseller Sales.
6. On the Select New Dimensions page, ensure that all check boxes are selected, and then click Next.
7. On the Completing the Wizard page, in the Cube name box, type Sales, and then click Finish.
2. Right-click Order Quantity, click Rename, and then rename the measure to Internet Quantity.
3. Repeat the previous step to rename the following measures in the Internet Sales measure group:
4. Expand the Reseller Sales measure group and rename the following measures:
2. In the Data Source View pane, in the Date table, right-click the DateAltKey column and click New
Attribute from Column.
3. Repeat the previous step for the following columns (note that spaces are automatically added to the
attribute names to make them more readable):
o MonthName
o CalendarYear
o FiscalQuarter
o FiscalYear
4. In the Attributes pane, right-click Date Alt Key and click Rename. Then rename the attribute to
Date.
5. In the Attributes pane, click DateKey and press F4. Then in the Properties pane, set the
AttributeHierarchyVisible property to False.
7. Repeat steps 1 to 5 to create attributes in the following dimensions and set the
AttributeHierarchyVisible property of the key attribute in each dimension to False:
2. Click Deployment, and verify that the Target properties have the following values, and then click
OK:
o Server: localhost
o Database AWSalesMD
5. On the Cube menu, click Analyze in Excel. If a Microsoft Excel Security Notice dialog box is
displayed, click Enable.
6. In Excel, in the PivotTable Fields pane, under Internet Sales, select Internet Revenue, and under
Reseller Sales, select Reseller Revenue.
8. Verify that the PivotTable in Excel shows the Internet and reseller sales revenue for four product
categories, and then close Excel without saving the workbook.
Results: At the end of this exercise, you will have a multidimensional data model named AWSalesMD.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-6 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
3. In the New Project dialog box, click Analysis Services Tabular Project, in the Name text box, type
AWSalesTab, in the Location box browse to D:\Labfiles\Lab05A\Starter, and then click OK.
4. If the Tabular model designer dialog box is displayed, in the Workspace server list, select
localhost\SQL2, and in the Compatibility level box, select SQL Server 2012 SP1 (1103), and then
click OK.
4. On the Connect to a Microsoft SQL Server Database page, in the Server name box, type MIA-SQLDW,
ensure that Use Windows Authentication is selected, and in Database name list, select
AWDataWarehouse, and then click Next.
6. On the Choose How to Import the Data page, ensure that Select from a list of tables and views to
choose the data to import is selected, and then click Next.
7. On the Select Tables and Views page, select the following source tables, specifying the friendly
name indicated in parentheses. Make sure you select the views in the dw_views schema, and not the
similarly named dimension and fact tables in the dbo schema.
o Customer (Customer).
o Date (Date).
o Product (Product).
o Reseller (Reseller).
o Salesperson (Salesperson).
After you select all of the views and correct the friendly names where necessary, click Finish.
8. After the data is imported, click Close.
Dimension
Fact Table Foreign Key Primary Key
Table
2. If the measure grid is not visible under the table data, on the Table menu, click Show Measure Grid.
3. In the measure grid, click the cell under the OrderQuantity column. On the Column menu, point to
AutoSum, and then click Sum.
4. In the formula bar, edit the DAX formula to change the measure name to Internet Quantity, as
shown in the following code.
Internet Quantity:=SUM([OrderQuantity])
5. Repeat the previous two steps to create Sum aggregations for the following measures.
6. Click the Reseller Sales tab, and create the following measures using the Sum aggregation.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-8 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
2. While holding the Ctrl key, click all columns in the Customer table except for the following ones:
o Name.
o City.
o StateOrProvince.
o Country.
3. Right-click any of the selected columns, and then click Hide from Client Tools. The preceding
columns, which you did not select, will remain visible as dimension attributes.
4. Right-click the Name column, click Rename, and then rename the column to Customer.
5. Right-click the StateOrProvince column, click Rename, and then rename the column to State Or
Province.
6. Repeat steps 2 to 5 as necessary to configure the columns in the following table.
2. On the Build menu, click Deploy AWSalesTab. After deployment completes, click Close.
3. On the Model menu, click Analyze in Excel. When prompted, ensure that Current Windows User
and the (Default) perspective are selected, and then click OK.
4. In Excel, in the PivotTable Fields pane, under Internet Sales, select Internet Revenue, and under
Reseller Sales, select Reseller Revenue.
6. Verify that the PivotTable in Excel shows the Internet and reseller sales revenue for four product
categories, and then close Excel without saving the workbook.
Results: At the end of this exercise, you will have a tabular data model named AWSalesTab.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-10 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
5. Click Yes when prompted to confirm that you want to run the command file, and then wait for the
script to finish.
2. In Solution Explorer, double-click Customer.dim, and note that a hierarchy named Customers By
Geography has been created in this dimension.
3. Click the Attribute Relationships tab, and note that relationships have been defined between the
attributes in the hierarchy.
4. Return to the Dimension Structure tab, and then click the City attribute. Press F4 to view the
following properties of the selected attribute:
o AttributeHierarchyVisible. This has been set to False, so the City attribute can be browsed only
through the Customers By Geography hierarchy.
o KeyColumns. The City member of the hierarchy is uniquely defined by a combination of the
City, StateOrProvince, and Country columns.
o NameColumn. Because the attribute has multiple key columns, the NameColumn property must
be set to specify which column gives the attribute its name.
o ValueColumn. Similar to the preceding attribute, the ValueColumn property specifies which
column contains the attribute value.
5. View the properties of the other attributes in the Customer dimension. Note that each attribute that
is included in the Customers By Geography hierarchy is uniquely identified by a combination of
multiple key columns, and that none of the attributes are visible―the only way to browse the
dimension is through the Customers By Geography hierarchy.
If you are prompted for credentials, use the ADVENTUREWORKS\ServiceAcct user account with the
password Pa$$w0rd. If you are prompted to replace an existing database with the same name, click
Yes.
7. In the Process Dimension – Customer dialog box, click Run, and in the Process Progress dialog
box, after processing completes, click Close. Then, in the Process Dimension – Customer dialog box,
click Close
MCT USE ONLY. STUDENT USE PROHIBITED
L5-11
8. In the Dimension Designer, click the Browser tab and if necessary, click the Reconnect button. In the
Hierarchy list, ensure that Customers By Geography is selected, and expand the All member to
browse the hierarchy.
9. Examine the Reseller.dim dimension (which has two hierarchies) and the Sales Territory.dim
dimension, and note that the attributes in these dimensions have been similarly configured.
3. Drag the Subcategory attribute and drop it under the Category attribute in the hierarchy that has
been created.
4. Drag the Product attribute, and drop it under the Subcategory attribute in the hierarchy.
5. Right-click Hierarchy, click Rename, and then rename the hierarchy to Products By Category.
6. Click the Attribute Relationships tab, and in the pane at the top, right-click Product, and then click
New Attribute Relationship.
7. In the Create Attribute Relationship dialog box, under Source Attribute, ensure that Product is
selected, under Related Attribute, select Subcategory, and in the Relationship type list, select Flexible
(may change over time), and then click OK.
11. On the Dimension Structure tab, click the Product attribute, and then press F4.
13. In the Properties pane, click the ellipsis button for the KeyColumns property, and in the list of
Available Columns, select ProductKey, click the > button, use the up-arrow button to reorder the
columns so that ProductKey is listed at the top, and then click OK.
14. In the Properties pane, click the ellipsis button for the NameColumn property, select ProductName,
and then click OK.
15. In the Properties pane, click the ellipsis button for the ValueColumn property, select ProductName,
and then click OK.
16. Repeat the previous five steps to set the properties of the following attributes.
If you are prompted for credentials, use the ADVENTUREWORKS\ServiceAcct user account with the
password Pa$$w0rd.
If you are prompted to replace an existing database with the same name, click Yes.
19. In the Process Dimension – Product dialog box, click Run, and in the Process Progress dialog box,
after processing completes, click Close. Then, in the Process Dimension – Product dialog box, click
Close.
20. In the Dimension Designer, click the Browser tab and if necessary, click the Reconnect button. Then
in the Hierarchy list, ensure that Products By Category is selected, and then expand the All member
to browse the hierarchy.
2. On the Dimension Usage tab, note that the Date dimension is listed twice – once to represent the
Order Date, and once to represent the Ship Date. Both versions of the dimension are related to the
Internet Sales and Reseller Sales measure groups by the Date Key column.
6. Drag the Date attribute, and drop it under the Month Name attribute in the hierarchy.
7. Right-click Hierarchy, click Rename, and then rename the hierarchy to Calendar Date.
9. Drag the Fiscal Quarter attribute and drop it under the Fiscal Year attribute in the hierarchy that
was created.
10. Drag the Month Name attribute and drop it under the Fiscal Quarter attribute in the hierarchy that
was created.
11. Drag the Date attribute, and drop it under the Month Name attribute in the hierarchy.
12. Right-click Hierarchy, click Rename, and then rename the hierarchy to Fiscal Date.
13. On the Attribute Relationships tab, use the same technique you used in the previous exercise to
create the following attribute relationships.
14. Change the relationship type between DateKey and Date to Rigid.
15. On the Dimension Structure tab, set the following attribute properties.
16. Set the OrderBy property of the Month Name attribute to Key.
17. In the Attributes pane, click the Date dimension (at the root of the attributes). In the Properties
pane, set the Type property to Time.
Note If you are prompted for credentials, use the ADVENTUREWORKS\ServiceAcct user account
with the password Pa$$w0rd. If you are prompted to replace an existing database with the same
name, click Yes.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-14 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
21. In the Process Dimension – Date dialog box, click Run, and in the Process Progress dialog box,
after processing completes, click Close. Then, in the Process Dimension – Date dialog box, click
Close.
22. In the Dimension Designer, click the Browser tab and if necessary, click the Reconnect button. In the
Hierarchy list, ensure that Calendar Date is selected, and expand the All member to browse the
hierarchy.
23. In the Hierarchy list, ensure that Fiscal Date is selected, and then expand the All member to browse
the hierarchy.
2. In the Data Source View pane, right-click the ParentEmployeeKey column, and then click New
Attribute from Column.
3. In the Attributes pane, click the Parent Employee Key attribute. In the Properties pane, set the
following properties:
o Name: Salesperson.
o Usage: Parent.
4. In the Attributes pane, click the Employee Key attribute. In the Properties pane, set the
AttributeHierarchyVisible property to False.
If you are prompted for credentials, use the ADVENTUREWORKS\ServiceAcct user account with the
password Pa$$w0rd.
If you are prompted to replace an existing database with the same name, click Yes.
6. In the Process Dimension – Salesperson dialog box, click Run, and in the Process Progress dialog
box, after processing completes, click Close. In the Process Dimension – Salesperson dialog box,
click Close.
7. In the Dimension Designer, click the Browser tab and if necessary, click the Reconnect button. In the
Hierarchy list, ensure that Salesperson is selected, and then expand the All member to browse the
hierarchy.
If you are prompted for credentials, use the ADVENTUREWORKS\ServiceAcct user account with the
password Pa$$w0rd. If you are prompted to replace an existing database with the same name, click
Yes.
2. In Solution Explorer, double-click Sales.cube and then in the Cube Designer, click the Browser tab.
3. On the Cube menu, click Analyze in Excel. If a Microsoft Excel Security Notice dialog box is
displayed, click Enable.
4. In Excel, in the PivotTable Fields pane, under Reseller Sales, select Reseller Revenue.
6. In the PivotTable Fields pane, note that both Order Date and Ship Date dimensions are listed, and
then under Order Date, select Order Date.Calendar Date.
8. Browse the PivotTable in Excel and verify that the hierarchies behave as expected. Close Excel without
saving the workbook.
Results: At the end of this exercise, you will have a multidimensional model that includes balanced
hierarchies, a role-playing dimension, and a parent-child dimension.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-16 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
3. On the Model menu, point to Process and click Process All. If the Impersonate Credentials dialog
box is displayed, enter the user name ADVENTUREWORKS\ServiceAcct and the password
Pa$$w0rd, and click OK. Then, in the Data Processing dialog box, when processing is complete, click
Close.
4. On the Model menu, point to Model View, and then click Diagram View.
5. In the Customer table, note that a hierarchy named Customers By Geography has been created.
6. Note that the Customers By Geography hierarchy contains the Country, State Or Province, City,
and Customer attributes.
7. Note that the Country, State Or Province, City, and Customer attributes in the table (not in the
hierarchy) have been hidden from client tools.
8. Note that hierarchies have also been created in the Reseller and Sales Territory tables.
2. Click the Create Hierarchy button in the title bar of the maximized Product table.
3. After the new hierarchy is created, change its name to Products By Category.
4. Drag the Category, Subcategory, and Product attributes (in that order) to the Products By
Category hierarchy.
5. Click the Product attribute that is not in the hierarchy, and hold the Shift key and click the Category
attribute. Right-click the selected attributes, and then click Hide from Client Tools.
6. Click the Restore button in the title bar of the maximized Product table.
7. On the File menu, click Save All.
2. Repeat the previous step to delete the dotted line between the Internet Sales and Date tables.
3. Right-click the title bar of the Date table, click Rename, and then rename the table to Order Date.
5. In the Existing Connections dialog box, ensure that SqlServer MIA-SQLDW AWDataWarehouse is
selected, and then click Open. If you are prompted for impersonation credentials, enter the password
Pa$$w0rd, and then click OK.
6. In the Table Import Wizard dialog box, on the Choose how to Import the Data page, ensure that
Select from a list of tables and views to choose the data to import is selected, and then click Next.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-17
7. On the Select Tables and Views page, select the Date source table, which is in the dw_views
schema, and change the friendly name to Ship Date. Click Finish. After the data from the table is
imported, click Close.
8. In the new Ship Date table that has been added to the model, right-click each of the following
attributes, click Rename, and rename them as specified in the following list:
o DateAltKey: Date.
9. Drag the ShipDateKey attribute from the Reseller Sales table and drop it on the DateKey attribute
in the Ship Date table. Drag the ShipDateKey attribute from the Internet Sales table and drop it on
the DateKey attribute in the Ship Date table.
10. Click the Order Date table, and then click the Maximize button in its title bar.
11. Click the Create Hierarchy button in the title bar of the maximized Order Date table. When the new
hierarchy is created, change its name to Calendar Date.
12. Drag the Calendar Year, Month Name, and Date attributes (in that order) to the Calendar Date
hierarchy.
13. Repeat the previous two steps to create a second hierarchy named Fiscal Date that contains the
Fiscal Year, Fiscal Quarter, Month Name, and date attributes.
14. Right-click each column in the table (not the attributes in the hierarchy) that is not currently hidden
from client tools, and then click Hide from Client Tools.
15. Click the Restore button in the title bar of the maximized Order Date table.
16. Click the Maximize button in the title bar of the Ship Date table, and then repeat the previous 5
steps to create Calendar Date and Fiscal Date hierarchies in the Ship Date table.
17. On the Model menu, point to Model View, click Data View, and then click the Order Date tab.
18. On the Table menu, point to Date, and click Mark as Date Table. In the Mark as Date Table dialog
box, ensure that the Date column is selected in the list, and then click OK.
19. Click the Month Name column header, and on the Column menu, point to Sort, and then click Sort
by Column.
20. In the Sort by Column dialog box, in the Sort section, in the Column list, ensure that Month Name
is selected. Then, in the By section, in the Column list, select MonthNumber, and then click OK.
21. Repeat the previous 3 steps on the Ship Date tab to mark Ship Date as a date table and sort the
Month Name column by MonthNumber.
2. Scroll to the right, double-click Add Column, type Path, and press Enter.
3. With the new Path column selected, in the formula bar, enter the following DAX expression:
=PATH([EmployeeKey], [ParentEmployeeKey])
MCT USE ONLY. STUDENT USE PROHIBITED
L5-18 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
4. Scroll to the right, double-click Add Column, type Level1, and then press Enter.
5. With the new Level1 column selected, in the formula bar, enter the following DAX expression:
6. Scroll to the right, double-click Add Column, type Level2, and then press Enter.
7. With the new Level2 column selected, in the formula bar, enter the following DAX expression:
8. Scroll to the right, double-click Add Column, type Level3, and then press Enter.
9. With the new Level3 column selected, in the formula bar, enter the following DAX expression:
10. On the Model menu, point to Model View, and then click Diagram View. Click the Salesperson
table, and then click the Maximize button in its title bar.
11. Click the Create Hierarchy button in the title bar of the maximized Salesperson table. After the new
hierarchy is created, change its name to Salesperson.
12. Drag the Level1, Level2, and Level3 attributes (in that order) to the Salesperson hierarchy.
13. Click the EmployeeKey attribute, hold the Shift key and click the Level3 attribute immediately above
the Salesperson hierarchy. Then right-click the selected attributes and click Hide from Client Tools.
14. Click the Restore button in the title bar of the maximized Product table.
15. On the File menu, click Save All.
2. On the Model menu, click Analyze in Excel. When prompted, ensure that Current Windows User
and the (Default) perspective are selected, and click OK.
3. In Excel, in the PivotTable Fields pane, under Reseller Sales, select Reseller Revenue.
5. In the PivotTable Fields pane, note that both Order Date and Ship Date dimensions are listed, and
then under Order Date, select Calendar Date.
Results: At the end of this exercise, you will have a tabular model that includes balanced hierarchies, a
role-playing dimension, and a parent-child dimension.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-19
5. Click Yes when prompted to confirm that you want to run the command file, and then wait for the
script to finish.
2. In Solution Explorer, in the Data Source Views folder, double-click AW Data Warehouse.dsv.
3. Right-click the ResellerSales table, and then click New Named Calculation.
4. In the Create Named Calculation dialog box, in the Column name box, type Reseller Profit. In the
Expression box, type the following expression, and then click OK:
[SalesAmount] - [TotalProductCost]
5. Repeat the previous two steps to add a named calculation to the InternetSales table. Name the
calculation Internet Profit and use the same expression as for the Reseller Profit calculation.
7. In the Measures pane, right-click the Internet Sales measure group, and then click New Measure.
8. In the New Measure dialog box, ensure that the Usage is set to Sum, the Source table is set to
InternetSales, and select the Internet Profit source column. Click OK.
9. Repeat the previous two steps to add a measure for the Reseller Profit column in the ResellerSales
table to the Reseller Sales measure group.
• If you are prompted for credentials, use the ADVENTUREWORKS\ServiceAcct user account with the
password Pa$$w0rd.
• If you are prompted to replace an existing database with the same name, click Yes.
23. In the Process Cube – Sales dialog box, click Run, and in the Process Progress dialog box, after
processing completes, click Close. Then, in the Process Cube – Sales dialog box, click Close.
CASE
WHEN([Measures].[Gross Margin]) < 0.3 THEN -1
MCT USE ONLY. STUDENT USE PROHIBITED
L5-21
• If you are prompted to replace an existing database with the same name, click Yes.
10. In the Process Cube – Sales dialog box, click Run, and in the Process Progress dialog box, when
processing has succeeded, click Close. Then, in the Process Cube – Sales dialog box click Close.
4. If the Internet Sales measure group is not already expanded, expand it and clear the following check
boxes:
5. Clear the check box for the Reseller Sales measure group (this clears the check boxes for all of the
measures in this measure group).
o Reseller.
o Sales Territory.
o Salesperson.
7. Clear the check box for the Gross Margin KPI.
o Total Revenue.
o Total Cost.
o Total Profit.
o Gross Margin.
9. Create another new perspective named Reseller Sales, and clear the check boxes for all objects other
than those in the following list:
o The Reseller Quantity, Reseller Cost, Reseller Revenue, and Reseller Profit measures.
o The Order Date, Product, Ship Date, Reseller, Sales Territory, and Salesperson dimensions.
• If you are prompted for credentials, use the ADVENTUREWORKS\ServiceAcct user account with the
password Pa$$w0rd.
• If you are prompted to replace an existing database with the same name, click Yes.
2. In the Cube Designer, click the Browser tab, and then click Analyze in Excel on the Cube menu.
3. In the Analyze in Excel dialog box, ensure that Sales is selected, and then click OK. If a Microsoft
Excel Security Notice dialog box is displayed, click Enable.
4. In Excel, in the PivotTable Fields pane, under Values, expand Total Sales, and then select Total
Revenue, Total Cost, Total Profit, and Gross Margin.
5. In the PivotTable Fields pane, under Product, select Products By Category.
6. In the PivotTable Fields pane, under KPIs, expand Gross Margin, and then select Status.
7. In the PivotTable, expand Clothing and view the Gross Margin Status indicator for each product
subcategory. Close Excel without saving the workbook.
8. In the Cube Designer, on the Browser tab, on the Cube menu, click Analyze in Excel.
9. In the Analyze in Excel dialog box, select Internet Sales, and then click OK. If a Microsoft Excel
Security Notice dialog box is displayed, click Enable.
10. In Excel, in the PivotTable Fields pane, verify that only four measures from the Internet Sales
measure group, and the Customer, Order Date, Product, and Ship Date dimensions are available.
Results: At the end of this lab, you will have a multidimensional model that contains custom
calculations, a KPI, and perspectives.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-23
3. On the Model menu, point to Process and click Process All. If the Impersonate Credentials dialog
box is displayed, enter the user name ADVENTUREWORKS\ServiceAcct and the password
Pa$$w0rd, and click OK. Then, in the Data Processing dialog box, when processing is complete, click
Close.
4. In the model designer, click the Internet Sales tab.
5. Scroll to the right, double-click Add Column, type Profit, and then press Enter.
6. With the new Profit column selected, in the formula bar, enter the following DAX expression:
=[SalesAmount] - [TotalProductCost]
7. Right-click the Profit column header, and then click Hide from Client Tools.
8. In the measure grid, click the cell directly under the new Profit column. On the Column menu, point
to AutoSum, and then click Sum.
9. In the formula bar, edit the DAX formula to change the measure name to Internet Profit, as shown
in the following code:
Internet Profit:=SUM([Profit])
10. Click the Reseller Sales tab, and repeat steps 4 to 9 to create a hidden calculated Profit column and
an aggregated measure named Reseller Profit.
4. In the Visual Studio instance for the AWSalesTab project, click any table tab, and then click Paste on
the Edit menu.
5. In the Paste Preview dialog box, type Total Sales in the Table Name box, and then click OK.
7. In the Visual Studio instance for the AWSalesTab project, in the Total Sales table, in the first empty
cell in the measure grid, enter the following DAX expression:
8. With the Total Revenue measure selected, press F4. In the Properties pane, set the Format property
to Currency.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-24 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
9. In the empty cell under the one in which you just created the Total Revenue measure, enter the
following DAX expression:
10. With the Total Cost measure selected, press F4. In the Properties pane, set the Format property to
Currency.
11. In the empty cell under the one in which you just created the Total Cost measure, enter the following
DAX expression:
12. With the Total Profit measure selected, press F4. In the Properties pane, set the Format property to
Currency.
13. In the empty cell under the one in which you just created the Total Profit measure, enter the
following DAX expression:
14. With the Gross Margin measure selected, press F4. In the Properties pane, set the Format property
to Percentage.
2. In the Key Performance Indicator (KPI) dialog box, select Absolute value and type in 0.4. Then,
drag the first slider to the value 0.3 and drag the second slider to the value 0.4.
3. Click OK.
3. In the Perspective Name column, replace New Perspective with Internet Sales.
4. If the Internet Sales measure group is not already expanded, expand it and select the following
check boxes:
• Internet Cost.
• Internet Profit.
• Internet Quantity.
• Internet Revenue.
5. Select the check boxes for the following dimensions:
• Customer.
• Order Date.
• Product.
• Ship Date.
MCT USE ONLY. STUDENT USE PROHIBITED
L5-25
6. Create another new perspective named Reseller Sales, and clear the check boxes for all objects other
than those in the following list:
• The Reseller Quantity, Reseller Cost, Reseller Revenue, and Reseller Profit measures.
• The Order Date, Product, Ship Date, Reseller, Sales Territory, and Salesperson dimensions.
7. In the Perspectives dialog box, click OK.
2. On the Model menu, click Analyze in Excel. When prompted, ensure that Current Windows User
and the (Default) perspective are selected, and then click OK.
3. In Excel, in the PivotTable Fields pane, under Total Sales, and then select Total Revenue, Total Cost,
Total Profit, and Gross Margin.
5. In the PivotTable Fields pane, under KPIs, expand Gross Margin, and then select Status.
6. In the PivotTable, expand Clothing and view the Gross Margin Status indicator for each product
subcategory. Close Excel without saving the workbook.
7. On the Model menu, click Analyze in Excel. When prompted, ensure that Current Windows User
and the Internet Sales perspective are selected, and then click OK.
8. In Excel, in the PivotTable Fields pane, verify that only four measures from the Internet Sales
measure group, and the Customer, Order Date, Product, and Ship Date dimensions available.
9. Close Excel without saving the workbook.
Results: At the end of this exercise, you will have a tabular model that contains calculated measures, a
KPI, and perspectives.
MCT USE ONLY. STUDENT USE PROHIBITED
MCT USE ONLY. STUDENT USE PROHIBITED
L6-1
2. Discuss the reporting requirements in the interviews and agree on appropriate tools to support them.
3. In the D:\Labfiles\Lab06\Starter folder, double-click Reporting Requirements.docx to open it in
Microsoft Word.
Results: At the end of this exercise, you will have a reporting requirements document that lists the
reporting scenarios that the BI solution must support, and the tools that you plan to use.
MCT USE ONLY. STUDENT USE PROHIBITED
MCT USE ONLY. STUDENT USE PROHIBITED
L7-1
o Report Parts
o Reports
o Templates
o Executives
o Sales
o Finance
o North America
o Europe
o Pacific
MCT USE ONLY. STUDENT USE PROHIBITED
L7-2 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
Results: At the end of this exercise, you should have created folders in the report server at http://mia-
sqlbi/reports_sql2.
MCT USE ONLY. STUDENT USE PROHIBITED
L7-3
3. In the New Project dialog box, click Report Server Project, in the Name text box, type AWReports,
in the Location box, browse to D:\Labfiles\Lab07\Starter, and then click OK.
4. In Solution Explorer, right-click Shared Data Sources, and then click Add New Data Source. In the
Shared Data Source Properties dialog box:
a. In the Name box, type AWDataWarehouse.
SERVER=MIA-SQLDW;DATABASE=AWDataWarehouse;
f. Click OK.
5. Repeat the previous step to create a second data source with the following properties:
o Name: AWSalesMD
o Connection String:
6. In Solution Explorer, right-click AWReports, and then click Properties. In the AWReports Property
Pages dialog box, set the following properties, and then click OK:
o TargetDataSetFolder: Datasets
o TargetReportFolder: Templates
o TargetServerURL: http://mia-sqlbi/reportserver_sql2
8. After the deployment succeeds, maximize Internet Explorer, browse to the Home page in Report
Manager at http://mia-sqlbi/reports_sql2, and then click Data Sources.
9. Hold the mouse over AWDataWarehouse, click the drop-down arrow that appears, and then click
Manage.
11. Select Credentials stored securely in the report server, and then enter the following credentials:
14. Select Use as Windows credentials when connecting to the data source.
15. Click Test Connection, and after the connection is created successfully, click Apply. If the connection
fails, correct any errors in the settings, and then try again.
16. Click the Data Sources link at the top of the page, and then repeat steps 9 and 10 for the
AWSalesMD data source.
2. Click the design surface for the new report, and then on the Report menu, click Add Page Header.
3. If the Toolbox pane is not visible, on the View menu, click Toolbox. Drag a Text Box to the upper-
right corner of the report in the page header area.
4. Right-click the text box, and then click Text Box Properties. In the Text Box Properties dialog box,
change the Name to PageNumber, and then click the expression (fx) button for the Value property.
5. In the Expression dialog box, in the Category list, click Built-in Fields, and in the Item list double-
click OverallPageNumber. Ensure that the expression is set to =Globals!OverallPageNumber, and
then click OK.
6. In the Text Box Properties dialog box, click the Alignment tab, click Right and in the Horizontal
list, click Right. Then click OK.
7. Resize the page header so that it is just big enough to contain the text box.
8. Drag a Text Box to the upper-left of the page, under the page header area.
9. Right-click the text box, and then click Text Box Properties. In the Text Box Properties dialog box,
change the Name to ReportName, and then click the expression (fx) button for the Value property.
10. In the Expression dialog box, in the Category list, click Built-in Fields, and in the Item list double-
click ReportName. Ensure that the expression is set to =Globals!ReportName, and then click OK.
11. In the Text Box Properties dialog box, click the Alignment tab, and in the Horizontal list, click
Right.
12. Click the Font tab, select Bold, and in the Size list, click 16pt. Click OK, and resize the text box so that
it is the same width as the page and you can read the text ([&ReportName]).
13. Drag a Text Box to the page, under the ReportName text box.
14. Right-click the text box, and then click Text Box Properties. In the Text Box Properties dialog box,
change the Name to ExecutionTime, and then click the expression (fx) button for the Value
property.
15. In the Expression dialog box, in the Category list, click Built-in Fields, and in the Item list double-
click ExecutionTime. Ensure that the expression is set to =Globals!ExecutionTime, and then click
OK.
16. In the Text Box Properties dialog box, click the Number tab. In the Category list, click Date, and in
the Type list select *Monday, January 31, 2000 1:30 PM.
17. Click the Alignment tab and in the Horizontal list, click Right. Click OK, and resize the text box so
that it is the same width as the page and you can read the text ([&ExecutionTime]).
MCT USE ONLY. STUDENT USE PROHIBITED
L7-5
19. Drag an Image to the footer and place it at the left edge of the page.
20. In the Image Properties dialog box, change the Name property to AWLogo, and then click Import.
Browse to the D:\Labfiles\Lab07\Starter folder, select Adventure Works Logo.jpg, and then click
Open. Click OK to close the Image Properties dialog box.
21. Resize the image so that it is approximately the same width as the text “[&ReportName]” in the
ReportName text box.
22. Drag a Text Box to the page, next to the image in the page footer.
23. Right-click the text box, and then click Text Box Properties. In the Text Box Properties dialog box,
change the Name to Warning and the Value to Property of Adventure Works Cycles – Do not
distribute without permission, and then click OK.
24. Widen the text box so that its text can be read.
25. Click the PageNumber text box, and then hold the Ctrl key and click the ReportName,
ExecutionTime, and Warning text boxes and press F4.
26. In the properties pane, set the CanGrow property to False.
3. In the Shared Dataset Properties dialog box, click the Parameters tab, and in the Data Type list for
both the @StartDate and @EndDate parameters that have been created, select Date/Time. Then
click OK.
4. In Solution Explorer, right-click Shared Datasets, and then click Add New Dataset.
5. In the Shared Dataset Properties dialog box, change the Name property to Reseller Sales By
Region, in the data source list, select AWDataWarehouse, under the Query box, click Import.
Browse to the D:\Labfiles\Lab07\Starter folder, select Reseller Sales By Region.sql, and then click
Open.
6. In the Shared Dataset Properties dialog box, click the Parameters tab.
7. In the Data Type list for both the @StartDate and @EndDate parameters that have been created,
select Date/Time.
8. In the Data Type list for the @Regions parameter, select Text, check the Allow multiple values
check box, check the check box to the right of the @Regions text box, and in the empty box, type
Europe,North America,Pacific. Click OK.
9. In Solution Explorer, right-click Shared Datasets, and then click Add New Dataset.
10. In the Shared Dataset Properties dialog box, change the Name property to Last Month, in the data
source list, select AWDataWarehouse, and under the Query box, click Import.
11. Browse to the D:\Labfiles\Lab07\Starter folder, select Last Month.sql, and then click Open.
o OverwriteDatasets: False
o OverwriteDataSources: False
o TargetDataSetFolder: Datasets
o TargetDataSourceFolder: Data Sources
o TargetReportFolder: Templates
o TargetServerURL: http://mia-sqlbi/reportserver_sql2
2. Click the Report Parts folder, and then, click Report Builder. If you are prompted to run the
program, click Run.
3. When Report Builder starts, in the Getting Started dialog box, make sure that New Report is
selected, and then click Chart Wizard.
Note: If the Getting Started dialog box is not displayed, click the round button at the upper left of the
Report Builder window, and then click New.
4. On the Choose a dataset page, select Choose an existing dataset in this report or a shared dataset,
and click Browse. In the Select Dataset dialog box, browse to the Datasets folder, select Internet Sales
By Country, and then click Open.
5. On the Choose a dataset page, make sure that Internet Sales By Country is selected, and then click
Next.
6. On the Choose a chart type page, select Bar, and then click Next.
7. On the Arrange chart fields page, drag Country to the Series area, drag Revenue to the Values
area, and then click Next.
8. On the Choose a style page, select Corporate, and then click Finish.
9. In Report Builder, click the chart, and then on the View tab of the ribbon, click Properties.
11. Resize the chart, and change its title to Internet Sales Revenue. Apply any other formatting changes
that you want.
12. Click the round button at the upper left, and then click Publish Report Parts. In the Publish Report
Parts dialog box, click the Review and modify report parts before publishing.
13. In the Publish Report Parts dialog box, expand InternetSalesChart and verify that the /Report
Parts folder is selected. Clear all check boxes other than InternetSalesChart, enter an appropriate
description, and then click Publish. After the report part is published successfully, click Close.
15. In Internet Explorer, refresh the view of the Report Parts folder and verify that the report part has
been published. Close Internet Explorer.
Results: At the end of this exercise, you will have published shared data sources, a report template,
shared datasets, and a report part.
MCT USE ONLY. STUDENT USE PROHIBITED
L7-8 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
3. In the New Project dialog box, click Report Server Project, in the Name text box, type Executive
Reports, in the Location box, browse to D:\Labfiles\Lab07\Starter, and then click OK.
4. In Solution Explorer, right-click Shared Data Sources, and then click Add New Data Source. In the
Shared Data Source Properties dialog box:
a. In the Name box, type AWDataWarehouse
SERVER=MIA-SQLDW;DATABASE=AWDataWarehouse;
f. Click OK.
5. In Solution Explorer, right-click Shared Datasets, point to Add, and then click New Item.
6. In the Add New Item – Executive Reports dialog box, select Reseller Sales By Region, change the
Name property to Reseller Sales By Region.rsd, and then click Add. The dataset is added and then
opened so you can see its source XML definition. Close the Reseller Sales by Region.rsd window.
7. Repeat the previous step to add a dataset named Last Month.rsd based on the Last Month item
template.
8. In Solution Explorer, right-click Reports, point to Add, and then click New Item.
9. In the Add Item – Executive Reports dialog box, select AWReport, change the Name property to
Reseller Sales.rdl, and then click Add.
10. Click the report design surface, and then on the View menu, click Report Data.
11. In the Report Data pane, right-click Data Sources, and then click Add Data Source. In the Data
Source Properties dialog box, change the Name property to AWDataWarehouse, select Use
shared data source reference, in the drop-down list select the AWDataWarehouse shared data
source, and then click OK.
12. In the Report Data pane, right-click Datasets, and then click Add Dataset. In the Dataset Properties
dialog box, change the Name property to ResellerSalesByRegion, in the list of shared datasets,
select Reseller Sales By Region, and then click OK.
13. Repeat the previous step to add a data set named LastMonth, based on the Last Month shared
dataset, to the report.
14. In the Report Data pane, expand Parameters and double-click StartDate.
15. In the Report Parameter Properties dialog box, on the Default Values tab, select Get values from
a query. In the Dataset list, select LastMonth, in the Value field list, select FirstDay, and then click
OK.
16. Repeat the previous two steps to set the default value of the EndDate parameter to the LastDay field
in the LastMonth dataset.
MCT USE ONLY. STUDENT USE PROHIBITED
L7-9
17. View the Toolbox pane, and drag a Table to the report.
18. In the Report Data pane, expand the ResellerSalesByRegion dataset, and drag the Revenue field to
the first column in the table.
19. Drag the Territory field to the Row Groups pane below the report, and drop it above the Details
group.
20. Drag the Country field to the Row Groups pane under the report, and drop it above the Territory
group.
21. Drag the Region field to the Row Groups pane under the report, and drop it above the Country
group.
22. Right-click the column headers for each empty column in the table, and then click Delete Columns.
23. Click the Preview tab to view the report, and note that the default values for all parameters are used.
Click the Design tab and apply any formatting you want to improve the report.
24. When you are satisfied with the report, in Solution Explorer, right-click Executive Reports and then
click Properties. In the Executive Reports Property Pages dialog box, set the following properties,
and then click OK.
a. OverwriteDatasets: False
b. OverwriteDataSources: False
c. TargetDataSetFolder: Datasets
e. TargetReportFolder: Reports/Executives
f. TargetReportPartFolder: Report Parts
g. TargetServerURL: http://mia-sqlbi/reportserver_sql2
25. In Solution Explorer, right-click Executive Reports, and then click Deploy. After deployment
succeeds, close Visual Studio.
26. Start Internet Explorer and browse to http://mia-sqlbi/reports_sql2. Click the Reports folder and
the Executives folder.
27. Click the Reseller Sales report and verify that it displays the reseller sales data by sales region. Keep
Internet Explorer open for the next task.
2. Hold the mouse over Reseller Sales, click the drop-down arrow that appears, and then click Create
Linked Report.
3. In the Name box, type Reseller Sales – Europe, and then click Change Location.
4. In the folder tree, expand Sales, click Europe, and then click OK. On the Reseller Sales page, click
OK. A linked version of the report is created in the Europe folder and displayed.
5. At the top of the page, click Reseller Sales – Europe to view the properties of the linked report.
6. Click the Parameters tab, and in the row for the Regions parameter, click Override Default. Click
the empty Default Value box, and in the top line of the drop down list, type Europe.
7. Select the Hide check box for the Regions parameter, and then click Apply.
MCT USE ONLY. STUDENT USE PROHIBITED
L7-10 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
8. At the top of the page, click the large Reseller Sales – Europe page title to view the report, and then
verify that it is filtered to show only sales in Europe.
9. At the top of the page, click Reports, and then click Executives. Repeat steps 2 to 8 to create a linked
report named Reseller Sales – North America in the Sales/North America folder with a hidden
Regions parameter that has a default value of North America.
10. Repeat the previous step to create a linked report named Reseller Sales – Pacific in the Sales/Pacific
folder with a hidden Regions parameter that has a default value of Pacific.
2. On the Finance page, click Report Builder. If you are prompted to run the program, click Run.
3. When Report Builder starts, in the Getting Started dialog box, click Open.
Note: If the Getting Started dialog box is not displayed, click the round button at the upper left of the
Report Builder window, and then click Open.
4. In the Open Report dialog box, browse to the Templates folder, select AWReport, and then click
Open.
5. Click the round button at the top left of the Report Builder window, and then click Save As. In the
Save As Report dialog box, click the Up One Level button, double-click Reports, double-click
Finance, change the Name property to Internet Sales, and then click Save.
6. On the ribbon, on the Insert tab, click Report Parts. Then, in the Report Part Gallery pane, in the
search box, type InternetSales and then click the search button. The chart you created earlier should
be displayed in the Report Part Gallery pane.
7. Double-click the InternetSalesChart icon to add the chart to the report, and then resize the chart
and report so that the chart is at the top of the main page area, under the ExecutionTime text box.
Leave a space under the chart so you can add a table to the report later.
Tip: You can close the Properties and Report Part Gallery panes to create more working space.
8. In the Report Data pane, expand Datasets and note that the InternetSalesByCountry dataset used
by the chart has been added to the report. Expand Parameters and note that the StartDate and
EndDate parameters used by the dataset have also been added to the report.
9. Right-click Datasets, and then click Add Dataset. In the Dataset Properties dialog box, set the
Name property to LastMonth, and then click Browse.
10. In the Select Dataset dialog box, browse to the Datasets folder, select Last Month, and then click
Open. In the Dataset Properties dialog box, click OK.
12. In the Report Parameter Properties dialog box, on the Default Values tab, select Get values from
a query. In the Dataset list, select LastMonth, in the Value field list, select FirstDay, and then click
OK.
13. Repeat the previous two steps to set the default value of the EndDate parameter to the LastDay field
in the LastMonth dataset.
14. On the ribbon, on the Insert tab, click Table, and then click Insert Table. Click and drag under the
chart to create the table.
MCT USE ONLY. STUDENT USE PROHIBITED
L7-11
15. In the Report Data pane, expand the InternetSalesByCountry dataset, and drag the Revenue field
to the first column in the table.
16. Drag the Cost field to the second column in the table.
17. Right-click the empty cell in the details row of the third column, and then click Expression. In the
Expression dialog box, enter the following expression, and then click OK.
=Fields!Revenue.Value - Fields!Cost.Value
18. Click the empty header row cell above the expression you just entered, and type Profit.
19. Drag the City field to the Row Groups pane under the report, and then drop it above the Details
group.
20. Drag the StateOrProvince field to the Row Groups pane under the report, and then drop it above
the City group.
21. Drag the Country field to the Row Groups pane under the report, and then drop it above the
StateOrProvince group.
22. Drag the Region field to the Row Groups pane under the report, and then drop it above the
Country group.
23. Click the Run button to view the report, and note that the default values for all parameters are used.
Click the Design tab, and apply any formatting you want to improve the report.
24. When you are satisfied with the report, click the Save button and close Report Builder.
25. In Internet Explorer, refresh the Finance page and verify that the report has been saved there. Click
Internet Sales to view the report.
Results: At the end of this exercise, you should have created a report from a template, created a linked
report, and used Report Builder to create a report that includes a previously published report part.
MCT USE ONLY. STUDENT USE PROHIBITED
MCT USE ONLY. STUDENT USE PROHIBITED
L8-1
5. Click Yes when prompted to confirm that you want to run the command file, and then wait for the
script to finish.
Results: At the end of this exercise, you should have a document that contains a list of the required
Excel features.
MCT USE ONLY. STUDENT USE PROHIBITED
L8-2 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
3. On the File tab, click Save As, and on the Save As page, click Browse. Browse to the
D:\Labfiles\Lab08\Starter folder and save the workbook as Sales Analysis.xlsx.
4. On the Data tab, click From Other Sources, and in the drop-down list, click From Analysis Services.
5. In the Data Connection Wizard dialog box, in the 1. Server name box, type MIA-SQLBI. Ensure that
Use Windows Authentication is selected, and then click Next.
6. On the Select Database and Table page, select the AWSalesMD database, and in the list of cubes,
select Sales, and then click Next.
7. On the Save Data Connection File and Finish page, click Finish.
8. In the Import Data dialog box, ensure that PivotTable Report is selected; ensure that Existing
worksheet is selected; and on the worksheet behind the dialog box, click cell B16. Click OK.
2. In the Insert Slicers dialog box, under Product, select Category, and under Reseller, select Business
Type. Click OK.
3. Click the caption of the Category slicer and on the ribbon, in the SLICER TOOLS section, on the
OPTIONS tab, click Slicer Settings.
4. In the Slicer Settings dialog box, select the Hide items with no data check box, and then click OK.
5. Repeat the previous two steps to hide items with no data in the Business Type slicer.
6. Resize and move the slicers so that they are above the PivotTable.
7. In the Category slicer, click Bikes. In the Business Type slicer, click Specialty Bike Shop. The data in
the PivotTable is filtered to include only sales of bikes to specialty bike shops.
8. Click the Clear Filter button at the upper right of each slicer to remove the filters.
2. In the Insert Chart dialog box, ensure that Clustered Column is selected, and then click OK.
3. Move the chart to the right of the slicers above the PivotTable.
4. In the chart, expand North America, and then verify that the chart is updated to reflect the
expanded hierarchy.
5. In the Business Type slicer, select Warehouse, and then verify that the chart is filtered by the slicer.
MCT USE ONLY. STUDENT USE PROHIBITED
L8-3
Results: At the end of this exercise, you will have an Excel workbook that contains a PivotTable and a
PivotChart based on an Analysis Services cube.
MCT USE ONLY. STUDENT USE PROHIBITED
L8-4 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
3. On the ribbon, on the FILE tab, click Save As, and on the Save As page, click Browse. Browse to the
D:\Labfiles\Lab08\Starter folder and save the workbook as Marketing Analysis.xlsx.
5. In the Options dialog box, click Add-Ins. In the Manage drop-down list, select COM Add-ins, and
then click Go.
6. In the COM Add-Ins dialog box, ensure that the Microsoft Office PowerPivot for Excel 2013 add-
in is selected, and then click Cancel.
2. In the PowerPivot for Excel – Marketing Analysis.xlsx window, on the ribbon, on the Home tab, in
the Get External Data drop-down list, in the From Database drop-down list, click From SQL Server.
3. In the Table Import Wizard dialog box, in the Server name box, type MIA-SQLDW. Ensure that
Use Windows Authentication is selected, and in the Database name drop-down list, select
AWDataWarehouse. Click Next.
4. On the Choose How to Import the Data page, ensure that Select from a list of tables and views to
choose the data to import is selected, and then click Next.
5. On the Select Tables and Views page, select the following views, change the friendly name of
InternetSales to Internet Sales, and then click Finish:
o Customer
o Date
o InternetSales
o Product
o Promotion
6. On the Importing page, after the data is successfully imported, click Close.
2. Arrange the tables so that you can see both the InternetSales and Date tables. Drag the
OrderDateKey field from the InternetSales fact table to the DateKey field in the Date dimension
table to create a relationship in which OrderDateKey is the foreign key, and DateKey is the primary
key.
Dimension
Fact Table Foreign Key Primary Key
Table
2. On the Customer table, scroll to the right and double-click the Add Column column header.
3. Type Birth Year, and press enter. Then, with the new Birth Year column selected, in the formula bar,
enter the following DAX expression:
=YEAR([BirthDate])
4. Click the tab for the Date table, and then on ribbon, on the Design tab, in the Mark as Date Table
drop-down list, click Mark as Date Table. In the Mark as Date Table dialog box, ensure that the
DateAltKey column is selected in the list, and then click OK.
5. Click the MonthName column header, and on the ribbon, on the Home tab, click Sort by Column.
6. In the Sort by Column dialog box, in the Sort section, in the Column list, ensure that MonthName
is selected. In the By section, in the Column list, select MonthNumber, and then click OK.
9. While holding the Ctrl key, click the following columns in the Customer table:
o CustomerKey
o CustomerAltKey
o Title
o Name
o BirthDate
o StreetAddress
o EmailAddress
o Phone
o CurrentFlag
o StartDate
o EndDate
10. Right-click any of the selected columns, and then click Hide from Client Tools. The columns, which
you did not select, will remain visible as dimension attributes.
MCT USE ONLY. STUDENT USE PROHIBITED
L8-6 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
11. Right-click the StateOrProvince column, click Rename, and then rename the column to State Or
Province. Then rename the following columns:
o NumberChildren: Children.
o NumberCars: Cars.
o CommuteDistance: Commute Distance.
12. Click the Restore button to return the Customer table to its restored state.
13. Repeat steps 8 to 12 as necessary to configure the columns in the following tables.
2. Click the Create Hierarchy button in the title bar of the maximized Product table.
3. When the new hierarchy is created, change its name to Products By Category.
4. Drag the Category, Subcategory, and Product attributes (in that order) to the Products By
Category hierarchy.
5. Click the Product attribute that is not in the hierarchy, hold the Shift key and click the Category
attribute, right-click the selected attributes, and then click Hide from Client Tools.
6. Click the Restore button in the title bar of the maximized Product table.
7. Repeat steps 1 to 6 to create a hierarchy named Calendar Date in the Date table with the following
members:
o Year
o Month
o Day
MCT USE ONLY. STUDENT USE PROHIBITED
L8-7
8. Repeat steps 1 to 6 to create a hierarchy named Sales Promotion in the Promotion table with the
following members:
o Promotion Type
o Promotion
9. Repeat steps 1 to 6 to create a hierarchy named Customers By Geography in the Customers table
with the following members:
o Country
o State Or Province
o City
o Postal Code
Note: If an error message is displayed when saving the workbook in this pre-release version of Excel 2013,
click OK to close it.
2. In the ribbon, on the Home tab, in the PivotTable drop-down list, click PivotTable. In the Insert
Pivot dialog box, select Existing Worksheet, select cell A15, and then click OK.
3. In the PivotTable Fields pane, expand Internet Sales, and then select Revenue.
4. In the PivotTable Fields pane, expand Product, and then drag Products By Category to the ROWS
area.
5. In the PivotTable Fields pane, expand Promotion, and then drag Sales Promotion to the
COLUMNS area.
6. In the VALUES area at the bottom of the PivotTable Fields pane, click the drop-down arrow for the
Sum of Revenue field, and then click Value Field Settings.
7. In the Value Field Settings dialog box, click Number Format. In the Format Cells dialog box, click
the Accounting category, and then click OK. In the Value Field Settings dialog box, click OK to
close it.
8. In the PivotTable Fields pane, under the Customer table, expand More Fields, and then drag Cars,
Children, and Birth Year to the FILTERS area.
9. On the ribbon, in the PIVOTTABLE TOOLS section, on the ANALYZE tab, click Insert Slicer. In the
Insert Slicers dialog box, under the Customer table, expand More Fields, select the following fields,
and then click OK:
o Marital Status
o Gender
10. Move the slicers above the PivotTable, and then click the caption of the Marital Status slicer and on
the ribbon, in the SLICER TOOLS section, on the OPTIONS tab, click Slicer Settings.
11. In the Slicer Settings dialog box, select the Hide items with no data check box, and then click OK.
12. Repeat the previous two steps to hide items with no data in the Gender slicer.
13. On the ribbon, in the PIVOTTABLE TOOLS section, on the ANALYZE tab, click Insert Timeline. In
the Insert Timelines dialog box, select Order Date, and then click OK.
MCT USE ONLY. STUDENT USE PROHIBITED
L8-8 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
14. Move the Order Date timeline above the PivotTable, and then change the time period from
MONTHS to YEARS.
o In the Birth Year filter, select Select Multiple Items, and then select all years later than 1970.
Note: If an error message is displayed when saving the workbook in this pre-release version of Excel 2013,
click OK to close it.
17. Keep the Excel workbook open for the next exercise, but close the PowerPivot window.
Results: At the end of this exercise, you will have an Excel workbook that contains a PowerPivot data
model based on data from the data warehouse.
MCT USE ONLY. STUDENT USE PROHIBITED
L8-9
Note: If a Power View report does not open in the POWER VIEW tab of the ribbon, view Excel options and
remove the Power View COM add-in, and then add it again.
2. In the Power View report, click Click here to add a title, and then type Sales Promotion Analysis.
3. On the ribbon, on the POWER VIEW tab, click Filters Area to hide the filters area.
4. In the Power View Fields pane, expand Internet Sales, and then select Revenue. Expand Promotion,
expand Sales_Promotion, select Promotion_Type. Expand Customer, and then click Commute
Distance.
5. On the Bar Chart drop-down list, click Clustered Bar. Resize the chart so it fills the left half of the
report.
6. In the Power View Fields pane, expand Date, expand Calendar_Date, and then drag the Year field
to the TILE BY area. Click each of the year headers in the report to view revenue by promotion type
broken down by commute distance for each year.
7. Click the blank area of the report on the right, and in the Power View Fields pane, under Internet
Sales, click Revenue. Then under Customer, expand Customers_By_Geography, select Country.
8. On the Other Chart drop-down list, click Pie. Resize the chart so it fills the top of the right half of the
report.
9. Click the blank area of the report on the right under the pie chart, and in the Power View Fields
pane, under Internet Sales, click Revenue. Under Customer, select Cars.
10. On the Column Chart drop-down list, click Stacked Column. Resize the chart so it fills the bottom of
the right half of the report.
11. In the Commute Distance legend, click the colored square for the 0-1 Miles category, and note that
all of the charts are shaded to reflect the selected value.
12. Click each of the other Commute Distance legend values and note the shading in all charts.
13. Click the currently selected Commute Distance legend value again to remove the shading.
Note: If an error message is displayed when saving the workbook in this pre-release version of Excel 2013,
click OK to close it.
Results: At the end of this exercise, you will have an Excel workbook that contains a Power View report
based on a PowerPivot data model.
MCT USE ONLY. STUDENT USE PROHIBITED
MCT USE ONLY. STUDENT USE PROHIBITED
L9-1
2. In the title bar for the home page, next to Student, click the Settings icon and in the menu, click Site
Settings.
3. On the Site Settings page, under Site Collection Administration, click Site Collection Features.
4. On the Site Collection Features page, in the SharePoint Server Publishing Infrastructure row, click
Activate, and then wait for the Active indicator to appear.
Note: The feature can take a few minutes to activate.
5. At the top of the Site Collection Features page, click Site Settings to return to the Site Settings
page.
6. Under Site Actions, click Manage Site Features.
7. On the Site Features page, in the SharePoint Server Publishing row, click Activate, and then wait
for the Active indicator to appear.
8. At the top of the Site Features page, click Adventure Works Portal to return to the home page.
4. Under Description, in the text box, type A subsite for Adventure Works BI reports.
6. In the Template Selection area, under Select a template, on the Enterprise tab, click Business
Intelligence Center.
7. At the bottom of the page, click Create button. After a short time, the Adventure Works BI Portal site
is displayed.
MCT USE ONLY. STUDENT USE PROHIBITED
L9-2 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
8. Select the URL in the Internet Explorer navigation bar, right-click it, and then click Copy.
10. On the home page, under the Quick Launch area, click Edit Links, and then click LINK.
11. In the Add a link dialog box, in the Text to display box, type BI Portal, right-click the Address box,
click Paste, and then click OK.
12. Under LINK, click Save.
13. In the Quick Launch area, click the new BI Portal link and verify that the Adventure Works BI Portal
site is displayed.
14. In the Adventure Works BI Portal, in the title bar for the home page, next to Student, click the
Settings icon, and then click Site Settings.
15. On the Adventure Works BI Portal Site Settings page, under Look and Feel, click Navigation.
16. On the Navigation Settings page, in the Current Navigation section, select Structural Navigation:
Display only the navigation items below the current site. At the top of the page, click OK. Note
that the Quick Launch area now only shows links for the items in the BI Portal subsite, and not for
items in the parent site.
17. Click the image above the Quick Launch area, this provides a navigation link to the home page of the
subsite.
18. Keep Internet Explorer open at the Adventure Works BI Portal page for the next exercise.
Results: At the end of this exercise, you should have created a subsite based on the Business Intelligence
Center template at http://mia-sqlbi/sites.adventureworks/bi.
MCT USE ONLY. STUDENT USE PROHIBITED
L9-3
3. On the Adventure Works Portal Site Settings page, under Site Collection Administration, click Site
collection features.
4. On the Site Collection Features page, verify that Report Server Integration Feature is activated.
5. Click the BI portal link in the Quick Launch area to return to the BI portal site.
3. On the Your Apps page, under Noteworthy, click Document Library. Then, in the Adding
Document Library dialog box, in the Pick a name box, type AWReports, and then click Create.
4. In the Quick Launch area, under Recent, click AWReports.
5. On the AWReports page, on the ribbon, on the Library tab, click Library Settings.
6. On the Settings page, click Advanced settings. On the Advanced Setting page, under Allow
management of content types, select Yes, and at the bottom of the page, click OK.
7. On the Settings page, in the Content Types list, note that only the Document content type is
enabled for the document library. Click Add from existing site content types.
8. On the Add Content Types page, in the Select site content types from drop-down list, select SQL
Server Reporting Services Content Types.
9. Click Report Builder Report, hold the Ctrl key while clicking Report Data Source, click Add, and
then click OK.
10. Under the Content Types list, click Change new button order and default content types.
11. On the Change Button Order page, clear the Visible check box for the Document content type, and
then click OK.
12. On the Settings page, under General Settings, click List name, description, and navigation. On the
General Settings page, in the Navigation section, under Display this document library on the Quick
Launch, select Yes, and then click Save.
13. Click the image above the Quick Launch area to return to the home page, and note that AWReports
is now listed in the Quick Launch area.
15. On the AWReports page, on the ribbon, on the Files tab, click the New Document drop-down
button, and note that the Report Builder Report and Report Data Source content types are listed.
Click the New Document drop-down button again to hide the list.
5. In the AWReports Property Pages dialog box, in the Deployment section, set the following
properties, and then click OK:
o TargetServerURL: http://mia-sqlbi/sites/adventureworks/bi
o TargetDatasetFolder: http://mia-sqlbi/sites/adventureworks/bi/awreports/Datasets
o TargetReportFolder: http://mia-sqlbi/sites/adventureworks/bi/awreports/Templates
8. In Internet Explorer, in the Quick Launch area, click AWReports and verify that the following folders
have been created:
o Data Sources
o Datasets
o Templates
9. Click Data Sources and verify that the AWDataWarehouse and AWSalesMD data sources have
been deployed.
10. Click the ellipsis to the right of AWDataWarehouse, and in the pop-up window that appears, click
EDIT.
Password: Pa$$w0rd
15. Click the Data Sources link at the top of the page, and then repeat steps 10 and 11 for the
AWSalesMD data source.
2. On the ribbon, on the Files tab, click New Folder. Then in the Create a new folder dialog box, in the
Name box, type Self-Service Reports and click Save.
3. Click the Self-Service Reports folder, and then, on the New Document drop-down list, click Report
Builder Report. If you are prompted to run the application, click Run.
4. When Report Builder starts, in the Getting Started dialog box, click Open.
MCT USE ONLY. STUDENT USE PROHIBITED
L9-5
Note: If the Getting Started dialog box is not displayed, click the round button at the upper left of the
Report Builder window, and then click Open.
5. In the Open Report dialog box, double-click the AWReports folder and the Templates folder, select
AWReport.rdl, and then click Open.
6. Click the round button at the upper-left of the Report Builder window, and then click Save As. In the
Save As Report dialog box, click the Up One Level button, double-click Self-Service Reports,
change the Name property to Internet Sales, and then click Save.
7. In the Report Data pane, right-click Datasets, and then click Add Dataset. In the Dataset Properties
dialog box, set the Name property to InternetSalesByCountry, and then click Browse.
8. In the Select Dataset dialog box, double-click the AWReports folder and the Datasets folder, select
Internet Sales By Country.rsd, and then click Open. On the Dataset Properties dialog box, click
OK.
9. In the Report Data pane, expand Parameters, and then note that the StartDate and EndDate
parameters used by the dataset have also been added to the report.
10. Right-click Datasets, and then click Add Dataset. In the Dataset Properties dialog box, set the
Name property to LastMonth, and then click Browse.
11. In the Select Dataset dialog box, double-click the AWReports folder and the Datasets folder, select
Last Month.rsd, and then click Open. In the Dataset Properties dialog box, click OK.
12. In the Report Data pane, in the Parameters folder, double-click the StartDate parameter.
13. In the Report Parameter Properties dialog box, on the Default Values tab, select Get values from
a query. In the Dataset list, select LastMonth, in the Value field list, select FirstDay, and then click
OK.
14. Repeat the previous two steps to set the default value of the EndDate parameter to the LastDay field
in the LastMonth dataset.
15. On the ribbon, on the Insert tab, click Table, and then click Insert Table. Click and drag into the
blank area of the report to create the table.
16. In the Report Data pane, expand the InternetSalesByCountry dataset, and then drag the Revenue
field to the first column in the table.
17. Drag the Cost field to the second column in the table.
18. Right-click the empty cell in the details row of the third column, and then click Expression. In the
Expression dialog box, enter the following expression, and then click OK.
=Fields!Revenue.Value - Fields!Cost.Value
19. Click the empty header row cell above the expression you just entered, and then type Profit.
20. Drag the City field to the Row Groups pane below the report, and then drop it above the Details
group.
21. Drag the StateOrProvince field to the Row Groups pane below the report, and then drop it above
the City group.
22. Drag the Country field to the Row Groups pane below the report, and then drop it above the
StateOrProvince group.
23. Click the Run button to view the report, and note that the default values for all parameters are used.
Click the Design tab and apply any formatting you want to improve the report.
24. After you are satisfied with the report, click the Save button, and then close Report Builder.
MCT USE ONLY. STUDENT USE PROHIBITED
L9-6 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
25. In Internet Explorer, in the Quick Launch area, click AWReports, click Self-Service Reports, and
verify that the report has been saved there. Click Internet Sales to view the report.
26. At the top of the page, click Adventure Works BI Portal to return to the home page for the BI portal
subsite, and keep Internet Explorer open for the next exercise.
Results: At the end of this exercise, you will have published Reporting Services reports to the BI subsite
and verified that self-service reporting is supported.
MCT USE ONLY. STUDENT USE PROHIBITED
L9-7
3. On the Your Apps page, under Apps you can add, click PowerPivot Gallery. In the Adding
PowerPivot Gallery dialog box, in the name box, type AWPowerPivot, and then click Create.
6. On the Settings page, click List name, description, and navigation. On the General Settings page, in
the Navigation section, under Display this document library on the Quick Launch, select Yes, and then
click Save.
7. Click the image above the Quick Launch area to return to the home page, and note that
AWPowerPivot is now listed in the Quick Launch area.
3. In the Add a document dialog box, click Browse. Browse to the D:\labfiles\Lab09\Starter folder, and
then double-click Marketing Analysis.xlsx. In the Add a document dialog box, click OK.
4. Click the large image for the Marketing Analysis PowerPivot workbook to view it in Excel Services.
5. Click the Clear Filter icon at the upper-right of the Marital Status and Gender slicers, and verify that
the data in the PivotTable updates.
Tip: Make sure you click the link for the application, and not the link for its proxy.
5. In the Secure Store Service configuration page, note that a target application has already been
created for the PowerPivot unattended account for data refresh.
6. In the Quick Launch area, click Application Management, and then on the Application Management
page, under Service Applications, click Configure service application associations.
7. On the Service Application Associations page, click the SharePoint – 80 web application. In the list
of service applications, select Secure Store Service, scroll to the bottom, and then click OK. This
enables the Adventure Works Portal site and its subsites to use credentials in the secure store service.
8. In the Quick Launch area, click General Application Settings, and then on the General Application
Settings page, under PowerPivot, click Configure service application settings.
MCT USE ONLY. STUDENT USE PROHIBITED
L9-8 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
9. On the PowerPivot Settings page, in the Data Refresh section, review the settings and note that:
o The unattended data refresh account has been configured to use the target application ID you
saw in the secure store service earlier.
11. Start a new instance of Internet Explorer and if the Adventure Works Portal site does not open,
browse to http://mia-sqlbi/sites/adventureworks.
12. In the Quick Launch area, click BI Portal, and then click AWPowerPivot.
13. Click the Manage Data Refresh button to the right of the Marketing Analysis workbook.
14. In the Manage Data Refresh page, select Enable, and then review the default settings. When you
have finished reviewing the settings, click OK.
5. On the Job Definitions page, click PowerPivot Management Dashboard Processing Timer Job. On the
Edit Timer Job page, click Run Now.
6. Under Timer Links on the left, click Running Jobs, and note that the job is running. Wait a few
minutes, and then click the Running Jobs link again until the job is no longer listed.
7. Under Timer Links on the left, click Job History, and verify that the PowerPivot Management
Dashboard Processing Timer Job completed successfully.
8. In the Quick Launch area, click Central Administration, and then under General Application Settings,
click PowerPivot Management Dashboard.
9. In the View drop-down list, click Activity, and review the server activity recorded for PowerPivot
workbooks.
10. Under Workbook Activity – Chart, click the Play button to view a timeline of workbook activity.
Results: At the end of this exercise, you will have a PowerPivot Gallery that contains a published
PowerPivot workbook.
MCT USE ONLY. STUDENT USE PROHIBITED
L9-9
Tip: Make sure you click the link for the application, and not the link for its proxy.
3. On the Manage PerformancePoint Services page, click PerformancePoint Service Application Settings.
4. On the PerformancePoint Service Application Settings page, ensure that Unattended Service
Account is selected, enter the following credentials, and then click OK:
o Password: Pa$$w0rd
4. In the Internet Explorer prompt to open designer.application from mia-sqlbi, click Open. If the
Application Run – Security Warning dialog box is displayed, click Run.
5. In the Dashboard Designer, in the Workspace Browser pane, right-click Data Connections, and then
click New Data Source.
6. In the Select a Data Source Template dialog box, under Template, click Analysis Services, and then
click OK.
7. When the new data source is created, rename it to AWSalesMD.
8. Under Connection Settings, in the Server text box, type MIA-SQLBI; in the Database drop-down
list, select AWSalesMD, and in the Cube drop-down list, select Sales.
10. In the Time Dimension drop-down list, click Ship Date.Ship Date.Fiscal Date.
11. In the Choose a date to begin the year box for the selected time dimension, click Browse.
12. In the Select Member dialog box, select the 1st of July for the most recent fiscal year, and then click
OK.
14. In the Enter a date that is equal to the period specified by the reference member above list, select the
same date that you chose in step 12.
15. In the Time Member Association pane, create the following mappings:
• Fiscal Year: Year.
• Date: Day.
16. In the Workspace Browser pane, right click AWSalesMD, and then click Save.
2. In the Select a KPI Template dialog box, select Blank KPI, and then click OK.
4. In the Editor, in the Actual and Targets section, in the Name column, click Actual, and then rename
to YTD.
5. In the Data Mappings column, in the YTD row, click 1 (fixed values). In the Fixed Values Data Source
Mapping dialog box, click Change Source.
6. In the Select a Data Source dialog box, on the Workspace tab, click AWSalesMD, and then click
OK.
7. In the Dimensional Data Source Mapping dialog box, in the Select a measure drop-down list, select
Reseller Revenue.
8. In the Select a dimension section, click New Time Intelligence Filter. In the Time Formula Editor dialog
box, type YearToDate, and then click OK.
9. On the Dimensional Data Source Mapping dialog box, click OK.
10. In the Number column, in the YTD row, click (Default), in the Format Numbers dialog box, in the
Format drop-down list, select Currency, and then click OK.
11. In the Data Mappings column, in the Target row, click 1 (fixed values). In the Fixed Values Data Source
Mapping dialog box, click Change Source.
12. In the Select a Data Source dialog box, on the Calculated Metric tab, under Templates, click Blank
Calculation, and then click OK.
13. In the Calculated Metrics Data Source Mapping dialog box, click the second row labeled Value2,
and then click Delete.
14. Click Value1 and rename it to PreviousYear.
15. Click 1 (fixed values), and then in the Fixed Values Data Source Mapping dialog box, click Change
Source.
16. In the Select a Data Source dialog box, on the Workspace tab, click AWSalesMD, and then click
OK.
17. In the Dimensional Data Source Mapping dialog box, in the Select a measure drop-down list, select
Reseller Revenue.
18. In the Select a dimension section, click New Time Intelligence Filter. In the Time Formula Editor dialog
box, type YearToDate-1, and then click OK.
19. In the Dimensional Data Source Mapping dialog box, click OK.
20. In the Calculated Metric Data Source Mapping dialog box, in the Formula box, type
PreviousYear*1.25, and then click OK.
21. In the Number column, in the Target row, click (Default), and then, in the Format Numbers dialog
box, in the Format drop-down list, select Currency, and then click OK.
MCT USE ONLY. STUDENT USE PROHIBITED
L9-11
o Best: 120%
o Threshold 2: 90%
o Threshold 1: 50%
o Worst: 0%
23. In the Workspace Browser pane, right-click Reseller Revenue, and then click Save.
2. In the Create an Analytic Chart Report dialog box, on the Workspace tab, select AWSalesMD, and
then click Finish.
8. On the ribbon, on the Edit tab, in the Report Type drop-down list, click Stacked Bar Chart.
9. In the Workspace Browser pane, right-click Reseller Profit, and then click Save.
5. Drag Reseller Revenue to the left side of the editor in the area labeled Drop Items Here.
6. On the ribbon, on the Edit tab, click Update. The scorecard is updated with the information from
cube.
7. In the Details pane, expand Dimensions, and then drag Sales Territory to the right edge of the
Reseller Revenue cell.
8. In the Select Members dialog box, click All, and then click OK.
9. On the ribbon, on the Edit tab, click Update. Expand the All dimension member to see the results for
each sales territory.
2. In the Select a Dashboard Page Template dialog box, select 2 Rows, and then click OK.
5. In the Details pane, expand Reports, and then expand PerformancePoint Content.
6. Drag Reseller Profit to the top row in the Dashboard Content pane.
7. In the Details pane, expand Scorecards, and then expand PerformancePoint Content.
8. Drag Reseller Revenue Scores to the bottom row in the Dashboard Content pane.
9. In the Workspace Browser pane, right-click Untitled Workspace, and then click Save.
11. In the Workspace Browser pane, right-click Sales Dashboard, and then click Deploy to SharePoint.
12. In the Deploy To dialog box, expand Adventure Works BI Portal, select Dashboards, and then click
OK. The dashboard is uploaded to SharePoint Server and opened in a new tab in Internet Explorer.
13. In Internet Explorer, on the ribbon, on the Page tab, click Make Homepage. When prompted to
confirm the action, click OK.
14. Close the Internet Explorer tab that contains the dashboard, and in the remaining tab (which should
be displaying the Data Connections library), click the BROWSE tab, and then click the image above
the Quick Launch area to go to the site’s home page (which is now the dashboard you created).
15. In the dashboard page, click the Reseller Profit chart, and then hold the mouse over each colored
band in the chart to see the profit for each sales territory.
16. Click the band for the North America sales territory, and then view the profit for the countries in that
territory.
17. Move the mouse to the upper-right of the chart, click the drop-down arrow that appears, and then
click Reset View to return to the default chart view for all sales territories.
18. In the Reseller Revenue Scores area, expand the sales territory hierarchy to view the sales
performance in each territory.
Results: At the end of this exercise, you will have created four PerformancePoint reports on the
SharePoint site.
MCT USE ONLY. STUDENT USE PROHIBITED
L10-1
2. In Object Explorer, expand Management, right-click Data Collection, and then click Configure
Management Data Warehouse.
3. In the Configure Management Data Warehouse Wizard dialog box, click Next.
4. On the Select configuration task page, ensure that Create or upgrade a management data warehouse
is selected, and then click Next.
5. On the Configure Management Data Warehouse Storage page, next to the Database name list, click
New. In the New database dialog box, in the Database name box, type ManagementDW, and then
click OK.
6. On the Configure Management Data Warehouse Storage page, in the Database name list, ensure that
ManagementDW is selected, and then click Next.
7. On the Map Logins and Users page, in the Users mapped to this login list, select
ADVENTUREWORKS\ServiceAcct; in the Database role membership for ManagementDW list, select
mdw_admin, and then click Next.
8. On the Complete the Wizard page, click Finish. When configuration is complete, click Close.
9. Right-click Data Collection, and then click Configure Management Data Warehouse.
10. In the Configure Management Data Warehouse Wizard dialog box, click Next.
11. On the Select configuration task page, select Set up data collection, and then click Next.
12. On the Configure Management Data Warehouse Storage page, click the ellipses next to the
Server name box, and then connect to the MIA-SQLDW instance of the database engine by using
Windows authentication. After the connection is made, in the Database name list, select
ManagementDW, and then click Next.
13. On the Complete the Wizard page, click Finish. After configuration completes, click Close.
2. In Computer Management, in the pane on the left, expand Performance, expand Monitoring Tools,
and then click Performance Monitor.
4. In the Add Counters dialog box, in the list of objects, expand Processor and select % Processor
Time. In the Instances of selected object list, select _Total, and then click Add.
Note: If the list of instances is empty, click the counter again to refresh the view.
5. Repeat the previous step to add the following counters. Where multiple instances are available, add
the _Total instance:
6. In the Add Counters dialog box, click OK. Note that Performance Monitor displays the counter
values.
7. In the pane on the left, right-click Performance Monitor, point to New, and then click Data
Collector Set.
8. In the Create new Data Collector Set dialog box, change the name to Data Warehouse
Performance Counters, and then click Next. Note the default value for the root directory, and then
click Finish.
9. In the pane on the left, expand Data Collector Sets, expand User Defined, right-click Data
Warehouse Performance Counters, and then click Start. Minimize Computer Management.
11. In Solution Explorer, double-click LoadDW.dtsx. Note that this package executes two other packages
that load the fact tables in the data warehouse, and then it updates the extraction log and processes
the Analysis Services cube.
13. After execution of the LoadDW.dtsx package completes, click Stop Debugging on the Debug
menu. Close Visual Studio without saving any changes.
14. Maximize Computer Management, and in the pane on the left, right-click Data Warehouse
Performance Counters, and then click Stop.
15. Right-click Data Warehouse Performance Counters, and then click Latest Report. View the report,
which shows the performance counter values during the data warehouse load.
3. In SQL Server Profiler, on the File menu, click New Trace. When prompted, use Windows
authentication to connect to the MIA-SQLDW instance of the database engine.
4. In the Trace Properties dialog box, in the Trace name box, type Data Warehouse Query Workload.
7. On the Events Selection tab, in the list of events, expand TSQL, select SQL:BatchCompleted and
SQL:Stmnt:Completed, and then clear the Show all events check box.
8. Clear the check boxes in all columns other than the following:
o ApplicationName
o DatabaseName
o Duration
o EndTime
o LoginName
o Reads
o RowCounts
o SPID
o StartTime
o TextData
10. Click Column Filters, and in the Edit Filter dialog box, select DatabaseName, expand Like, type
%AWDataWarehouse%, and then click OK.
11. Click Run, and then minimize SQL Server Profiler.
12. In the D:\Labfiles\Lab10\Starter folder, double-click RunDWQueries.cmd. This executes a script that
runs queries in the data warehouse for over a minute.
13. After the script completes, maximize SQL Server Profiler, and on the File menu, click Stop Trace.
14. Review the trace, noting the TextData, Duration, and Reads values for the SQL:StmntCompleted
events where the value in the RowCounts column is over 100.
15. On the Tools menu, click Database Engine Tuning Advisor, and when prompted, use Windows
authentication to connect to the MIA-SQLDW instance of the database engine.
16. In the Database Engine Tuning Advisor, change the session name to Tune DW and ensure that under
Workload, File is selected. Click the Browse for a workload file button (this is a binoculars icon)
and open the Data Warehouse Query Workload.trc file in the D:\Labfiles\Lab10\Starter folder.
17. In the Database for workload analysis drop-down list, select AWDataWarehouse, and in the Select
databases and tables to tune list, select AWDataWarehouse.
18. On the Tuning Options tab, clear the Limit tuning time check box.
19. In the Physical Design Structures (PDS) to use in database section, select Indexes and indexed views;
in the Partitioning strategy to employ section, select Aligned partitioning; and in the Physical design
Structures (PDS) to keep in database section, select Keep aligned partitioning.
MCT USE ONLY. STUDENT USE PROHIBITED
L10-4 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
20. Click Start Analysis and wait for the analysis to complete.
21. Review the recommendations, which list suggested indexes and statistics to drop or create.
22. Close Database Engine Tuning Advisor and minimize SQL Server Profiler.
2. Expand Resource Governor, right-click Resource Pools, and then click New Resource Pool.
3. In the Resource Governor Properties dialog box, in the Resource pools section, add the following
resource pools.
Low Priority 0 50 0 50
High Priority 20 90 20 90
4. In the Resource Governor Properties dialog box, in the Resource pools section, select the Low
Priority resource pool you created in the previous step, and in the Workload groups for resource
pool: Low Priority section, add the following workload group.
Memory
CPU
Maximum Memory Grant Degree of
Name Importance Time
Requests Grant % Time-out Parallelism
(sec)
(sec)
User Low 10 50 50 20 1
Queries
5. In the Resource Governor Properties dialog box, in the Resource pools section, select the High
Priority resource pool you created in the previous step, and in the Workload groups for resource
pool: High Priority section, add the following workload group.
CPU Memory
Maximum Memory Degree of
Name Importance Time Grant Time-
Requests Grant % Parallelism
(sec) out (sec)
7. On the File menu, point to Open, and then click File. Browse to the D:\Labfiles\Lab10\Starter folder,
and then open Classifier Function.sql.
8. Click Execute to run the script, which creates a function named fn_classify_apps that returns the
string “User Queries” if the application name in the current session is “SQLCMD”; it returns “ETL” if the
current application is named “SQL Server”.
9. In Object Explorer, right-click Resource Governor, and then click Properties. In the Resource
Governor Properties dialog box, in the Classifier function name list, select
[dbo].[fn_classify_apps], and then click OK.
MCT USE ONLY. STUDENT USE PROHIBITED
L10-5
10. Minimize SQL Server Management Studio, and maximize Computer Management.
11. In Computer Management, in the pane on the left, if necessary, expand Performance and
Monitoring Tools, and then click Performance Monitor.
12. In the pane at the bottom, select each counter in turn, and on the toolbar, click Delete (the red X
icon) until there are no counters displayed.
14. In the Add Counters dialog box, in the list of objects, expand the SQLServer: Resource Pool Stats
object, and then click CPU control effect. % If the Instances of selected object list is empty, click
CPU control effect % again, and then click High Priority, hold the Ctrl key and click Low Priority,
and then click Add.
15. Repeat the previous step to add the following counters from the SQLServer:Workload Group Stats
object for the ETL and User Queries instances:
o CPU usage %
18. With the RunDWQueries.cmd command still running, in the D:\Labfiles\Lab10\Starter folder, double-
click RunETL.cmd to start an ETL workload. Observe the values of the counters in Performance
Monitor. Note that the CPU control effect % for both workloads increases as Resource Governor
prioritizes CPU resources for the ETL workload.
• collection_set_2_upload.
• collection_set_3_upload.
4. In Object Explorer, under Management, right-click Data Collection, point to Reports, point to
Management Data Warehouse, and then click Server Activity History.
5. Under the timeline, click the zoom in icon (a magnifying glass that contains a “+” symbol). Keep
zooming in to see activity for a shorter period of time.
6. In Object Explorer, under Management, right-click Data Collection, and then click Disable Data
Collection. After the action completes, click Close.
7. Keep SQL Server Management Studio open for the next exercise.
MCT USE ONLY. STUDENT USE PROHIBITED
L10-6 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
Results: At the end of this exercise, you will have a Performance Monitor report showing activity during
an ETL data load and recommendations from the Database Tuning Advisor based on a SQL Server
Profiler trace. You will also have created resource pools and workload groups for Resource Governor,
and generated server health data with the Data Collector.
MCT USE ONLY. STUDENT USE PROHIBITED
L10-7
2. In Object Explorer, right-click the MIA-SQLBI Analysis Services instance, and then click Restart. When
prompted to allow the program to make changes, click Yes, and when prompted to confirm the
restart action, click Yes. Wait for Analysis Services to restart.
2. When prompted, use Windows authentication to connect to the MIA-SQLBI instance of Analysis
Services.
3. In the Trace Properties dialog box, in the Trace name box, type Analysis Services Query Trace.
4. On the Events Selection tab, select Show all events, and then clear the Events check box in all rows
other than the following:
o Progress Report Begin.
o Query Begin.
o Query End.
5. Clear the Show all events check box, and then select Show all columns. Clear the selected check
boxes in all columns other than the following:
o EventSubclass
o TextData
o ApplicationName
o Duration
o DatabaseName
o ObjectName
o SPID
o CPUTime
7. Click Column Filters, and in the Edit Filter dialog box, select DatabaseName, expand Like, type
AWSalesMD, and then click OK.
2. In the pane at the bottom, select each counter in turn, and on the toolbar, click Delete (the red X
icon) until there are no counters displayed.
4. In the Add Counters dialog box, in the list of objects, expand the MSAS11: MDX object, click Total
cells calculated, click Add, and then click OK.
5. On the toolbar, in the Change graph type drop-down list, click Report. Total cells calculated
should currently have the value 0.000.
4. Maximize SQL Server Profiler, and on the File menu, click Stop Trace. Maximize Computer
Management, and on the toolbar, click Freeze Display.
5. In SQL Server Profiler, view the trace, and note the Duration value for the last Query Subcube event
(which represents the time spent retrieving the cube from the storage engine), and the Duration
value for the last Serialize Results End event (which represents the time spent manipulating the data
after it was retrieved from storage).
Note: These results indicate that the query spent significantly more time manipulating the data
than retrieving it from the storage engine, and a very large number of cells were calculated
during the execution of the query. The most appropriate way to improve the query performance
is to optimize the MDX and reduce the number of calculations being performed.
3. In SQL Server Profiler, on the File menu, click Run Trace. Minimize SQL Server Profiler.
4. In Computer Management, on the toolbar, click Unfeeze Display. If the Total cells calculated value
does not revert to 0.000, right-click the report, and then click Clear. Minimize Computer
Management.
5. Maximize SQL Server Management Studio, and on the File menu, point to Open, and then click File.
Browse to the D:\Labfiles\Lab10\Starter folder and open Revised MDX Query.mdx. If you are
prompted, use Windows authentication to connect to the MIA-SQLBI instance of Analysis Services.
8. Maximize SQL Server Profiler, and on the File menu, click Stop Trace. Maximize Computer
Management, and on the toolbar, click Freeze Display.
9. In SQL Server Profiler, view the trace and note the Duration value for the last Query Subcube event
(which represents the time spend retrieving the cube from the storage engine) and the Duration
value for the last Serialize Results End event (which represents the time spent in the formula
engine).
11. Close SQL Server Management Studio, and minimize SQL Server Profiler and Computer Management.
Note: The revised version of the query uses a WITH SET statement to sort the resellers by
revenue before applying the RANK function. This enables the query processor to use a linear hash
scan to find each reseller’s position in the ordered list, dramatically reducing the number of
calculations required to produce the results.
Results: At the end of this exercise, you will have created a SQL Server Profiler trace and used
Performance Monitor to view Analysis Services performance data while executing an MDX query.
MCT USE ONLY. STUDENT USE PROHIBITED
L10-10 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
4. Start Internet Explorer, and browse to the Adventure Works Portal at http://mia-
sqlbi/sites/adventureworks.
5. In the Quick Launch area, click Reports and verify that the report items have been deployed.
6. Click Data Sources, and then click the ellipsis to the right of AWDataWarehouse, and in the pop-up
window that appears, click Edit.
Password: Pa$$w0rd
10. Click Test Connection, and after the connection is created successfully, click OK. If the connection
fails, correct any errors in the settings and try again.
11. In the Quick Launch area, click Reports, and then click Reseller Sales to view the report.
12. At the top of the page, click Reports to return to the Reports folder.
2. In the pane at the bottom, select each counter in turn, and on the toolbar, click Delete (the red x
icon) until there are no counters displayed.
4. In the Add Counters dialog box, in the list of objects, expand the MSRS 2011 Web Service
SharePoint Mode object.
5. Click Total Cache Hits, hold the Ctrl key and click Total Cache Misses and Total Reports Executed,
click Add, and then click OK.
6. On the toolbar, in the Change graph type drop-down list, click Report.
Note: the current values of the three counters you added, and then minimize Computer Management.
2. When prompted, use Windows authentication to connect to the MIA-SQLDW instance of the
database engine.
3. In the Trace Properties dialog box, in the Trace name box, type Reporting Services Query Trace.
4. On the Events Selection tab, select Show all events, and then clear the Events check box in all rows
other than the following:
MCT USE ONLY. STUDENT USE PROHIBITED
L10-11
o RPC:Completed
o SQL:BatchCompleted
5. Clear the Show all events check box, and select Show all columns. Clear the selected check boxes in
all columns other than the following:
o TextData
o ApplicationName
o CPU
o Duration
o SPID
o StartTime
o BinaryData
o DatabaseName
7. Click Column Filters, and in the Edit Filter dialog box, select DatabaseName, expand Like, type
%AWDataWarehouse%, and then click OK.
2. Maximize Computer Management and note that the number of report executions has increased by
one, but that the total cache hits and total cache misses are unchanged.
3. Maximize SQL Server Profiler and note that the trace includes:
o Two SQL:BatchCompleted events, which record the execution of a Transact-SQL query used to
retrieve the e default values for the StartDate and EndDate parameters. The query was run once
when the report was first executed, and again to populate the available values lists after the
report rendered.
o An RPC:Completed event that records the use of the sp_executesql stored procedure to retrieve
the data for the report.
4. In Internet Explorer, in the Parameters pane, change the Start Date and End Date parameters so
that the report will show data for the past six months, and then click Apply.
5. Maximize Computer Management and note that the number of report executions has increased by
one, but that the total cache hits and total cache misses are unchanged.
6. Maximize SQL Server Profiler and note that the trace includes a new RPC:Completed event to
retrieve the data for the report with the modified parameter values. Then, on the File menu, click
Stop Trace.
2. Click the ellipsis to the right of Last Month, and in the pop-up window that appears, click the ellipsis
and click Manage Caching Options.
MCT USE ONLY. STUDENT USE PROHIBITED
L10-12 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
3. In the Manage Caching Options page, select Cache shared dataset. Ensure that On a custom schedule
is selected, and then click Configure.
4. In the Frequency section, select Month, and in the Schedule section, change the On calendar
day(s) value to 1, and the Start time value to 12:00, and then click OK.
6. Maximize SQL Server Profiler, and on the File menu, click Run Trace.
7. In Internet Explorer, in the Quick Launch area, click Reports, and then click Reseller Sales to view the
report with the default parameter values.
8. Maximize Computer Management and note the number of report executions has increased by one,
but that the total cache hits and total cache misses are unchanged.
9. Maximize SQL Server Profiler and note that the trace includes:
o One SQL:BatchCompleted events, which records the execution of a Transact-SQL query used to
retrieve the default values for the StartDate and EndDate parameters when the report was first
viewed. When the report was rendered, the cached dataset was used to populate the available
values lists.
o An RPC:Completed event that records the use of the sp_executesql stored procedure to retrieve
the data for the report.
10. In Internet Explorer, in the Parameters pane, change the Start Date and End Date parameters so
that the report will show data for the past six months, and then click Apply.
11. Maximize Computer Management and note that the number of report executions has increased by
one, but that the total cache hits and total cache misses are unchanged.
12. Maximize SQL Server Profiler and note that the trace includes a new RPC:Completed event to
retrieve the data for the report with the modified parameter values. Then, on the File menu, click
Stop Trace.
2. Click the ellipsis to the right of Reseller Sales, and in the pop-up window that appears, click the
ellipsis and click Manage Processing Options.
3. In the Manage Processing Options page, in the Data Refresh Options section, select Use cached
data. Ensure that Elasped time in minutes is selected with the value 30, and then click OK.
4. Maximize SQL Server Profiler, and on the File menu, click Run Trace.
5. In Internet Explorer, click Reseller Sales to view the report with the default parameter values.
6. Maximize Computer Management and note that the number of report executions has increased by
one. The number of cache misses has also increased because there was no cached copy of the report
with the default parameter values.
7. Maximize SQL Server Profiler and note that the trace includes an RPC:Completed event that records
the use of the sp_executesql stored procedure to retrieve the data for the report.
8. In Internet Explorer, in the Parameters pane, change the Start Date and End Date parameters so
that the report will show data for the past six months, and then click Apply.
9. Maximize Computer Management and note that the number of report executions has increased by
one. The number of cache misses has also increased because there was no cached copy of the report
with the custom parameter values.
MCT USE ONLY. STUDENT USE PROHIBITED
L10-13
10. Maximize SQL Server Profiler and note that the trace includes a new RPC:Completed event to
retrieve the data for the report with the modified parameter values.
11. In Internet Explorer, click the Reports link at the top of the page. Then click Reseller Sales to view
the report with the default parameter values again.
12. Maximize Computer Management and note that the number of report executions has increased by
one. The number of cache hits has also increased because the report was executed with the default
parameter values within the last 30 minutes, and could be retrieved from the cache.
13. Maximize SQL Server Profiler and note that the trace does not show any queries in the data
warehouse. All datasets were cached.
14. In Internet Explorer, in the Parameters pane, change the Start Date and End Date parameters so
that the report will show data for the past six months, and then click Apply.
15. Maximize Computer Management and note that the number of report executions has increased by
one. The number of cache hits has also increased because the report was executed with the same
custom parameter values within the last 30 minutes, and could be retrieved from the cache.
16. Maximize SQL Server Profiler and note that the trace does not show any queries in the data
warehouse. All datasets were cached. Then, on the File menu, click Stop Trace.
17. Close SQL Server Profiler, Computer Management and Internet Explorer.
Results: At the end of this exercise, you will have deployed Reporting Services items to a SharePoint
Server document library, and configured caching for a dataset and a report.
MCT USE ONLY. STUDENT USE PROHIBITED
MCT USE ONLY. STUDENT USE PROHIBITED
L11-1
2. In Object Explorer right-click Integration Services Catalogs and click Create Catalog.
3. In the Password box, type Pa$$w0rd, in the Retype Password box, type Pa$$w0rd, and then click
OK.
7. In the Folder description box, type Folder for the Adventure Works ETL SSIS Project, and click OK.
8. Expand the DW ETL folder, and then right-click the Projects folder and click Deploy Project.
9. In the Integration Services Deployment Wizard, on the Introduction page, click Next.
10. In the Integration Services Deployment Wizard, on the Select Source page, ensure that Project
deployment file is selected, and click Browse. Browse to the D:\Labfiles\Lab11\Starter folder and
double-click LoadPartition.ispac. Then click Next.
11. In the Integration Services Deployment Wizard, on the Select Destination page, ensure that the
Server name box contains the value MIA-SQLDW, and the Path box contains the value
/SSISDB/DW ETL/LoadPartition. Then click Next.
12. In the Integration Services Deployment Wizard, on the Review page, click Deploy.
5. In the Variables page, create the following variables and then click OK.
6. Repeat steps 1 to 5 to create a second environment named Production with the following variables:
10. In the Browse Environments dialog box, under Local Folder (DW ETL), click Test, and click OK.
11. Repeat the previous two steps to add a reference to the Production environment.
12. In the Configure – LoadPartition dialog box, in the Select a page pane, click Parameters.
13. On the Connection Managers tab, select the AWDataWarehouse connection manager, and click
the ellipsis (…) button for the Value of the ServerName property.
14. In the Set Parameter Value dialog box, select Use environment variable and in the drop-down list,
select DWServer. Then click OK.
15. Repeat the previous two steps to use the StagingServer environment variable for the ServerName
property of the Staging connection manager. Then, click OK.
16. Keep SQL Server Management Studio open for the next exercise.
Results: At the end of this exercise, you will have an SSIS catalog that contains environments named
Test and Production, and you will have deployed the LoadPartition SSIS project to the SSIS catalog.
MCT USE ONLY. STUDENT USE PROHIBITED
L11-3
2. Expand the dbo.DimSalesTerritory table, right-click Indexes, and click Rebuild All.
3. In the Rebuild Indexes dialog box, in the Script drop-down list, click Script Action to New Query
Window. Then, click Cancel to close the dialog box without rebuilding the indexes.
4. In the SQLQuery1.sql window, modify all three of the ALTER INDEX statements to change the
ONLINE = OFF clause to ONLINE = ON.
5. At the end of the script, after the last GO statement, add the following Transact-SQL code:
EXEC sp_updatestats
GO
6. On the File menu, click Save SQLQuery1.sql. Save the script as Rebuild Sales Territory Indexes.sql
in the D:\Labfiles\Lab11\Starter folder.
2. When prompted, use Windows authentication to connect to the MIA-SQLBI instance of Analysis
Services.
3. In Object Explorer, under the MIA-SQLBI Analysis Services server, expand Databases, expand
AWSalesMD, and expand Cubes.
4. Right-click Sales, and click Process.
5. In the Process Cube – Sales dialog box, in the Script drop-down list, click Script Action to File. Save
the script as Process Sales Cube.xmla in the D:\Labfiles\Lab11\Starter folder. Then, in the Process
Cube – Sales dialog box, click Cancel to close it without processing the cube.
3. In the New Job dialog box, in the Name box, type Data Warehouse Load.
4. In the New Job dialog box, on the Steps page, click New.
5. In the New Job Step dialog box, in the Step Name box, type Run ETL Package, and in the Type drop-
down list, click SQL Server Integration Services Package.
6. On the Package tab, in the Server drop-down list, type MIA-SQLDW. Then click the ellipsis (…)
button next to the Package box.
7. In the Select an SSIS Package dialog box, expand SSISDB, expand DW ETL, expand LoadPartition,
select LoadDW.dtsx, and in the Select an SSIS Package dialog box, click OK.
8. In the New Job Step dialog box, on the Configuration tab, select the Environment check box, and
in the drop-down list, select .\Test. Then click OK.
9. In the New Job dialog box, on the Steps page, click New.
10. In the New Job Step dialog box, in the Step Name box, type Update Indexes. Ensure that Type is
set to Transact-SQL script (T-SQL), and in the Database drop-down list, select AWDataWarehouse.
MCT USE ONLY. STUDENT USE PROHIBITED
L11-4 Designing Business Intelligence Solutions with Microsoft SQL Server 2012
11. Under Command, click Open, then browse to the D:\Labfiles\Lab11\Starter folder, select Rebuild
Sales Territory Indexes.sql, and click Open.
12. In the New Job Step dialog box, verify that the Transact-SQL script you edited earlier is displayed,
and then click OK.
13. In the New Job dialog box, on the Steps page, click New.
14. In the New Job Step dialog box, in the Step Name box, type Process Cube, and in the Type drop-
down list, click SQL Server Analysis Services Command.
17. In the New Job Step dialog box, verify that an XMLA script to process the Sales cube is displayed,
and then click OK.
18. In the New Job dialog box, on the Schedules page, click New.
19. In the New Job Schedule dialog box, in the Name box, type Monthly Data Load. In the Frequency
section, in the Occurs drop-down list, select Monthly, and review the default settings to verify that
the schedule will run at 12:00 on the first day of every month. Then click OK.
21. Keep SQL Server Management Studio open for the next exercise.
Results: At the end of this exercise, you will have a SQL Server Agent job named Data Warehouse
Load.
MCT USE ONLY. STUDENT USE PROHIBITED
L11-5
2. In the Log File Viewer – MIA-SQLDW dialog box, expand the first entry and verify that all three
steps were completed successfully. Then click Close.
2. Notice that the most recent package execution succeeded, and then click Overview.
3. In the Overview report, in the Parameters Used table, verify that the values used for the
AWDataWarehouse.ServerName and Staging.ServerName parameters were the values you
specified in the Test environment.
5. In the Messages report, view the messages that were logged during package execution. Then, click
View Overview to return to the overview report.
7. In the Execution Performance report, note the duration of the package execution.
Results: At the end of this exercise, you will have executed a job, reviewed job history, and reviewed SSIS
catalog reports.