Sei sulla pagina 1di 60

Lab 09

Cleansing Data with SQL


Server 2016 Data Quality
Services
Overview
The estimated time to complete this lab is 90 minutes
In this lab you will cleanse data with SQL Server 2016 Data Quality Services (DQS) and Integration
Services (SSIS).
This will involve creating a knowledge base and creating and configuring domains. You will then
perform knowledge discovery to add trusted knowledge to the knowledge base domains. Then, you
will develop an SSIS package to cleanse data before loading results into a SQL Server table.
You will learn how to:

• Create a DQS knowledge base

• Create and configuration domains

• Monitor DQS activity

• Perform knowledge discovery

• Cleanse data in an SSIS package design

Cleansing Data with SQL Server 2016 Data Quality Services 2


Connecting to the Virtual Machine
In this exercise, you will use the lab hosting portal to connect to the virtual machine, and optimize the
virtual machine environment for your language and location.

Signing In
In this task, you will sign in to the virtual machine.
1. To sign in to the virtual machine, using the portal menu, click Commands, and then select
Ctrl + Alt + Delete.

2. In the password box, enter Pass@word1 (do not enter the period), and then click Submit.

If you are not using a US English keyboard, the password you enter may not be correctly received
by the virtual machine. You must complete the following task to sign in, and then update the virtual
machine language.

Note: For lab users with English keyboards, if the @ symbol is above the 2, then your keyboard is
a US English keyboard, and you should not complete the following task.

Cleansing Data with SQL Server 2016 Data Quality Services 3


Updating the Virtual Machine Language
In this task, you will sign in to the virtual machine by using the on-screen keyboard, and then update
the virtual machine language. This is important to ensure that your keyboard characters are correctly
received by the virtual machine.
1. Located at the bottom-left of the virtual machine screen, click Ease of Access, and then select
On-Screen Keyboard.

2. Use the on-screen keyboard to enter the password Pass@word1. (Do not enter the period.)
Tip: To reveal the input password before submitting, click the following.

3. Submit the password.


4. Once signed in, to add a new language, on the taskbar, click the Windows button.

Cleansing Data with SQL Server 2016 Data Quality Services 4


5. In the Start screen, click the Control Panel tile.

6. In the Control Panel window, from inside the Clock, Language, and Region group, click
Change Input Methods.

7. In the Language window, click Add a Language.

8. In the Add Languages window, locate and select your language, and then click Add (or Open).
If the selected language has regional variants, you will be directed to the Regional Variants
window, in which case, select a variant, and then click Add.

9. Close the Language window.


10. In the taskbar, click ENG, and then select your language.

Cleansing Data with SQL Server 2016 Data Quality Services 5


Changing the Screen Resolution (Optional)
It is recommended that you change the virtual machine screen resolution to take full advantage of your
screen size.
1. Using the portal menu, click Display, and then select Full Screen.

2. First, notice that the portal left pane can be resized.

3. Resize the width of the pane, with the aim of arriving at a minimum width that allows the font size
of the content (lab manual) to remain easily readable.
Tip: Less width occupied by the pane allows for more room for the virtual machine screen.

4. In the virtual machine screen, right-click the desktop, and then select Screen Resolution.
5. In the Screen Resolution window, in the Resolution dropdown list, select a higher resolution.
1024 x 768 is the recommended minimum, but use a higher resolution if this fits your screen size.

6. Click Apply.

Cleansing Data with SQL Server 2016 Data Quality Services 6


7. If the entire virtual machine screen is fully visible within the portal, click Keep Changes, otherwise,
click Revert, and try a different resolution.

Changing the Clock (Optional)


Changing the clock is for your convenience only, and therefore this task is optional.
1. In the taskbar, click the clock, and then select Change Date and Time Settings.
2. In the Date and Time window, click Change Time Zone.
3. In the Time Zone Settings window, in the Time Zone dropdown list, select your time zone.
4. Click OK.
5. In the Date and Time window, click OK.

Ending the Lab Session


In this task, you will explore how to end the lab session—when you are ready to do so.
1. At the top right-corner of the portal, click Exit, and then select End Lab.

2. When prompted to end the lab, click Yes, End My Lab.

Cleansing Data with SQL Server 2016 Data Quality Services 7


3. Once redirected to the evaluation form, please take a few moments to complete and submit your
evaluation of this lab—your feedback assists us to deliver a great lab experience.

You are now ready to commence the lab.

Cleansing Data with SQL Server 2016 Data Quality Services 8


Creating a Knowledge Base
In this exercise, you will create a knowledge base to address many of the data quality requirements for
the office dataset.

Setting Up
In this task, you will setup the lab database required to complete this lab.
1. To open File Explorer, on the taskbar, click the File Explorer shortcut.

2. In File Explorer, navigate to D:\SQLServer2016BI\Lab09\Assets folder.


3. Right-click the Setup.cmd file, and then select Run as Administrator.

4. In the Command window, when prompted to press any key to continue, press any key.

Creating the Knowledge Base


In this task, you will create the knowledge base by using the domain management activity.
1. To open Data Quality Client, on the taskbar, click the Data Quality Client shortcut.

2. In the Connect to Server window, click Connect.

Cleansing Data with SQL Server 2016 Data Quality Services 9


3. To create a new knowledge base, in the Knowledge Base Management panel, click
New Knowledge Base.

4. In the Name box, enter Office.

Important: When naming objects in this lab, be sure to enter the names exactly as the lab
describes. Incorrect name values may result in errors later in the lab.

5. In the lower pane, notice that the Domain Management activity is selected.

Cleansing Data with SQL Server 2016 Data Quality Services 10


6. Click Next.

Creating Domains
In this task, you will create the knowledge base domains.
1. To create a domain, click Create a Domain.
Tip: In Data Quality Client, commands are available either as icons, or right-click context menus.
To determine what an icon does, hover the cursor over it to reveal a tooltip.

2. In the Create Domain window, in the Domain Name box, enter Office.

There are many domain properties that can also be set when creating the domain, and these can
be modified at any time during domain management.

3. Click OK.

Cleansing Data with SQL Server 2016 Data Quality Services 11


4. Create the following additional 12 domains.
Reminder: Take care to name the domains exactly as the lab describes.

• District

• Address1

• Address2

• City

• StateOrProvince

• PostalCode

• Country

• Phone

• ManagerFirstName

• ManagerLastName

• ManagerTitle

• ManagerEmail
5. Verify that you have 13 domains.
6. Select the Address1 domain.
Important: It is a common mistake to configure the wrong domain, which later involves determining
which domain to undo (there is no Ctrl-Z to undo). Always take care to select the correct domain
before configuring it.

7. In the Domain Properties tab, uncheck Enable Speller.

Cleansing Data with SQL Server 2016 Data Quality Services 12


8. Configure the following additional domain properties.

Domain Action
Address2 Enable Speller: Uncheck
StateOrProvince Format Output to: Upper Case
Enable Speller: Uncheck
ManagerEmail Format Output to: Lower Case
Enable Speller: Uncheck

Creating Domains
In this task, you will configure domain values and define a synonym.
1. Select the Office domain.
2. Select the Domain Values tab.

3. Notice that the domain values already includes the DQS_NULL value.
All domains include this value, and it cannot be deleted.

4. Set the DQS_NULL value to Invalid.

This configuring ensures that missing Office values will result in an invalid record.

5. Repeat the last step to set the DQS_NULL value to Invalid for the following additional domains:

• District

• Address1

• City

• PostalCode

• StateOrProvince

• Country

• Phone

Cleansing Data with SQL Server 2016 Data Quality Services 13


6. Select the Country domain.
7. To add a domain value, click Add New Domain Value.

8. In the new row added to the domain value grid, enter Canada.

9. Press Enter.
10. Notice that the domain value is set to type Correct (green check mark).
11. Add two additional domain values:

• United States

• US
12. In the grid, notice that domain values sort alphabetically, and that new domain values added
during the activity are adorned with a yellow star.

13. To define synonyms, first select the United States domain value, and then while pressing the
Control key, select the US domain value.
14. Right-click the selection, and then select Set as Synonyms.

Cleansing Data with SQL Server 2016 Data Quality Services 14


15. Notice the arrangement of domain values, with the United States domain value as the leading
value.
While US is regarded as correct domain value, it will be corrected to the leading value.

Configuring Domain Rules


In this task, you will configure domain rules.
1. Select the ManagerFirstName domain.
2. Select the Domain Rules tab.

3. To add a domain rule, click Add a New Domain Rule.

4. In the domain rule grid, in the Name box, enter Not an initial.

It is important to be clear—yet concise—when defining a rule name, as it will be output as the


reason when data cannot conform to the rule.

5. In the Build a Rule section, modify the operator to Length is Greater Than or Equal to, and then
in the corresponding box, enter 2.

Cleansing Data with SQL Server 2016 Data Quality Services 15


6. To test the domain rule, click Run the Selected Domain Rule on Test Data.

7. In the Test Domain Rule window, click Adds a New Testing Term for the Domain Rule.

8. In the ManagerFirstName box, enter R.

9. Add a second testing term with the value Robert.

10. Click Test the Domain Rule On All the Terms.

Cleansing Data with SQL Server 2016 Data Quality Services 16


11. Verify that the value R is invalid, while the value Rob is correct.

12. Click Close.

13. Select the ManagerLastName domain.


14. Repeat the steps in this task to create the Not an initial domain rule.
15. Select the Phone domain.
16. Create a domain rule named Valid phone format, and configure the following rule logic.
For your convenience and accuracy, you can copy the pattern from the
D:\SQLServer2016BI\Lab09\Assets\Snippets.txt file.

17. Test the domain rule with the following terms:

• 800 123 4567

• 800 123-4567

• (800) 123-4567

Cleansing Data with SQL Server 2016 Data Quality Services 17


18. Verify that the only the final term is correct.

19. Select the ManagerEmail domain.


20. Create a domain rule named Valid email address, and configure the following rule logic.
For your convenience and accuracy, you can copy the regular expression from the
D:\SQLServer2016BI\Lab09\Assets\Snippets.txt file.

21. Note that this domain rule only tests valid email addresses, and not the additional requirement that
the email address must belong to a particular domain.
22. To add a new condition, click Add a New Condition to the Selected Clause.

Cleansing Data with SQL Server 2016 Data Quality Services 18


23. Complete the configuration of the rule logic on the following.
For your convenience and accuracy, you can copy the string from
D:\SQLServer2016BI\Lab09\Assets\Snippets.txt.

24. Test the domain rule with the following terms:

• rob@hotmail.com

• rob@@lab.microsoft.com

• rob@lab.microsoft.com
25. Verify that the only the final term is correct.

Cleansing Data with SQL Server 2016 Data Quality Services 19


26. Select the PostalCode domain.
27. Create a domain rule named Valid postal code format, and configure the following rule logic.
For your convenience and accuracy, you can copy the two regular expressions from the
D:\SQLServer2016BI\Lab09\Assets\Snippets.txt.

The first regular expression validates a US postal code (ZIP Code) allowing also for the Zip+4
Code format. The second regular expression validates a Canadian postal code, requiring a space
at the fourth character.

28. To modify the operator, to the right of the AND operator, click the down-arrow, and then select
OR.

29. Verify that the domain rule looks like the following.

Cleansing Data with SQL Server 2016 Data Quality Services 20


30. Test the rule with the following terms:

• 1234

• 12345

• 12345-123

• 12345-1234

• A1A1A1

• A1A 1A1 (the fourth character is a space)


31. Verify that the only the second, fourth and last terms are correct.

Configuring a Term-Based Relation


In this task, you will configure a term-based relation.
1. Select the District domain.
2. Select the Term-Based Relations tab.

3. To add a term-based relation, click Add New Relation.

4. In the term-based relation grid, in the Value box, enter Distr. (include the period).

Cleansing Data with SQL Server 2016 Data Quality Services 21


5. In the Correct To box, enter District (do not include a period).

This relation will ensure all abbreviated instances will be corrected to the full name.

Configuring a Composite Domain


In this task, you will configure a composite domain to be composed of the address-related domains.
1. In the left pane, click Create a Composite Domain.

2. In the Create a Composite Domain window, in the Composite Domain Name box, enter
Address.

3. In the Domains List, select Address1.

Cleansing Data with SQL Server 2016 Data Quality Services 22


4. To add the domain to the composite domain, click the right-arrow.

5. Add the following domains also to the composite domain, ensuring that they are added in the
order listed.
Tip: You can double-click each domain to add it to the list, and you can also multi-select items in
order by pressing the Control key, and then add them by clicking the right-arrow.

• Address2

• City

• StateOrProvince

• PostalCode

• Country
6. Verify that the Domains in Composite Domain list includes the following six domains, in the
order presented.

7. Click OK.

Cleansing Data with SQL Server 2016 Data Quality Services 23


8. In the left pane, notice the addition of the composite domain, and that it is adorned with a different
icon.

Configuring a Composite Domain Rule


In this task, you will configure a composite domain rule to correct StateOrProvince values for the city
of Vancouver.
1. Ensure that the Address composite domain is selected.
2. Select the CD Rules tab.

3. To add a composite domain rule, click Add a New Domain Rule.

4. In the cross-domain rules grid, in the Name box, enter Vancouver CA.

5. In the Build a Rule section, configure the following rule logic.

Cleansing Data with SQL Server 2016 Data Quality Services 24


6. To add a clause, right-click inside a blank area of the Build a Rule section, and then select
Add Clause.

7. Complete the configuration of the rule logic based on the following.

Cleansing Data with SQL Server 2016 Data Quality Services 25


8. In the Then section, configure the following.

This configuration is referred to as a definitive cross-domain rule. A definitive cross-domain rule is


one that uses the Values is Equal to operator in the Then logic, and it is able to correct values,
rather than just validate values.

The value BC will be added to the StateOrProvince domain values as a result of configuring this
rule.

9. Test the cross-domain rule with the following terms.

10. Verify that the first term would be corrected to BC.

Cleansing Data with SQL Server 2016 Data Quality Services 26


11. Create a second cross-domain rule named Vancouver US, and configure the following rule logic.

The value WA will be added to the StateOrProvince domain values as a result of configuring this
rule.

12. Test the cross-domain rule with the following terms.

13. Verify that the second term would be corrected to WA.

14. To complete the domain management activity, click Finish.

Cleansing Data with SQL Server 2016 Data Quality Services 27


15. When prompted to publish the knowledge base, click No.

The knowledge base is not yet ready to applied to a cleansing activity. You will continue to
enhance the knowledge base with knowledge discovery activities in the next exercise.

Reviewing Knowledge Base Status


In this task, you will review the status of the knowledge base.
1. To open a knowledge base, in the Knowledge Base Management panel, click
Open Knowledge Base.

2. In the grid, notice that the Office knowledge base is locked, and has the state In Work.
The knowledge base cannot be used until it is unlocked. You will unlock the knowledge base
when you publish it in the next exercise.

3. Click Cancel.

Cleansing Data with SQL Server 2016 Data Quality Services 28


Monitoring DQS Activity
In this task, you will monitor the DQS activity.
1. To monitor activity, in the Administration panel, click Activity Monitoring.

2. Notice the first listed activity is the one you just completed.
Every DQS activity undertaken is logged and remains available for review and audit.

3. Click Close.

Cleansing Data with SQL Server 2016 Data Quality Services 29


Performing Knowledge Discovery
In this exercise, you will perform knowledge discovery to add domain values to the knowledge base.

Adding Trusted Knowledge


In this task, you will add trusted state and province codes to the StateOrProvince domain values.
This trusted knowledge was acquired from the US and Canadian postal authorities.
1. To perform knowledge discovery, in the Knowledge Base Management panel, click the Office
knowledge base, and then select the Knowledge Discovery activity.

2. Notice that step 1 of the activity is to map to external data containing knowledge.

3. In the Database dropdown list, select Lab-DQS.


4. In the Table/View dropdown list, select Reference_CA_ProvinceOrTerritoryCode.

Cleansing Data with SQL Server 2016 Data Quality Services 30


5. At the top-right corner of the Mappings grid, click Preview Data Source.

6. Review the source data, and then click Close.

7. In the Mappings grid, in the first row, in the Source Column column, select the
ProvinceOrTerritoryCode column.
8. In the corresponding Domain column, select the StateOrProvince domain.

9. To proceed to the next step, click Next.

10. Notice that step 2 of the activity is to discover knowledge from the source.

11. Click Start.

Cleansing Data with SQL Server 2016 Data Quality Services 31


12. When the discovery analysis has completed, review the source statistics in the Profiler pane.

13. Note that 13 unique values were detected, of which 12 are new values for the domain.
In the previous exercise, when you added the cross-domain rules, both BC and WA were added
to the domain values. BC (British Colombia) was included in the source data, but not added to the
domain values as it already exists.

14. To proceed to the next step, click Next.


15. Notice that step 3 of the activity is to manage domain values.

16. Review the list of domain values, and notice that this is a list of what has been added in this
activity.
17. To reveal all domain values, uncheck the Show Only New checkbox.

18. To complete the knowledge discovery process, click Finish.

19. Do not publish the knowledge base.

Cleansing Data with SQL Server 2016 Data Quality Services 32


20. Repeat the steps in this task to perform another knowledge discovery activity, this time sourcing
data from the Reference_US_StateCode table.

21. Do not publish the knowledge base.

Adding Additional Knowledge


In this task, you will add knowledge sourced from the Office dataset. There are known issues with this
data, and so judgement will need to be applied to ensure domain values are appropriately added.
1. Perform knowledge discovery a third time on the Office knowledge base.
2. Source data from Lab-DQS database, and the MSFTOffice_NorthAmerica table.

3. Map only the following four source columns to their respective domains.

The rationale for performing knowledge discovery for the StateOrProvince domain is to detect
and appropriately configure anomalies.

4. Notice that domains that can be cleansed by domain rules (i.e. Phone and ManagerEmail) are
not included in this knowledge discovery activity. Some domains do not need to have possible
values stored as domain values.
5. Proceed to the discovery step, and start the discovery process.

Cleansing Data with SQL Server 2016 Data Quality Services 33


6. Review the profiler statistics.
7. In the profiler grid, review the statistics at domain level, and also hover the cursor Completeness
bar.
8. Notice that the District and Country domains have a small proportion of missing values.

9. For the Country domain, notice the notification icon in the New column.

10. Hover the cursor over the notification icon to reveal a tooltip describing a possible issue.
You can ignore the issue in this lab.

11. Proceed to the domain management step.


12. In the left pane, select the Office domain.
13. Notice the domain value Ausstin, TX has a red squiggly.
As this domain has spelling enabled, DQS used its dictionary to check spelling.

14. Right-click the Ausstin, TX text, and then select the correct spelling suggestion: Austin, TX.

Cleansing Data with SQL Server 2016 Data Quality Services 34


15. Notice that the correction has assigned the domain value as an error, and corrected it to a new
domain value.
The knowledge base now understands how to correct any instance of this misspelled office.

16. Scroll down the list to locate the Lehi, UT office domain value (which is, in fact, correctly spelled).

17. Right-click the Lehi, UT domain value, and then select Add to Dictionary.
18. Notice that the red squiggly has been removed.
19. Show all domain values.
It is useful to show all values when managing synonyms that may involve existing members.

20. Locate the two adjacent domain values for New York.

21. Multi-select the two domain values, right-click the selection, and then select Set as Synonyms.
22. Ensure that New York, NY is the leading value.

Cleansing Data with SQL Server 2016 Data Quality Services 35


23. Select the District domain.
There is no need to be concerned about the Distr. abbreviation used in these domain values, as
in the previous exercise you configured a term-based relation to replace any instances of the
abbreviation.

24. Use the dictionary to correct the Midwesst Distr. domain value to Midwest Distr.
25. Show all domain values, and notice how the misspelled Midwest domain value corrects to an
existing domain value.

26. Set the Greater South East District member to Error.

27. In the adjacent Correct to box, enter the correct domain value, Greater Southeast District, and
then press Enter.
28. Notice how the error domain value relates to the correct domain value.

29. Correct also the Mid-Atlantic Dist. value to the Mid Atlantic District value.

30. Select the StateOrProvince domain.

Cleansing Data with SQL Server 2016 Data Quality Services 36


31. Correct each of the five new domain values, based on the following.

32. Show all domain values, and notice how the corrections relate to an existing domain values.
33. Select the Country domain.

34. Correct the CA domain value to the Canada domain value.


35. Notice that DQS used its dictionary to correct the misspelled United States domain value.
36. Show all domain values, and notice how the corrected domain values relate to the existing domain
values.
37. Finish the knowledge discovery activity, and publish the knowledge base.

38. When notified that the knowledge base has been published, click OK.

You will use the knowledge base in the next exercise to cleanse the Office dataset.

39. Review the knowledge base status, and notice that it is no longer locked, and has not state (i.e. it
is open).
40. Review the activity monitoring, and notice the three knowledge discovery activities you have just
completed.

Cleansing Data with SQL Server 2016 Data Quality Services 37


Cleansing Data with Integration Services
In this exercise, you will cleanse the Office dataset with Integration Services by using the knowledge
base created in the previous exercises.

Creating the DQS Connection Manager


In this task, you will open the SSIS project, and then create a DQS connection manager.
1. To open Visual Studio, on the desktop taskbar, click the Visual Studio 2015 shortcut.

2. To open an existing project, on the File menu, select Open | Project/Solution.


3. In the Open Project window, navigate to the D:\SQLServer2016BI\Lab09\Assets\Project folder.
4. Select Lab-DQS.sln, and then click Open.
5. In the Project Password window, in the Password box, enter Pass@word1. (Do not enter the
period.)
6. Click OK.

7. Notice that the project consists of a single connection manager, which is used to connect to the
Lab-DQS database.

Cleansing Data with SQL Server 2016 Data Quality Services 38


8. To create an additional connection manager, in Solution Explorer, right-click the
Connection Managers folder, and then select New Connection Manager.

9. In the Add SSIS Connection Manager window, select the DQS connection manager type.

10. Click Add.

11. In the Add DQS Cleansing Connection Manager window, in the Server Name dropdown list—
do not click the dropdown arrow—enter localhost.

Cleansing Data with SQL Server 2016 Data Quality Services 39


12. Click OK.

13. In Solution Explorer, notice the addition of the connection manager.

Creating the Package


In this task, you will create a package designed to load cleansed office records into a table which
represents a data warehouse dimension table. Records that cannot be cleansed will be loaded to an
alternate table that can be analyzed by a data steward.
1. In Solution Explorer, right-click the SSIS Packages folder, and then select New SSIS Package.
2. Notice that the package designer opens automatically.
3. To rename the package, in Solution Explorer, right-click the Package1.dtsx file, and then select
Rename.
4. Rename the package to Load DimOffice.dtsx, and then press Enter.

Cleansing Data with SQL Server 2016 Data Quality Services 40


Developing the Data Flow
In this task, you will develop a data flow to extract, transform and load (ETL) the office dataset. The
transform process will cleanse the data by using the Office knowledge base.
The output of the cleansing will be split into correct and invalid outputs. Correct data will be loaded into
the DimOffice table, and invalid data will be loaded into the DimOffice_Error table.
1. Select the Data Flow tab.

2. To add a data flow task, click the link located at the center of the designer.

3. To open the toolbox, on the SSIS menu, select SSIS Toolbox.


4. To design the data flow, from the SSIS Toolbox (located at the left), expand Other Sources, and
then drag the ADO NET Source to the data flow designer.

Cleansing Data with SQL Server 2016 Data Quality Services 41


5. In the Properties pane, set the Name property to Office Dataset.

6. Verify that the data flow component looks like the following.

Do not be concerned about the error icon, which will disappear when you complete the next steps.

7. To edit the source component, right-click the component, and then select Edit.
8. In the ADO.NET Source Editor window, in the ADO.NET Connection Manager dropdown list,
notice that the localhost.Lab-DQS connection manager is selected.
9. In the Name of the Table or the View dropdown list, select
"dbo"."MSFTOffice_NorthAmerica".

Cleansing Data with SQL Server 2016 Data Quality Services 42


10. Click OK.

11. From the SSIS Toolbox, expand Other Transforms, and then drag the DQS Cleansing to the
data flow designer, and drop it directly beneath the source component.

12. Verify that the data flow design looks like the following.

13. To connect the components, first select the Office Dataset source component, and then drag the
standard output (the left, blue arrow) on top of the cleansing component.

Cleansing Data with SQL Server 2016 Data Quality Services 43


14. Verify that the data flow design looks like the following.

15. To edit the cleansing component, right-click the component, and then select Edit.
16. In the DQS Cleansing Transform Editor window, in the Data Quality Connection Manager
dropdown list, select the DQS Cleansing Connection Manager.localhost connection manager.
17. In the Data Quality Knowledge Base dropdown list, select Office.

18. In the Available Domains list, review the knowledge base domains, noticing that the first listed in
the composite domain.
You will not use the composite domain to cleanse that data in this package design.

19. Select the Mapping tab.

20. Notice the Available Input Columns grid.


This grid lists of input columns received from the source component.

21. To select all input columns, check the top-right checkbox.

22. Notice the second grid that defines the mapping between input columns and the knowledge base
domains.
It also defines alias output columns for the source, output and status columns.

Cleansing Data with SQL Server 2016 Data Quality Services 44


23. Set the Office input column to map to the Office domain.

24. Map each input column to its respective domain—do not map the Address composite domain.

25. Select the Advanced tab.

26. Review the available options.


27. Notice that the Standardize Output checkbox is selected.

For your knowledge base, this will mean that StateOrProvince values will be set to upper case,
and ManagerEmail values will be set to lower case.

Cleansing Data with SQL Server 2016 Data Quality Services 45


28. Check the Reason checkbox.

The reason needs to be output to help explain why values are invalid.

29. To complete the component configuration, click OK.

30. From the SSIS Toolbox, from inside the Common group, drag the Conditional Split to the data
flow designer, and drop it directly beneath the cleansing component.

31. Configure the standard output of the cleansing component to connect to the new component, as
follows.

Cleansing Data with SQL Server 2016 Data Quality Services 46


32. Edit the conditional split component.
33. In the grid, in the Output Name box, enter Invalid.

34. In the top-right pane, expand the Columns folder.

35. Scroll to the bottom of the columns list, and then drag the Record Status column into the
Condition box.

36. In the Condition box, complete the expression as follows (note that the operator is two equals (=)
signs, which tests for equality).

[Record Status] == "Invalid"

37. Verify that the expression looks like the following.

Any record with an invalid record status will be output to the Invalid output.

Cleansing Data with SQL Server 2016 Data Quality Services 47


38. In the Default Output Name box, replace the text with Correct.

All remaining records will be output to the Correct output.

39. To complete the component configuration, click OK.

40. From the SSIS Toolbox, expand Other Destinations (the last group), and then drag the
ADO NET Destination to the data flow designer, and drop it beneath, and to the left of, the
conditional split component
41. In the Properties pane, set the Name property to DimOffice.

42. Configure the standard output of the conditional split component to connect to the new
component.
43. In the Input Output Selection window, in the Output dropdown list, select Correct.

Cleansing Data with SQL Server 2016 Data Quality Services 48


44. Click OK.
45. Verify that the data flow design looks like the following.

46. Edit the DimOffice destination component.


47. In the ADO.NET Destination Editor window, in the Connection Manager dropdown list, notice
that the localhost.Lab-DQS connection manager is selected.
48. In the Use a Table or View dropdown list, select "dbo"."DimOffice".

49. In the left pane, select the Mappings page.

This page of the editor is used to configure the mappings between the input columns, and the
columns of the DimOffice table.

Cleansing Data with SQL Server 2016 Data Quality Services 49


50. To widen the list, drag the right edge of the Available Input Columns list, and drag open the
Name column to reveal the full column names.

51. From the Available Input Columns list, drag the Office_Output column to the Office columns of
the Available Destination Columns list.

There is no need to map to the OfficeKey column, as this is an identity column that will
automatically populate a sequence of values when rows are inserted into the table.

The source columns will contain original values, while the output columns will contain
standardized column (i.e. lower case email addresses), so you will map only the output columns.

There is no need to store other column types as the rows passed to this destination are only
correct records. Status columns will only ever be Correct or Corrected.

Cleansing Data with SQL Server 2016 Data Quality Services 50


52. Verify that the mapping was created.

53. Map all "_Output" columns to the destination columns—except OfficeKey.


Tip: You can also configure the mappings by selecting the input columns in the lower grid.

54. Verify that all "_Output" columns are correctly mapped.

55. To complete the component configuration, click OK.

Cleansing Data with SQL Server 2016 Data Quality Services 51


56. Add a second ADO.NET destination component, and then rename it DimOffice_Error.
57. Connect the conditional split component to the new destination component.
58. Verify that the data flow design looks like the following.

59. Edit the DimOffice_Error destination component.


60. In the ADO.NET Destination Editor window, in the Connection Manager dropdown list, notice
that the localhost.Lab-DQS connection manager is selected.
61. In the Use a Table or View dropdown list, select "dbo"."DimOffice_Error".

62. Select the Mappings page.


63. Notice that the mappings to this table are automatically created.
Mappings are created automatically when there are matching column names and data types
between the two tables.

As this table will be used to analyze data quality issues, all output columns will be stored.

Cleansing Data with SQL Server 2016 Data Quality Services 52


64. To complete the component configuration, click OK.

Executing the Package


In this task, you will execute the package and observe the data flow execution.
1. To execute the package, in Solution Explorer, right-click the Load DimOffice.dtsx package, and
then select Execute Package.

2. Review the row count statistics for each component output.


3. Note the following:

• 60 correct rows were loaded into the DimOffice table

• 10 invalid rows were loaded to the DimOffice_Error table


4. To stop the package debugging, on the Debug menu, select Stop Debugging.
5. To close Visual Studio, on the File menu, select Exit.

Cleansing Data with SQL Server 2016 Data Quality Services 53


Reviewing Activity Monitoring
In this task, you will review the activity monitoring.
1. Switch to Data Quality Client.
2. To monitor activity, in the Administration panel, click Activity Monitoring.

3. To sort the activities by descending order, in the activity grid, click the ID column header twice.
4. Notice the first listed activity is a SSIS Cleansing type.

Every activity undertaken with the Data Quality Server—even when invoked by SSIS—is logged
and remains available for review and audit.

5. Click Close.

Cleansing Data with SQL Server 2016 Data Quality Services 54


Reviewing the Cleansing Results
In this task, you will open the Data Quality Project and review the SSIS cleansing results.
1. To open a Data Quality Project, in the Data Quality Projects panel, click
Open Data Quality Project.

2. In the project grid, right-click the SSIS cleansing project, and then select Open.
The SSIS cleansing project is highlighted in red, and is locked.

3. Notice that the project opens at the Manage and View Results step.
It is possible to complete a manual cleansing process.

4. Click Close.

5. Close Data Quality Client.

Cleansing Data with SQL Server 2016 Data Quality Services 55


Analyzing the Cleansing Results
In this task, you will execute various queries to analyze the cleansing results output by the SSIS
package execution.
1. To open SQL Server Management Studio (SSMS), click the SQL Server Management Studio
taskbar shortcut.

2. In the Connect to Server window, ensure that the Server Type is set to Database Engine, and
that the Server Name is set to SQLSERVER2016BI.
3. Click Connect.

4. To open a script file, on the File menu, select Open | File.


5. In the Open File window, navigate to the D:\SQLServer2016BI\Lab09\Assets folder.
6. Select the Script-01-ReviewSsisOutputs.sql file, and then click Open.
7. In the script file, take note of the first line.

It is very important that you execute the script in the manner intended. Many script files include
multiple batches of statements (completed with the GO keyword), and so you should select the
statements together with the GO keyword, and then execute only that selection.

8. To execute a subset of a script, select the text you intend to execute, and then click Execute (or
press F5).
9. Read the comments in the first batch (line 3).
10. Select and execute the only query in the batch (lines 4-5).
11. Read the commented text, and then execute the query for each of the remaining batches in the
script.
12. To exit SSMS, on the File menu, select Exit.

Cleansing Data with SQL Server 2016 Data Quality Services 56


Finishing Up
In this exercise, you will finish up by undoing the configurations made in this lab, and by closing
opened applications.
There is no need to complete this exercise if you do not plan to do any more labs in this Virtual
Machine session.

Finishing Up
In this task, you will finish up by undoing the configurations made in this lab, and by closing opened
applications.
1. Close Data Quality Client.
2. In a File Explorer window, navigate to the D:\SQLServer2016BI\Lab09\Assets folder.
3. Right-click the Cleanup.cmd file, and then select Run as Administrator.
4. In the Command window, when prompted to press any key to continue, press any key.
5. Close the File Explorer window.

Cleansing Data with SQL Server 2016 Data Quality Services 57


Summary
In this lab you cleansed data with SQL Server 2016 Data Quality Services and Integration Services.
This involved creating a knowledge base and creating and configuring domains. You then performed
knowledge discovery to add trusted knowledge to the knowledge base domains. Then, you developed
an SSIS package to cleanse data before loading results into a SQL Server table.

Cleansing Data with SQL Server 2016 Data Quality Services 58


Terms of Use
© 2017 Microsoft Corporation. All rights reserved.
By using this hands-on lab, you agree to the following terms:
The technology/functionality described in this hands-on lab is provided by Microsoft Corporation in a
“sandbox” testing environment for purposes of obtaining your feedback and to provide you with a
learning experience. You may only use the hands-on lab to evaluate such technology features and
functionality and provide feedback to Microsoft. You may not use it for any other purpose. Without
written permission, you may not modify, copy, distribute, transmit, display, perform, reproduce,
publish, license, create derivative works from, transfer, or sell this hands-on lab or any portion thereof.
COPYING OR REPRODUCTION OF THE HANDS-ON LAB (OR ANY PORTION OF IT) TO ANY
OTHER SERVER OR LOCATION FOR FURTHER REPRODUCTION OR REDISTRIBUTION
WITHOUT WRITTEN PERMISSION IS EXPRESSLY PROHIBITED.
THIS HANDS-ON LAB PROVIDES CERTAIN SOFTWARE TECHNOLOGY/PRODUCT FEATURES
AND FUNCTIONALITY, INCLUDING POTENTIAL NEW FEATURES AND CONCEPTS, IN A
SIMULATED ENVIRONMENT WITHOUT COMPLEX SET-UP OR INSTALLATION FOR THE
PURPOSE DESCRIBED ABOVE. THE TECHNOLOGY/CONCEPTS REPRESENTED IN THIS
HANDS-ON LAB MAY NOT REPRESENT FULL FEATURE FUNCTIONALITY AND MAY NOT WORK
THE WAY A FINAL VERSION MAY WORK. WE ALSO MAY NOT RELEASE A FINAL VERSION OF
SUCH FEATURES OR CONCEPTS. YOUR EXPERIENCE WITH USING SUCH FEATURES AND
FUNCITONALITY IN A PHYSICAL ENVIRONMENT MAY ALSO BE DIFFERENT.
FEEDBACK If you give feedback about the technology features, functionality and/or concepts
described in this hands-on lab to Microsoft, you give to Microsoft, without charge, the right to use,
share and commercialize your feedback in any way and for any purpose. You also give to third parties,
without charge, any patent rights needed for their products, technologies and services to use or
interface with any specific parts of a Microsoft software or service that includes the feedback. You will
not give feedback that is subject to a license that requires Microsoft to license its software or
documentation to third parties because we include your feedback in them. These rights survive this
agreement.
MICROSOFT CORPORATION HEREBY DISCLAIMS ALL WARRANTIES AND CONDITIONS WITH
REGARD TO THE HANDS-ON LAB, INCLUDING ALL WARRANTIES AND CONDITIONS OF
MERCHANTABILITY, WHETHER EXPRESS, IMPLIED OR STATUTORY, FITNESS FOR A
PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. MICROSOFT DOES NOT MAKE ANY
ASSURANCES OR REPRESENTATIONS WITH REGARD TO THE ACCURACY OF THE RESULTS,
OUTPUT THAT DERIVES FROM USE OF THE VIRTUAL LAB, OR SUITABILITY OF THE
INFORMATION CONTAINED IN THE VIRTUAL LAB FOR ANY PURPOSE.
DISCLAIMER This lab contains only a portion of new features and enhancements in Microsoft Power
BI. Some of the features might change in future releases of the product.

Cleansing Data with SQL Server 2016 Data Quality Services 59


Document Version
# Date Author Comments

1 30-JUN-2017 Peter Myers SQL Server 2016 SP1 CU3

Cleansing Data with SQL Server 2016 Data Quality Services 60

Potrebbero piacerti anche