Sei sulla pagina 1di 107

SSIS Implementation Problems

SQL SERVER 2005

SSIS Problems

Page |2

Problem One of the Junior SQL Server Developers in my company approached me yesterday with a dilemma. He was developing an SSIS Package which imports data from a comma separated text file and he wanted to know the different ways in which one can execute an SSIS Package in SQL Server 2005 and higher versions. At first I started to tell him, but figured it would be smarter to document the options and share the information. Solution In SQL Server 2005 and higher versions there are different ways in which one can execute an SSIS package. Let us go through each option one by one.

Execute SSIS Package Using SQL Server Business Intelligence Development Studio (BIDS) During the development phase of the project developers can test the SSIS package execution by running the package from Business Intelligence Development Studio a.k.a. BIDS. 1. In Solution Explorer, right click the SSIS project folder that contains the package which you want to run and then click properties as shown in the snippet below.

2. In the SSIS Property Pages dialog box, select Build option under the Configuration Properties node and in the right side panel, provide the folder location where you want the SSIS package to be deployed within the OutputPath. Click OK to save the changes in the property page.

SSIS Problems

Page |3

3. In Solution Explorer, right click the SSIS Package and then click Set as Startup Object option as shown in the snippet below.

SSIS Problems

Page |4

4. Finally to execute the SSIS package, right click the package within Solution Explorer and select Execute Package option from the drop down menu as shown in the snippet below.

Execute SSIS Package using DTEXEC.EXE Command Line Utility Using the DTEXEC.EXE command line utility one can execute an SSIS package that is stored in a File System, SQL Server or an SSIS Package Store. The syntax to execute a SSIS package which is stored in a File System is shown below. DTEXEC.EXE /F "C:\BulkInsert\BulkInsertTask.dtsx"

Execute SSIS Package using DTEXECUI.EXE Utility Using the Execute Package Utility (DTEXECUI.EXE) graphical interface one can execute an SSIS package that is stored in a File System, SQL Server or an SSIS Package Store.

SSIS Problems

Page |5

1. In command line, type DTEXECUI.EXE which will open up Execute Package Utility as shown in the snippet below. Within the Execute Package Utility, click on the General tab and then choose the Package source as File System, next you need to provide the path of the SSIS package under Package option and finally click the Execute button to execute the SSIS package.

The Execute Package Utility is also used when you execute the SSIS package from the Integration Services node in SQL Server Management Studio.

SSIS Problems

Page |6

Execute SSIS Package using SQL Server Agent Job Using a SQL Server Agent Job one can execute an SSIS package that is stored in a File System, SQL Server or an SSIS Package Store. This can be done by creating a new SQL Server Agent Job and then by adding a new step with details as mentioned in the snippet below. 1. In New Job Step dialog box provide an appropriate Step name, then choose SQL Server Integration Services Package option as Type from the drop down list, and then choose SQL Server Agent Service Account as Run as value. 2. In the General tab choose the File System as Package Source and provide the location of the SSIS package under Package option.

SSIS Problems

Page |7

3. Click OK to save the job step and click OK once again to save the SQL Server Agent Job 4. Thats it now you can execute the SQL Server Agent Job which will internally execute the SSIS package. Problem I have some sales forecast data that I get from the business users in an Excel spreadsheet. I need to load this data into a SQL Server database table. The forecast contains product categories on the rows and the sales forecast for each month is on the columns. I'd like to use SQL Server Integration Services (SSIS) to perform this recurring task. How can I transform this data into a table that has Category, Month and Sales Forecast columns? Solution SSIS has a Data Flow Transformation called Unpivot which can do exactly what you need. Let's assume that your spreadsheet looks like this:

SSIS Problems

Page |8

Although not shown above, there are 12 columns for forecast data, January through December. Assume we want to load the data from the Excel spreadsheet into the following table: CREATE TABLE [dbo].[SalesForecast]( [ForecastDate] [datetime] NULL, [SalesForecast] [int] NULL, [CATEGORY] [nvarchar](255) NULL ) We'll create a simple SSIS package to process the Excel spreadsheet. The package will have the following Data Flow:

The following are the main points about the above data flow:

The Excel Source reads in the Excel spreadsheet. The Unpivot transform takes the forecast value columns and transforms them into rows. The Script Component takes the column names (i.e. JAN, FEB, etc. which are also transformed from columns to rows) and prepends the ForecastYear package variable to

SSIS Problems

Page |9

create a string value with the format of YYYY-MM-DD. This can be inserted into a column of type DATETIME. Insert Into SalesForecast is an OLE DB Destination that inserts rows into the SalesForecast table.

Before we start the explanation of the Unpivot, let's add a data viewer after the Unpivot and run the package. The data viewer allows us to see the contents of the data flow. To add a data viewer, right click the line connecting the Unpivot and Script Component, select Data Viewers, Add, then Grid. The data viewer output is shown below:

As you can see the Unpivot component has performed the transformation that we need, taking the Sales Forecast values in the columns and turning them into rows. The Unpivot Transformation Editor is shown below:

SSIS Problems

P a g e | 10

The Available Input Columns grid contains the list of columns in the data flow as read from the spreadsheet. The CATEGORY column has Pass Through checked which means the column value simply passes through this component unchanged. You can see the CATEGORY column in the data viewer output above. The columns that have the checkbox to their left checked are unpivoted; i.e. these columns become rows. All of the checked input columns are being transformed to the SalesForecast column, one per row, as shown in the Destination Column above. Referring back to the data viewer output, you can see the SalesForecast column. The Pivot key value column name is a new column that is added to the data flow; the value of this column is specified in the Pivot Key Value column. The Pivot Key Value allows you to specify the value of your choice for each column in the original spreadsheet. The Pivot Key Value shown is the first day of the month specified in the Input Column. You can see the MonthDay column in the data viewer output above.

SSIS Problems

P a g e | 11

The Script Component has the following single line of code that prepends the package variable ForecastYear to the MonthDay column in the data flow to assign a string value to the output column ForecastDate that can be implicitly converted to a DATETIME: Row.ForecastDate = _ Me.ReadOnlyVariables("ForecastYear").Value.ToString() + _ "-" + Row.MonthDay To test executing the package, use the following DTEXEC command line, specifying a value for the ForecastYear package variable (add the appropriate path before the package name): DTEXEC /f unpivot_sample.dtsx /set \package.variables[ForecastYear];2010 Note that there is a semi-colon separating the package variable and the value. Also the variable name (ForecastYear) is case-sensitive.

Problem I have a requirement to extract data from a SharePoint list. Ideally I would like to be able to do this from an SSIS package. In an earlier tip you showed how to do this by implementing a CLR function that invokes the SharePoint Lists web service. Isn't there a built-in component that we could use to do this? Solution There is a component available on the CodePlex site which should meet your requirements. The SharePoint List Source and Destination Sample provides a Source adapter to extract data from a SharePoint list and a Destination adapter to update data in a SharePoint list. In this tip we will walk through installing the CodePlex sample and creating an SSIS package to extract data from a SharePoint list using SSIS 2005. Installation Download the code for the SharePoint List Source and Destination Sample and launch the ,msi file to install the components. Note that there are separate downloads for SQL Server 2005 and SQL Server 2008. The components require the .NET Framework version 3.5; the installation will prompt you to download and install the .NET Framework if you do not have the version required. If you choose to download and install the .NET Framework your browser will open and navigate to the download page. As of the date of this writing you should be downloading and installing the .NET Framework version 3.5 Service Pack 1. There is a link on the page to download the full package; this will allow you to download everything you need. The default link will download a bootstrapper which will launch and download additional code as it runs. By downloading everything you have what you need to install the .NET Framework on another machine if necessary. After installing the SharePoint List Source and Destination Sample code, open the Business Intelligence Development Studio (BIDS) and add the new components to the Toolbox under Data Flow Sources and Data Flow Destinations. Create a new SSIS package then click Tools on the toplevel menu, then Choose Toolbox Items. Click the SSIS Data Flow Items tab and click the checkboxes for SharePoint List Destination and SharePoint List Source. After clicking OK on the dialog, you should now see SharePoint List Source under the Data Flow Sources and SharePoint List

SSIS Problems

P a g e | 12

Destination under the Data Flow Destinations in the Toolbox. We are now ready to create a sample SSIS package that extracts data from a SharePoint list. Create a Sample SSIS Package Add a Data Flow component to the Control Flow of a new or existing SSIS package, then add the components below to the Data Flow:

The SharePoint List Source can be found in the Toolbox under the Data Flow Sources. To configure the component, right click on it and select Edit from the popup menu. You will see the dialog below:

For this example you need to select a list in a SharePoint site. In my case I have a site called AdventureWorks that has a standard Contacts list that includes all of the employees from the DimEmployee table of the AdventureWorksDW database that comes with SQL Server 2005. After you pick a SharePoint list, fill in the SiteUrl and SiteListName as appropriate. The URL of my sample list is: http://bi-wss/adventureworks/Lists/Contacts/AllItems.aspx - use this as an example of how to extract the SiteListName and SiteUrl based on the URL of your list. Note that there is a SiteListViewName property where you can specify the name of a view that you have created for

SSIS Problems

P a g e | 13

your list. The view allows you to specify the field list, sorting, etc. For our example we'll just leave it blank and go with the default view for the list. As you can see in the above edit dialog for the SharePoint List Source there are two additional tabs - Column Mappings and Input and Output Properties. You can accept the default values. For our example we'll write out the SharePoint list contents to a flat file. Add a Flat File Destination and configure its Flat File Connection Manger (named FlatFileOutput) to point to the file of your choice on your local hard drive; e.g. "c:\mssqltips\ssis_sharepointlists\contacts.txt". You may want to check "Overwrite data in the file" on the Flat File Destination's Connection Manager dialog so that the file will be overwritten if it already exists. In my case I set the column delimiter to Tab {t} on the Flat File Connection Manager's Columns page since my data has a FullName field with a comma separating the last name and first name. We are now ready to run the sample SSIS package. Right click on it in the BIDS Solution Explorer and select Execute Package from the popup menu. You should see the package execute and write out the contents of your SharePoint list to a text file as specified in the FlatFileOutput connection manager. One final note - you may have noticed the CamlQuery property on the SharePoint List Source Component Properties dialog shown above. CAML stands for Collaborative Application Markup Language. It is XML and in this case you could use it to specify sorting and filtering on your list. You can get more details on the CAML Query element here. Problem In part 1 of this tip series, I discussed using the built-in Send Mail Task which is quite simple to use and can be used in a scenario where you need to send plain text email with less development efforts. In this second tip, I am going to discuss the use of the Script Task to overcome the limitations imposed by the Send Mail Task. In this tip, I will show how you can send HTML formatted emails from SSIS using the Script Task or rather sending emails using the .Net capabilities from your SSIS package. Solution I will start my discussion on using the Script Task to send email (both non-HTML and HTML formatted) with an example. First I will create a database table (MailsToBeSent) which will hold the information about the emails which are to be sent by the Script Task and then insert a few records in this table. Next I will create a stored procedure to retrieve the records from the above created tables. So here is the script for creating these database objects. --Create a table to store mails to be sent details CREATE TABLE MailsToBeSent ( [MailID] INT PRIMARY KEY, [From] VARCHAR(200), [TO] VARCHAR(200), [CC] VARCHAR(200), [BCC] VARCHAR(200), [Subject] VARCHAR(200), [Body] VARCHAR(MAX), [IsHTMLFormat] BIT,

SSIS Problems [Priority] CHAR(1) ) GO

P a g e | 14

--Insert a non-HTML mail details to be sent INSERT INTO MailsToBeSent([MailID], [From], [TO], [CC], [BCC], [Subject], [Body], [IsHTMLFormat], [Priority]) VALUES(1, 'arshad@gmail.com', 'arshad@gmail.com', 'arshad@gmail.com;ali@gmail.com','', 'Sending Non-HTML Mail Using Script Task', 'This Non-HTML mail has been sent using SSIS Script task.', 0, 'L') GO --Insert a HTML formatted mail details to be sent INSERT INTO MailsToBeSent([MailID], [From], [TO], [CC], [BCC], [Subject], [Body], [IsHTMLFormat], [Priority]) VALUES(2, 'arshad@gmail.com', 'arshad@gmail.com', 'arshad@gmail.com;ali@gmail.com','', 'Sending HTML formatted Mail Using Script Task', 'This <strong><span style="font-size:130%;color:#006600;">HTML formatted</span></strong> mail has been sent using <em><span style="color:#ff6600;">SSIS Script task</span></em>.', 1, 'H') GO --Create a procedure to retrieve all the records --from MailsToBeSent table to send mails CREATE PROCEDURE GetMailsToBeSent AS BEGIN SELECT [MailID], [From], [TO], [CC], [BCC], [Subject], [Body], [IsHTMLFormat], [Priority] FROM MailsToBeSent END GO EXEC GetMailsToBeSent GO Once you are done with the database object creation, lets move on to create a package with Script Task to send emails. Create a new project of Integration Services type in the Business Intelligence Development Studio. Drag a Script Task from the toolbox and drop it onto the Control Flow area of the Designer interface, right click on Script Task component and then select Edit, a screen similar to one as given below will come up. This is a screen shot from SSIS 2008. If you are still using SSIS 2005 it will say "Design Script..." instead of "Edit Script...".

SSIS Problems

P a g e | 15

On the Script page of Script Task Editor click on Edit Script button, it will bring up VSTA environment (On SSIS 2005 its VSA environment) for writing .Net code for sending emails. Copy the code from the below tables and paste it in the VSA/VSTA code editor. There are two sets of code one for 2005 and one for 2008, so make sure you use the right version based on your version of Business Intelligence Development Studio, not the version of SQL Server you are connecting to. Because of changes in the scripting environment between SQL Server 2005 and 2008, there would is a slight change in the code that's why I am providing the below separate code to be used on SQL Server 2005 and on SQL Server 2008. If you are running the below code in your Script Task in SSIS 2005 environment, you may need to reference the System.XML.dll. By default it is included and referenced dll in SSIS 2008, so you would not have to worry if you are using it on SSIS 2008. Here are the objects that are referenced with the System.XML highlighted below.

SSIS Problems

P a g e | 16

Please note you need to change the connection string in the code pointing to the server and database where you have created the above database objects and also change the SMTP server name which will be used to send emails. The two lines of code are as follows: ConnString = "Data Source=ARALI-LAPTOP;Initial Catalog=Learning;Integrated Security=True;" mySmtpClient = New SmtpClient("smtpserver") Also, I have commented out the section that allows you to send emails using authentication to your mail server. So if you want to use a user and password this can be supplied as well. Script : VB .Net Code for Script Task for SQL Server 2005

Script : VB .Net Code for Script Task for SQL Server 2008

You should make sure you properly dispose of the instance of MailMessage class especially if you are sending attachments with the email otherwise you will end up having your files locked by Windows OS and you will not be able to delete them. The easiest way to avoid overhead of disposing of the unused objects is to use the USING statement and write your code inside its block, similar to the way it has been done in the above code. So now for the execution of the above created package, two emails are sent to the intended audience as shown below. As expected the first email (MailMessage.IsBodyHtml = False) is nonHTML email whereas the second email (MailMessage.IsBodyHtml = True) is HTML formatted, look at color in the message body.

SSIS Problems

P a g e | 17

Note

If you are running the above code in your Script Task in SSIS 2005 environment, you may need to make reference to System.Xml.dll. By default it is included and referenced dll in SSIS 2008, so you would not have to worry if you are using it on SSIS 2008. You need to reference System.Net.Mail namespace in your code which contains MailMessage and SmtpClient classes which are required for sending emails. There are three differences to note here between SSIS 2005 and SSIS 2008 in terms of the Script Task. o Now you have two language options to write your code i.e. Visual Basic .Net and C# .Net in SSIS 2008 whereas you had only one language option in SSIS 2005. o The scripting environment in SSIS 2008 is VSTA and it is VSA in SSIS 2005. It means almost full .Net capabilities in SSIS o In SSIS 2008, in Script Task Editor Dialog box, the Script page has got preference over other pages and is the first page in list on the left side, it means it saves one more click to reach to your development environment to write your code.

More details about the script environment enhancement in SSIS 2008 can be found in the article at http://www.sql-serverperformance.com/articles/biz/SSIS_New_Features_in_SQL_Server_2008_Part5_p1.aspx

Problem Sending an email is a frequent requirement to notify a user on the occurrence of certain events, especially if an unexpected event happens (for example sending notification on failure that could be either logical or physical). SSIS provides a built-in "Send Mail Task" to send email in these circumstances. The Send Mail Task is quite simple and straight forward in its configuration and use, but it has some inherent limitations for example, first it supports only sending plain text email (doesnt support HTML formatted mail) and second it doesnt support passing username and password while connecting to SMTP server (it only supports Windows authentication i.e. none windows authentication is not allowed) nor does it support specifying a SMTP port number to send emails if your SMTP server does not use the default value. In part 1 of this tip series, I will first start my discussion on using the built-in Send Mail Task and then in part 2 of this tip series, I will discuss using the "Script Task" to overcome the limitations imposed by Send Mail Task. I will show how you can send HTML formatted mails from SSIS using the Script Task or rather the .Net capabilities from your SSIS package.

SSIS Problems

P a g e | 18

Solution When you drag a Send Mail Task from the toolbox to the control flow, you will notice there are three pages when you right click on the task and select Edit. On each page you will find a few settings which you might need to configure for sending emails. These pages are:

General Page Here you specify the name and a small description for your Send Mail Task. Though these are not mandatory, but its a good practice to give a meaningful name and description. Expression Page You use the Expressions page to edit property expressions and to access the Property Expressions Editor and Property Expression Builder dialog boxes. Property expressions update the values of properties when the package / task are run. The expressions are evaluated and their results are used at runtime instead of the values to which you set the properties when you configured the task. The expressions can include variables and the functions and operators that the expression language provides. For example, you can generate the subject line for the Send Mail task by concatenating the value of a variable that contains the string "Weather forecast for " and the return results of the GETDATE() function to make the string "Weather forecast for 4/5/2009". You can refer to this KB article to learn more on how to use expression in Send Mail Task http://support.microsoft.com/kb/906547. Mail Page This is a place where you specify most of the configuration for your Send Mail Task as shown in below image:

SSIS Problems

P a g e | 19

Let me summarize the Mail Page configurations and give you a brief description of the above settings which you would normally do on this page as shown in the below table: Property SMTPConnection Description Select an SMTP connection manager in the list, or click <New connection> to create a new connection manager. As discussed below, you have an option to attempt anonymous or Windows authenticated connection as well as enable Secure Socket Layer (SSL) to encrypt the communication. An SMTP connection manager enables a package to connect to a Simple Mail Transfer Protocol (SMTP) server. Specify the e-mail address of the sender, which may be used by recipient of the mail to reply back. Provide the e-mail addresses of the recipients, multiple recipients emails are separated with semicolons. Specify the e-mail addresses, multiple recipients emails are separated with semicolons, of individuals who also receive copies of the message. Specify the e-mail addresses, multiple recipients emails are separated with semicolons, of individuals who receive blind carbon copies (Bcc) copies of the message. Provide a subject line for your e-mail message. You may configure it to dynamically change its value using expression as discussed below.

From To Cc

Bcc

Subject

MessageSourceTyp Select the source type of the message that could be e either Direct Input which allows you to directly type your message in the box provided or File Connection which points to a file containing your message or Variable which allows your message content to come from a SSIS variable. Priority Attachments Set the priority of the message, it could be either Low, Normal or High. Provide the file names of attachments to the e-mail message, multiple attachments are delimited by the pipe (|) character.

When you create an SMTP connection manager, a dialog box similar as shown below will come up. Here you specify a meaningful name and a small description for this SMTP connection manager and then you specify the SMTP Server. The SMTP connection manager supports only anonymous authentication and Windows authentication. It does not support basic authentication. Check Enable Secure Socket Layer (SSL) option if you want to encrypt communication using Secure Sockets Layer (SSL) while sending e-mail messages. One thing to note here is if you execute your package interactively from BIDS it uses the security context of the currently logged on user, whereas if you schedule it by executing a SQL Server

SSIS Problems

P a g e | 20

Agent Job then it uses the account under which SQL Server Agent is running to connect to the SMTP host.

So far we have learned about all of the settings of the Send Mail Task, now let me execute the package and see the result.so here goes the mail.

Note You can configure and send emails using "Send Mail Task" programmatically as well, more details about how this can be done, can be found here on the MSDN site. Caution The maximum allowed length of an expression is 4,000 characters. While using an expression take this limitation into consideration or else you will end up having an error as shown below:

SSIS Problems

P a g e | 21

If you are using expressions for the MessageSource property and your expectation is such that your email message source can grow to more than 4,000 characters then in that case instead of using expression use MessageSourceType = Variable and assign the value directly using a variable or think about using the Script Task to send emails (this will be discussed in part 2 of this tip series). Conclusion

In the part 1 of this tip series I discussed how you can easily configure and use the built-in Send Mail Task of SSIS to send plain text emails, we also then learned about some of its limitations. In part 2 of this tip series, I will be covering sending emails using the Script Task which overcomes the limitations imposed by Send Mail Task.

Problem One task that most people are faced with at some point in time is the need to import data into SQL Server from an Excel spreadsheet. We have talked about different approaches to doing this in previous tips using OPENROWSET, OPENQUERY, Link Servers, etc... These options are great, but they may not necessarily give you as much control as you may need during the import process. Another approach to doing this is using SQL Server Integration Services (SSIS). With SSIS you can import different types of data as well as apply other logic during the importing process. One problem though that I have faced with importing data from Excel into a SQL Server table is the issue of having to convert data types from Unicode to non-Unicode. SSIS treats data in an Excel file as Unicode, but my database tables are defined as non-Unicode, because I don't have the need to store other code sets and therefore I don't want to waste additional storage space. Is there any simple way to do this in SSIS? Solution If you have used SSIS to import Excel data into SQL Server you may have run into the issue of having to convert data from Unicode to non-Unicode. By default Excel data is treated as Unicode and also by default when you create new tables SQL Server will make your character type columns Unicode as well (nchar, nvarchar,etc...) If you don't have the need to store Unicode data, you probably always use non-Unicode datatypes such as char and varchar when creating your tables, so what is the easiest way to import my Excel data into non-Unicode columns?

SSIS Problems

P a g e | 22

The following shows two different examples of importing data from Excel into SQL Server. The first example uses Unicode datatypes and the second does not. Here is what the data in Excel looks like.

Example 1 - Unicode data types in SQL Server Our table 'unicode" is defined as follows: CREATE TABLE [dbo].[unicode]( [firstName] [nvarchar](50) NULL, [lastName] [nvarchar](50) NULL ) ON [PRIMARY] If we create a simple Data Flow Task and an Excel Source and an OLE DB Destination mapping firstname to firstname and lastname to lastname the import works great as shown below.

Example 2- non-Unicode data types in SQL Server Our table 'non_unicode" is defined as follows: CREATE TABLE [dbo].[non_unicode]( [firstName] [varchar](50) NULL, [lastName] [varchar](50) NULL ) ON [PRIMARY] If we map the columns firstname to firstname and lastname to lastname we automatically get the following error in the OLE DB Destination. Columns "firstname" and "firstname" cannot convert between unicode and non-unicode data types...

SSIS Problems

P a g e | 23

If we execute the task we get the following error dialog box which gives us additional information.

Solving the Problem So based on the error we need to convert the data types so they are the same types. If you right click on the OLE Destination and select "Show Advanced Editor" you have the option of changing the DataType from string [DT_STR] to Unicode string [DT_WSTR]. But once you click on OK it looks like the changed was saved, but if you open the editor again the change is gone and back to the original value. This makes sense since you can not change the data type in the actual table.

SSIS Problems

P a g e | 24

If you right click on the Excel Source and select "Show Advanced Editor" you have the option of changing the DataType from Unicode string [DT_WSTR] to string [DT_STR] and the change is saved.

If you click OK the change is saved, but now you get the error in the Excel Source that you can not convert between unicode and non-unicode as shown below. So this did not solve the problem either.

SSIS Problems

P a g e | 25

Using the Data Conversion Task So to get around this problem we have to also use a Data Conversion task. This will allow us to convert data types so we can get the import completed. The following picture shows the "Data Conversion" task in between the Excel Source and the OLE DB Destination.

If you right click on "Data Conversion" and select properties you will get a dialog box such as the following. In here we created an Output Alias for each column. Our firstname column becomes firstname_nu (this could be any name you want) and we are making the output be a non-unicode string. In addition we do the same thing for the lastname column.

SSIS Problems

P a g e | 26

If we save this and change the mapping as shown to use our new output columns and then execute the task we can see that the import was successful.

SSIS Problems

P a g e | 27

As you can see this is pretty simple to do once you know that you need to use the Data Conversion task to convert the data types. Problem We use SSIS to periodically load data into our data warehouse. While much of the data we process is in relational data stores, we do have some Excel spreadsheets that we need to process. In one particular case we load an Excel spreadsheet that is produced by an external application and every month the number of sheets varies and the sheet names are also different. How can we determine the sheet names in our SSIS package and process this variable number of sheets? Solution The first step is to retrieve the schema information from the Excel spreadsheet. The sheet name in the Excel spreadsheet becomes the table name in a SQL statement; the sheet columns are the columns in the SQL statement. Let's start out by looking at a sample Excel spreadsheet that we might need to process with an SSIS package:

The main points about the above Excel spreadsheet are:


We have an external system that generates an Excel spreadsheet with a list of invoices by week. The spreadsheet can be generated for any number of weeks; each week is in its own sheet. The sheet names represent the year and the week number in the year.

Based on our knowledge of the Excel Source that we use in a Data Flow task, we need to know the sheet name in order to import the data. In our example, however, the sheet name varies and the

SSIS Problems

P a g e | 28

number of sheets also varies. What we need then is a way to query the Excel spreadsheet and get the list of sheets. You can get the schema information from an Excel spreadsheet by using the OleDbConnection .NET Framework class. The OleDbConnection class has a method called GetOleDbSchemaTable that will return the list of sheets (i.e. tables) in a spreadsheet and the list of columns in a particular sheet. Let's create a simple SSIS package to demonstrate how to query this information and process the sheets. For additional details on the GetOleDbSchemaTable method, refer to this article on the Microsoft web site. Sample SSIS Package We'll create an SSIS package that will process all of the sheets in a single Excel file. The Excel file to process will be specified on the command line and stored in the package variable ExcelFilePath. Our SSIS sample package will have the following control flow:

SSIS Problems

P a g e | 29

Truncate Staging Tables is an Execute SQL task that truncates two staging tables used during processing. Get Schema from Excel File is a Data Flow task that retrieves the schema information from each sheet in the Excel spreadsheet and stores it the stg_ExcelMetadata staging table. Get List of Excel Tables to Process is an Execute SQL task that gets the distinct list of tables from the stg_ExcelMetadata table and stores them in the package variable ExcelTableList. Process Each Excel Table is a Foreach Loop Container task that iterates through the list of Excel tables in the ExcelTableList package variable. Each time through the loop the Excel table to be processed is stored in the ExcelTable package variable. Process Excel Table is a Data Flow task that reads the data from the single Excel sheet per the ExcelTable package variable and inserts the data into the staging table stg_Invoice.

The Get Schema from Excel File task is the most interesting part of our sample SSIS package and it looks like this:

Get Excel Metadata is a Script Source. It contains the VB.NET code that retrieves the schema information from the Excel file. Write to Staging Table is an OLE DB Destination that inserts the schema information into the stg_ExcelMetadata table which is defined as follows: CREATE TABLE [dbo].[stg_ExcelMetadata]( [EXCEL_FILE_PATH] [nvarchar](256) NULL, [DATA_TYPE] [nvarchar](50) NULL, [COLUMN_NAME] [nvarchar](50) NULL, [TABLE_NAME] [nvarchar](50) NULL ) A Script Source allows you to write VB.NET code to retrieve data from just about any source and insert it into the data flow. The key point is that you have the entire .NET Framework at your disposal. There are two steps in the configuration of the Script Source component: Step 1: Define the output columns; these are the columns that you want to insert into the data flow. In our case they are all strings:

SSIS Problems

P a g e | 30

Step 2: Write the VB.NET code: Public Overrides Sub CreateNewOutputRows() Dim excelFilePath As String = Me.Variables.ExcelFilePath.ToString() Dim strCn As String = "Provider=Microsoft.Jet.OLEDB.4.0;" & _ "Data Source=" + excelFilePath + ";Extended Properties=Excel 8.0" Dim dtTables As DataTable Dim dtColumns As DataTable Dim tableName As String Dim cn As OleDbConnection = New OleDbConnection(strCn) cn.Open() dtTables = cn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing) For Each row As DataRow In dtTables.Rows tableName = row("TABLE_NAME").ToString() dtColumns = cn.GetOleDbSchemaTable(OleDbSchemaGuid.Columns, _ New Object() {Nothing, Nothing, tableName, Nothing}) For Each columnRow As DataRow In dtColumns.Rows OutputBuffer.AddRow() OutputBuffer.EXCELFILEPATH = excelFilePath OutputBuffer.TABLENAME = tableName OutputBuffer.COLUMNNAME = columnRow("COLUMN_NAME").ToString() OutputBuffer.DATATYPE = columnRow("DATA_TYPE").ToString() Next Next cn.Close() OutputBuffer.SetEndOfRowset() End Sub The main points about the above code snippet are:

In a Script Source component you add VB.NET code to the CreateNewOutputRows subroutine to retrieve data and insert it into the data flow. The ExcelFilePath package variable is passed in to the Script Source component and used in the connection string. Create an OleDbConnection object, open the connection and call the GetOleDbSchemaTable method to retrieve the list of sheets and the columns in each sheet. The GetOleDbSchemaTable method returns a DataTable; this is a standard ADO.NET class that has rows and columns.

SSIS Problems

P a g e | 31

The OutputBuffer class is used to add rows to the data flow and also to assign values to the output columns. The OutputBuffer class gets its name based on the name you specify on the Inputs and Outputs page in Step 1 above. You should call the SetEndOfRowset method on the OutputBuffer to indicate that you are done adding rows.

The Get List of Excel Tables to Process task executes a query to get the list of sheets in the Excel spreadsheet (from the staging table) then stores the list in the package variable ExcelTableList. The Process Each Excel Table task iterates over the list of tables in the Excel spreadsheet and executes the Process Each Excel Table task on each table. This technique was also used in our earlier tip How To Implement Batch Processing in SQL Server Integration Services (SSIS). There are two steps required to configure the Get List of Excel Tables to Process task:

The General page (shown above) is where you specify the query; it's just selecting the list of tables from the staging table that we populated in the Get Schema from Excel File task. Setting the ResultSet to Full result set allows us to capture the query results into a package variable which we need to specify on the Result Set page:

SSIS Problems

P a g e | 32

Note that the data type of the ExcelTableList variable must be Object (i.e. the .NET Framework System.Object class) in order for the variable to hold the list of tables from our query. There are two steps required to configure the the Process Each Excel Table task:

The Collection page (shown above) is where you specify the type of enumerator; in our case it must be Foreach ADO Enumerator. The ADO object source variable is ExcelTableList which is the variable we specified for the Result Set in the Get List of Excel Tables task. For Enumeration Mode we pick Rows in the first table (there is only one table in our result set). The Variable Mappings page is used to assign value(s) from the result set to package variable(s) during each iteration. We'll use a package variable named ExcelTable:

Finally the last step in our SSIS package is the Process Excel Table task which looks like this:

SSIS Problems

P a g e | 33

The main points about the above Data Flow task are:

The Excel Source reads a single sheet from our Excel file. The Derived Column task adds the ExcelFilePath and ExcelTable package variables to the data flow so that we can save these in the staging table. Write to Staging is an OLD DB Destination that inserts rows into the staging table stg_Invoice.

The staging table is defined as follows: CREATE TABLE [dbo].[stg_Invoice] ( [ExcelFilePath] [nvarchar](255) NULL, [ExcelTable] [nvarchar](255) NULL, [InvoiceDate] [float] NULL, [InvoiceNumber] [nvarchar](255) NULL, [CustomerNumber] [nvarchar](255) NULL, [InvoiceAmount] [float] NULL ) While the above Data Flow task is relatively straight forward, there are two subtle points that we need to take into consideration. First we need to configure the Excel Connection Manager that is used by the Excel Source. The ExcelFilePath property needs to be set to the ExcelFilePath package variable that is passed in on the command line. Click the button in the Expressions property of the Excel Connection Manager and assign the ExcelFilePath package variable to the ExcelFilePath property as shown below:

SSIS Problems

P a g e | 34

Second remember that the Data Flow task is being executed once for each sheet in our Excel spreadsheet. The way to make this work is to configure the Excel Source Connection Manager page to specify the Data access mode as Table name or view name variable and select the ExcelTable package variable as the Variable name as shown below:

As the Process Each Excel Table task iterates through the list of sheets in the Excel file, it assigns each sheet to the ExcelTable package variable then executes the Process Excel Table task which operates on the sheet specified by the ExcelTable package variable. Running the SSIS Package Run the sample SSIS package using the DTEXEC command line utility and set the ExcelFilePath package variable to the full path of the Excel file to process; e.g.: DTEXEC /FILE ExcelMetadata.dtsx /SET "\Package.Variables[User::ExcelFilePath].Value";"c:\drop\sample.xls" After running the package you can query the stg_ExcelMetadata table to see the schema information:

SSIS Problems

P a g e | 35

Note that the TABLE_NAME column has a '$' character at the end of it and is enclosed in single quotes. The sheet name does not show the '$' character or the single quotes in Excel. The DATA_TYPE column values are: 5=double precision, 130=text. You can find the details on additional types here. Excel supports a very small set of column types. You can also query the stg_Invoice table to see the data that was loaded from the Excel sheets:

The results above only show the first ten rows; if you scroll through all of the results you will see rows from each of the three Excel sheets in our sample Excel spreadsheet. Problem We have a named SQL instance and I am able to connect to the instance, but when I try to view the SSIS packages stored in the MSDB database I get an error. This does not happen with our default instances. Is there an additional setting that must be changed to get this to work? Solution When a SQL Server instance is installed, one of the configuration files is the MsDtsSrvr.ini.xml file.

SSIS Problems

P a g e | 36

This file is located in the "<Program Files Installation>\Microsoft SQL Server\90\DTS\Binn" folder. It is responsible for various configuration options. Here is what the default contents for MsDtsSrvr.ini.xml looks like: <?xml version="1.0" encoding="utf-8"?> <DtsServiceConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <StopExecutingPackagesOnShutdown&;gttrue</StopExecutingPackagesOn Shutdown> <TopLevelFolders> <Folder xsi:type="SqlServerFolder"> <Name>MSDB</Name> <ServerName>.</ServerName> </Folder> <Folder xsi:type="FileSystemFolder"> <Name>File System</Name> <StorePath>..\Packages</StorePath> </Folder> </TopLevelFolders> </DtsServiceConfiguration> Of the configuration settings listed above, the one of interest is the <ServerName> tag. By default, the configuration is set to look at default instance (the server name). When you attempt to view the SQL Server Integration Services packages on a named instance, you get the following error:

In order to view packages on a named instance, perform the following steps: 1. Replace the period (.) in the <ServerName> section with the name of your SQL Server instance. In my case this was changed to: <ServerName>CULLENSVR01\SQL2K5</ServerName> 2. Save the file

SSIS Problems

P a g e | 37

3. Restart the SQL Server and SQL Server Integration Services services 4. Re-connect to the SSIS instance and you should be able to see all packages stored in the SQL Server instance.

Problem We have a requirement to implement security in our data warehouse to limit what data a user can see on a report. As an initial step we have created tables for users and roles; we also have a user role table where we specify the role(s) that a user is assigned. We would like to update the user, role, and user role tables automatically from Active Directory. Ideally we'd like an SSIS package that we could schedule and/or run on demand to take care of the update. Can you provide an example of how to do this? Solution An SSIS package is a convenient choice to synchronize your security tables with Active Directory. From a database standpoint, let's assume your security tables are as shown in the following schema diagram:

SSIS Problems

P a g e | 38

The DimUser table has a row for each user that is allowed to see any data in the warehouse. The DimRole table contains our list of roles that a user may be assigned; there is a one-to-one relationship between a role and an Active Directory group. The DimUserRole table contains the users and their roles via the foreign key relationships to the DimUser and DimRole tables. You query the DimUserRole table to determine what role(s) the user belongs to then use that to filter the data that the user can see. If you're familiar with .NET programming you might ask why not just call the method WindowsPrincipal.IsInRole() to check whether a user is in a particular role? While you certainly could do that, there are a number of reasons why having the users and roles in tables is beneficial:

You can create table-valued functions (TVF) that you can join with other tables to filter based on role and/or other rules; I'll give an example in this tip. Sometimes the "current user" is different than the the "calling user"; e.g. in SQL Server Reporting Services you may configure a report to run as a particular user but filter the data based on the global variable User!UserID (the calling user). You often come up with additional user or role data that isn't in Active Directory and it's not a simple task to extend the Active Directory schema; you can easily incorporate this data into your own SQL Server tables. For instance users in certain roles can see all Key Performance Indicators (KPIs) for all offices; users in other roles can only see the companywide KPI (no office-level detail).

As shown above the schema is very simple. Let's continue on with the details on how to retrieve users and their groups from Active Directory then implement an SSIS package to perform the update. .NET Code to Query Active Directory In order to retrieve the list of users and their groups from Active Directory, we will need to write some .NET code. The following method shows the steps to write out every user and their group memberships to a tab delimited file:

SSIS Problems

P a g e | 39

The main points about the above code are:

Step 1 sets up the parameters for the Active Directory search. DirectoryEntry is a class in the System.DirectoryServices namespace that you use to specify where in Active Directory to begin the search. In my case I used "LDAP://dc=vs,dc=local" as the path variable value to get all users in the domain since my domain is vs.local. DirectorySearcher is used to perform the actual search; it is also in the System.DirectoryServices namespace. The SearchScope property is set to search objects and their hierarchies. You specify the

SSIS Problems

P a g e | 40

attributes you want to retrieve by calling the PropertiesToLoad Add method. The Filter property is set to return any object that represents a person. The PageSize property sets the chunk size for retrieving items from the Active Directory. Without specifying a PageSize you will only get the first 1,000 items. Step 2 executes the search by calling the FindAll method on the DirectorySearcher object, which returns a collection of objects. Step 3 creates the flat file to output the results. Step 4 iterates through the result collection. Each item is a collection itself. Step 5 iterates through each item collection pulling out either a single value (e.g. samaccountname) or multiple values (e.g. memberof). The samaccountname attribute is the user's login name. The memberof attribute is a multi-valued collection which contains each Active Directory group that the user is a member of. The above code is part of a class called User contained in a class library called ADHelper.DLL. By packaging the code in a class library rather than embedding it in the SSIS package, we can call it from the SSIS package as well as any .NET code.

Note that there are two requirements for deploying the above code so that it can be called from an SSIS package:

ADHelper.DLL must be deployed to the Global Assembly Cache (GAC). You can use the GACUTIL utility to do this or simply drag and drop the DLL to the \Windows\Assembly folder. ADHelper.DLL must be copied to the folder \Program Files\Microsoft SQL Server\90\SDK\Assemblies

SSIS Package Control Flow We will use the following SSIS package control flow to synchronize our security tables with Active Directory:

Extract Users and Group Memberships from AD Extract Users and Group Memberships from AD is a Script task that retrieves users and their group memberships from Active Directory and writes out a tab delimited file. The Script task contains the following code:

SSIS Problems

P a g e | 41

Dim user As New ADHelper.User Dim filename As String filename = Dts.Variables("ADUserGroupsFileName").Value.ToString() user.GetUserGroups("VS", "LDAP://dc=vs,dc=local", filename) Dts.TaskResult = Dts.Results.Success ADUserGroupFileName is a package variable that holds the full path to the tab delimited file where the results are stored. The code in the Script task is minimal since the GetUserGroups method contains the majority of the code. You need to add a reference to the ADHelper DLL in the Script task. Truncate stg_UserGroupList Truncate stg_UserGroupList is an Execute SQL task that truncates the stg_UserGroupList table used during processing. The stg_UserGroupList table is created with the following script: CREATE TABLE [dbo].[stg_UserGroupList]( [Domain] [nvarchar](50) NOT NULL, [AccountName] [nvarchar](50) NOT NULL, [Group] [nvarchar](50) NOT NULL, [DomainUser] AS (([Domain]+'\')+[AccountName]), [FK_DimUser] [int] NULL, [FK_DimRole] [int] NULL ) The Domain column is populated by the value of the domain parameter passed in to the GetUserGroups method described above. The AccountName and Group columns are set from the Active Directory attributes samaccountname and memberof. The DomainUser computed column concatenates the Domain and AccountName columns in the DOMAIN\ACCOUNT format. The FK_DimUser and FK_DimRole columns will be set to the primary key values of the respective tables. Import Users and Groups into stg_UserGroupList Import Users and Groups into stg_UserGroupList is a Data Flow task that loads the stg_UserGroupList table from the tab delimited file created in the initial Script task. Synchronize DimUser and DimUserRole Synchronize DimUser and DimUserRole is an Execute SQL task that calls the stored procedure that updates our security tables from the stg_UserGroupList table. The update is essentially a merge operation that inserts rows for any new users and their group memberships, and deletes users and their group memberships that are no longer in the Active Directory. The stored procedure is shown below: -- Step 1 UPDATE dbo.stg_UserGroupList SET FK_DimUser = u.DimUserID FROM dbo.stg_UserGroupList f JOIN dbo.DimUser u ON u.UserName = f.DomainUser

SSIS Problems UPDATE dbo.stg_UserGroupList SET FK_DimRole = r.DimRoleID, FK_DimPlan = r.FK_DimPlan FROM dbo.stg_UserGroupList f JOIN dbo.DimRole r ON r.Role = f.[Group] -- Step 2 INSERT INTO dbo.DimUser (UserName) SELECT DISTINCT DomainUser FROM dbo.stg_UserGroupList WHERE FK_DimRole IS NOT NULL AND FK_DimUser IS NULL -- Step 3 UPDATE dbo.stg_UserGroupList SET FK_DimUser = u.DimUserID FROM dbo.stg_UserGroupList f JOIN dbo.DimUser u ON u.UserName = f.DomainUser -- Step 4 INSERT INTO dbo.DimUserRole (FK_DimUser, FK_DimRole) SELECT f.FK_DimUser, f.FK_DimRole FROM dbo.stg_UserGroupList f LEFT JOIN dbo.DimUserRole r ON r.FK_DimUser = f.FK_DimUser AND r.FK_DimRole = f.FK_DimRole WHERE f.FK_DimUser IS NOT NULL AND f.FK_DimRole IS NOT NULL AND r.DimUserRoleID IS NULL -- Step 5 DELETE FROM dbo.DimUser WHERE DomainUser NOT IN ( SELECT DISTINCT DomainUser FROM dbo.stg_UserGroupList )

P a g e | 42

-- Step 6 DELETE FROM dbo.DimUserRole FROM dbo.DimUserRole r LEFT JOIN dbo.stg_UserGroupList f ON f.FK_DimUser = r.FK_DimUser AND f.FK_DimRole = r.FK_DimRole WHERE f.AccountName IS NULL The main points about the above stored procedure are:

Step 1 looks up the primary key values for the DimUser and DimRole tables and saves them in the FK_DimUser and FK_DimRole columns. When the FK_DimUser column is NULL we have a new user; when the column isn't NULL we have an existing user. Step 2 inserts any new users into the DimUser table. Note that we only insert new users if they are in a role in the DimRole table. Step 3 looks up the primary key value for the DimUser table and saves it in the FK_DimUser column. This is done to get the primary key of any users that were added in Step 2. Step 4 inserts any new user role assignments into the DimUserRole table. Note that a LEFT JOIN is used because we only want to insert rows that are not already in the table.

SSIS Problems

P a g e | 43

Step 5 deletes any rows from the DimUser table that have been removed from Active Directory; i.e. any user not in the staging table. Step 6 deletes any rows from the DimUserRole table where the user is no longer in the Active Directory group.

Implementing KPI Security Let's finish up by implementing a common security requirement. Assume we want to only allow users in a certain role to see KPI values in detail. All roles can see the KPI values for the company, but only users in a certain role can see KPIs at the office level of detail. We can implement this security by creating a table-valued function that will take the user as a parameter, lookup his role, then return the list of KPI values that the user is allowed to see. We'll use the following schema to implement the KPI values:

The FK_DimOffice column in the FactKPIValue table will have a value of 1 when the row is the KPI value for the entire company. We will use the following table-valued function to implement the KPI security: CREATE FUNCTION [dbo].[udf_get_filtered_kpi] ( @username NVARCHAR(256) ) RETURNS @kpi_t TABLE ( FactKPIID INT , FK_DimKPI INT , FK_DimOffice INT , KPIValue DECIMAL(18,2) ) AS

SSIS Problems BEGIN DECLARE @UserID INT DECLARE @ShowDetails BIT SELECT @UserID = DimUserID FROM dbo.DimUser WHERE DomainUser = @username IF @UserID IS NULL RETURN IF EXISTS ( SELECT * FROM dbo.DimUserRole u JOIN dbo.DImRole r ON r.DimRoleID = u.FK_DimRole WHERE u.FK_DimUser = @UserID AND r.[Role] = 'Senior Leadership' ) SET @ShowDetails = 1 ELSE SET @ShowDetails = 0 -- everyone gets to see the kpi values for the company INSERT INTO @kpi_t (FactKPIID, FK_DimKPI, FK_DimOffice, KPIValue) SELECT FactKPIID, FK_DimKPI, FK_DimOffice, KPIValue FROM dbo.FactKPIValue WHERE FK_DimOffice = 1 -- 'Senior Leadership' role can see the detailed kpis IF @ShowDetails = 1 INSERT INTO @kpi_t (FactKPIID, FK_DimKPI, FK_DimOffice, KPIValue) SELECT FactKPIID, FK_DimKPI, FK_DimOffice, KPIValue FROM dbo.FactKPIValue WHERE FK_DimOffice > 1 RETURN END Main points about the above function:

P a g e | 44

The @username parameter must be formatted as DOMAIN\USERNAME. This is the format of the User!UserID global variable in a SQL Server Reporting Services report. Every user can see the company KPI values. If the user is in the Senior Leadership role then he can see the KPI details at the office level.

Here is the sample output from executing the function with a user that is in the Senior Leadership role and another user that isn't in the role: select * from dbo. udf_get_filtered_kpi ('VS\jones') select * from dbo. udf_get_filtered_kpi ('VS\smith')

SSIS Problems

P a g e | 45

Problem We use the FTP task in SSIS to process a number of files from an FTP server. We would like to implement a step in our SSIS packages that would retrieve the list of files that are available on the FTP server before we try to process them. The FTP task doesn't have an operation that will retrieve the list of files. Can you provide an example of how to do this? Solution All of the code for this tip can be downloaded here. The FTP task provides the following operations as shown below:

As you have noted, there is no operation to retrieve a list of files. You can use the Script task to do this. Here is a sample SSIS package that we will review to get the list of files from an FTP server:

SSIS Problems

P a g e | 46

The main points about the above SSIS package are:


Get Ftp File Listing is a Script task that will retrieve the list of files available on the FTP server. It will return the list of available files as an XML document. Stage Ftp File Listing is an Execute SQL task which will call a stored procedure to insert the list of files from the XML document into a table.

SSIS Package Setup Before digging in to the package tasks shown above, let's discuss the package setup. The following variables are defined in the SSIS package:

The variables are used as follows:


FtpFileListXML will be populated with the list of files available on the FTP server by the Get Ftp File Listing Script task FtpFileType is used to specify the type of file FtpServer is the URL of the Ftp server; e.g. ftp.yourservername.com FtpWorkingDirectory is the directory or folder on the Ftp server to get the list of files from

The FtpFileType, FtpServer, and FtpWorkingDirectory variables will be set on the command line when we run the package. This will allow our package to retrieve the list of files from any or our ftp servers.

SSIS Problems The package uses the following Connection Managers:


P a g e | 47

OLE DB Connection called Staging which is the SQL Server database where the list of files from the FTP server will be written FTP Connection which defines the FTP Server; the ServerName property is set to the FtpServer package variable

Get Ftp File Listing Let's take a look at the Script task that is used to get the list of files from the FTP server. The Script page of the editor is shown below:

The FtpWorkingDirectory variable allows the package to specify the folder on the FTP server to query for the list of files. The script code will populate the FtpFileListXML variable with the list of files available on the FTP server as an XML document. Now let's review the actual code in the Script task:

SSIS Problems

P a g e | 48

The main points about the above code are as follows:


Step 1: declare a StringBuilder variable to build up the XML document containing the list of available files on the FTP server Step 2: setup the FtpClientConnection object using an FTP Connection Manager defined in the package; the FtpClientConnection class is provided in the Microsoft.SqlServer.Dts.Runtime namespace; the ftpFileNames array will be populated with the list of files; the ftpFolderNames array will be populated with the list of subfolders Step 3: connect to the FTP server and get the list of files and subfolders from the folder specified by the FtpWorkingDirectory variable Step 4: iterate through the ftpFileNames array and build an XML document with the list of file names Step 5: set the FtpFileListXML package variable to the XML document containing the list of file names

SSIS Problems Stage Ftp File Listing

P a g e | 49

Let's take a look at the Execute SQL task that inserts the list of files available on the FTP server into a table. The create table script is as follows: CREATE TABLE dbo.stg_FtpFileList ( FileType nvarchar(50) NOT NULL, FileName nvarchar(50) NOT NULL ) The following stored procedure is used to shred the XML document created in the Get Ftp File Listing task and insert the available files into the table above: CREATE PROCEDURE [dbo].[usp_PutFtpFileList] @fileType nvarchar(50) , @xml xml AS BEGIN SET NOCOUNT ON; DELETE FROM [dbo].[stg_FtpFileList] WHERE [FileType] = @fileType INSERT INTO [dbo].[stg_FtpFileList] ( [FileType] ,[FileName] ) SELECT @fileType ,doc.col.value('@name', 'nvarchar(50)') filename FROM @xml.nodes('//file') doc(col) END The above stored procedure takes advantage of new XML capabilities provided in SQL Server 2005; take a look at our previous tip Replacing OPENXML with the XML nodes() Function in SQL Server 2005 for the details. Running the Package To run the package use DTEXEC and specify values for the FtpFileType, FtpServer, and FtpWorkingDirectory variables on the command line; e.g. here is a batch file that you could use: SET FTPSERVER=ftp.yourserver.com SET FTPFILETYPE=XXX SET FTPWORKINGDIRECTORY=/yourftpfolder DTEXEC /FILE GetFtpFileListing.dtsx [variable assignments go here] The entire DTEXEC command must be specified on a single line; replace [variable assignments go here] with the following:

SSIS Problems

P a g e | 50

/SET \Package.Variables[FtpFileType].Properties[Value];%FTPFILETYPE% /SET \Package.Variables[FtpServer].Properties[Value];%FTPSERVER% /SET \Package.Variables[FtpWorkingDirectory].Properties[Value];%FTPWORKINGDIRECTORY% After running the package you can query the stg_FtpFileList table to get the list of files available in a particular folder on an FTP server; e.g.:

Next Steps

While SSIS provides quite a few useful tasks, occasionally you need something more. The Script task is likely a good alternative as it allows you to execute any VB .NET code inside of your SSIS package. Take a look at the sample code here to experiment on your own. The FtpClientConnection class used in the above example has other properties and methods that you may find useful; you can get the details here. The FTP task also has other properties that you may need to set such as, user name, password, etc.

Problem I'm trying to build an SSIS package where the entire package is encapsulated in a transaction. In addition there is a table that needs to remain locked for the duration of the SSIS package execution. Can you provide an example of how to do this? Solution The transaction handling that is built in to SSIS can easily support your requirements. Before we get in to the specifics of implementing this in SSIS, let's discuss the transaction isolation level, transactions in SSIS at a high level, then walk through an example of using transactions in an SSIS package to solve your problem. Transaction Isolation Levels The transaction isolation level determines the duration that locks are held. We'll use SQL Server as an example. The following transaction isolation levels are available in SQL Server:

SSIS Problems

P a g e | 51

READ UNCOMMITTED - reads do not acquire share locks and they don't wait on locks. This is often referred to as a dirty read because you can read modified data that hasn't been committed yet and it could get rolled back after you read it. READ COMMITTED - reads acquire share locks and wait on any data modified by a transaction in process. This is the SQL Server default. REPEATABLE READ - same as READ COMMITTED but in addition share locks are retained on rows read for the duration of the transaction. In other words any row that is read cannot be modified by another connection until the transaction commits or rolls back. SERIALIZABLE - same as REPEATABLE READ but in addition no other connection can insert rows if the new rows would appear in a SELECT statement already issued. In other words if you issue a select statement in a transaction using the SERIALIZABLE isolation level you will get the same exact result set if you issue the select statement again within the same transaction.

SQL Server 2005 added two new options:

A variation on READ COMMITTED where you set READ_COMMITTED_SNAPHOT ON at the database level and any transaction that uses the READ COMMITTED isolation level will not acquire share locks and will not wait on any locks. Rather, you will get the committed version of all rows at the time the SELECT statement begins. A new isolation level called SNAPSHOT where you set ALLOW_SNAPSHOT_ISOLATION ON at the database level and any transaction that explicitly sets the transaction isolation level to snapshot will not acquire share locks and will not wait on any locks. Rather, you will get the committed version of all rows at the time the transaction begins.

Both of the above SQL Server 2005 enhancements are made possible by maintaining committed versions of rows in tempdb (referred to as the version store). When a read encounters a row that has been modified and not yet committed, it retrieves the appropriate latest committed row from the version store. The maintenance and traversing of the version store is performed by SQL Server automatically; there are no code changes required. Transactions in SSIS Transaction support is built in to SSIS. The TransactionOption property exists at the package level, container level (e.g. For Loop, Foreach Loop, Sequence, etc.), as well as just about any Control Flow task (e.g. Execute SQL task, Data Flow task, etc.). TransactionOption can be set to one of the following:

Required - if a transaction exists join it else start a new one Supported - if a transaction exists join it (this is the default) NotSupported - do not join an existing transaction

The built-in transaction support in SSIS makes use of the Distributed Transaction Coordinator (MSDTC) service which must be running. MSDTC also allows you to perform distributed transactions; e.g. updating a SQL Server database and an Oracle database in the same transaction. If you execute an SSIS package that utilizes the built-in transaction support and MSDTC is not running, you will get an error message like the following: Error: 0xC001401A at Transaction: The SSIS Runtime has failed to start the distributed transaction due to error 0x8004D01B "The Transaction Manager is not available.". The DTC transaction failed to start. This could occur because the MSDTC Service is not running.

SSIS Problems

P a g e | 52

Note also that the SSIS package elements also have an IsolationLevel property with a default of Serializable. As discussed above in the section on Transaction Isolation Levels, this setting impacts the duration of locks as well as whether shared locks are acquired. SSIS Package Example Let's take a look at a sample SSIS package that we will use to demonstrate how to implement transactions at the package level and lock a table for the duration of the package's execution:

The Test Initialization sequence container is used to create a test environment. Two tables are created (TranQueue and TranQueueHistory) and a row is inserted into TranQueue. This will allow us to simulate a process where the SSIS package processes a group of rows inside of a transaction. The TransactionOption setting for the Test Initialization sequence container is NotSupported since it only exists to create the test environment; i.e. we don't need any transaction support here which would rollback any successful steps in the event of a failure. The Process sequence container has its TransactionOption set to Supported; since the package setting for TransactionOption is set to Required, a transaction is created at the package level and the container will join that transaction. Process TranQueue is an Execute SQL task that executes the following SQL command to simulate processing a group of rows in the TranQueue table:

DELETE TOP(10) dbo.TranQueue OUTPUT DELETED.* INTO dbo.TranQueueHistory FROM dbo.TranQueue WITH (TABLOCKX) The main points about this SQL command are:

It deletes the first ten rows from the TranQueue table to simulate pulling them out for processing

SSIS Problems

P a g e | 53

It uses the OUTPUT clause to insert the message column of each deleted row into the TranQueueHistory table to simulate processing has completed and history is being updated It uses the TABLOCKX table hint to lock the TranQueue table

The Placeholder for Breakpoint Execute SQL task does not execute a command; it's there so we can set a breakpoint and run some queries while the package is running and the transaction is open (discussed below). The Simulate Failure Execute SQL task is executed if the package variable v_SimulateFailure = 1; it does a SELECT 1/0 to generate an error (i.e. a divide by zero) which will cause a rollback on the package transaction. The above example is intentionally short just for demonstration purposes. You can certainly have multiple tasks in the Process sequence container, all of which would participate in the transaction, and either all succeed on none succeed (i.e. rollback on failure). You can download the project containing the sample SSIS package here. The package is hardcoded to use a local database named mssqltips; create it if it doesn't exist. Open the project using SQL Server Business Intelligence Development Studio (BIDS) and double click on the package Transaction.dtsx. Follow these steps to see the transaction handling in an SSIS package:

Make sure the value of the variable v_SimulateFailure = 1; this will demonstrate the rollback Make sure there is a breakpoint on the Placeholder for Breakpoint Execute SQL task Execute the package; your screen should look like this (stopping at the breakpoint):

Open a new query window in SQL Server Management Studio, connect to the mssqltips database and execute the command below. You should see a single row result set; e.g. Test Message2008-09-08 14:22:31.043 (your date and time will be different of course). The NOLOCK hint ignores locks; the row you see is not committed yet. SELECT * FROM dbo.TranQueueHistory WITH (NOLOCK)

Open another new query window in SQL Server Management Studio, connect to the mssqltips database and execute the command below. You will be blocked waiting for the transaction executing in the SSIS package to either rollback or commit since we added the

SSIS Problems

P a g e | 54

TABLOCKX hint which will keep the TranQueue table locked for the duration of the transaction. Alternatively you could issue an INSERT INTO the dbo.TranQueue table and you will see that it also is blocked until the transaction either commits or does a rollback. SELECT * FROM dbo.TranQueue

Click Continue in BIDS (or click Debug on the top-level menu then Continue) and you will see the package fail. Execute the SELECT statement above on the TranQueueHistory table again and you will see no rows; the select statement above on the TranQueue table will complete showing a single row. Thus the error caused the transaction to rollback. After the rollback the deleted row(s) in the TranQueue table are restored and the inserted row(s) in the TranQueueHistory table are not committed (i.e. they will disappear).

You can change the value of the v_SimulateFailure variable to 0 and run the package and queries above again to validate that the transaction commit works as we expect. Problem We are looking to automate the processing of our SQL Server Analysis Services dimensions and cubes. We'd like to add this processing to our existing SQL Server Integration Services (SSIS) packages which periodically update our data warehouse from our OLTP systems. Can you give us the details on how the Analysis Services Processing Task can be used in an SSIS package? Solution The Analysis Services Processing Task allows you to process dimensions, measure group partitions, and mining models in an SSIS package. While you can process all of these objects at one time, you can also select a subset of these objects to be processed as well. For example you may update certain dimension and fact tables in your data warehouse on a periodic basis by running an SSIS package. As a final step in the SSIS package, you would like to process just the dimensions and measure group partitions that use those data warehouse tables as their data source. The Analysis Services Processing Task allows you to do that. In this tip we will walk through the steps to use the Analysis Services Processing Task in an SSIS package. We'll create a sample package that will process a dimension and a measure group partition in the Adventure Works DW Analysis Services database that comes with SQL Server 2005. Our hypothetical scenario is that we run an SSIS package to update the Product and Currency Rate tables in our data warehouse on a daily basis. We would like to add a step to the SSIS package to process the Product dimension and the Currency Rate fact table, thereby updating the information available in our SQL Server Analysis Services cube. Create the Sample SSIS Package To begin launch Business Intelligence Development Studio (BIDS) from the Microsoft SQL Server 2005 program group and create a new Integration Services project. An SSIS package named Package.dtsx will be created automatically and added to the project. Rename the package to SSASProcessingTask_Demo.dtsx then perform the following steps on the SSIS package: Step 1: Add a Connection Manager for the SSAS server. Right click in the Connection Managers area and select New Analysis Services Connection from the context menu. Accept the defaults in the dialog to connect to the local SSAS Server (or edit as appropriate if you want to connect to an SSAS Server on another machine):

SSIS Problems

P a g e | 55

Step 2: Drag and drop the Analysis Services Processing Task from the Toolbox onto the Control Flow of the SSIS package. Edit the Analysis Services Processing Task; select the connection manager defined in step 1 above and click the Add button to select the objects to be processed:

The Process Options selected work as follows:


Process Incremental on a measure group partition is used to load just new rows from the fact table. It requires additional settings which we will complete in the next step. Process Update for a dimension will update the dimension with any inserts, updates or deletes from the data warehouse.

Step 3: Click the Configure hyperlink in the Currency_Rates row shown in step 2 above. Since we have selected Process Incremental as the Process Option we need to either specify a table or view to load the new fact rows from or specify a query; we'll specify a query and assume that the stg_FactCurrencyRate table is populated with just the new fact rows to be added to the measure group partition.

SSIS Problems

P a g e | 56

Next Steps

If you don't already have the AdventureWorks SSAS sample projects and databases available, you can download them here to get the starting point for this tip. Click the AdventureWorksBICI.msi link. Also click on the Release Notes link for the details on attaching the relational database. The default install location for the project is C:\Program Files\Microsoft SQL Server\90\Tools\Samples\AdventureWorks Analysis Services Project; you will see Enterprise and Standard folders. We used the project in the Enterprise folder. Take a look at the technical article Analysis Services 2005 Processing Architecture for an indepth discussion of the processing options available for cubes, dimensions, and mining models. You can download the sample SSIS project created in this tip here.

Problem We are looking to automate some tasks to be performed on our SQL Server Analysis Services Servers. Can you give us the details on how the Analysis Services Execute DDL Task can be used in a SQL Server Integration Services (SSIS) package? Solution The Analysis Services Execute DDL Task is a very useful one, allowing you to do just about anything with a SQL Server Analysis Services instance. For example you could backup a database, process a cube, create a partition, merge partitions, etc. You specify commands to be executed using XML for Analysis (XMLA) which is the native XML protocol for all interaction between a client

SSIS Problems

P a g e | 57

application and a Microsoft SQL Server Analysis Services instance. You can find all of the details about XMLA in Books on Line; just search on XMLA. A key point to keep in mind is that you can use SQL Server Management Studio (SSMS) to generate a script for just about anything you need to do. For instance you can connect to a SQL Server Analysis Services server, right click on a database, then select Back Up from the context menu. You can then click the Script button on the Backup Database dialog to generate the XMLA script to perform the backup. You can run this XMLA script from an SSIS package by using the Analysis Services Execute DDL Task. The benefit of creating the SSIS package is that you now have a repeatable process that you can run on demand or schedule via SQL Server Agent. In this tip we will walk through the steps to use the Analysis Services Execute DDL Task in an SSIS package. We'll create a sample package that will perform a backup of the Adventure Works DW Analysis Services database that comes with SQL Server 2005. Create the Sample SSIS Package To begin launch Business Intelligence Development Studio (BIDS) from the Microsoft SQL Server 2005 program group and create a new Integration Services project. An SSIS package named Package.dtsx will be created automatically and added to the project. Rename the package to SSASExecuteDDLTask_Demo.dtsx then perform the following steps on the SSIS package: Step 1: Add a Connection Manager for the SSAS server. Right click in the Connection Managers area and select New Analysis Services Connection from the context menu. Accept the defaults in the dialog to connect to the local SSAS Server (or edit as appropriate if you want to connect to an SSAS Server on another machine):

Step 2: Add a string variable to the package; we will use this variable to contain the XMLA script to perform the backup. Right click on the Control Flow, select Variables from the context menu, then enter the variable as follows:

SSIS Problems

P a g e | 58

Step 3: Drag and drop the Script Task from the Toolbox onto the Control Flow of the SSIS package. Edit the Script Task and add the package variable created in Step 2 above to the ReadWriteVariables property. We will assign the XMLA script to this variable in the next step.

Step 4: Click the Design Script button in the Script Task Editor and enter the following XMLA script (remember you can generate the script using SSMS): Public Sub Main() Dim backupfilename As String = "AdventureWorksDW_" + Now().ToString("MMddyyyy") + ".abf" Dim xml As String = _ "<Backup xmlns=""http://schemas.microsoft.com/analysisservices/2003/engine"">" + _ "<Object>" + _ " <DatabaseID>Adventure Works DW</DatabaseID>" + _ "</Object>" + _ "<File>${BACKUPFILENAME}</File>" + _ "</Backup>" Dts.Variables("User::v_XMLA").Value = xml.Replace("${BACKUPFILENAME}", backupfilename) Dts.TaskResult = Dts.Results.Success End Sub This is just an example of how you might fine tune the XMLA that you generate with SSMS. The backup file name is modified to include the current date. The resulting XMLA is stored in the package variable named v_XMLA. The use of ${BACKUPFILENAME} for the text to replace is purely arbitrary, but hopefully intuitive. Step 5: Drag and drop the Analysis Services Execute DDL Task from the Toolbox onto the Control Flow of the SSIS package and connect it to the Script Task configured earlier. Open the Analysis Services Execute DDL Task editor, click on DDL in the list box on the left, and set the properties as follows:

SSIS Problems

P a g e | 59

The XMLA to execute is defined in the package variable that we setup in the previous step. At this point the SSIS package will look like this:

At this point you can execute the SSIS package and you will see the backup file created; the default location is specified in the BackupDir property for the Analysis Server; e.g. C:\Program Files\Microsoft SQL Server\MSSQL.2\OLAP\Backup. Problem We routinely load data warehouses with multiple years worth of fact rows at a time. We'd like to perform this process in batches and be able to restart at the point of failure when an error occurs. Can you give us an example of how we might implement this batching capability in an SSIS package? Solution SSIS supports batch processing very nicely with the existing components in the Toolbox. A simple approach to implementing batch processing in SSIS is to come up with a way to group the rows to be processed into batches, process each batch, then update each group as processed. Let's begin by describing a scenario then implement an SSIS package to perform the work. A common requirement in developing reporting applications is to aggregate data allowing report queries to run very quickly. Let's assume that we want to aggregate data by month. We also want

SSIS Problems

P a g e | 60

to have the capability to make adjustments to the aggregated data and only recalculate the month(s) that have adjustments. We can envision an SSIS package with the following steps:

Get Batch List is an Execute SQL task that groups the source data to be processed into batches, creating a result set that contains a single row per batch Process Batch Loop is a Foreach Loop container that iterates over the result set rows; i.e. executes once for each row in the result set Transaction Container is a Sequence container that contains the tasks to be executed for each iteration of the loop; it controls the transaction used to commit if successful or rollback on error Append Batch to Sales History is an Execute SQL task that extracts a batch of rows and inserts them to a history table Compute Aggregation is an Execute SQL task that performs the aggregations on the batch and updates an aggregation table Mark Batch as Processed is an Execute SQL task that updates rows in the source table to indicate that they have been processed

In the following sections we will discuss each step in the SSIS package in detail. Let's begin with the setup then proceed through the steps. Setup

SSIS Problems

P a g e | 61

For simplicity sake we'll get our source data from the AdventureWorks sample database that comes with SQL Server 2005. Use the following script to copy the SalesOrderHeader and SalesOrderDetail tables from AdventureWorks into a database called mssqltips (create this database if it doesn't exist): USE mssqltips GO SELECT * INTO dbo.imp_SalesOrderHeader FROM AdventureWorks.Sales.SalesOrderHeader SELECT * INTO dbo.imp_SalesOrderDetail FROM AdventureWorks.Sales.SalesOrderDetail ALTER TABLE dbo.imp_SalesOrderHeader ADD Processed bit not null default 0 GO The Processed column will be updated to 1 as the rows are processed. In the SSIS package the following variables will be used:

We'll describe the variable usage in the sections below. Get Batch List Get Batch List executes a stored procedure that groups the source data into batches. While there are many ways to accomplish this task, in this case we simply group on year and month in the stored procedure stp_CreateOrderBatchList: SELECT DATEPART(YYYY,OrderDate) OrderYear ,DATEPART(MONTH,OrderDate) OrderMonth FROM dbo.imp_SalesOrderHeader WHERE Processed = 0 GROUP BY DATEPART(YYYY,OrderDate) ,DATEPART(MONTH,OrderDate) ORDER BY DATEPART(YYYY,OrderDate) ,DATEPART(MONTH,OrderDate)

SSIS Problems

P a g e | 62

Note that the stored procedure only gets rows where the Processed column is equal to zero. The Execute SQL task in our SSIS package executes the above stored procedure. The General properties page is shown below:

Note the ResultSet property is set to Full result set. The stored procedure just does a SELECT and the Execute SQL task stores the result set in a package variable. The Result Set properties page maps the result set to the package variable User::v_BatchList. The variable type must be System.Object. The Result Name of 0 (zero) is required.

Process Batch Loop Process Batch Loop is a Foreach Loop container that iterates over the result set created in Get Batch List, one time for each row in the result set. There are two property pages to be configured Collection and Variable Mappings. The Collection property page has the following settings:

SSIS Problems

P a g e | 63

In order to iterate through the result set created by Get Batch List, the Enumerator is set to Foreach ADO Enumerator and the ADO object source variable is set to User::v_BatchList. Get Batch List mapped the User::v_BatchList variable to the result set. The Enumeration mode is set to Rows in the first table (there is only one table in the result set). The Variable Mappings property page has the following settings:

The stored procedure executed in Get Batch List returns a result set that has two columns OrderYear and OrderMonth. The Variable Mappings property page maps the columns in each row of the result set to the package variables based on the ordinal position of the column (the first column is 0). Transaction Container The Transaction Container is a Sequence container. The tasks inside of the container are all executed in a transaction. They either all succeed and are committed or they are rolled back on error. Set the TransactionOption property of the Sequence container to Required; this setting executes all tasks inside the container in the context of a transaction. A new transaction is created each time through the loop. Append Batch to Sales History Append Batch to Sales History is an Execute SQL task that calls a stored procedure to extract a single batch of data from the source table and append it to the sales history table. If

SSIS Problems

P a g e | 64

transformations were required we would use a Data Flow task. The sales history table and stored procedure are as follows: CREATE TABLE dbo.SalesHistory ( OrderYear int not null, OrderMonth int not null, ProductID int not null, OrderQty smallint not null, LineTotal money not null )

CREATE PROCEDURE dbo.stp_AppendSalesHistory @OrderYear int ,@OrderMonth int AS BEGIN SET NOCOUNT ON; INSERT INTO dbo.SalesHistory ( OrderYear ,OrderMonth ,ProductID ,OrderQty ,LineTotal ) SELECT DATEPART(YYYY,m.OrderDate) ,DATEPART(MONTH,m.OrderDate) ,d.ProductID ,d.OrderQty ,d.LineTotal FROM dbo.imp_SalesOrderHeader m JOIN dbo.imp_SalesOrderDetail d ON d.SalesOrderID = m.SalesOrderID WHERE Processed = 0 AND DATEPART(YYYY,m.OrderDate) = @OrderYear AND DATEPART(MONTH,m.OrderDate) = @OrderMonth END GO Note that the stored procedure only gets rows where the Processed column is equal to zero. The General property settings for the Execute SQL task are as follows:

SSIS Problems

P a g e | 65

The Parameter Mapping property settings for the Execute SQL task are as follows:

In the above settings the SQLStatement is set to execute the stored procedure, with placeholders for the required parameters. SSIS package variables are mapped to the parameters based on the ordinal number of the parameters in the stored procedure. Compute Aggregation Compute Aggregation is an Execute SQL task that recalculates the summary data in the sales history summary table for the order year and order month batch being processed. The sales history summary table and stored procedure are as follows: CREATE TABLE dbo.SalesHistorySummary ( OrderYear int not null, OrderMonth int not null, ProductID int not null, OrderQty smallint not null, LineTotal money not null )

CREATE PROCEDURE dbo.stp_CalcSalesHistorySummary @OrderYear int ,@OrderMonth int AS BEGIN

SSIS Problems SET NOCOUNT ON; DELETE FROM dbo.SalesHistorySummary WHERE OrderYear = @OrderYear AND OrderMonth = @OrderMonth; INSERT INTO dbo.SalesHistorySummary ( OrderYear ,OrderMonth ,ProductID ,OrderQty ,LineTotal ) SELECT OrderYear ,OrderMonth ,ProductID ,SUM(OrderQty) ,SUM(LineTotal) FROM dbo.SalesHistory WHERE OrderYear = @OrderYear AND OrderMonth = @OrderMonth GROUP BY OrderYear ,OrderMonth ,ProductID END GO

P a g e | 66

The above stored procedure first deletes any rows in the summary table for the order year and month being processed, then performs the aggregation and insert. The Execute SQL task property settings are the same as in Append Batch to Sales History except for the name of the stored procedure to execute; we'll skip showing the screen shots. Mark Batch as Processed Mark Batch as Processed is an Execute SQL task that updates the Processed column in the source table for the rows that have been processed in the current batch. It invokes the following stored procedure: CREATE PROCEDURE dbo.stp_MarkOrdersProcessed @OrderYear int ,@OrderMonth int AS BEGIN SET NOCOUNT ON; UPDATE dbo.imp_SalesOrderHeader SET Processed = 1 WHERE DATEPART(YYYY,OrderDate) = @OrderYear AND DATEPART(MONTH,OrderDate) = @OrderMonth; END GO

SSIS Problems

P a g e | 67

The Execute SQL task property settings are also the same as before except for the name of the stored procedure; we will skip the screen shots. Summary Let's highlight the key points in our sample SSIS package that implements batch processing:

Group the source data into batches; use the Execute SQL task and create a Full result set which saves the result set in a package variable. Iterate over the result set using a Foreach Loop Container. Use a Sequence Container to define a transaction and add the appropriate tasks inside the Sequence Container. The package design supports restarting the package at the beginning in the event of any error; the Sequence Container commits all work if successful or does a rollback if there are any errors. The source data is flagged when processed so it will not be processed again if the package is restarted. Problem When developing SQL Server Integration Services (SSIS) packages there is sometimes the need to only run certain steps or paths in the package execution either based on time period or maybe a parameter value that is passed to the package or queried from the database. How do you setup an SSIS package to have different execution paths based on a parameter value? Solution This is a pretty easy process to setup, but maybe not as intuitive as you might think. There is often the need to have one package do several things, but only having certain steps run at certain times. By setting a variable and then based on this value one of many paths can be taken. Here is an example of a simple package that just has four Execute SQL Tasks. Each task is just doing a SELECT to illustrate this example, there is nothing unique about this code or this task. The examples below should be able to be used for any of tasks that are available in SSIS. For this example there are four Execute SQL Tasks with two different paths that can be taken and then control is passed back to Task 3 to complete the package.

If we execute the package we can see that all four tasks execute and complete. This is great if this is what you want to do, but for our example we want to only run Task 2a or Task 2b based on a parameter value.

SSIS Problems

P a g e | 68

Here we setup a package variable called "test", but this could be called anything. We are making this a Boolean data type with a value of "False".

If we double click on the workflow arrow from Task 1 to Task 2a we get the dialog box below. Here we are setting the workflow to go down this path based on an Expression value (first dropdown list) and the Expression equals (@test == True). The @test is the name of the variable we setup above. This expression is saying that if our variable "test" equals "True" then to go from Task 1 to Task 2a. We do the same thing for the workflow arrow from Task 1 to Task 2b, but this is set to (@test == False)

SSIS Problems

P a g e | 69

After changing both of these and clicking OK the package should look like the following. You will see the colors of the lines changed from green to blue.

If we execute the package now, we can see that Task 1 and Task 2b run (@test = False), but the package never reaches Task 3.

SSIS Problems

P a g e | 70

To fix this problem, we need to double click on the workflow arrow from Task 2a to Task 3 and change the option from "Logical AND" to "Logical OR". With the Logical OR we are saying that either Task 2a or Task 2b needs to complete and then control passes on to Task 3.

Once you click OK the package will look like the following. Even though you only changed one of the workflow values, both changed from "Logical AND" to "Logical OR".

SSIS Problems

P a g e | 71

At this point if we execute the package again, we see that the flow goes from Task 1 to Task 2b to Task 3 as planned.

To test our package when our variable value equals "True" we just change our variable from "False" to "True"

If we run it now we can see that since our variable is set to "True" control goes from Task 1 to Task 2a to Task3.

SSIS Problems

P a g e | 72

To take this a step further, we will store the value for "test" in a table and query the database to get the value. First we create a table and add one row to the table with a value of 0 (False).

CREATE TABLE [dbo].[packageControls]( [test] [bit] NULL ) ON [PRIMARY] GO INSERT INTO dbo.packageControls (test) VALUES (0) -- FALSE

For Task 1 we change the query to get the data from table packageControls and change the ResultSet to "Single row" as highlighted below.

SSIS Problems

P a g e | 73

On the Result Set window we "Add" a new "Result Name" equal to "test" that maps to our variable "User::test".

SSIS Problems

P a g e | 74

At this point our package should execute Task 1 to Task 2b to Task 3.

If we update our table and set "test" equal to "True" we should go from Task 1 to Task 2a to Task 3.

UPDATE [dbo].[packageControls] SET test = -1 -- TRUE

SSIS Problems

P a g e | 75

Problem We have experimented with the Slowly Changing Dimension (SCD) Data Flow Transformation that is available in the SSIS designer and have found a few issues with it. Our major concern is the use of the OLE DB Command Data Flow Transformation for all updates (inferred, type 1 and type 2) to rows in the dimension table. Do you have any suggestions? Solution You have hit upon the one issue with the Slowly Changing Dimension Data Flow Transformation that likely requires an alternative approach. The issue with the OLE DB Command is that it executes a single SQL statement for each row passing through the Data Flow. When the volume of rows is substantial, this creates an unnecessary and probably unacceptable performance hit. Let's take a step back and analyze the task at hand then discuss an alternative solution. The term slowly changing dimensions encompasses the following three different methods for handling changes to columns in a data warehouse dimension table:

Type 1 - update the columns in the dimension row without preserving any change history. Type 2 - preserve the change history in the dimension table and create a new row when there are changes. Type 3 - some combination of Type 1 and Type 2, usually maintaining multiple instances of a column in the dimension row; e.g. a current value and one or more previous values.

A dimension that implements Type 2 changes would typically have the following housekeeping columns to identify the current row and the effective date range for each row:

Natural Key - the unique source system key that identifies the entity; e.g. CustomerID in the source system would be called nk_CustomerID in the dimension. Surrogate Key (or warehouse key) - typically an identity value used to uniquely identify the row in the dimension. For a given natural key there will be an instance of a row for each Type 2 change so the natural key will not be unique in the dimension. CurrentMember - a bit column to indicate if the row is the current row. EffectiveDate - a datetime (or smalldatetime) column to indicate when the row became the current row. ExpirationDate - a datetime (or smalldatetime) column to indicate when the row ceased being the current row.

SSIS Problems

P a g e | 76

The effective date range columns retain the history of a natural key in the dimension, allowing us to see the column values at any point in time. Fact table rows can be joined to the dimension row where the fact row transaction date is between the effective date range of the dimension row. When you add the SCD Data Flow Transformation to the Data Flow designer, you step through a wizard to configure the task, and you will wind up with the Slowly Changing Dimension task and everything that follows below being added to the Data Flow designer (the task names generated by the SCD wizard have been updated to add clarification):

Main points about the above screen shot:


The Excel Source is a sample data source representing data extracted from a source system that is used to update a dimension table in a data warehouse. The Type 1 OLE DB Command task updates dimension rows one at a time by executing an UPDATE statement on the dimension table. The Type 2 OLE DB Command task "expires" the current dimension rows one at a time (sets the ExpirationDate or CurrentMember flag) by executing an UPDATE statement.

SSIS Problems

P a g e | 77

The Insert New OLE DB Destination task inserts a new row into the dimension table when there is a new row in the source system or a Type 2 change. The Inferred OLE DB Command task performs a Type 1 update to a dimension row that was created with default values as a result of an early arriving fact. An early arriving fact is one where the fact row has a source system key value that does not exist in the dimension; we will discuss Inferred processing in part two of this tip.

Now that we have described how the SCD transformation implements slowly changing dimension processing, we can discuss an alternative solution. As an example we will use a Customer dimension that is updated with source system data in an Excel spreadsheet. The SSIS package Control Flow looks like this:

Main points about the above solution:


Truncate Customer Staging Table is an Execute SQL task that clears out the Customer dimension staging table. Stage Customer Data from Source System is a Data Flow task that extracts the rows from the Excel spreadsheet, cleanses and transforms the data, and writes the data out to the staging table. Update Customer Dimension is an Execute SQL task that invokes a stored procedure that implements the Type 1 and Type 2 handling on the Customer dimension.

An additional detail about Type 1 and Type 2 processing is that a dimension may implement both. In other words some column changes may be handled as Type 1 and other column changes may be handled as Type 2. An elegant way to implement this is to take advantage of the SQL Server CHECKSUM function. CHECKSUM calculates a unique integer hash value based on the values of every column in a row or a subset of columns. We can use a hash value comparison to determine whether anything has changed in our list of columns in the staging table versus the dimension table. Let's take a look at our Customer dimension table:

SSIS Problems

P a g e | 78

The checksum columns are defined as follows:


[Type1Checksum] AS CHECKSUM([ContactName],[ContactTitle],[Phone],[Fax]) [Type2Checksum] AS CHECKSUM([Address],[City],[Region],[PostalCode],[Country])

There is a separate CHECKSUM value calculated for the list of Type 1 columns and the list of Type 2 columns. In our staging table we have the same two CHECKSUM computed columns; the column lists must match exactly in order for this to work. As a general rule the staging table schema mirrors the dimension table schema but includes a couple of other housekeeping columns as shown below:

SSIS Problems

P a g e | 79

The housekeeping columns in staging are as follows:


IsNew is set to 1 if this is a new dimension row. IsType1 is set to 1 if there is a change to any column handled as Type 1. IsType2 is set to 1 if there is a change to any column handled as Type 2.

Finally let's review the single stored procedure that implements the Type 1 and Type 2 processing and is invoked in the Update Customer Dimension Execute SQL task as noted above. The first step is to update the housekeeping columns in the staging table to specify whether the row is new, has a Type 1 change, or a Type 2 change. Remember that Type 1 and Type 2 changes are not mutually exclusive; you can have one, both, or neither. We simply join the staging table to the dimension on the natural key and CurrentMember = 1 to set the housekeeping flags. UPDATE stg SET wk_Customer = dim.wk_Customer ,IsNew = CASE WHEN dim.wk_Customer IS NULL THEN 1 ELSE 0 END ,IsType1 = CASE WHEN dim.wk_Customer IS NOT NULL AND stg.Type1Checksum <> dim.Type1Checksum THEN 1 ELSE 0 END ,IsType2 = CASE WHEN dim.wk_Customer IS NOT NULL AND stg.Type2Checksum <> dim.Type2Checksum THEN 1 ELSE 0 END FROM dbo.stg_dim_Customer stg LEFT OUTER JOIN dbo.dim_Customer dim ON dim.nk_CustomerID = stg.nk_CustomerID

SSIS Problems AND dim.CurrentMember = 1

P a g e | 80

The Type 1 changes are handled by updating the dimension table from staging where the IsType1 column = 1. Note that if there are multiple rows for the natural key in the dimension, all rows will be updated. This is typically how Type1 changes are handled but you can easily restrict the update to the current row if desired. UPDATE dim SET [ContactName] = stg.[ContactName] ,[ContactTitle] = stg.[ContactTitle] ,[Phone] = stg.[Phone] ,[Fax] = stg.[Fax] FROM dbo.stg_dim_Customer stg JOIN dbo.dim_Customer dim ON dim.nk_CustomerID = stg.nk_CustomerID WHERE IsType1 = 1 The Type 2 changes are handled by expiring the current dimension row. The ExpirationDate is set to the ModifiedDate per the staging table less 1 minute. UPDATE dim SET CurrentMember = 0 ,ExpirationDate = DATEADD(minute, -1, stg.ModifiedDate) FROM dbo.stg_dim_Customer stg JOIN dbo.dim_Customer dim ON dim.wk_Customer = stg.wk_Customer WHERE IsType2 = 1 A row is inserted into the dimension table for new rows as well as Type 2 changes. Typically the EffectiveDate in new rows may be set to the minimum value of the datetime column as a convenience instead of the actual ModifiedDate (i.e. created date) just so that if a fact row had a transaction date before the dimension row's EffectiveDate it would still be in the range of the earliest dimension row. The ExpirationDate is set to the maximum value of the datetime column; some folks prefer NULL which also works. INSERT INTO dbo.dim_Customer ( nk_CustomerID ,CurrentMember ,EffectiveDate ,ExpirationDate ,CompanyName ,ContactName ,ContactTitle ,Address ,City ,Region ,PostalCode ,Country ,Phone ,Fax ) SELECT

SSIS Problems

P a g e | 81

nk_CustomerID ,1 ,CASE WHEN IsNew = 1 THEN '1900-01-01' -- MIN of smalldatetime ELSE ModifiedDate END ,'2079-06-06' -- MAX of smalldatetime ,CompanyName ,ContactName ,ContactTitle ,Address ,City ,Region ,PostalCode ,Country ,Phone ,Fax FROM dbo.stg_dim_Customer stg WHERE IsType2 = 1 OR IsNew = 1 Let's take a look at an example of Type 2 processing in the dim_Customer table. The following query results show a customer after the region has been updated. Region is one of the columns that is handled as a Type 2 change. As you can see a new row has been inserted with CurrentMember = 1, an EffectiveDate = the MdifiedDate when the change was processed, and an ExpirationDate which is the maximum value for a smalldatetime. The original row was expired and its CurrentMember = 0 and ExpirationDate is set to the ModifiedDate from the source system less 1 minute. The 1 minute subtraction eliminates any overlap in the effective date range.

Problem We have adopted XML configuration files as a standard development practice within our organization. We use them in ASP.NET web applications as well as Windows Forms applications. How can we use XML configuration files with our SSIS packages? Solution SSIS has built-in support for using XML files for package configuration. Just about any property setting in a package or task can be retrieved at runtime from an XML configuration file. A single XML configuration file can store as many configuration settings as you need, allowing you to use the same configuration file for multiple SSIS packages. However, every property that is specified in the XML configuration file must exist in the SSIS package else an error is raised when opened in BIDS (or Visual Studio). A simple work around for this behavior is to use multiple XML configuration files, each containing a group of properties that exist in every package that uses the XML configuration file. Let's walk thru an example to demonstrate the steps required to setup an XML package configuration. Assume that we want to populate an Excel file with sales information from the AdventureWorks database. Ultimately we would like to execute this SSIS package from a SQL Agent job which will run on a scheduled basis. In addition we would like the ability to run the package for any date range on demand.

SSIS Problems

P a g e | 82

For our example we will specify the following configuration settings in our XML package configuration:

Connection string for the AdventureWorks database Connection string (i.e. the file folder and file name) for the Excel file Begin date End Date

Now we are ready to setup an SSIS package to use XML package configuration. To begin, open Business Intelligence Developer Studio (BIDS) or Visual Studio and create a new Integration Services project. Add a new SSIS package to the project then perform the following steps: 1. Add the Begin Date and End Date variables to the package; we will set these from the XML package configuration and use them as parameters in our SQL command (right click on the Control Flow and select Variables):

2. Add an OLE DB Connection Manager for AdventureWorks (right click in the Connection Managers area and add a new OLE DB connection):

SSIS Problems

P a g e | 83

3. Add an Excel Connection Manager for our output file (right click in the Connection Managers area, select New Connection then Excel):

4. Add a Data Flow Task to the Control Flow (drag/drop Data Flow Task from the Toolbox onto the Control Flow):

SSIS Problems

P a g e | 84

5. Add an OLE DB Source and an Excel Destination to the Data Flow Task (right click the Data Flow Task then select edit; drag/drop source and destination from the Toolbox onto the Data Flow Task):

6. Configure the OLE DB Source (right click then select Edit):

SSIS Problems

P a g e | 85

7. Set the parameters for the SQL command text (click the Parameters button in the OLE DB Source Editor):

SSIS Problems 8. Configure the Excel Destination (right click and select Edit)

P a g e | 86

9. Specify the Excel create table statement (click New on the Excel Destination Editor; override the LineTotal column type as shown below):

10. Enable Package Configuration (right click on the Control Flow, select Package Configurations, then check Enable package configurations):

SSIS Problems

P a g e | 87

11. Add the XML Package Configuration (click Add on the Package Configurations Organizer) :

12. Select the properties to include in the XML Package Configuration file (drill in to the Objects and click the check boxes for the AdventureWorks ConnectionString, Excel Connection Manager ConnectionString, v_BeginDate and v_EndDate variable's Value):

13. Complete the XML Package Configuration:

SSIS Problems

P a g e | 88

14. Review the XML Package Configuration file created (navigate to the file name as noted in Step 13 above and open it with Internet Explorer):

SSIS Problems

P a g e | 89

Note the following in the XML package configuration file (as shown in step 14 above) :

Open the file using your XML editor of choice to make changes as necessary The Path element refers to the object in the SSIS package The ConfiguredValue element holds the value to be used in the package for the object specified in the Path element; this is the one that you need to edit

At this point the XML package configuration example is complete; you can edit the XML file as appropriate and run the SSIS package to see the results. However, the XML file name and path is hard-coded into the SSIS package. It would be nice to make this a variable that we can specify at runtime. Two options come to mind:

Use an environment variable to specify the XML file name and path Set the XML file name and path as a command line parameter at runtime

Step 11 above is where the XML file name and path was specified. We selected the radio button "Specify configuration settings directly" then entered the full path to the file. Instead we could specify the full path to the file in an environment variable, select the radio button "Configuration location is stored in an environment variable", and specify the environment variable. For example:

SSIS Problems

P a g e | 90

To specify the XML file name and path on the command line when executing the SSIS package, navigate to the SSIS package file using Windows Explorer and double click the file to launch the Execute Package Utility. Select Configurations and add the configuration file as shown below then click the Execute button to run the package:

Clicking on Command Line will show the actual DTEXEC command line:

SSIS Problems

P a g e | 91

Caveats:

Remember that any environment variable that you add while BIDS (or Visual Studio) is open will not show up in the environment variable list (e.g. Select Configuration Type dialog above) until you close the application and reopen it. When you are working with an SSIS package in BIDS (or Visual Studio) , the package configuration is read when you open the package. Any changes made to the configuration will not be reflected until you close and reopen the SSIS package. When using an Excel destination the Excel file has to already exist. In addition the default behavior is to append rows to the worksheet. It's usually a good idea to make a backup copy of the Excel file that gets created when configuring the Excel destination (see steps 8 and 9 above). You can then copy/rename the backup to the original Excel file name to start out with an empty worksheet before rerunning your package.

Problem I have seen many of the changes with SQL Server 2005 Integration Services (SSIS) versus SQL Server 2000 Data Transformation Services (DTS). Integration Services certainly has much more functionality out of the box than DTS and I am learning SSIS as my projects move forward. One item that has seemed to be a thorn in my side is deploying an SSIS package. So, I have read your tip (SQL Server Crosswalk - Deploying a SQL 2000 DTS vs. a SQL 2005 SSIS package) related to deploying a package and wanted to find out if any other options are available? Can you shed some light on the situation? Solution You are right about SSIS vs. DTS. SSIS certainly has a great deal of functionality out of the box as compared to DTS, but in some respects some of the simpler aspects of DTS Packages have been over shadowed by a standardized development platform (Business Intelligence Management Studio) with SSIS. The net result is for the sake of new technology, new processes are needed. One of those items could be considered the deployment of an SSIS Package with the deployment manifest wizard. In the SQL Server Crosswalk - Deploying a SQL 2000 DTS vs. a SQL 2005 SSIS package tip we talked about creating a deployment manifest file and using that file for deployment purposes. Let's also outline another approach in this tip which is using Management Studio to import or export an SSIS Package. Deploying an SSIS Package Although the detailed steps for deploying an SSIS Package are outlined in this tip (SQL Server Crosswalk - Deploying a SQL 2000 DTS vs. a SQL 2005 SSIS package) let's outline the general steps:

Build the package in Business Intelligence Management Studio (BIDS) Change the package configurations to build the deployment utility Build the deployment directory with all of the needed files Execute the deployment manifest file which launches the wizard In the course of the wizard, deploy the SSIS Package to either the file system or the MSDB database

SSIS Problems

P a g e | 92

Execute the SSIS Package via the DTExec utility, a SQL Server Agent Job or a script

As you can tell, this is a much different process as compared to the SQL Server 2000 DTS Packages where all of the development and deployment was directly in Enterprise Manager. Importing an SSIS Package in Management Studio As compared to the process above, importing an SSIS package via Management Studio may be considered a much simpler approach. Let's walk through the process of importing the SSIS Package via Management Studio once the package is saved in BIDS. Import a SQL Server Integration Services Package in Management Studio Import Process - In order to start the SSIS import process follow these steps:

Open Management Studio Login to the SQL Server Integration Services instance where you want to import the SSIS Package Expand the 'Stored Packages' folder To access the 'Import Packages...' option, right click on either: o The 'File System' folder o The 'MSDB' folder o An individual SSIS Package Once the SSIS Import Package interface opens, complete the options

Import Package Options - Below outlines the interface options:

Package location o SQL Server - MSDB database

SSIS Problems
o o

P a g e | 93 File System - Directory with the SSIS Package (*.dtsx file) SSIS Package Store - Directories related to the SSIS installation i.e. C:\Program Files\Microsoft SQL Server\90\DTS\Packages\

Server
o SQL Server instance with SSIS installed Authentication o Windows Authentication o SQL Server Package path o Current directory with the SSIS Package Package name o Rename the SSIS Package name when imported Protection level o Level of security assigned to the SSIS package o

Additional information - Setting the Protection Level of Packages

Exporting an SSIS Package in Management Studio Since we covered importing an SSIS Package with Management Studio, let's also cover exporting an SSIS package with Management Studio: Export a SQL Server Integration Services Package in Management Studio Export Process - In order to start the SSIS export process follow these steps:

Open Management Studio Login to the SQL Server Integration Services instance where you want to export the SSIS Package Expand the 'Stored Packages' folder To access the 'Export Packages...' option, right click on an individual SSIS Package Once the SSIS Export Package interface opens, complete the options

SSIS Problems

P a g e | 94

Export Package Options - Below outlines the interface options:

Package location o SQL Server - MSDB database o File System - Directory with the SSIS Package (*.dtsx file) o SSIS Package Store - Directories related to the SSIS installation i.e. C:\Program Files\Microsoft SQL Server\90\DTS\Packages\ Server o SQL Server instance with SSIS installed Authentication o Windows Authentication o SQL Server Package path o Current directory with the SSIS Package Protection level o Level of security assigned to the SSIS package
o

Additional information - Setting the Protection Level of Packages

Delete an SSIS Package in Management Studio The deletion process in Management Studio is very straight forward. Just right click on the package and select the 'Delete' option. Save Copy of Package in Business Intelligence Development Studio Save Copy of Package in Business Intelligence Development Studio (BIDS)

SSIS Problems

P a g e | 95

Save Copy of Package Process - In order to start the process follow these steps:

Finish the SSIS Package in BIDS Navigate to File | Save Copy of Package Once the Save Copy of Package Process interface opens, complete the options

Export Package Options - Below outlines the interface options:

Package location o SQL Server - MSDB database o File System - Directory with the SSIS Package (*.dtsx file) o SSIS Package Store - Directories related to the SSIS installation i.e. C:\Program Files\Microsoft SQL Server\90\DTS\Packages\ Server o SQL Server instance with SSIS installed Authentication o Windows Authentication o SQL Server Package path o Current directory with the SSIS Package Protection level o Level of security assigned to the SSIS package
o

Additional information - Setting the Protection Level of Packages

SSIS Problems Command line management with dtutility

P a g e | 96

Although the Management Studio and Business Intelligence Development Studio offer rich interfaces to manage SSIS Packages, Microsoft also offers the option to copy, move, delete, or verify the existence of an SSIS Package with the dtutil command. This alternative may prove priceless if you are faced with a situation where you need to manage large numbers of SSIS Packages in an automated manner. Here are some simple coding examples with the dtutil command: Copy and rename the SampleSSISPackage to Export_FlatFile_Daily_CustomerData.dtsx dtutil /FILE c:\DevSSISPackages\SampleSSISPackage.dtsx /COPY FILE;c:\TestSSISPackages\Export_FlatFile_Daily_CustomerData.dtsx Move the 'SampleSSISPackage' from Package Store to the MSDB database dtutil /DTS SampleSSISPackage.dtsx /MOVE SQL;SampleSSISPackage Delete the SSIS Package named 'SampleSSISPackage' in the MSDB database dtutil /SQL SampleSSISPackage /DELETE Verify the SSIS Package named 'SampleSSISPackage' in the MSDB database dtutil /SQL SampleSSISPackage /EXISTS

The sample code in this tip for the dtutil command is only the tip of the iceberg. For additional information about the dtutil command visit - dtutil Utility. Problem We have a number of SSIS packages that routinely fail for various reasons such as a particular file is not found, an external FTP server is unavailable, etc. In most cases these error conditions are just a temporary situation and we can simply rerun the package at a later time and it will be successful. The issue, however, is that we do not want to rerun the tasks in the package that have have already completed successfully. Is there a way that we can restart an SSIS package at the point of failure and skip any tasks that were successfully completed in the previous execution of the package? Solution SSIS provides a Checkpoint capability which allows a package to restart at the point of failure. The Checkpoint implementation writes pertinent information to an XML file (i.e. the Checkpoint file) while the package is executing to record tasks that are completed successfully and the values of package variables so that the package's "state" can be restored to what it was when the package failed. When the package completes successfully, the Checkpoint file is removed; the next time the package runs it starts executing from the beginning since there will be no Checkpoint file found. When a package fails, the Checkpoint file remains on disk and can be used the next time the package is executed to restore the values of package variables and restart at the point of failure.

SSIS Problems

P a g e | 97

The starting point for implementing Checkpoints in a package is with the SSIS package properties. You will find these properties in the Properties window under the Checkpoints heading:

CheckpointFileName - Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hardcoded path as shown above, it's a good idea to use an expression that concatenates a path defined in a package variable and the package name. CheckpointUsage - Determines if/how checkpoints are used. Choose from these options: Never (default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist. SaveCheckpoints - Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.

After setting the Checkpoint SSIS package properties, you need to set these properties under the Execution heading at the individual task level:

FailPackageOnFailure - Choose from these options: True or False (default). True indicates that the SSIS package fails if this task fails; this implements the restart at the point of failure behavior when the SSIS package property SaveCheckpoints is True and CheckpointFileUsage is IfExists. FailParentOnFailure - Choose from these options: True or False (default). Select True when the task is inside of a container task such as the Sequence container; set FailPackageOnFailure for the task to False; set FailPackageOnFailure for the container to True.

SSIS Problems

P a g e | 98

Keep in mind that both the SSIS package Checkpoint properties and the individual task properties need to be set appropriately (as described above) in order to implement the restart at the point of failure behavior. Before wrapping up the discussion on Checkpoints, let's differentiate the restart from the point of failure behavior with that of a database transaction. The typical behavior in a database transaction where we have multiple T-SQL commands is that either they all succeed or none of them succeed (i.e. on failure any previous commands are rolled back). The Checkpoint behavior, essentially, is that each command (i.e. task in the SSIS package) is committed upon completion. If a failure occurs the previous commands are not rolled back since they have already been committed upon completion. Let's wrap up this discussion with a simple example to demonstrate the restart at the point of failure behavior of Checkpoints. We have an SSIS package with Checkpoint processing setup to restart at the point of failure as described above. The package has two Execute SQL tasks where the first will succeed and the second will fail. We will see the following output when running the package in BIDS:

Task 1 is green; it executed successfully. Task 2 is red; it failed. If we run the package a second time we will see the following output:

Notice that Task 1 is neither green nor red; in fact it was not executed. The package began execution with Task 2; Task 1 was skipped because it ran successfully the last time the package was run. The first run ended when Task 2 failed. The second run demonstrates the restart at the point of failure behavior.

SSIS Problems Caveats:


P a g e | 99

SSIS does not persist the value of Object variables in the Checkpoint file. When you are running an SSIS package that uses Checkpoints, remember that when you rerun the package after a failure, the values of package variables will be restored to what they were when the package failed. If you make any changes to package configuration values the package will not pickup these changes in a restart after failure. Where the failure is caused by an erroneous package configuration value, correct the value and remove the Checkpoint file before you rerun the package. For a Data Flow task you set the FailPackageOnFailure or FailParentOnFailure properties to True as discussed above. However, there is no restart capability for the tasks inside of the Data Flow; in other words you can restart the package at the Data Flow task but you cannot restart within the Data Flow task.

Problem In SQL Server 2000's Data Transformation Services (DTS) the tool had the ability to issue each portion of a package one step at a time. With some custom coding, it was possible to determine variables and the package status by creating T-SQL tasks to write the items to a table. Unfortunately, this process could become very tedious for large DTS Packages. Another alternative was to move portions of the DTS package that were working to another DTS Package. Then have the problematic package call the first package and then step through the remainder of the tasks in the problematic DTS Package. This just cut down on the number of clicks, but was also tedious. With the many changes in SQL Server 2005, does Integration Services (SSIS) have a better way to review the variables and overall SSIS Package status at particular points in time? Solution Yes - SQL Server 2005 Integration Services (SSIS) has the ability to apply breakpoints to specific objects in the package. When the SSIS Package is executed interactively in the SQL Server 2005 Business Intelligence Studio (BIDS), the breakpoint can give the Developer\DBA an opportunity to review the status of the data, variables and the overall status of the SSIS package. How do I setup the breakpoints in SSIS? In the BIDS, navigate to the control flow interface. Right click on the object where you want to set the breakpoint and select the 'Edit Breakpoints...' option. This will display the screen shot listed next section. What are the conditions that I can set for a breakpoint in SSIS? Breakpoints have 10 unique conditions when they can be invoked. The breakpoints can also be used in combination. One typical example is using both the OnPreExecute and OnPostExecute events to determine the status of the variables as the process begins and ends.

SSIS Problems

P a g e | 100

What other breakpoint parameters can be configured? For each of the breakpoints, the Hit Count Type and Hit Count can be configured. The Hit Count Type values are:

Always Hit count greater than or equal to Hit count multiple Hit count equals

The Hit Count value is an integer greater than 1. The image below shows an example of these options, but is only an example not a probable configuration.

SSIS Problems

P a g e | 101

Next Steps

The next time you have to troubleshoot an SSIS Package interactively, consider the SSIS breakpoints as a simple means to determine the status of your variables and your overall package. Rather than relying on the error messages that SSIS generates in production, consider setting up SSIS breakpoints in your development or test environment and then step through the package interactively to determine the root cause of the issue.

Problem In SQL Server 2000 DTS, creating a connection to an object is relatively straightforward, but limited. Making a connection to a file, particularly if you need a dynamic connection string, likely requires a global variable, a dynamic properties task, and ActiveX scripting. Using ActiveX scripts in DTS packages tends to slow the package down because the code needs to be compiled at runtime. In SQL Server 2005 SSIS a connection to a flat file is much easier and makes use of new programming techniques, making the package run more efficiently and smoothly. Solution Connection Manager is a way of communicating with a variety of interfaces. It is located on the bottom portion of the Designer window after opening a new or existing package. You create flat file connections by right-clicking the Connection Manager area and choosing New Flat File Connection:

SSIS Problems

P a g e | 102

A new screen opens where you enter the information about the flat file you want to connect to (most of the options are self-explanatory):

SSIS Problems

P a g e | 103

When you click on the Columns section you should see the actual data from the file (as well as in the Preview section):

SSIS Problems

P a g e | 104

The Advanced section is the area where you can rename the incoming column, change the data type and length of string:

SSIS Problems

P a g e | 105

In addition to making a static connection, you can also create a dynamic connection using Expressions. In SQL Server 2000 DTS you had to create a global variable, use the Dynamic Properties task to get the value, and ActiveX scripting to assign the value to the connection. Here is an example of creating a dynamic flat file connection in SSIS. Let's say that every morning we load a textfile from the Receiving Department's network share into a database (for this we will use C:\backups\). The file is always processed the day after the receiving process and is named "DataLoad"+"month"+ "day"+"year.txt" (i.e., DataLoad10112006.txt). We are setting up an SSIS package that retrieves the data from the file and moves it to the database. There are a couple of ways of doing this, but we decide to create a package variable called "DataLoadDir" to hold the folder location. We open the Variables window and click on Add Variable. The variables window may have to be expanded by dragging the right side of it out. We change the Data Type to String, then type in the string value "C:\backups\":

SSIS Problems

P a g e | 106

Right-click the new Flat File connection and choose Properties. The Properties window on the right side will open. There you will see an area called Expressions. Click the ellipse on the side and it will open the Property Expressions Editor:

Select ConnectionString in the Property area and click the ellipse at the end of the row and the Expression Builder opens. You can drag expressions from the right side to the Expression textbox. The expression can be previewed once built by clicking Evaluate Expression: With the Expression Builder open again, we will assign the variable for the ConnectionString property. We first add the package variable DataLoadDir by expanding the Variables tree on the left and then do a drag and drop into the Expression textbox. Then we had a + sign to concatenate. We add the string "DataLoad" in double quotes and another + sign. The next three phrases capture yesterday's date:

(DT_STR,4,1252)MONTH( DATEADD( "dd", -1, getdate() )) gets the month (DT_STR,4,1252)DAY( DATEADD( "dd", -1, getdate() )) gets the day (DT_STR,4,1252)YEAR( DATEADD( "dd", -1, getdate() )) gets the year

The above statements can either be typed in or dragged down from their respective location on the right side. In the case of DATEADD statements, when you drag and drop the statement into the text file it appears in the following way: "DATEADD( datepart, number, date )". We merely replace the various unknowns with the information we want.The last part is to add the extension ".txt" to the end of the string. Once we have everything in place we can click Evaluate Expression to see the results:

SSIS Problems

P a g e | 107

To save the Expression, click on OK and this expression will now be saved with your connectionString property and be resolved automatically each time it is run.

Potrebbero piacerti anche