Sei sulla pagina 1di 14

Design and implement an ETL data flow by using an

SSIS package
TEAM MENTOR –NILESH K, PRACHI S, SHELDON P, JENIL V

April 11, 2019


Presented By –Mony Toppo, Lakhan Bhagnani
Contents
• SSIS terminologies
• Implement an SSIS Control Flow
• Implement an SSIS Data Flow
• Implement transformations
• Create variables and parameters
• Implement Fuzzy grouping and Fuzzy lookup
• Demo

2
SSIS terminologies
• SSIS Package – A platform for building enterprise-level data integration and data transformations
solutions.
• Connection Manager- A connection string that holds the credentials to connect to a data source or
data destination.
• Data Source- The source location where the data is extracted from, which could be a file, database
table, spreadsheet, etc.
• Data Destination (Target)- The location to which the data is loaded ; the destination or target
database, file, etc.
• SSIS Expression (Language)- It is SSIS-specific expression language containing many prebuilt
functions to manipulate strings, load variable.

3
SSIS terminologies
• Control Flow (Tab) – It defines a workflow of tasks to be executed, often a particular order and
contains the control flow tasks that manages the order in which the tasks within it execute.
• Data Flow (Tab) - It contains data flow tasks that manage the work of copying, moving, loading, and
transforming data.
• Task -It refers to the various types of actions and work that can be performed by a component
within SSIS.
• Variable - It is used to store values, strings, table objects, and can be updated and changed within a
package.
• Parameters -These are also variables, but are passed from a parent SSIS package to a child SSIS
package.

4
Implement an SSIS Control Flow
1. Control Flow Tasks
• Top-level structure of the SSIS package framework.
• Sequence of steps in our package are designed and
directed as how they flow from one to the next.

2. Connection Manager
• To access any database source object, it is needed to be initially
configured to authenticate with the source
database server .
• It establishes the connection to the source and destination
(database or file) with the necessary credentials needed.

5
Implement an SSIS Control Flow
3. Precedence Constraints
Value:
1.On Success (Green arrow)
2. On Completion (Black arrow)
3. On Failure (Red arrow)

Evaluation operations:
1. Constraint
2. Expression
3. Expression AND Constraint
4. Expression OR Constraint

Figure: Example of three several types of Precedence Constraints 6


Implement an SSIS Control Flow
4. Container
• Objects in SQL Server Integration Services that provide structure to packages and services to tasks.
• It helps to loop through a set of tasks until a criterion has been met or group a set of tasks logically.
• The containers are :
1. For Loop Container –Used to loop through and run everything inside the container a fixed number of times.
2. For Each Loop Container -Used to loop through and run for each item.
3. Sequence Container -Used only for grouping items that should be run together only one time.

Figure : Containers
7
Implement an SSIS Data Flow
1. Data Flow Tasks
• It is a special task of the control flow where you can read data from various sources into the
memory of the machine that is executing the SSIS package.
• It needs a canvas of its own, so there’s an extra tab for the data flow, right next to the control flow.

Figure : Data Flow Task

8
Implement an SSIS Data Flow
2. Precedence Constraints
Value:
• Success
• Data elements

Figure: Success/Failure of Data Flow Task operation


9
Create variables and parameters
Variables
• Store values or objects
• Updated within the package
• Scope

Parameters
• Variables passed within the package.
• Cannot be updated within the
package.

10
Implement Fuzzy grouping and Fuzzy lookup

SQL Server provides two fuzzy transformations to help with such scenarios:
1. Fuzzy grouping
2. Fuzzy lookup

1. Fuzzy grouping –It is used primarily for deduplicating and standardizing values in column data.
• Input parameters-
1.Token delimiter
2. Similarity thresholds
• Output columns-
1. _key_in
2. _key_out
3. _score

11
Implement Fuzzy grouping and Fuzzy lookup
2. The Fuzzy Lookup- It is used to replace the wrongly typed words with correct words. It uses
fuzzy matching to find one or more close matches in the reference table and replace the source data
with reference data.
• Input parameters-
1. Maximum Number Of Matches To Output Per Lookup
2. Token Delimiter
3. Similarity Thresholds
• Output columns
1. _Similarity
2. _Confidence

12
DEMO

13
Discussion

Potrebbero piacerti anche