Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
IS Stream
Jan-18
Table of Contents
1. Introduction
2. DW Basics
3. Business Performance Management
4. Data Mining
5. Text Mining fundamentals
6. Big Data and Analytics
7. Future Trends in Analytics
2
Confidential |
Why DW
Confidential | 3
Information usage
4
Confidential |
Problems in Information usage
6
Confidential |
Fundamental Characteristics of a DW
Subject oriented
– Data organized around “Subjects”: sales, products, customers
– Information collected about subjects from all relevant systems
– Difference with operational database
Integrated
– Closely related to subject orientation
– Data from different sources, consistent format
– Get rid of conflicts and discrepancies
Time variant
– Maintenance of historical data
– Current status reporting is optional
– Single most important dimension
– Support for multiple time points
Non-Volatile
– No update of data once written into the DW
7
Confidential |
DW Types
Data Marts
– Same as EDW, but smaller in scale
– Focuses on a particular subject or department
– Can be Dependent (gets its feeds from the DW) or Independent (feeds the DW)
Operational Data Stores
– Integrate corporate data from different heterogeneous data sources
– Facilitate operational reporting in real-time or near real-time
– Structured similar to the source systems: not optimized for historical and trend analysis
– During integration the data can be cleaned, denormalized, and business rules applied to
ensure data integrity
– Data at the lowest granular level
– Integration with source systems on a regular basis
– Frequently used as a data source for the data warehouse
Enterprise Data Warehouse
– Store data from multiple data sources
– To be used for historical and trend analysis reporting
– Acts as a central repository for many subject areas
– Contains the “single version of truth”
8
Confidential |
DW Goals
Information accessibility
Information credibility
Flexible to change
Support for more fact-based decision making
Support for the data security
Information consistency
9
Confidential |
DW Benefits
10
Confidential |
DW Components
11
Confidential |
DW Architecture
12
Confidential |
Data Integration
13
Confidential |
Data Integration Technologies (EAI)
14
Confidential |
Data Integration Technologies (EII)
15
Confidential |
Data Integration
Data Integration:
– Integration of data present in different sources for providing a unified view of the data
– Ability to consolidate data from several different sources while maintaining the integrity
and reliability of the data
Approaches:
– Schema Integration
• Developing a unified representation of semantically similar information, structured and stored
differently in the individual databases
• Done using various mapping rules to handle structural differences
– Instance Integration
• Information is retrieved directly from the data
• Identify and integrate all the instance of the data items that represent the real-world entity
Mechanisms
– Federated (Virtual) Database
• Fully integrated, logical composite result of all of its constituent databases
• Query submitted to constituent DBs in parts, results consolidated
– Data Warehousing
• Analyse and qualify source data
• Data profiling
• Source-to-target mapping
• Data cleansing and transformation
16
Confidential |
ETL
Extract: reading data from sources into – Joining together data derived from
staging area or ODS’ multiple sources
– Summarizing multiple rows of data
Transformation: Change and clean -
make it fit for upload – Splitting a column into multiple columns
– Rules, Lookup Tables, combining data:
integrate, cleanse Load: upload the data into the data
– Cleanse warehouse
• Missing records or attributes – Transport data between sources and
• Redundant records targets
• Missing keys or other required data
– Document how data elements change as
• Erroneous relationships they move
• Inaccurate data
– Exchange metadata with other
Samples applications as needed
– Selecting only certain columns to load
– Translating a few coded values
– Encoding some free-form values
– Deriving a new calculated value
17
Confidential |
ETL
18
Confidential |
Data Quality
19
Confidential |
Metadata
Types of metadata:
– (Usage):
• Technical or business metadata
– (Pattern):
• Syntactic: describing the syntax of data
• Structural: describing the structure of data
• Semantic: describing the meaning of the data in a specific domain
Successful metadata driven enterprise satisfies the following
requirements:
Extensibility Interoperability Effectiveness
Reusability Evolution Efficiency and performance
Versatility Low maintenance cost Flexibility
Versioning Segregation Entitlement
21
Confidential |
Data Profiling
22
Confidential |
DW Development Approaches
Sources Only some operational and external systems Many operational and external systems
25
Confidential |
Data Representation in DW
Fact Table:
– Central fact table: decision analysis attribute
– Performance measures, operational metrics, aggregated measures
– Contains the descriptive attributes needed to perform decision
analysis and query reporting
– Connected to several dimension tables through foreign keys
Dimension tables:
– Contain classification and aggregation information about the central
fact rows
– Address how data will be analysed and summarized
– Used to slice and dice the numerical values in the fact table
Star schema: total de-normalization
Snowflake schema: some level of normalization of dimensions
26
Confidential |
Star and Snowflake
Confidential | 27
Storage of OLAP Cubes
28
Confidential |
DW Implementation: major tasks
30
Confidential |
DW Implementation Risks
Executive sponsorship
Unmet expectations
Effective engagement of stakeholders
Choice of information to be loaded onto the DW
Specialized design of the DW
Involvement of business users
Quality of data
Performance, capacity and scalability
Continuous governance, maintenance
31
Confidential |
Thank you