Sei sulla pagina 1di 27

ARCHITECTURAL

COMPONENTS OF A
DATA WAREHOUSE

UNDERSTANDING DATA
WAREHOUSE ARCHITECTURE

Data Warehouse Architecture:


- the structure that brings all the
components of a data warehouse together.
In your data warehouse, architecture
includes a number of factors:
- the integrated data that is the centerpiece
- composed of the rules, procedures, and
functions
- made up of the technology that empowers
your data warehouse.

UNDERSTANDING DATA
WAREHOUSE ARCHITECTURE

What is the general purpose of the data


warehouse architecture?
- provides the overall framework for
developing and deploying your data
warehouse;
- it is a comprehensive blueprint
- defines the standards, measurements,
general design, and support techniques.

Architecture in Three Major


Areas

Data acquisition
Data storage
Information delivery

Architecture in Three Major


Areas

DISTINGUISHING
CHARACTERISTICS

Different Objectives and Scope


Data Content
Complex Analysis and Quick Response
Flexible and Dynamic
Metadata-driven

ARCHITECTURAL
FRAMEWORK

Architecture Supporting Flow of


Data
-the architectural components govern
the flow of data from beginning to end.
The management and control module is
one such component. This module
touches every step along the data
movement.

Architectural Framework
Supporting the Flow of Data

ARCHITECTURAL
FRAMEWORK

What are the architectural components,


and how do these components enable
the data flow?
1. At the Data Source - The internal
and external data sources form the
source data architectural component.
Source data governs the extraction of
data for preparation and storage in the
data warehouse. The data staging
architectural component governs the
transformation, cleansing, and
integration of data.

10

ARCHITECTURAL
FRAMEWORK
2. In the Data Warehouse Repository The data storage architectural
component includes the loading of data
from the staging area and also storing
the data in suitable formats for
information delivery. The metadata
architectural component is also a
storage mechanism to contain data
about the data at every point of the flow
of data from beginning to end.

11

ARCHITECTURAL
FRAMEWORK
3. At the User End. The
information delivery
architectural component
includes dependent data
marts, special
multidimensional databases,
and a full range of query and
reporting facilities.

12

ARCHITECTURAL
FRAMEWORK

The Management and Control Module


- This architectural component is an overall
module managing and controlling the
entire data warehouse environment.
- It is an umbrella component working at
various levels and covering all the
operations.
- has two major functions: first to
constantly monitor all the ongoing
operations, and next to step in and recover
from problems when things go wrong

13

The Management & Control


Component

14

Other Functions of the


Management & Control

Module
manages backing up significant parts of

the data warehouse and recovering from


failures
governs data security and provides
authorized access to the data warehouse
interfaces with the end-user information
delivery component to ensure that
information delivery is carried out
properly

TECHNICAL ARCHITECTURE
15

The technical architecture of a data


warehouse is the complete set of functions
and services provided within its
components.
It includes the procedures and rules that
are required to perform the functions and
provide the services.
It also encompasses the data stores
needed for each component to provide the
services.

16

Technical architecture in each of the


three major areas of the data
warehouse:

1. Data Acquisition - This area covers the entire


process of extracting data from the data sources,
moving all the extracted data to the staging area,
and preparing the data for loading into the data
warehouse repository.
Data Flow
Flow. In the data acquisition area, the data flow
begins at the data sources and pauses at the staging
area. After transformation and integration, the data is
ready for loading into the data warehouse repository.
Data Sources. For the majority of data warehouses,
the primary data source consists of the enterprises
operational systems. Many of the operational systems
at several enterprises are still legacy systems. Legacy
data resides on hierarchical or network databases.

On Data Sources
17

A fairly large number of companies have


adopted ERP (enterprise resource
planning) systems. ERP data sources
provide an advantage in that the data
from these sources is already consolidated
and integrated.
For including data from outside sources,
you will have to create temporary files to
hold the data received from the outside
sources. After reformatting and
rearranging the data elements, you will
have to move the data to the staging area.

18

Intermediary Data Stores. As data gets


extracted from the data sources, it moves
through temporary files. Sometimes, extracts
of homogeneous data from several source
applications are pulled into separate
temporary files and then merged into another
temporary file before moving it to the staging
area.
Staging Area. This is the place where all the
extracted data is put together and prepared
for loading into the data warehouse. The
staging area is like an assembly plant or a
construction area.

19

List of Functions and


Services
Data Extraction
Select data sources and determine the types of
filters to be applied to individual sources
Generate automatic extract files from operational
systems using replication and other techniques
Create intermediary files to store selected data to be
merged later
Data Transformation
Map input data to data for data warehouse repository
Clean data, deduplicate, and merge/purge
Denormalize extracted data structures as required by
the dimensional model of the data warehouse

20

List of Functions and


Services
Data Staging
Provide backup and recovery for staging area
repositories
Sort and merge files
Create files as input to make changes to
dimension tables
If data staging storage is a relational database,
create and populate database

21

Technical architecture in each of the


three major areas of the data
warehouse:

2. Data Storage - This area covers the process of


loading the data from the staging area into the
data warehouse repository.
Data Flow
Flow. For data storage, the data flow begins at the
data staging area. The transformed and integrated
data is moved from the staging area to the data
warehouse repository.
Data Groups. Prepared data waiting in the data
staging area fall into two groups. The first group is
the set of files or tables containing data for a full
refresh. The other group of data is the set of files
or tables containing ongoing incremental loads.

22

The Data Repository. Almost all of todays


data warehouse databases are relational
databases. All the power, flexibility, and
ease of use capabilities of the RDBMS
become available for the processing of
data.

23

List of Functions and


Services

Load data for full refreshes of data warehouse tables


Perform incremental loads at regular prescribed
intervals
Support loading into multiple tables at the detailed and
summarized levels
Optimize the loading process
Provide automated job control services for loading the
data warehouse
Provide backup and recovery for the data warehouse
database
Provide security
Monitor and fine-tune the database
Periodically archive data from the database according
to preset conditions

Technical architecture in each of the three


major areas of the data warehouse:
24

3. Information Delivery - This area spans


a broad spectrum of many different
methods of making information available
to users. The information delivery
component makes it easy for the users to
access the information either directly from
the enterprise-wide data warehouse, from
the dependent data marts, or from the set
of conformed data marts. Most of the
information access in a data warehouse is
through online queries and interactive
analysis sessions.

Information Delivery
25

Data Flow
Flow. For information delivery, the data flow
begins at the enterprise-wide data warehouse and
the dependent data marts when the design is
based on the top-down technique. When the
design follows the bottom-up method, the data
flow starts at the set of conformed data marts.
Service Locations. In your information delivery
component, you may provide query services from
the user desktop, from an application server, or
from the database itself. This will be one of the
critical decisions for your architecture design.

26

Data Stores. For information delivery, you


may consider the following intermediary
data stores:
Proprietary temporary stores to hold
results of individual queries and reports
for repeated use
Data stores for standard reporting
Proprietary multidimensional databases

List of Functions and


Services

27

Provide security to control information access


Monitor user access to improve service and for future enhancements
Allow users to browse data warehouse content
Simplify access by hiding internal complexities of data storage from users
Automatically reformat queries for optimal execution
Enable queries to be aware of aggregate tables for faster results
Govern queries and control runaway queries
Provide self-service report generation for users, consisting of a variety of
flexible
options to create, schedule, and run reports
Store result sets of queries and reports for future use
Provide multiple levels of data granularity
Provide event triggers to monitor data loading
Make provision for the users to perform complex analysis through online
analytical processing (OLAP)
Enable data feeds to downstream, specialized decisions support systems such
as EIS and data mining

Potrebbero piacerti anche