Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data sources often store only current data, not historical data
Corporate decision making requires a unified view of all organizational
data, including historical data
A data warehouse is a repository (archive) of information gathered
from multiple sources, stored under a unified schema, at a single site
Greatly simplifies querying, permits study of historical trends
Shifts decision support query load away from transaction
processing systems
Database System Concepts - 5th Edition, Aug 26, 2005 18.1 Silberschatz, Korth and Sudarshan
Data Warehousing
Database System Concepts - 5th Edition, Aug 26, 2005 18.2 Silberschatz, Korth and Sudarshan
OLTP Vs OLAP
OLTP System
Online OLOLAP System
Online Transaction Processing
Analytical Processing
(Operational System)
(Data Warehouse)
Operational data; OLTPs are the original Consolidation data; OLAP data comes
Source of data
source of the data. from the various OLTP Databases
To control and run fundamental To help with planning, problem solving,
Purpose of data
business tasks and decision support
Reveals a snapshot of ongoing business Multi-dimensional views of various kinds
What the data
processes of business activities
Short and fast inserts and updates Periodic long-running batch jobs refresh
Inserts and Updates
initiated by end users the data
Relatively standardized and simple Often complex queries involving
Queries
queries Returning relatively few records aggregations
Depends on the amount of data
involved; batch data refreshes and
Processing Speed Typically very fast complex queries may take many hours;
query speed can be improved by
creating indexes
Larger due to the existence of
Can be relatively small if historical data
Space Requirements aggregation structures and history data;
is archived
requires more indexes than OLTP
Typically de-normalized with fewer
Database Design Highly normalized with many tables tables; use of star and/or snowflake
schemas
Backup religiously; operational data is Instead of regular backups, some
critical to run the business, data loss is environments may consider simply
Backup and Recovery
likely to entail significant monetary loss reloading the OLTP data as a recovery
and legal liability method
Database System Concepts - 5th Edition, Aug 26, 2005 18.3 Silberschatz, Korth and Sudarshan
Database System Concepts - 5th Edition, Aug 26, 2005 18.4 Silberschatz, Korth and Sudarshan
Design Issues
When and how to gather data
Source driven architecture: data sources transmit new information
to warehouse, either continuously or periodically (e.g. at night)
Destination driven architecture: warehouse periodically requests
new information from data sources
Keeping warehouse exactly synchronized with data sources (e.g.
using two-phase commit) is too expensive
Usually OK to have slightly out-of-date data at warehouse
Data/updates are periodically downloaded form online
transaction processing (OLTP) systems.
What schema to use
Schema integration
Database System Concepts - 5th Edition, Aug 26, 2005 18.5 Silberschatz, Korth and Sudarshan
More Warehouse Design Issues
Data cleansing : task of correcting and preprocessing data
E.g. correct mistakes in addresses (misspellings, zip code errors)
Merge address lists from different sources and purge duplicates
How to propagate updates
Warehouse schema may be a (materialized) view of schema from
data sources
What data to summarize
Raw data may be too large to store on-line
Aggregate values (totals/subtotals) often suffice
Queries on raw data can often be transformed by query optimizer
to use aggregate values
Database System Concepts - 5th Edition, Aug 26, 2005 18.6 Silberschatz, Korth and Sudarshan
Warehouse Schemas
Dimension values are usually encoded using small integers and
mapped to full values via dimension tables
Resultant schema is called a star schema
More complicated schema structures
Snowflake schema: multiple levels of dimension tables
Constellation: multiple fact tables
Database System Concepts - 5th Edition, Aug 26, 2005 18.7 Silberschatz, Korth and Sudarshan
Data Warehouse vs.
Operational DBMS
OLTP (on-line transaction processing)
Major task of traditional relational DBMS
Day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.
OLAP (on-line analytical processing)
Major task of data warehouse system
Data analysis and decision making
Database System Concepts - 5th Edition, Aug 26, 2005 18.8 Silberschatz, Korth and Sudarshan
Commercial systems for Data
Warehouse
Database System Concepts - 5th Edition, Aug 26, 2005 18.9 Silberschatz, Korth and Sudarshan
Multidimensional Data Model
Database System Concepts - 5th Edition, Aug 26, 2005 18.10 Silberschatz, Korth and Sudarshan
Conceptual Modeling of Data Warehouses
Database System Concepts - 5th Edition, Aug 26, 2005 18.11 Silberschatz, Korth and Sudarshan
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measure
s
Database System Concepts - 5th Edition, Aug 26, 2005 18.12 Silberschatz, Korth and Sudarshan
Example of Snowflake Schema
time
item
time_key
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country
Database System Concepts - 5th Edition, Aug 26, 2005 18.13 Silberschatz, Korth and Sudarshan
Data Warehouse Usage
Database System Concepts - 5th Edition, Aug 26, 2005 18.15 Silberschatz, Korth and Sudarshan
Data Mining
Data mining is the process of semi-automatically analyzing large
databases to find useful patterns
Prediction based on past history
Predict if a credit card applicant poses a good credit risk, based on
some attributes (income, job type, age, ..) and past history
Predict if a pattern of phone calling card usage is likely to be
fraudulent
Some examples of prediction mechanisms:
Classification
Given a new item whose class is unknown, predict to which class
it belongs
Regression formulae
Given a set of mappings for an unknown function, predict the
function result for a new parameter value
Database System Concepts - 5th Edition, Aug 26, 2005 18.16 Silberschatz, Korth and Sudarshan
Data Mining (Cont.)
Descriptive Patterns
Associations
Find books that are often bought by similar customers. If a
new such customer buys one such book, suggest the others
too.
Associations may be used as a first step in detecting causation
E.g. association between exposure to chemical X and cancer,
Clusters
E.g. typhoid cases were clustered in an area surrounding a
contaminated well
Detection of clusters remains important in detecting epidemics
Database System Concepts - 5th Edition, Aug 26, 2005 18.17 Silberschatz, Korth and Sudarshan
Classification Rules
Database System Concepts - 5th Edition, Aug 26, 2005 18.18 Silberschatz, Korth and Sudarshan
Decision Tree
Database System Concepts - 5th Edition, Aug 26, 2005 18.19 Silberschatz, Korth and Sudarshan
Construction of Decision Trees
Training set: a data sample in which the classification is already
known.
Greedy top down generation of decision trees.
Each internal node of the tree partitions the data into groups
based on a partitioning attribute, and a partitioning condition
for the node
Leaf node:
all (or most) of the items at the node belong to the same class,
or
all attributes have been considered, and no further partitioning
is possible.
Database System Concepts - 5th Edition, Aug 26, 2005 18.20 Silberschatz, Korth and Sudarshan
Other Types of Mining
Text mining: application of data mining to textual documents
cluster Web pages to find related pages
cluster pages a user has visited to organize their visit history
classify Web pages automatically into a Web directory
Data visualization systems help users examine large volumes of data
and detect patterns visually
Can visually encode large amounts of information on a single
screen
Humans are very good a detecting visual patterns
Database System Concepts - 5th Edition, Aug 26, 2005 18.21 Silberschatz, Korth and Sudarshan