Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
William Inmon coined the phrase “data warehouse” in 1990. He defined it as a managed
database in which the data is:
Subject oriented: The database is subject oriented because of a shift from application-
oriented data (i.e. data designed to support application processing) to decision-support
data (i.e. data designed to aid in decision making).
Integrated: The database is integrated because of the consolidation of application-
oriented data from different legacy systems.
Time-variant: The database is time-variant because of the distinction between
operational data and informational data – operational data is only temporarily valid (a
“slice of time”).
Non-volatile: New data is always added to the database, rather than replaced. The
database continually absorbs new data, integrating with previous data.
Data warehouses enable queries that cut across different segments of a company's
operation. E.g. production data could be compared against inventory data even if they
were originally stored in different databases with different structures. Queries that
would be complex in very normalized databases could be easier to build and maintain
in data warehouses, decreasing the workload on transaction systems.
Data warehousing is an efficient way to manage and report on data that is from a
variety of sources, non uniform and scattered throughout a company. Data
warehousing is an efficient way to manage demand for lots of information from lots of
users. Data warehousing provides the capability to analyze large amounts of historical
data for nuggets of wisdom that can provide an organization with competitive
advantage.
What is OLAP?
OLAP stands for Online Analytical Processing. It uses database tables (fact and
dimension tables) to enable multidimensional viewing, analysis and querying of large
amounts of data. E.g. OLAP technology could provide management with fast answers to
complex queries on their operational data or enable them to analyze their company's
historical data for trends and patterns.
What is OLTP?
OLTP stands for Online Transaction Processing. OLTP uses normalized tables to quickly
record large amounts of transactions while making sure that these updates of data
occur in as few places as possible. Consequently OLTP database are designed for
recording the daily operations and transactions of a business. E.g. a timecard system
that supports a large production environment must record successfully a large number
of updates during critical periods like lunch hour, breaks, startup and close of work.
To estimate the size of the fact table in bytes, multiply the size of a row by the number
of rows in the fact table. A more exact estimate would include the data types, indexes,
page sizes, etc. An estimate of the number of rows in the fact table is obtained by
multiplying the number of transactions per hour by the number of hours in a typical
work day and then multiplying the result by the number of days in a year and finally
multiplies this result by the number of years of transactions involved. Divide this result
by 1024 to convert to kilobytes and by 1024 again to convert to megabytes.
E.g. A data warehouse will store facts about the help provided by a company’s product
support representatives. The fact table is made of up of a composite key of 7 indexes
(int data type) including the primary key. The fact table also contains 1 measure of time
(datetime data type) and another measure of duration (int data type). 2000 product
incidents are recorded each hour in a relational database. A typical work day is 8 hours
and support is provided for every day in the year. What will be approximate size of this
data warehouse in 5 years?
First calculate the approximate size of a row in bytes (int data type = 4 bytes, datetime
data type = 8 bytes):
Size of a row = size of all composite indexes (add the size of all indexes) + size of all
measures (add the size of all measures).
Cube: A set of related factual measures, aggregates, and dimensions for a specific
dimensional analysis problem. Example: regional product sales.
Dimension: A set of level properties that describe a specific aspect of a business, used
for analyzing the factual measures of one or more cubes which use that dimension.
Examples: geography, time, customer and product.
Drilling: Drilling is the term used for navigating through a cube. This navigation is
usually performed to access a summary level of infomation or to provide more detailed
properties of a dimension in a hierarchy.
Fact: A fact is a time variant measurement of quantitative data in a cube; for example,
units sold, sales dollars, or total profit.
Hierarchy: The hierarchy concept refers to the level of granularity represented by the
data in a particular dimension of a cube. For example, state, county, district, and city
represent different granularity in the hierarchy of the geography dimension.
Measure: The means for representing quantitative data in facts or aggregates. Example
measures are total sales or units sold per year.
Redundancy: The term used for referring to duplication of data among related tables
for the sake of improving the speed of query processing.