Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data Warehouse
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject. Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.
ETL
ETL : Extract Transform and load ETL is the method or technology used for implementation of data warehouse. Extract : Extract data from source Transform : Transform as per the business requirements. Load : Load data in Data warehouse
OLAP
Online analytics processing.These system are implemented where speed of data retrieval required is more and compromise can be made over the speed of data entry. OLAP system is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).
Consolidation data; OLAP data comes from the various OLTP Databases
To control and run fundamental business To help with planning, problem solving, and tasks decision support Reveals a snapshot of on-going business Multi-dimensional views of various kinds of processes business activities Short and fast inserts and updates initiated Periodic long-running batch jobs refresh the by end users data Relatively standardized and simple queries Returning relatively few records Often complex queries involving aggregations Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Typically de-normalized with fewer tables; use of star and/or snowflake schemas
Processing Speed
Can be relatively small if historical data is archived Highly normalized with many tables
Backup religiously; operational data is Instead of regular backups, some critical to run the business, data loss is likely environments may consider simply reloading to entail significant monetary loss and legal the OLTP data as a recovery method liability
Dimensional Modelling
DM is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability and performance.
Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts.
Fact table : A fact table is the table that contain measure of interest i.e. Business facts.
Fact table mostly contain the additive values, that can be aggregated to provide figures that would help to take business decisions.
Dimension Table
A category of information. For example, the Product dimension, store dimension, Time dimension. Attribute : A unique level within attribute. Ex :product category in product dimension Or month in time dimension
Dimensional Modelling
Snow flake schema
Fact constellatio n schema.
Star Schema
In Star schema a single Fact table is surrounded by multiple dimensional tables.
It is more difficult for business users who use data warehouse system using snowflake schema because they have to work with more tables in a database than star schema.
By creating aggregate table(s) and joining it (them) to the required dimension table(s) improves performance by reducing the execution time.
Fact constellation schema can implement between aggregated fact table or else when a complex fact table is decomposed into independent simple fact table
Conformed dimension is describes a common structured dimension that shared across the various FACT table in data warehouse. conformed dimensions are used to avoid redundant data in data warehouse.
Price 20 55
Price 15 55
Price 15 20 55
Version 1 1
Version 1 2 1
Junk Dimension
In data warehouse design, frequently we run into a situation where there are yes/no indicator fields in the source system. Keep all those indicator fields in the fact table, not only do we need to build many small dimension tables, but the amount of information stored in the fact table also increases tremendously, leading to possible performance issues.
coupon_ind
Junk_ind 1 2 3 4
prepay_ind Y Y N N
coupon_ind Y N N Y
Surrogate key
Surrogate key is the DWH generated primary key that is used for uniquely identifying record in DWH. Why surrogate key is implemented : 1) When data is loaded from multiple sources. 2) When the history need to be maintained and primary key from source would violates the primary key constraint in DWH.
Data Mart
A data warehouse incorporates information about many subject areas often the entire enterprise. While the data mart focuses on one or more subject areas. The data mart represents only a portion of an enterprise's data perhaps data related to a business unit or work group.
Typically, a data mart's data is targeted to a smaller audience of end users or used to present information on a smaller scope.
Thank You