Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
A)
What is the flow of loading data into fact & dimensional tables?
Fact table - Table with Collection of Foreign Keys corresponding to the Primary
Keys in Dimensional table. Consists of fields with numeric values.
Dimension table - Table with Unique Primary Key.
Load - Data should be first loaded into dimensional table. Based on the primary key
Values in dimensional table, the data should be loaded into Fact table.
2.
A)
3.
A)
5.
A)
Primary Key is a combination of unique and not null. It can be a collection of key values
called as composite primary key. Partition Key is a just a part of Primary Key. There are
several methods of partition like Hash, DB2, and Random etc. While using Hash partition
we specify the Partition Key.
6.
A)
a)
b)
c)
7.
A)
a)
b)
c)
d)
e)
f)
8.
A)
Factless Fact
Additive Fact
Semi-Additive
Non-Additive
Conformed Fact
9.
A)
Conformed dimension:
A dimension table connects to more than one fact table. We present this same
dimension table in both schemes and we refer to dimension table as conformed
dimension.
Or
If one primary key of dimension is defined in two fact tables it is called conformed dimension.
Conformed fact:
Definitions of measurements (facts) are highly consistent we call them as conformed
fact.
Junk dimension:
It is convenient grouping of random flags and aggregates to get them out of a fact
table and into a useful dimensional framework.
Degenerated dimension:
Usually occur in line item oriented fact table designs. Degenerate dimensions are
normal, expected and useful.
The degenerated dimension key should be the actual production order of number and
should set in the fact table without a join to anything.
Or
A Degenerate dimension is a Dimension which has only a single attribute. This dimension is
typically represented as a single field in a fact table. Degenerate Dimensions are the fastest way
to group similar transactions. Degenerate Dimensions are used when fact tables represent
transactional data. They can be used as primary key for the fact table but they cannot act as
foreign keys.
Or
Degenerate dimension: A column of the key section of the fact table that does not have the
associated dimension table but used for reporting and analysis such column is called
degenerate dimension or line item dimension. For ex we have a fact table with customer_id
product_id branch_id employee_id bill_no date in key section and price quantity amount in
measure section. In this fact table bill_no from key section is a single value it has no associated
dimension table. Instead of creating a separate dimension table for that single value we can
include it in fact table to improve performance. So here the column bill_no is a degenerate
dimension or line item dimension.
Time dimension:
It contains a number of useful attributes for describing calendars and navigating.
An exclusive time dimension is required because the SQL date semantics and
functions cannot generate several important features, attributes required for analytical
purposes.
Attributes like week days, week ends, holidays, physical periods cannot be generated
by SQL statements.
Fact less fact table:
Fact table which do not have any facts are called fact less fact table.
They may consist of keys; these two kinds of fact tables do not have any facts at all.
The first type of fact less fact table records an event.
Many event tracking tables in dimensional data warehouses turn out to be factless.
Ex: A student tracking system that details each student attendance event each day.
The second type of fact less fact table is coverage. The coverage tables are frequently
needed when a primary fact table in dimensional DWH is sparse.
Ex: The sales fact table that records the sales of products in stores on particular days
under each promotion condition
Types of facts:
Additive: facts involved in the calculations for deriving summarized data.
Semi additive: facts that involved in the calculations at a particular context of time.
Non additive: facts that cannot involved in the calculations at every point of time.
What is Data Warehousing?
Subject oriented, integrated, time variant, non volatile Collection of data in support of
management / business user decisions. Encompasses not just data in the warehouse, but also
the architecture and tools to collect, query and analyze the information
Subject Oriented: All relevant data about a subject area is gathered and stored as a
single set in a useful format
Integrated : Data being stored in a globally accepted fashion, consistent naming
conventions, measurements, physical attributes etc even while the underlying source systems
store the data differently
Non Volatile : implies Data warehouse is read only
Time Variant : implies data gets added on as time goes by. Time being the most
important dimension.
Operational Data Data used to run your business, used by OLTP systems.
Informational Data Created from the wealth of operational data and some
external data useful to analyze your business.
Operational DataStore A staging area where you store and integrate
operational data before loading into warehouse. It is subject oriented non volatile
CURRENT DATA (not historical). ODS data is used for analysis, collected within a
few days or months and updated every time the underlying detail data changes.
Data Mart Is a subset of a warehouse, that enables certain targeted business
user groups to access functionally departmentalized data. (Business area WH).
Contains significantly smaller amount of data. Reduces demand on EDW, localized,
faster access and reduces network traffic.
Dependent Data mart: Build from a Warehouse
Independent Data mart: Build from independent operational sources.
OLAP
Data Content
transactional information
necessary to run business
operations
Organization of data
Application specific
Enterprise wide
Refresh frequency
Dynamic
Data Model
De normalized or partially
de normalized (star schema)
to optimize query
performance
Probability of Access
High
Moderate to Low
Response Time
Seconds to minutes
Usage
Unstructured analytical
processing
Dimensional Modeling
1. Data modeling means structuring and organizing data and these data structures are
typically implemented in a database. In addition to defining and organizing the data,
data modeling will impose (implicitly or explicitly) constraints or limitations on the data
placed within the structure.
2. Dimensional modeling (DM) is the name of a logical design / model technique often
used for data warehouses.
3. It is different from, and contrasts with, entity-relationship modeling (ER).
4. Intended to support end-user queries in a data warehouse for performance. Easy to
understand, less joins, denormalized.
5. It is oriented around business user understandability, as opposed to ER Modeling and 3
NF where the goal is to remove redundancy.
6. Eg : Joining Hundreds of thousands of rows across several table will seriously
compromise Performance
7. Usage / Access path is fixed in OLTP, while in DSS it is adhoc.
Performance and Storage (Aggregate Explosion) for loading and retrieval of data.
Risks in DW Projects
Definition of data are inconsistent across user types and upstream data systems during Reqt
Analysis.
Data ownership in Datawarehouse is very less and hence the sanctity of the data.
High dependency on upstream systems. Delay in upstream interfaces impact design and
development.
DQ issues originated in source system result in end user dissatisfaction.
dimension table contains the context of measurements ie the dimensions on which the facts
are calculated.
Data is modeled as a hypercube and the schema is a so-called star schema with a centralized
fact table surrounded by smaller dimensional tables representing key scientific objects.
Dimensional database systems allow multidimensional data to be modeled natively. Or they can
be modeled using the star schema or snowflake schema.
What is the main difference between schema in RDBMS and schemas in Data Warehouse....?
What is the main functional difference between ROLAP, MOLAP and HOLAP?
The FUNCTIONAL difference between these is how they information is stored. In all cases, the
users see the data as a cube of dimensions and facts.
ROLAP - detailed data is stored in a relational database in 3NF, star, or snowflake form. Queries
must summarize data on the fly.
MOLAP - data is stored in multidimensional form - dimension and facts stored together. You
can think of this a persistent cube. Level of detail is determined by the intersection of the
dimension hierarchies.
HOALP - data is stored using a combination of relational and multi-dimensional storage.
Summary data might persist as a cube, while detail data is stored relationally, but transitioning
between the two is invisible to the end-user.