Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ETL Toolkit
by Ralph Kimball
VSV Training
Chapter 2: ETL Data Structures
Prepared by: Hien Bui
Date: 09/02/2008
area.
Update Strategy. This field indicates how the table is
maintained.
Load Frequency. Reveals how often the table is loaded
or changed by the ETL process.
ETL Job(s). Staging tables are populated or updated via
ETL jobs.
Initial Row Count. The ETL team must estimate how
many rows each table in the staging area initially
contains.
you must supply the DBA with the average row length in
each staging table.
Grows With. Even though tables are updated on a
scheduled interval, they dont necessarily grow each time
they are touched.
Expected Monthly Rows. This estimate is based on
history and business rules.
Expected Monthly Bytes. Expected Monthly Bytes is a
calculation of Average Row Length times Expected
Monthly Rows.
recovery.
Sorting data. Sorting is a prerequisite to virtually
every data integration task.
Filtering. Suppose you need to filter on an attribute
that is not indexed on the source database.
Replacing/substituting text strings.
Aggregation
Referencing source data.
front room.
Dimensional data models are by far the most
popular data structures for end user querying
and analysis.
This section is a brief introduction to the main
table types in a dimension model.
environment.
People, especially developers, are very
creative when it comes to reusing existing
resources.
2.4.3 Naming
Conventions
The data-staging area may contain tables or
elements that are not in the data warehouse
presentation layer and do not have
established naming standards
Work with the data warehouse team and DBA
group to embellish the existing naming
standards to include special data-staging
tables.
dimensions
Standardizing calculations, creating
conformed key performance indicators (KPIs)
Correcting and coercing data in the data
cleaning routines
2.5 Summary
We have reviewed the primary data structures you