Sei sulla pagina 1di 7

Star schemas

A star schema consists of fact tables and dimension tables. Fact tables contain the
quantitative or factual data about a business--the information being queried. This information
is often numerical, additive measurements and can consist of many columns and millions or
billions of rows. Dimension tables are usually smaller and hold descriptive data that reflects
the dimensions, or attributes, of a business. SQL queries then use joins between fact and
dimension tables and constraints on the data to return selected information.

Fact and dimension tables differ from each other only in their use within a schema. Their
physical structure and the SQL syntax used to create the tables are the same. In a complex
schema, a given table can act as a fact table under some conditions and as a dimension table
under others. The way in which a table is referred to in a query determines whether a table
behaves as a fact table or a dimension table.

Even though they are physically the same type of table, it is important to understand the
difference between fact and dimension tables from a logical point of view. To demonstrate
the difference between fact and dimension tables, consider how an analyst looks at business
performance:

• A salesperson analyzes revenue by customer, product, market, and time period.


• A financial analyst tracks actuals and budgets by line item, product, and time period.
• A marketing person reviews shipments by product, market, and time period.

The facts--what is being analyzed in each case--are revenue, actuals and budgets, and
shipments. These items belong in fact tables. The business dimensions--the by items--are
product, market, time period, and line item. These items belong in dimension tables.

For example, a fact table in a sales database, implemented with a star schema, might contain
the sales revenue for the products of the company from each customer in each geographic
market over a period of time. The dimension tables in this database define the customers,
products, markets, and time periods used in the fact table.

A well-designed schema provides dimension tables that allow a user to browse a database to
become familiar with the information in it and then to write queries with constraints so that
only the information that satisfies those constraints is returned from the database.

Performance of star schemas

Performance is an important consideration of any schema, particularly with a decision-


support system in which you routinely query large amounts of data. IBM Red Brick
Warehouse supports all schema designs. However, star schemas tend to perform the best in
decision-support applications.

For more information on the performance implications of star schemas, see the Query
Performance Guide.

Terminology
The terms fact table and dimension table represent the roles these objects play in the logical
schema. In terms of the physical database, a fact table is a referencing table. That is, it has
foreign key references to other tables. A dimension table is a referenced table. That is, it has a
primary key that is a foreign key reference from one or more tables.

Simple star schemas

Any table that references or is referenced by another table must have a primary key, which is
a column or group of columns whose contents uniquely identify each row. In a simple star
schema, the primary key for the fact table consists of one or more foreign keys. A foreign key
is a column or group of columns in one table whose values are defined by the primary key in
another table. In IBM Red Brick Warehouse, you can use these foreign keys and the primary
keys in the tables that they reference to build STAR indexes, which improve data retrieval
performance.

When a database is created, the SQL statements used to create the tables must designate the
columns that are to form the primary and foreign keys.

The following figure illustrates the relationship of the fact and dimension tables within a
simple star schema with a single fact table and three dimension tables. The fact table has a
primary key composed of three foreign keys, Key1, Key2, and Key3, each of which is the
primary key in a dimension table. Nonkey columns in a fact table are referred to as data
columns. In a dimension table, they are referred to as attributes.

Figure 11. Simple Star Schema

In the figures used to illustrate schemas:

• The items listed within the box under each table name indicate columns in the table.
• Primary key columns are labeled in bold type.
• Foreign key columns are labeled in italic type.
• Columns that are part of the primary key and are also foreign keys are labeled in bold
italic type.
• Foreign key relationships are indicated by lines connecting tables.

Although the primary key value must be unique in each row of a dimension table, that
value can occur multiple times in the foreign key in the fact table--a many-to-one
relationship.
The following figure illustrates a sales database designed as a simple star schema. In the fact
table Sales, the primary key is composed of three foreign keys, Product_id, Period_id, and
Market_id, each of which references a primary key in a dimension table.

Figure 12. Sales Database

Many-to-one relationships exist between the foreign keys in the fact table and the primary
keys they reference in the dimension tables. For example, the Product table defines the
products. Each row in the table represents a distinct product and has a unique product
identifier. That product identifier can occur multiple times in the Sales table representing
sales of that product during each period and in each market.

Multiple fact tables

A star schema can contain multiple fact tables. In some cases, multiple fact tables exist
because they contain unrelated facts; for example, invoices and sales. In other cases, they
exist because they improve performance. For example, multiple fact tables are often used to
hold various levels of aggregated (summary) data, particularly when the amount of
aggregation is large; for example, daily sales, monthly sales, and yearly sales.

The following figure illustrates the Sales database with an additional fact table for sales from
the previous year.

Figure 13. Sales database with additional dimension


Another use of a referencing table is to define a many-to-many relationship between some
dimensions of the business. This type of table is often known as a cross-reference or
associative table. For example, in the Sales database, each product belongs to one or more
groups, and each group contains multiple products, a many-to-many relationship that is
modeled by establishing a referencing table that defines the possible combinations of
products and groups.

Figure 14. Sales database with cross-reference table

Multicolumn foreign key

Another way to define a many-to-many relationship is to have a dimension table with a


multicolumn primary key that is a foreign key reference from a fact table. For example, in the
Sales database, each product belongs to one or more groups, and each group contains
multiple products, a many-to-many relationship. This relationship is modeled by defining a
multicolumn foreign key in the Sales_Current table that references the Product table, as in the
following example.

Figure 15. Sales database with multicolumn foreign key

In the preceding figure, the Product_id and Group_id columns are the two-column primary
key of the Product table and are a two-column foreign key reference from the Sales_Current
table.

Outboard tables

Dimension tables can also contain one or more foreign keys that reference the primary key in
another dimension table. The referenced dimension tables are sometimes referred to as
outboard, outrigger, or secondary dimension tables. The following figure includes two
outboard tables, District and Region, which define the ID codes used in the Market table.

Figure 16. Sales database with outboard tables

In the preceding figure, the Market table, because it is both a referencing and referenced
table, can behave as a fact (referencing) or dimension (referenced) table, depending on how it
is used in a query.

Multistar schemas

In a simple star schema, the primary key in the fact table is formed by concatenating the
foreign key columns. In some applications, however, the concatenated foreign keys might not
provide a unique identifier for each row in the fact table. These applications require a
multistar schema.

In a multistar schema, the fact table has both a set of foreign keys, which reference dimension
tables, and a primary key, which consists of one or more columns that provide a unique
identifier for each row. The primary key and the foreign keys are not identical in a multistar
schema. This fact distinguishes a multistar schema from a single-star schema.

The following figure illustrates the relationship of the fact and dimension tables within a
multistar schema. In the fact table, the foreign keys are Fkey1, Fkey2, and Fkey3, each of
which is the primary key in a dimension table. Unlike the simple star schema, these columns
do not form the primary key in the fact table. Instead, the two columns Key1 and Key2,
which do not reference any dimension tables, and Fkey1, which does reference a dimension
table, are concatenated to form the primary key. The primary key can consist of any
combination of foreign key and other columns in a multistar schema.

Figure 17. Relationship of fact and dimension tables in multistar schema


The following figure illustrates a retail sales database designed as a multistar schema with
two outboard tables. The fact table Transact records daily sales in a rolling seven-day
database. The primary key for the fact table consists of three columns: Date, Receipt, and
Line_item. These keys together provide the unique identifier for each row. The foreign keys
are the columns for Store_id and SKU_id, which reference the Store and SKU (storekeeping
unit) dimension tables. Two outboard tables, Class and Subclass, are referenced by the SKU
dimension table.

Figure 18. Multistar schema with two outboard tables

In this database schema, analysts can query the transaction table to obtain information on
sales of each item, sales by store or region, sales by date, or other interesting information.

In a multistar schema, unlike a simple star schema, the same value for the concatenated
foreign key in the fact table can occur in multiple rows, so the concatenated foreign key no
longer uniquely identifies each row. For example, in this case the same store (Store_id) might
have multiple sales of the same item (SKU_id) on the same day (Date). Instead, row
identification is based on the primary key or keys. Each row is uniquely identified by Date,
Receipt, and Line_item.

Views

In some databases, schema design can be simplified by the use of views, which effectively
create a virtual table by selecting a combination of rows and columns from an existing table
or combination of tables. For example, a view that selects employee names and telephone
extensions from an employee database produces a company phone list but does not include
confidential information such as addresses and salaries. A view that selects transactions that
occur within a given time period avoids the need to constrain queries to that time period.

Views are useful for a wide variety of purposes, including the following:

• Increasing security
• Simplifying complex tables to give users a view of only what they need
• Simplifying query constraints
• Simplifying administrative tasks, such as granting table authorizations
• Hiding administrative changes to users

The database schema changes design, but the view to the user remains the same.

Potrebbero piacerti anche