Tuesday, September 16, 2008

How many facts?

You can have as many fact tables, and as many facts in the fact tables, as you need in your star schema, as long as all facts in a fact table have the same grain.

Sunday, September 14, 2008

sk, the link of fact to dimension

A query on a star schema links its surrogate keys, as seen in the query's WHERE clause.

SELECT order_number, sales_amount, order_quantity
FROM sales_fact sf, order_dim od, customer_dim cd, product_dim pd, date_dim dd
WHERE
sf.order_sk = od.order_sk
AND sf.customer_sk = cd.customer_sk
AND sf.product_sk = pd.product_sk
AND sf.date_sk = dd.date_sk

Friday, September 12, 2008

sk and the source keys

we add a surrogate key in every dimension table, not replacing the keys of the dimension's data source--we need both.

Wednesday, September 10, 2008

dimensions connect to fact and nothing else

In a star schema, the database schema of a dimensional data warehouse, you "connect" dimension tables to one or more fact table. You don't connect dimension table to dimension table.

Tuesday, September 9, 2008

sk for surrogate key

You might have noticed that every dimension table has a colum with _sk suffix on its name. These columns are surrogate keys. These columns (their data) do not come from the source; we generate them in our dimensional data warehouse. Surrogate key values are usually just meaningless sequential positive integers.

Every fact table that uses a dimension have the dimension's surrogate key; the surrogate keys link the fact to the dimension.

Thursday, September 4, 2008

Same grain of facts

In the sales_fact table we have two facts: sales_amount and order_quantity. They're both numeric. The first one is in dollars; the second, a count.

For your reference I'm showing the DDL of the table here:

CREATE TABLE sales_fact (order_sk INT, customer_sk INT, product_sk INT, date_sk INT, sales_amount DEC(10,2), order_quantity INT);

The sales_amount fact is the total dollar amount of daily sales of each order, each customer, and each product, which is the grain of the fact. The other fact, the order_quantity, in the table must have the same grain.

Wednesday, September 3, 2008

Examples of Star Schema

Here's an example of a star schema (in terms of its tables' DDL) that has an sales fact table and its four dimension tables: order, customer, product, and date.

CREATE TABLE sales_fact (order_sk INT, customer_sk INT, product_sk INT, date_sk INT, sales_amount DEC(10,2), order_quantity INT);
CREATE TABLE order_dim (order_sk INT, order_num INT);
CREATE TABLE customer_dim (customer_sk INT, customer_name CHAR(50));
CREATE TABLE product_dim (product_sk INT, product_code INT, product_name CHAR(50));
CREATE TABLE date_dim (date_sk INT, date DATE);

We can add another fact table (production_fact), which makes our schema a two-star schema:

CREATE TABLE production_fact (product_sk INT, factory_sk INT, date_sk INT, production_quantity INT);
CREATE TABLE factory_dim (factory_sk INT, factory_code INT, factory_name CHAR(50));

Note that this 2nd star (its production_fact table) shares the product_dim and date_dim tables from the 1st schema.

Tuesday, September 2, 2008

Star Schema

If you draw a diagram of your dimensional tables, how a fact table is related (surrounded) to its dimension tables, you'll see a star schema. The fact table is the star's centre; the dimension tables, the corners.
Your schema can have multiple stars, one for each of your fact tables.
Note that a dimensional table might be related (shared) to more than one fact table.

Wednesday, August 27, 2008

Two types of tables

Dimensional data warehouse has two types of tables (I’m talking about relational database tables here). The so-called fact table contains fact data, which is WHAT you want to analyze (fact is a.k.a. measure). Fact is numeric. Any measurement can be a fact, e.g. sales amount, count (of anything), percentage (of change), duration (elapsed time), score (soccer match result), grade (as in school), and rank (as in competition).

On the other hand, dimension table contains data that provides you with the HOW you want to analyze the fact(s) in the fact table. For example, if you want to analyze the dollar sales amount by four dimensions: month, product, sales region, and customer segment, you must have Month, Product, Sales Region, Customer Segment dimension tables. Dimension data is typically textual. The Month dimension table, for example, contains “JAN 2008”, “FEB 2008”... rows of data.