Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA WAREHOUSING – DIMENSIONAL MODELLING AND SCHEMAS With MIKE –AARONE ATUHE Handout 5.

Similar presentations


Presentation on theme: "DATA WAREHOUSING – DIMENSIONAL MODELLING AND SCHEMAS With MIKE –AARONE ATUHE Handout 5."— Presentation transcript:

1 DATA WAREHOUSING – DIMENSIONAL MODELLING AND SCHEMAS With MIKE –AARONE ATUHE Handout 5

2 CHAPTER OBJECTIVES Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast it with entity-relationship modeling Review the basics of the STAR schema Find out what is inside the fact table and inside the dimension tables Determine the advantages of the STAR schema for data warehouses

3 BOOKS TO CONSIDER Datawarehousing fundamentals -a guide for IT professionals by: P. Ponniah A complete guide to dimensional modelling by Kimball and rose-2end edition

4 DIMENSIONAL MODELING VOCACBULARY 1. Dimensional modeling (DM) is the name of a set of techniques and concepts used in data warehouse design. Dimensional Modeling does not necessarily involve a relational database.data warehouse Dimensional modeling is widely accepted as the preferred technique for presenting analytic data because it addresses two simultaneous requirements: ■ Deliver data that’s understandable to the business users. ■ Deliver fast query performance

5 Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies that define the facts. For example, sales amount is a fact ; timestamp, product, register#, store#, etc. are elements of dimensions. Dimensional models are built by business process area, e.g. store sales, inventory, claims, etc

6 2. FACT TABLE A fact table is the primary table in a dimensional model where the numerical performance measurements of the business are stored, We use the term fact to represent a business measure. We can imagine standing in the marketplace watching products being sold and writing down the quantity sold and dollar sales amount each day for each product in each store

7

8 So the above table gives sales activity on a given day in a given store for a given product. All fact tables have two or more foreign keys, as designated by the FK notation in the Figure above, that connect to the dimension tables’ primary keys. For example, the product key in the fact table always will match a specific product key in the product dimension table. When all the keys in the fact table match their respective primary keys correctly in the corresponding dimension tables, we say that the tables satisfy referential integrity. We access the fact table via the dimension tables joined to it.

9 DIMENSION TABLES The dimension tables contain the textual descriptors of the business, as illustrated in the Figure below In a well-designed dimensional model, dimension tables have many columns or attributes. These attributes describe the rows in the dimension table It is not uncommon for a dimension table to have 50 to 100 attributes

10

11 BRINGING TOGETHER FACTS AND DIMENSIONS Now that we understand fact and dimension tables, let’s bring the two building blocks together in a dimensional model. As illustrated in the Figure below, the fact table consisting of numeric measurements is joined to a set of dimension tables filled with descriptive attributes. This characteristic starlike structure is often called a star join schema.

12

13

14 DW SCHEMAS The schema is a logical description of the entire database. The schema includes the name and description of records of all record types including all associated data-items and aggregates. The database uses the relational model on the other hand the data warehouse uses the Stars, snowflake and fact constellation schema. In this chapter we will discuss the schemas used in data warehouse.

15 STAR SCHEMA In star schema each dimension is represented with only one dimension table. This dimension table contains the set of attributes. In the following diagram we have shown the sales data of a company with respect to the four dimensions namely, time, item, branch and location.

16 EXAMPLES OF A STAR SCHEMA time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures item_key item_name brand type supplier_type item branch_key branch_name branch_type branch foreign keys

17 MORE EXPLANATION…….. There is a fact table at the centre. This fact table contains the keys to each of four dimensions. The fact table also contain the attributes namely, dollars sold and units sold. Note: Each dimension has only one dimension table and each table holds a set of attributes. For example the location dimension table contains the attribute set {location_key,street,city,province_or_state,country}. This constraint may cause data redundancy. For example the "Vancouver" and "Victoria" both cities are both in Canadian province of British Columbia. The entries for such cities may cause data redundancy along the attributes province_or_state and country.

18 SNOWFLAKE SCHEMA In Snowflake schema some dimension tables are normalized. The normalization split up the data into additional tables. for example the item dimension table in star schema is normalized and split into two dimension tables namely, item and supplier table.

19 EXAMPLE OF SNOWFLAKE SCHEMA time_key day day_of_the_week month quarter year time location_key street city_key location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measure s item_key item_name brand type supplier_key item branch_key branch_name branch_type branch supplier_key supplier_type supplier city_key city province_or_street country city normalization

20 Therefore now the item dimension table contains the attributes item_key, item_name, type, brand, and supplier-key. The supplier key is linked to supplier dimension table. The supplier dimension table contains the attributes supplier_key, and supplier_type. Note: Due to normalization in Snowflake schema the redundancy is reduced therefore it becomes easy to maintain and save storage space.

21 FACT CONSTELLATION SCHEMA In fact Constellation there are multiple fact tables. This schema is also known as galaxy schema. In the following diagram we have two fact tables namely, sales and shipping.

22 EXAMPLE OF FACT CONSTELLATION time_key day day_of_the_week month quarter year time location_key street city province_or_street country location Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measure s item_key item_name brand type supplier_type item branch_key branch_name branch_type branch Shipping Fact Table time_key item_key shipper_key from_location to_location dollars_cost units_shipped shipper_key shipper_name location_key shipper_type shipper

23 The sale fact table is same as that in star schema. The shipping fact table has the five dimensions namely, item_key, time_key, shipper-key, from-location. The shipping fact table also contains two measures namely, dollars sold and units sold. It is also possible for dimension table to share between fact tables. For example time, item and location dimension tables are shared between sales and shipping fact table.

24 DIMENSIONAL MODELING PROCESS The dimensional model is built on a star-like schema, with dimensions surrounding the fact table. To build the schema, the following design model is used:star-like schema Choose the business process Declare the grain Identify the dimensions Identify the fact

25 CHOOSE THE BUSINESS PROCESS The basics in the design build on the actual business process which the data warehouse should cover. Therefore the first step in the model is to describe the business process which the model builds on. This could for instance be a sales situation in a retail store. To describe the business process, one can choose to do this in plain text or use basic Business Process Modeling Notation (BPMN) or other design guides like the Unified Modeling Language (UML).data warehouseBPMNUML Example business processes include raw materials purchasing, orders, shipments, invoicing, inventory, and general ledger. It is important to remember that we’re not referring to an organizational business department or function when we talk about business processes

26 DECLARE THE GRAIN Declaring the grain means specifying exactly what an individual fact table row represents. The grain conveys the level of detail associated with the fact table measurements. It provides the answer to the question, “How do you describe a single row in the fact table?”

27 EXAMPLE GRAIN DECLARATIONS INCLUDE: An individual line item on a customer’s retail sales ticket as measured by a scanner device A line item on a bill received from a doctor An individual boarding pass to get on a flight A daily inventory levels for each product in a warehouse A monthly snapshot for each bank account

28 IDENTIFY THE DIMENSIONS The third step in the design process is to define the dimensions of the model.. Dimensions are the foundation of the fact table, and is where the data for the fact table is collected. Typically dimensions are nouns like date, store, inventory etc. These dimensions are where all the data is stored. For example, the date dimension could contain data such as year, month and weekday. Examples of common dimensions include date, product, customer, transaction type, and status.

29 IDENTIFY THE FACTS Identify the numeric facts that will populate each fact table row. Facts are determined by answering the question, “ What are we measuring?” Business users are keenly interested in analyzing these business process performance measures. This step is closely related to the business users of the system, since this is where they get access to data stored in the data warehouse.data warehouse Therefore most of the fact table rows are numerical, additive figures such as quantity or cost per unit, etc.

30 TYPES OF FACTS Additive : Additive facts are facts that can be summed up through all of the dimensions in the fact table. Semi-Additive : Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. Non-Additive : Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

31 ADDITIVE FACTS Let us use examples to illustrate each of the three types of facts. The first example assumes that we are a retailer, and we have a fact table with the following columns: (Date Key; Store key; Productkey; Sales_Amount) The purpose of this table is to record the sales amount for each product in each store on a daily basis. Sales_Amount is the fact. In this case, Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in the fact table -- date, store, and product. For example, the sum of Sales_Amount for all 7 days in a week represent the total sales amount for that week.

32 SEMI ADDITIVE FACTS Say we are a bank with the following fact table: (Date Key; Account Key; Current_Balance; Profit_Margin) The purpose of this table is to record the current balance for each account at the end of each day, as well as the profit margin for each account for each day. Current_Balance and Profit_Margin are the facts.

33 SEMI ADDITIVE FACTS Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information). Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day level.

34 FACT LESS FACT TABLES Fact tables that don't have any facts at all! They may consist of nothing but keys. For example fact tables that records an event.

35 FACTLESS FACT TABLES Imagine that you have a modern student tracking system that detects each student attendance event each day. The dimensions surrounding the student attendance event include: Date : one record in this dimension for each day on the calendar Student: one record in this dimension for each student Course: one record in this dimension for each course taught each semester Teacher : one record in this dimension for each teacher Facility : one record in this dimension for each room, laboratory, or athletic field

36 FACTLESS FACT TABLE A factless fact table for recording student attendance on a daily basis at a college. The five dimension tables contain rich descriptions of dates, students, courses, teachers, and facilities. There are no additive, numeric facts.

37 The grain of the fact table above is the individual student attendance event. When the student walks through the door into the lecture, a record is generated. The fact table record, consisting of just the five keys, is a good representation of the student attendance event

38 There is no obvious fact to record each time a student attends a lecture or suits up for physical education. Tangible facts such as the grade for the course don't belong in this fact table. This fact table represents the student attendance process, not the semester grading process or even the midterm exam process.

39 A lot of interesting questions can be asked of this dimensional schema, including: Which classes were the most heavily attended? Which classes were the most consistently attended? Which teachers taught the most students? Which teachers taught classes in facilities belonging to other departments? Which facilities were the most lightly used? What was the average total walking distance of a student in a given day?

40 GROUP WORK


Download ppt "DATA WAREHOUSING – DIMENSIONAL MODELLING AND SCHEMAS With MIKE –AARONE ATUHE Handout 5."

Similar presentations


Ads by Google