Presentation is loading. Please wait.

Presentation is loading. Please wait.

Competing on Analytics II

Similar presentations


Presentation on theme: "Competing on Analytics II"— Presentation transcript:

1 Competing on Analytics II
BI Tools and Techniques Robert Monroe April 1, 2008

2 Goals Introduce dimensional models (aka star schemas) and why they are useful in data warehousing Explore some common business processes, both inward-facing and outward-facing, that companies choose as a basis for analytic competition Understand the type of data management and warehousing infrastructure needed to support those analytic capabilities

3 ETL Extended Dance Remix: Dimensional Modeling

4 Quick Review: The Three-Layer Data Architecture
Data goes through three common stages during ETL Operational Data transactional data stored in individual systems of record throughout the organization Reconciled Data detailed, current data intended to be the single, authoritative source for all decision support applications Derived Data data that have been selected, formatted, and aggregated for end-user decision support applications Operational Data Reconciled Data Derived Data

5 Quick Review: The ETL Process
Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

6 Quick Review: Typical Data Warehouse Structure
Reconcile Data Derive Data Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

7 Derived Data Operational Data Reconciled Derived Although reconciled data provides a consistent, hiqh-quality collection of enterprise data it is not necessarily in an efficient form for use by BI tools Derived data objectives: Ease of use for decision support applications Fast response to predefined user queries Customized data for particular target audiences Ad-hoc query support Data mining capabilities Characteristics Detailed (mostly periodic) data Aggregated (for summary) Processed Distributed (to data marts)

8 Dimensional Modeling: Facts and Dimensions
a simple database design in which dimensional data are separated from fact or event data. Dimensional models are also sometimes called star schemas. Dimensional models are a common way to represent derived data for informational data stores Well suited to ad-hoc queries and OLAP Poorly suited for transaction processing Commonly used for data warehouse/mart storage model

9 Dimensional Modeling Dimensional modeling is a simple database design pattern in which dimensional data are separated from fact or event data. Dimensional models are also sometimes called star schemas. Dimensional models are a common way to represent derived data for analytic data stores Well suited to ad-hoc queries and OLAP Poorly suited for transaction processing

10 Star Schema Structure Fact tables contain factual or quantitative data
Dimension tables contain descriptions about the subjects of the business Fact tables contain factual or quantitative data Dimension tables are denormalized to maximize performance 1:N relationship between dimension tables and fact tables Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

11 Star Schema Example Dimension tables provides details on stores, products, and time periods Fact table provides statistics for sales broken down by product, period and store dimensions Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

12 Star Schema Example With Data
Product Period Store Sales Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

13 Dimensional Model Benefits
Simple and predictable framework Well suited to ad-hoc analytical queries Relatively straightforward mapping from most transactional systems Dimensional independence Query performance is somewhat independent of dimensions used in the query Straightforward model extensions support evolution

14 Dimension Hierarchies
Dimension tables can capture hierarchies Dimensions use levels to represent hierarchies Each sub-level subdivides the parent level with finer granularity Examples Dimension: Time Period Levels – Year :: Quarter :: Month :: Week :: Day Dimension: Organization Levels – Company :: Division :: Department :: Employee Exercise: Define the levels for a Geography dimension for customer locations

15 Facts and Measures Measures represent the interesting data at the intersection of different dimensions There is a space for a measure at every intersection of every level of every dimension Base facts are stored in the intersections of lowest-level dimensions (either simple or calculated measures) Aggregate or computed values are stored at the intersections of where all of the dimensions are not at the lowest level (aggregate values must be calculated measures)

16 Modeling Hierarchies Dimension tables frequently model hierarchies
Example: Customers dimension stores data about your customers You may sell to several divisions of a single company You want to be able to analyze sales to the individual divisions and also capture “rolled-up” values for the parent company Divisions of ABC Automotive Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

17 Modeling Hierarchies With Denormalized Tables
Hierarchical dimensions are frequently represented with denormalized tables This approach simplifies and speeds queries … at the expense of introducing anomalies Customer_Dimension Parent_Company Customer_Key Name Address Type <null> C000001 ABC Automotive 100 1st St. Dealer C000002 ABC Auto Sales 110 1st St. Sales C000003 ABC Repair 130 1st St. Service C000004 ABC Auto New Sales C000005 ABC Auto Used Sales C000006 Bubba’s House O’ Cars 5432 Maple Ln

18 In-Class Exercise: Star Schema
Form teams of 2-3 people Complete exercise 2, question #1 on handout Build a star schema to store grades at Millenium College

19 Modeling Dates and Time
Dates, times, and time periods are almost always included in BI systems, but they can be tricky to model Decide on the granularity of dates carefully (big effect on size) A common approach to modeling dates as a dimension: Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

20 Multiple Fact Tables It is frequently useful to store more than one type of fact in a single multidimensional database (star schema) This can be handled by using multiple fact tables that share dimensions Example: modeling products sold and products purchased Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

21 Conformed Dimensions When dimensions are shared across multiple fact tables they must be conformed dimensions Conformed dimensions One or more dimension tables associated with two or more fact tables for which the dimension tables have the same business meaning and primary key with each fact table Conformed dimensions allow users to: Query across multiple fact tables Improve consistency of meaning and structure for derived and retrieved information

22 Factless Fact Tables – Tracking Events
“Factless” fact tables store only foreign keys, no facts Factless fact tables allow the tracking of what types of events happened, and under what circumstances they happened Diagram Source: Hoffer, Prescott, McFadden, Modern Database Management, 7th ed.

23 In-Class Exercise 2: Extending The Star Schema
Form 2-3 person teams Add the ability to store new facts for FCE data: Specific facts to store: Aggregate average course eval rating (1-5) for each course section offered Aggregate average instructor eval rating (1-5) for each course section offered

24 ETL Tools – Microsoft SSIS Demo
Image Source: Conchango Consulting


Download ppt "Competing on Analytics II"

Similar presentations


Ads by Google