Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data warehouse and OLAP

Similar presentations


Presentation on theme: "Data warehouse and OLAP"— Presentation transcript:

1 Data warehouse and OLAP
Lecture 1

2 Goals of a Data warehouse
DW provides access to corporate or organizational data. The data in DW is consistent The data in the DW can be separated and combined by means of every possible measure in the business DW is not just data, but also set of tools to query, analyze and present information The DW is a place where we publish used data The quality of data in the DW is a driver of business reengineering

3 Data Warehousing Knowledge workers Information Operational data Cost
Products Brands Cost Suppliers Mktg Customer Care Sales Knowledge workers Information Clients Operational data 10 12

4 What is data warehouse DW – database that is maintained separately from an organization’s operational database. Provides a solid platform of consolidated historical data for analysis. A data warehouse is a subject-oriented, integrated,time-variant, and nonvolatile collection of data in support of management’s decision-making process.

5 Enterprise DW Architecture
OLTP OLAP Metadata Extract Integrate Transform Maintain External Data Warehouse Reporting Legacy Data Mining Operational Environment Analysis Environment 3

6 Two types of systems: OLTP – covers most of day-to-day operations of an organization (e.g. purchasing, inventory, manufacturing, banking, payroll, registration, accounting and etc.) OLAP – data analysis and decision making on historical data.

7 OLTP vs OLAP Characteristic Operational processing
Informational processing Orientation transaction analysis User Clerk, DBA, client Knowledge worker (manager, analyst) Function Day-to-day Historical info requirements, decision support DB design ER based, app-oriented Start/snowflake, subject oriented Data Current; up-to-date historical Summarization Primitive, highly detailed Summarized; consolidated View Detailed, flat relational Summarized, multidimensional Unit of work short;, simple transaction Complex query Access Read/write Mostly read

8 OLTP vs OLAP Focus Data in Information out Operations
Index/hash on primary key Lots of scans Number of records accessed tens millions Number of users thousands hundreds DB size 1 GB 100 GB - 1TB Priority High performance High flexibility, end-user autonomy Metric Transaction throughput Query throughput, response time

9 Data Warehouse Subject -Oriented
Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.

10 Data Warehouse - Integrated
Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources When data is moved to the warehouse, it is converted.

11 Data Warehouse – Time Variant
Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain “time element”.

12 Data Warehouse - Non-Volatile
A physically separate store of data transformed from the operational environment. Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: Initial loading of data and access of data.

13 Relational Database Collection of tables
Each table consists of a set of attributes (fields), each of which is assigned with a unique name Stores a large set of tuples (records)

14 Multidimensional Data Model
OLAP tools are based on multidimensional data model View data in the form of data cube Cube defined by dimensions and facts Dimensions – entities with respect to which an organization wants to keep records. (time, item, branch, location, etc.)

15

16 Multidimensional Data Model
Typically organized around a central theme, like Sales, for instance. The theme is represented by Facts – numerical measures. Ex: dollars_sold (sales amount in dollars), units_sold (number of units sold)

17 Tables View (2D)

18 3-D view by adding Location dimension

19

20 In Data warehouse the data cube is n- dimensional
Suppose we would like to view data with additional fourth dimension like supplier

21

22 Degrees of summarization
We may display any n-D data as a series of (n-1)-D cubes We can show the data at different degrees of summarization. In SQL GROUP BY statement is used for this purposes.

23 Lattice of cuboids

24 Levels of lattice The cuboid that holds the lowest level of summarization is called the base cubiod The 0-D cuboid, which holds the highest level of summarization, is called the apex cuboid. In this case this is total sales summarized over all four dimentions. (All)

25 Modeling paradigms The most popular data models: Star schema
Snowflake schema Fact constellation schema

26 Star Schema

27 Star Schema Fact table – a large central table containing the bulk of data, with no redundancy Big Constantly growing Stores measures (often aggregated in queries) Dimension tables – a set of smaller attendant tables, one for each dimension Small Infrequently updated

28 Star schema Each dimension is represented by only one table
Each table contains a set of attributes.

29 Snowflake schema Variant of star schema with normalized dimension tables Saves space But evolves a lots of joins

30 Snowflake schema

31 Fact Constellation Multiple fact tables share dimension tables
Collection of stars AKA galaxy schema

32 Fact Constellation


Download ppt "Data warehouse and OLAP"

Similar presentations


Ads by Google