1 On-Line Analytic Processing Warehousing Data Cubes.

Slides:



Advertisements
Similar presentations
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Advertisements

Data Analysis. Overview Traditional database systems are tuned to many, small, simple queries. Some applications use fewer, more time-consuming, analytic.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Data Warehousing M R BRAHMAM.
Jennifer Widom On-Line Analytical Processing (OLAP) Introduction.
2/10/05Salman Azhar: Database Systems1 On-Line Analytical Processing Salman Azhar Warehousing Data Cubes Data Mining These slides use some figures, definitions,
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Lecture 1: Data Warehousing Based on the slides by Jeffrey D. Ullman and Hector Garcia-Molina at Stanford University 1.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
1 On-Line Application Processing Warehousing Data Cubes Data Mining.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Tanvi Madgavkar CSE 7330 FALL Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis.
CS346: Advanced Databases
On-Line Application Processing Warehousing Data Cubes Data Mining 1.
Business Intelligence Instructor: Bajuna Salehe Web:
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
On-Line Analytic Processing Chetan Meshram Class Id:221.
Ahsan Abdullah 1 Data Warehousing Lecture-11 Multidimensional OLAP (MOLAP) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for.
OnLine Analytical Processing (OLAP)
Cube Intro. Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Goal:
1 On-Line Application Processing Warehousing Data Cubes Data Mining.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
BI Terminologies.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
BUSINESS ANALYTICS AND DATA VISUALIZATION
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
On-Line Application Processing Warehousing Data Cubes (Data Mining) (slides borrowed from Stanford)
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
ADVANCED TOPICS IN RELATIONAL DATABASES Spring 2011 Instructor: Hassan Khosravi.
Data Warehousing Multidimensional Analysis
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
What is OLAP?.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Data Warehousing.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
SQL Server Analysis Services Understanding Unified Dimension Model (UDM)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
An Overview of Data Warehousing and OLAP Technology
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Databases 2 On-Line Application Processing: Warehousing, Data Cubes, Data Mining.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
CSE6011 Implementing a Warehouse  Monitoring: Sending data from sources  Integrating: Loading, cleansing,...  Processing: Query processing, indexing,...
On-Line Application Processing
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data Warehouse.
On-Line Analytic Processing
Data warehouse and OLAP
On-Line Analytic Processing
Chapter 13 The Data Warehouse
Chapter 5: Advanced SQL Database System concepts,6th Ed.
On-Line Analytical Processing (OLAP)
CMPE 226 Database Systems April 11 Class Meeting
On-Line Application Processing
Online analytical processing (OLAP) is a category of software technology that enables analysts, managers, and executives to gain insight into data through.
Presentation transcript:

1 On-Line Analytic Processing Warehousing Data Cubes

2 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, complex analytic queries. New architectures have been developed to handle analytic queries efficiently.

3 OLTP Most database operations involve On-Line Transaction Processing (OTLP). –Short, simple, frequent queries and/or modifications, each involving a small number of tuples. –Examples Answering queries from a Web interface sales at cash registers selling airline tickets

4 OLAP On-Line Analytic Processing (or A for “application”) queries: –Few, but complex queries –May query a large amount of data and run for hours –Do not depend on having an absolutely up-to- date database.

Example: OLAP Application Analysts at Wal-Mart look for items with increasing sales in some region recently. Sales(saledate,item,store,qty) Items(item,size,color) Stores(store,city,provice) SELECT item,city,SUM(qty) FROM Sales NATURAL JOIN Stores WHERE saledate >= ‘ ’ GROUP BY item,city;

6 Data Warehouse It’s better for OLAP applications to take place in a separate copy of the master database. Analysis may involve data from various sources across the enterprise. Data warehouse is the most common form of data integration. –Copy sources into a single DB (warehouse) and try to keep it up-to-date. –Usual method: periodic reconstruction of the warehouse, perhaps overnight.

7 Common Architecture Databases at store branches handle OLTP. Local store databases copied to a central warehouse overnight. Analysts use the warehouse for OLAP.

Star Schemas A star schema is a common organization for data at a warehouse. It consists of: –Fact table : a very large accumulation of facts such as sales. Often “insert-only.” –Dimension tables : smaller, generally static information about the entities involved in the facts.

9 Example Suppose we want to record in a warehouse information about sales of products: –the store where the item is sold –the item sold –the customer who bought the item –the time when the item is sold –the price The fact table is a relation: Sales(store, item, customer, timeID, price)

10 Example(cont.) The dimension tables include information about stores, items, customers and time “dimensions”: Stores(store,city,province) Items(item, size, color, manf) Customers(customer, addr, phone) Time(timeID, day, week, month, year)

Visualization: Star Schema 11 Dimension Table ItemsDimension Table Time Dimension Table CustomersDimension Table Stores Fact Table - Sales Dimension Attrs. Dependent Attr.

12 Dimension/Dependent Attributes Two classes of fact-table attributes: –Dimension attributes: the key of a dimension table. Foreign key for fact table. –Dependent attributes: a value determined by the dimension attributes of the tuple. More often called “measure” attributes.

13 Example: Dependent Attribute price is the dependent attribute of our example Sales relation. –Other dependent attributes can also be present, e.g., quantity. It is determined by the combination of dimension attributes: store, item, customer and time attributes.

14 Approaches to Implementation ROLAP = relational OLAP: Use relational DBMS to support star schemas. MOLAP = multidimensional OLAP: Use a specialized multidimensional data structure. –e.g., data cube HOLAP = hybrid OLAP: Use both the above.

15 Data Cube OLAP data can be modelled in a multidimensional space manner. Keys of dimension tables are the dimensions of a hypercube. –Example: for the Sales data, the four dimensions are store, item, customer and time. Dependent attributes (e.g., price) appear as points within the multidimensional space.

Visualization: Data Cube 16 price store item customer

17 Data Cube w/ Aggregations Raw-data cube: original data in the fact table. Formal data cube: also includes points that represent aggregation (typically SUM) of the raw-data grouped in all subsets of dimensions. –Precomputed aggregations –Critical for fast response upon an analytic query.

Visualization: Formal Data Cube 18 price store item customer SUM over all customers

Tuple w/ Aggregate Components Think of each dimension as having an additional value *. –Stands for “all”. A point with one or more *’s in its coordinates aggregates over the dimensions with the *’s. Example: Sales(‘Shop-1’, ‘TV’, *, *) holds the sum of prices, over all customers and all time, of the TV sets sold at Shop-1. 19

Building Data Cube in SQL In SQL:1999 SELECT store,item,customer,SUM(price) FROM Sales GROUP BY CUBE(store,item,customer); –Group by 2 3 subsets of the three dimensions –Use NULL for the “*” –In SQL Server: GROUP BY … WITH CUBE To store the cube: CREATE MATERIALIZED VIEW myCube AS … cube-generating statement here … Lu Chaojun, SJTU

Variant of CUBE ROLLUP operator: SELECT store,item,customer,SUM(price) FROM Sales GROUP BY ROLLUP(store,item,customer); –Group by 4 subsets of the three dimensions: {store, item, customer}, {store, item}, {store}, {}. –In SQL Server: GROUP BY … WITH ROLLUP Lu Chaojun, SJTU

Operations on Cube: Dicing Dicing and Slicing –Each dimension is partitioned at some level of granularity. e.g., “store” dimension may be partitioned by store, by city, by province. “time” dimension may be partitioned by day, by week, by month, by year. –A choice of partition for each dimension “dices” the cube. –A choice of partition for one dimension generate a “slices” of the cube. Lu Chaojun, SJTU

Example: Slicing/Dicing SELECT city, color, SUM(price) FROM (((Sales NATURAL JOIN Stores) NATURAL JOIN Items) NATURAL JOIN Times) WHERE year = 2009 GROUP BY city, color; –Slice in time dimension, and dice in store and item dimension. Lu Chaojun, SJTU

24 Operations on Cube: Roll-up Roll-up –Aggregate along one or more dimensions. –From finer gruanularity to coarser granularity: Going up the dimension hierarchy. Reducing dimensions. Example –Given sales data of each store, roll it up into aggregated data by cities. –Or simply omit the store dimension.

25 Operations on Cube: Drill-down Drill-down –“de-aggregate”: break an aggregate into its constituents. –From coarser gruanularity to finer granularity: Going down the dimension hierarchy. Adding dimensions. Example: having found that Shop-1 doesn’t sell TV well, break down its TV sales by particular size, or by the Time dimension.

Example: Roll-up/Drill-down 26 TVPCRefrige Shop Shop Shop Qty by store/item Qty by province/item Roll up store by province Qty by city/item Drill down store by city TVPCRefrige Jiangsu TVPCRefrige Nanjing Suzhou737432

End