Presentation is loading. Please wait.

Presentation is loading. Please wait.

On-Line Analytic Processing

Similar presentations


Presentation on theme: "On-Line Analytic Processing"— Presentation transcript:

1 On-Line Analytic Processing
Warehousing Data Cubes

2 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, complex analytic queries. New architectures have been developed to handle analytic queries efficiently.

3 OLTP Most database operations involve On-Line Transaction Processing (OTLP). Short, simple, frequent queries and/or modifications, each involving a small number of tuples. Examples Answering queries from a Web interface sales at cash registers selling airline tickets

4 OLAP On-Line Analytic Processing (or A for “application”) queries:
Few, but complex queries May query a large amount of data and run for hours Do not depend on having an absolutely up-to-date database.

5 Example: OLAP Application
Analysts at Wal-Mart look for items with increasing sales in some region recently. Sales(saledate,item,store,qty) Items(item,size,color) Stores(store,city,provice) SELECT item,city,SUM(qty) FROM Sales NATURAL JOIN Stores WHERE saledate >= ‘ ’ GROUP BY item,city;

6 Data Warehouse It’s better for OLAP applications to take place in a separate copy of the master database. Analysis may involve data from various sources across the enterprise. Data warehouse is the most common form of data integration. Copy sources into a single DB (warehouse) and try to keep it up-to-date. Usual method: periodic reconstruction of the warehouse, perhaps overnight.

7 Common Architecture Databases at store branches handle OLTP.
Local store databases copied to a central warehouse overnight. Analysts use the warehouse for OLAP.

8 Star Schemas A star schema is a common organization for data at a warehouse. It consists of: Fact table : a very large accumulation of facts such as sales. Often “insert-only.” Dimension tables : smaller, generally static information about the entities involved in the facts.

9 Example Suppose we want to record in a warehouse information about sales of products: the store where the item is sold the item sold the customer who bought the item the time when the item is sold the price The fact table is a relation: Sales(store, item, customer, timeID, price)

10 Example(cont.) The dimension tables include information about stores, items, customers and time “dimensions”: Stores(store,city,province) Items(item, size, color, manf) Customers(customer, addr, phone) Time(timeID, day, week, month, year)

11 Visualization: Star Schema
Dimension Table Stores Dimension Table Customers Dimension Attrs. Dependent Attr. Fact Table - Sales Dimension Table Items Dimension Table Time

12 Dimension/Dependent Attributes
Two classes of fact-table attributes: Dimension attributes: the key of a dimension table. Foreign key for fact table. Dependent attributes: a value determined by the dimension attributes of the tuple. More often called “measure” attributes.

13 Example: Dependent Attribute
price is the dependent attribute of our example Sales relation. Other dependent attributes can also be present, e.g., quantity. It is determined by the combination of dimension attributes: store, item, customer and time attributes.

14 Approaches to Implementation
ROLAP = relational OLAP: Use relational DBMS to support star schemas. MOLAP = multidimensional OLAP: Use a specialized multidimensional data structure. e.g., data cube HOLAP = hybrid OLAP: Use both the above.

15 Data Cube OLAP data can be modelled in a multidimensional space manner. Keys of dimension tables are the dimensions of a hypercube. Example: for the Sales data, the four dimensions are store, item, customer and time. Dependent attributes (e.g., price) appear as points within the multidimensional space.

16 Visualization: Data Cube
item price store customer

17 Data Cube w/ Aggregations
Raw-data cube: original data in the fact table. Formal data cube: also includes points that represent aggregation (typically SUM) of the raw-data grouped in all subsets of dimensions. Precomputed aggregations Critical for fast response upon an analytic query.

18 Visualization: Formal Data Cube
item SUM over all customers price store customer

19 Tuple w/ Aggregate Components
Think of each dimension as having an additional value *. Stands for “all”. A point with one or more *’s in its coordinates aggregates over the dimensions with the *’s. Example: Sales(‘Shop-1’, ‘TV’, *, *) holds the sum of prices, over all customers and all time, of the TV sets sold at Shop-1.

20 Building Data Cube in SQL
SELECT store,item,customer,SUM(price) FROM Sales GROUP BY CUBE(store,item,customer); Group by 23 subsets of the three dimensions Use NULL for the “*” In SQL Server: GROUP BY … WITH CUBE To store the cube: CREATE MATERIALIZED VIEW myCube AS … cube-generating statement here … Lu Chaojun, SJTU

21 Variant of CUBE ROLLUP operator: SELECT store,item,customer,SUM(price)
FROM Sales GROUP BY ROLLUP(store,item,customer); Group by 4 subsets of the three dimensions: {store, item, customer}, {store, item}, {store}, {}. In SQL Server: GROUP BY … WITH ROLLUP Lu Chaojun, SJTU

22 Operations on Cube: Dicing
Dicing and Slicing Each dimension is partitioned at some level of granularity. e.g., “store” dimension may be partitioned by store, by city, by province. “time” dimension may be partitioned by day, by week, by month, by year. A choice of partition for each dimension “dices” the cube. A choice of partition for one dimension generate a “slices” of the cube. Lu Chaojun, SJTU

23 Example: Slicing/Dicing
SELECT city, color, SUM(price) FROM (((Sales NATURAL JOIN Stores) NATURAL JOIN Items) NATURAL JOIN Times) WHERE year = 2009 GROUP BY city, color; Slice in time dimension, and dice in store and item dimension. Lu Chaojun, SJTU

24 Operations on Cube: Roll-up
Aggregate along one or more dimensions. From finer gruanularity to coarser granularity: Going up the dimension hierarchy. Reducing dimensions. Example Given sales data of each store, roll it up into aggregated data by cities. Or simply omit the store dimension.

25 Operations on Cube: Drill-down
“de-aggregate”: break an aggregate into its constituents. From coarser gruanularity to finer granularity: Going down the dimension hierarchy. Adding dimensions. Example: having found that Shop-1 doesn’t sell TV well, break down its TV sales by particular size, or by the Time dimension.

26 Example: Roll-up/Drill-down
Qty by store/item Qty by province/item Roll up store by province TV PC Refrige Shop-1 45 33 30 Shop-2 50 36 42 Shop-3 38 31 40 TV PC Refrige Jiangsu 133 100 112 Qty by city/item Drill down store by city TV PC Refrige Nanjing 60 36 80 Suzhou 73 74 32

27 End


Download ppt "On-Line Analytic Processing"

Similar presentations


Ads by Google