Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 On-Line Analytic Processing Warehousing Data Cubes.

Similar presentations


Presentation on theme: "1 On-Line Analytic Processing Warehousing Data Cubes."— Presentation transcript:

1 1 On-Line Analytic Processing Warehousing Data Cubes

2 2 Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, complex analytic queries. New architectures have been developed to handle analytic queries efficiently.

3 3 OLTP Most database operations involve On-Line Transaction Processing (OTLP). –Short, simple, frequent queries and/or modifications, each involving a small number of tuples. –Examples Answering queries from a Web interface sales at cash registers selling airline tickets

4 4 OLAP On-Line Analytic Processing (or A for “application”) queries: –Few, but complex queries –May query a large amount of data and run for hours –Do not depend on having an absolutely up-to- date database.

5 Example: OLAP Application Analysts at Wal-Mart look for items with increasing sales in some region recently. Sales(saledate,item,store,qty) Items(item,size,color) Stores(store,city,provice) SELECT item,city,SUM(qty) FROM Sales NATURAL JOIN Stores WHERE saledate >= ‘2009-1-1’ GROUP BY item,city;

6 6 Data Warehouse It’s better for OLAP applications to take place in a separate copy of the master database. Analysis may involve data from various sources across the enterprise. Data warehouse is the most common form of data integration. –Copy sources into a single DB (warehouse) and try to keep it up-to-date. –Usual method: periodic reconstruction of the warehouse, perhaps overnight.

7 7 Common Architecture Databases at store branches handle OLTP. Local store databases copied to a central warehouse overnight. Analysts use the warehouse for OLAP.

8 Star Schemas A star schema is a common organization for data at a warehouse. It consists of: –Fact table : a very large accumulation of facts such as sales. Often “insert-only.” –Dimension tables : smaller, generally static information about the entities involved in the facts.

9 9 Example Suppose we want to record in a warehouse information about sales of products: –the store where the item is sold –the item sold –the customer who bought the item –the time when the item is sold –the price The fact table is a relation: Sales(store, item, customer, timeID, price)

10 10 Example(cont.) The dimension tables include information about stores, items, customers and time “dimensions”: Stores(store,city,province) Items(item, size, color, manf) Customers(customer, addr, phone) Time(timeID, day, week, month, year)

11 Visualization: Star Schema 11 Dimension Table ItemsDimension Table Time Dimension Table CustomersDimension Table Stores Fact Table - Sales Dimension Attrs. Dependent Attr.

12 12 Dimension/Dependent Attributes Two classes of fact-table attributes: –Dimension attributes: the key of a dimension table. Foreign key for fact table. –Dependent attributes: a value determined by the dimension attributes of the tuple. More often called “measure” attributes.

13 13 Example: Dependent Attribute price is the dependent attribute of our example Sales relation. –Other dependent attributes can also be present, e.g., quantity. It is determined by the combination of dimension attributes: store, item, customer and time attributes.

14 14 Approaches to Implementation ROLAP = relational OLAP: Use relational DBMS to support star schemas. MOLAP = multidimensional OLAP: Use a specialized multidimensional data structure. –e.g., data cube HOLAP = hybrid OLAP: Use both the above.

15 15 Data Cube OLAP data can be modelled in a multidimensional space manner. Keys of dimension tables are the dimensions of a hypercube. –Example: for the Sales data, the four dimensions are store, item, customer and time. Dependent attributes (e.g., price) appear as points within the multidimensional space.

16 Visualization: Data Cube 16 price store item customer

17 17 Data Cube w/ Aggregations Raw-data cube: original data in the fact table. Formal data cube: also includes points that represent aggregation (typically SUM) of the raw-data grouped in all subsets of dimensions. –Precomputed aggregations –Critical for fast response upon an analytic query.

18 Visualization: Formal Data Cube 18 price store item customer SUM over all customers

19 Tuple w/ Aggregate Components Think of each dimension as having an additional value *. –Stands for “all”. A point with one or more *’s in its coordinates aggregates over the dimensions with the *’s. Example: Sales(‘Shop-1’, ‘TV’, *, *) holds the sum of prices, over all customers and all time, of the TV sets sold at Shop-1. 19

20 Building Data Cube in SQL In SQL:1999 SELECT store,item,customer,SUM(price) FROM Sales GROUP BY CUBE(store,item,customer); –Group by 2 3 subsets of the three dimensions –Use NULL for the “*” –In SQL Server: GROUP BY … WITH CUBE To store the cube: CREATE MATERIALIZED VIEW myCube AS … cube-generating statement here … Lu Chaojun, SJTU

21 Variant of CUBE ROLLUP operator: SELECT store,item,customer,SUM(price) FROM Sales GROUP BY ROLLUP(store,item,customer); –Group by 4 subsets of the three dimensions: {store, item, customer}, {store, item}, {store}, {}. –In SQL Server: GROUP BY … WITH ROLLUP Lu Chaojun, SJTU

22 Operations on Cube: Dicing Dicing and Slicing –Each dimension is partitioned at some level of granularity. e.g., “store” dimension may be partitioned by store, by city, by province. “time” dimension may be partitioned by day, by week, by month, by year. –A choice of partition for each dimension “dices” the cube. –A choice of partition for one dimension generate a “slices” of the cube. Lu Chaojun, SJTU

23 Example: Slicing/Dicing SELECT city, color, SUM(price) FROM (((Sales NATURAL JOIN Stores) NATURAL JOIN Items) NATURAL JOIN Times) WHERE year = 2009 GROUP BY city, color; –Slice in time dimension, and dice in store and item dimension. Lu Chaojun, SJTU

24 24 Operations on Cube: Roll-up Roll-up –Aggregate along one or more dimensions. –From finer gruanularity to coarser granularity: Going up the dimension hierarchy. Reducing dimensions. Example –Given sales data of each store, roll it up into aggregated data by cities. –Or simply omit the store dimension.

25 25 Operations on Cube: Drill-down Drill-down –“de-aggregate”: break an aggregate into its constituents. –From coarser gruanularity to finer granularity: Going down the dimension hierarchy. Adding dimensions. Example: having found that Shop-1 doesn’t sell TV well, break down its TV sales by particular size, or by the Time dimension.

26 Example: Roll-up/Drill-down 26 TVPCRefrige Shop-1453330 Shop-2503642 Shop-3383140 Qty by store/item Qty by province/item Roll up store by province Qty by city/item Drill down store by city TVPCRefrige Jiangsu133100112 TVPCRefrige Nanjing603680 Suzhou737432

27 End


Download ppt "1 On-Line Analytic Processing Warehousing Data Cubes."

Similar presentations


Ads by Google