Presentation is loading. Please wait.

Presentation is loading. Please wait.

OLAP CS 543 – Data Warehousing. CS 543 - Data Warehousing (Sp 2007-2008) - Asim LUMS2 Where Does OLAP Fit In? (1) OLAP = On-line analytical processing.

Similar presentations


Presentation on theme: "OLAP CS 543 – Data Warehousing. CS 543 - Data Warehousing (Sp 2007-2008) - Asim LUMS2 Where Does OLAP Fit In? (1) OLAP = On-line analytical processing."— Presentation transcript:

1 OLAP CS 543 – Data Warehousing

2 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS2 Where Does OLAP Fit In? (1) OLAP = On-line analytical processing. OLAP is a characterization of applications, not a database design technique. Idea is to provide very fast response time in order to facilitate iterative decision-making. Analytical processing requires access to complex aggregations (as opposed to record-level access).

3 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS3 Where Does OLAP Fit In? (2) Information is conceptually viewed as “cubes” for simplifying the way in which users access, view, and analyze data. Quantitative values are known as “facts” or “measures.”  e.g., sales $, units sold, etc. Descriptive categories are known as “dimensions.”  e.g., geography, time, product, scenario (budget or actual), etc.  Dimensions are often organized in hierarchies that represent levels of detail in the data (e.g., UPC, SKU, product subcategory, product category, etc.).

4 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS4 OLAP FASMI Test Fast: Delivers information to the user at a fairly constant rate. Most queries should be delivered to the user in five seconds or less. Analysis: Performs basic numerical and statistical analysis of the data, pre-defined by an application developer or defined ad hoc by the user. Shared: Implements the security requirements necessary for sharing potentially confidential data across a large user population. Multi-dimensional: The essential characteristic of OLAP. Information: Accesses all the data and information necessary and relevant for the application, wherever it may reside and not limited by volume....from the OLAP Report by Pendse and Creeth.

5 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS5 Need for Multidimensional Analysis A simple analysis  How many units of product A did we sell in the store in DHA, Lahore Typically, decision support requires more complex analyses How much revenue did the new product X generate during the last three months, broken down by individual months, in the Southern Region, by individual stores, broken down by the promotions, compared to estimates, and compared to the previous version of the the product?

6 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS6 Kinds of Analyses Roll-ups to provide summaries and aggregates along the hierarchies of the dimensions Drill-downs from the top level to the lowest along the hierarchies of the dimensions Calculations involving facts and metrics Algebraic equations involving key performance indicators Moving averages and growth percentages Trend analyses using statistical methods

7 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS7

8 8

9 9 OLAP? The name On-Line Analytical Processing was coined in a paper by E.F. Codd in 1993 (“Providing On-Line Analytical Processing for User Analysts”) A definition  OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access in a a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user

10 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS10 OLAP Features

11 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS11 Dimensional Analysis (1)

12 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS12 Dimensional Analysis (2)

13 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS13 Some Queries Display the total sales of all products for past five years in all stores Compare total sales for all stores, product by product, between years 2000 and 1999. Show comparison of sales by individual stores, product by product, between years 2000 and 1999 only for those products with reduced sales. Show the results of the previous queries, but rotating the columns with rows

14 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS14 Hypercubes Multi-dimension cubes  Hard to visualize and display beyond three dimensions Multi-dimensional domain structure (MDS)  Represents each dimension as a line showing the values

15 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS15 MDS

16 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS16 Display of Hypercubes

17 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS17

18 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS18

19 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS19 Drill-Down and Roll-Up

20 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS20 Slice-and-Dice or Rotation

21 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS21 OLAP Models/Implementations MOLAP: OLAP implemented with a multi-dimensional database. ROLAP: OLAP implemented with a relational database. HOLAP: OLAP implemented with a hybrid of multi- dimensional and relational database technologies. DOLAP: OLAP implemented for desktop decision support environments.

22 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS22 ROLAP and MOLAP

23 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS23 MOLAP Implementations OLAP has historically been implemented through use of multi- dimensional databases (MDDs). Dimensions are key business factors for analysis:  geographies (zip, state, region,...)  products (item, product category, product department,...)  dates (day, week, month, quarter, year,...) Very high performance via fast look-up into “cube” data structure to retrieve pre-calculated results. “Cube” data structures allow pre-calculation of aggregate results for each possible combination of dimensional values. Use of application programming interface (API) for access via front-end tools.

24 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS24

25 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS25 MOLAP Implementations Need to consider both maintenance and storage implications when designing strategy for when to build cubes. Maintenance Considerations: Every data item received into MDD must be aggregated into every cube (assuming “to-date” summaries are maintained). Storage Considerations: Although cubes get much smaller (e.g., more dense) as dimensions get less detailed (e.g., year vs. day), storage implications for building hundreds of cubes can be significant.

26 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS26 MOLAP Implementations Typically outperform relational database technology because all answers are pre-computed into cubes (and overhead for accessing cubes is very low). Difficult to scale because of combinatorial explosion in the number and size of cubes when dimensions of significant cardinality are required. Beyond tens (sometimes small hundreds) of thousands of entries in a single dimension will break the MOLAP model because the pre-computed cube model does not work well when the cubes are very sparse in the population of individual cells. See www.olapreport.com/DataExplosion.htm

27 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS27 Virtual Cubes Virtual cubes are used when there is a need to join information from two dissimilar cubes that share one or more common dimensions. Similar to a relational view; two (or more) cubes are linked along common dimension(s). Often used to save space by eliminating redundant storage of information.

28 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS28 Partitioned Cubes One logical cube of data can be spread across multiple physical cubes on separate (or same) servers. The divide-and-conquer approach of partitioned cubes helps to mitigate the scalability limitations of a MOLAP environment. Ideal cube partitioning is completely invisible to end users.

29 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS29 ROLAP Implementations Advances in database technologies and front-end tools have begun to allow deployment of OLAP using ANSI SQL RDBMS implementations. ROLAP facilitates deployment of much larger dimension tables than MOLAP implementations. Front-end tools to facilitate GUI access to multi- dimensional analysis capabilities. Aggregate awareness allows exploitation of pre-built summary tables for some front-end tools. Star schema designs are often used to facilitate OLAP against relational databases.

30 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS30

31 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS31 Simplified Third Normal Form (Retail)

32 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS32 Simplified Star Schema

33 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS33 Simplified Star Schema A vastly simplified physical data model! Collapse dimensional hierarchies into a single table for each dimension and create a single fact table from the header and detail records: Fewer tables. Fewer joins to get results.

34 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS34 Star Schema for High Performance Business question: How many $ in raincoats did I sell in the first week of January through stores in Boston? Assume: 4 Billion rows in fact table. 20 different kinds (size, color, style) of raincoats (product category) out of 50,000 UPCs in store. 8 stores out of 400 are in BOSTON SMSA. 2 years of POS history in DBMS.

35 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS35 Star Schema for High Performance Simple (poor performance) approach to query execution: 1. Join item table with filtering on raincoat product category (very selective) to fact table. 2. Join date table with filtering by week (next most selective) to result table. 3. Join store table with filtering on store to result table from step 2. 4. Aggregate.

36 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS36 Star Schema for High Performance Advanced (better performance) approach to query execution: 1. Cartesian product join between dimensional tables. * Result is 20 x 8 x 7 = 1,120 rows. 2. Use composite index on item:store:day into fact table for very selective access. * Access less than 0.00000008 percent of data in fact table! Sophisticated cost-based optimizers will figure this out.

37 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS37 Forcing a Cartesian Product Join Add an addition “join_value” column in each dimensional table. Set join_value to same value in all rows of the dimensional tables. Add additional where clause predicates joining on this column between dimensional tables. NOTE: This shouldn't be necessary with a “smart” optimizer.

38 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS38 Forcing a Cartesian Product Join Sample code: select sum(sales.sales_amt) from d_sales_detail,store,item,period where d_sales_detail.store_id = store.store_id and d_sales_detail.item_id = item.item_id and d_sales_detail.day_dt = period.day_dt and period.day_dt between '23-NOV-2000' and '24- DEC-2000' and item.trade_style_cd = 'BARBIE' and store.state_cd = 'CA' and store.join_value = period.join_value and store.join_value = item.join_value and period.join_value = item.join_value ;

39 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS39 Star Schema for High Performance Problem: What if I want to know raincoat sales in first week of January regardless of store? Answer: Performance advantage of composite index in traditional RDBMS is severely impaired! B-tree indexing techniques do not allow for flexibility in the use of dimensions for query purposes. Bit indexing (and variations thereof) often allows much more generality in achieving high performance from a star schema.

40 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS40 Star Schema for High Performance Bottom Line: It is not at all unusual to obtain an order of magnitude (or more) in performance advantage using a star schema with advanced indexing versus a more traditional relational database implementation. Despite what vendors may tell you, star schemas cannot be effectively implemented for all DSS business applications and/or data models.

41 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS41 ROLAP Relational OLAP often makes heavy use of summary tables to provide near instantaneous access for multi-dimensional queries. Foundation is usually star schema or snowflake database design. Allows OLAP with much larger data sets than multi-dimensional database (MDD) products using cube structures (MOLAP).

42 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS42 ROLAP Number of summary tables can get very large if discipline is not enforced... Assume a retail database with the following two dimensions on the fact table... Calendar: Day, Week, Period, Quarter, Year, All Days Geography: Store, Zone, District, Region, All Stores

43 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS43 ROLAP Summary tables in a naive implementation require all combinations of the dimensions at each aggregation level... 30 summary tables!... Add in item, SKU, subcategory, category, and all items...now we are up to 150 pre- aggregates!

44 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS44 ROLAP Summary tables are more of a maintenance issue than a storage issue in most production implementations. Notice that summary tables get much smaller as dimensions get less detailed (e.g., year vs. day). Should plan for double the size of the unsummarized data for ROLAP summaries in most environments. Every detail record that is received into warehouse must aggregate into EVERY summary table (assuming "to-date" summaries are maintained).

45 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS45 ROLAP Warning: Do not assume that dimensions are always simple hierarchies. Example: Items are not just category, subcategory, SKU, and atomic item.... what about trade styles or manufacturer? Now we need summary tables along these lines as well...another 120 summary tables! Calendar vs. accounting period vs. billing cycle can be even worse...

46 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS46 ROLAP Many ROLAP products have devised ways to reduce the number of summary tables: Ability to build summaries on-the-fly as demanded by end- user applications. Ability to aggregate efficiently from subset of the summary tables. Tools exist in some products to assist in DBAs in selecting the "best” aggregations to build. HOLAP (Hybrid OLAP) tools allow co-existence of pre- built cubes alongside relational OLAP structures.

47 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS47 Intelligent Aggregation Selection Maximum performance boost implies lots of disk for every pre-calculation. Minimum performance boost implies no disk with zero pre-calculation. Strategy is to use meta data to heuristically determine optimum set of aggregates from which all other aggregates can be derived.

48 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS48 Aggregate Wizards

49 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS49 Fact Table Aggregates Enhance performance on common queries at coarser granularities. Save space to permit storing more history than possible with finer granularities. Take advantage of need to store other facts (with similar samples) at a particular granularity.

50 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS50 Aggregate Advice Coarser granularity decreases potential cardinality, but usually increases density (e.g., daily summary table is typically twice the size of weekly summary table - not seven times). Strongly consider omitting candidate aggregates where expected cardinality is more than 10% that of next finer granularity stored. Keep the detail for drill down, even if you deploy aggregates for performance.

51 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS51 Bottom Line There are many implementation techniques for delivery of an OLAP environment. Must fully consider the performance, scalability, complexity, and flexibility characteristics when deciding between MOLAP, ROLAP, and HOLAP. Understand your tools and RDBMS!

52 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS52 MOLAP Vs. ROLAP

53 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS53 Implementation Issues Data design and preparation Administration and performance OLAP platforms OLAP tools and products

54 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS54 Data Design and Preparation Characteristics of data  Stores and uses much less data compared to a DW  Data is summarized. You will rarely find data at the lowest levels of detail as in the DW  Data is more flexible for processing and analysis partly because there is much less data to work with  Every instance of the OLAP system in your environment is customized for the purpose that instance serves OLAP data is generally customized Types and levels  Static and dynamic summary data  Permanent and transient detailed data

55 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS55 Administration Administering and managing OLAP systems should be handled with that of the DW environment Some considerations  Expectations on what data would be accessed and how  Selection of the right business dimensions  Selection of the right filters in loading data from the DW  Choosing the aggregation, summarization, and precalculation  Size of the multidimensional database  Access and security privileges  Backup and restore facilities  Drill-through to the data warehouse; drill-through to another OLAP instance

56 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS56 Performance OLAP takes most of the queries that normalling would run against the DW OLAP is designed for complex queries; so it should enhance overall query performance of the DW environment OLAP can precalculate and pre-aggregate data for quick response

57 CS 543 - Data Warehousing (Sp 2007-2008) - Asim Karim @ LUMS57 OLAP Platforms Usually, the data warehouse and OLAP systems reside on the same platform in the start. Later when the data warehouse becmes large and OLAP is a common task, OLAP system is moved to another platform A separate platform is needed when  The size and usage of the DW consumes all resources  Many departmental users desire OLAP capabilities  The stability and performance of OLAP degrades  OLAP tools require a different platform configuration than the DW


Download ppt "OLAP CS 543 – Data Warehousing. CS 543 - Data Warehousing (Sp 2007-2008) - Asim LUMS2 Where Does OLAP Fit In? (1) OLAP = On-line analytical processing."

Similar presentations


Ads by Google