Presentation on theme: "Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007."— Presentation transcript:
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007
What is OLAP Basic idea: converting data into information that decision makers need Concept to analyze data by multiple dimension in a structure called data cube
History In 1993, E. F. Codd came up with the term online analytical processing (OLAP) and proposed 12 criteria to define an OLAP database online analytical processing (OLAP)online analytical processing (OLAP) the term OLAP seems perfect to describe databases designed to facilitate decision making (analysis) in an organization
Purpose of OLAP To derive summarized information from large volume database To generate automated reports for human view
Why need OLAP over Relational Database I Consistently fast response OLAP obtains a consistently fast response is by prestoring calculated values
Why need OLAP over Relational Database II Metadata-based queries provide analysis functions that are difficult or impossible to express in SQL SQL was developed primarily for transaction systems, not for reporting applications
Why need OLAP over Relational Database III Spreadsheet-style formulas design the data structure with users in mind. Spreadsheets are key components of business management because they are intuitive to create
Step I 1.identify multidimensional data measure attribute (measure some value, can be aggregated upon) (measure some value, can be aggregated upon) dimension attribute (define the dimension and summary of measure attribute) (define the dimension and summary of measure attribute)
(Cont.) Each dimension is typically expressed as a “hierarchy” Hierarchy: Analyst is interested in different level of detail of a dimension
Step II 2.Analyze multidimensional data into cross-tabulation row header: value for one attribute row header: value for one attribute column header: value for another attr. column header: value for another attr. individual cell: value aggregation individual cell: value aggregation
Step III 3.Visualize n-dimensional cube - data cube the word CUBE describe what in the the word CUBE describe what in the relational world would be the integration relational world would be the integration of the fact table with dimension tables of the fact table with dimension tables
Step IV After you design the cube, you will use the cube's structure to build a relational database (known as a star schema) to house the data for the cube
Step V Once you load data into the relational database, and then into the cube, you'll be able to see how attributes, dimensions, measures, and measure groups fit together within a cube to create a powerful analytical tool.
Star Schema Cubes are easily stored in relational databases, using a denormalized data structure called the star schema, developed by Ralph Kimball starts with a central fact table Each row in the central fact table contains some combination of keys that makes it unique. These keys are called dimensions.
Slicing & Dicing Additional Functionality that can be thought of as viewing a slice of the data cube, particularly when values for multiple dimensions are fixed. Slicing/Dicing simply consists of selecting specific values for these attributes, which are then displayed on top of the cross-tab
MOLAP The database is stored in a special, usually proprietary, structure that is optimized for multidimensional analysis. + : very fast query response time because data is mostly pre-calculated -: practical limit on the size because the time taken to calculate the database and the space required to hold these pre-calculated values
ROLAP The database is a standard relational database and the database model is a multidimensional model, often referred to as a star or snowflake model or schema. +: more scalable solution -: performance of the queries will be largely governed by the complexity of the SQL and the number and size of the tables being joined in the query
HOLAP a hybrid of ROLAP and MOLAP can be thought of as a virtual database whereby the higher levels of the database are implemented as MOLAP and the lower levels of the database as ROLAP
DOLAP The previous terms are used to refer to server based OLAP technologies DOLAP (Desktop OLAP) DOLAP enables users to quickly pull together small cubes that run on their desktops or laptops
Conclusion OLAP is a significant improvement over query systems OLAP is an interactive system to show different summaries of multidimensional data by interactively selecting the attributes in a multidimensional data cube
References IBM Redbooks. DB2 Cube Views: A Primer. Durham, NC, USA: IBM, 2003. ebrary collections. San Jose State University. IBM Redbooks. DB2 Cube Views: A Primer. Durham, NC, USA: IBM, 2003. ebrary collections. San Jose State University. Jacobson, Reed, Microsoft® SQL Server™ 2005 Analysis Services Step by Step. Microsoft Press. Berry, Michael J. A. Data Mining Techniques : For Marketing, Sales, and Customer Relationship Management. Hoboken, NJ, USA: John Wiley & Sons, Incorporated, 2004. ebrary collections. San Jose State University..