Chapter 34 OLAP Transparencies.

Slides:



Advertisements
Similar presentations
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
C6 Databases.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Online Analytical Processing OLAP
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA
Database Systems: Design, Implementation, and Management Tenth Edition
Manajemen Basis Data Pertemuan 8 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
CS2032 DATA WAREHOUSING AND DATA MINING
Chapter 33 OLAP Transparencies © Pearson Education Limited 1995, 2005.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Designing a Data Warehouse
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
1 Basic concepts of On-Line Analytical processing DT211 /4.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Warehouse & Data Mining
Multi-Dimensional Databases & Online Analytical Processing This presentation uses some materials from: “ An Introduction to Multidimensional Database Technology,
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
OnLine Analytical Processing (OLAP)
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
BUSINESS ANALYTICS AND DATA VISUALIZATION
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
UNIT-II Principles of dimensional modeling
Building Dashboards SharePoint and Business Intelligence.
OLAP in DWH Ján Genči PDT. 2 Outline OLAP Definitions and Rules The term OLAP was introduced in a paper entitled “Providing On-Line Analytical.
DATA RESOURCE MANAGEMENT
A POWER OF OLAP TECHNOLOGY National Technical University of Ukraine “Kiev Polytechnic Institute” Heat and energy design faculty Department of automation.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Chapter 6.  Problems of managing Data Resources in a Traditional File Environment  Effective IS provides user with Accurate, timely and relevant information.
What is OLAP?.
Data Warehousing.
Session id: Darrell Hilliard Senior Delivery Manager Oracle University Oracle Corporation.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
Pearson Education © Chapter 33 OLAP Transparencies.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Managing Data Resources File Organization and databases for business information systems.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
Chapter 13 The Data Warehouse
Data Warehouse.
Online Analytical Processing OLAP
Introduction of Week 9 Return assignment 5-2
OLAP in DWH Ján Genči PDT.
Presentation transcript:

Chapter 34 OLAP Transparencies

Chapter 34 - Objectives The purpose of Online Analytical Processing (OLAP). The relationship between OLAP and data warehousing. The key features of OLAP applications.

Chapter 34 - Objectives How to represent multi-dimensional data. The rules for OLAP tools. The main categories of OLAP tools. OLAP extensions to the SQL standard. How Oracle supports OLAP.

Business Intelligence Technologies Accompanying the growth in data warehousing is an ever-increasing demand by users for more powerful access tools that provide advanced analytical capabilities. There are two main types of access tools available to meet this demand, namely Online Analytical Processing (OLAP) and data mining.

Business Intelligence Technologies OLAP and Data Mining differ in what they offer the user and because of this they are complementary technologies. An environment that includes a data warehouse (or more commonly one or more data marts) together with tools such as OLAP and /or data mining are collectively referred to as Business Intelligence (BI) technologies.

Online Analytical Processing (OLAP) Original definition - The dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data, Codd (1993). Describes a technology that is designed to optimize the storing and querying of large volumes of multi-dimensional data that is aggregated (summarized) to various levels of detail to support the analysis of this data.

Online Analytical Processing (OLAP) Enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to a wide variety of possible views of the data. Allows users to view corporate data in such a way that it is a better model of the true dimensionality of the enterprise.

Online Analytical Processing (OLAP) Can easily answer ‘who?’ and ‘what?’ questions, however, ability to answer ‘why?’ type questions distinguishes OLAP from general-purpose query tools. Types of analysis ranges from basic navigation and browsing (slicing and dicing) to calculations, to more complex analyses such as time series and complex modeling.

OLAP Benchmarks OLAP Council published an analytical processing benchmark referred to as the APB-1 (OLAP Council, 1998). Aim is to measure a server’s overall OLAP performance rather than the performance of individual tasks.

OLAP Benchmarks APB-1 assesses the most common business operations including: bulk loading of data from internal or external data sources incremental loading of data from operational systems; aggregation of input level data along hierarchies;

OLAP Benchmarks APB-1 assesses the most common business operations including (continued): calculation of new data based on business models; time series analysis; queries with a high degree of complexity; drill-down through hierarchies; ad hoc queries; multiple online sessions.

OLAP Benchmarks OLAP applications are judged on their ability to provide just-in-time (JIT) information, a core requirement of supporting effective decision-making. This requirement is more than measuring processing performance but includes its abilities to model complex business relationships and to respond to changing business requirements.

OLAP Benchmarks APB-1 uses a standard benchmark metric called AQM (Analytical Queries per Minute). AQM represents the number of analytical queries processed per minute including data loading and computation time. Thus, the AQM incorporates data loading performance, calculation performance, and query performance into a singe metric.

OLAP Benchmarks Publication of APB-1 benchmark results must include both the database schema and all code required for executing the benchmark. An essential requirement of all OLAP applications is the ability to provide users with JIT information, which is necessary to make effective decisions about an organization's strategic directions.

OLAP Applications JIT information is computed data that usually reflects complex relationships and is often calculated on the fly. Also as data relationships may not be known in advance, the data model must be flexible.

Examples of OLAP applications in various functional areas

OLAP Applications Although OLAP applications are found in widely divergent functional areas, they all have the following key features: multi-dimensional views of data support for complex calculations time intelligence

OLAP Applications - multi-dimensional views of data Core requirement of building a ‘realistic’ business model. Provides basis for analytical processing through flexible access to corporate data. The underlying database design that provides the multi-dimensional view of data should treat all dimensions equally.

OLAP Applications - support for complex calculations Must provide a range of powerful computational methods such as that required by sales forecasting, which uses trend algorithms such as moving averages and percentage growth. Mechanisms for implementing computational methods should be clear and non-procedural.

OLAP Applications – time intelligence Key feature of almost any analytical application as performance is almost always judged over time. Time hierarchy is not always used in the same manner as other hierarchies. Concepts such as year-to-date and period-over-period comparisons should be easily defined.

Multi-dimensional Data and OLAP cubes Multi-dimensional data is facts (numeric measurements) such as property sales revenue data and the association of this data with dimensions such as location (of the property) and time (of the property sale). Which is the best representation of multi-dimensional data: relational table, matrix or data cube?

Multi-dimensional Data as 3-field Table versus 2-D Matrix

Multi-dimensional Data as 4-field Table versus 3-D Cube

Multi-dimensional Data as series of 3-D Cubes

Multi-dimensional data and OLAP cubes We consider cubes as solid 3-D structures with equal sides. However, the OLAP cube is n-dimensional structure (with sides that need not be equal). Alternative representation for n-dimensional data is to consider a data cube as a lattice of cuboids. Each cuboid represents a subset of the given dimensions.

Multi-dimensional data and OLAP cubes all location type office time time, type time, office time, location type, office location, office location, type location, type, office time, type, office time, location, office time, location, type time, location, type, office 0-D cuboid (highest-level) 1-D cuboid 2-D cuboid 3-D cuboid 4-D cuboid (lowest-level)

Dimensionality Hierarchy The lattice of cuboids does not show the hierarchies that are commonly associated with dimensions. A dimensional hierarchy defines mappings from a set of lower-level concepts to higher level concepts.

Dimensionality Hierarchy country season month quarter day week region city area zipCode year 2-D data

Dimensional Operations The analytical operations that can be performed on data cubes include: Roll-up Drill-down Slice and Dice Pivot

Dimensional Operations Roll-up performs aggregations on the data by moving up the dimensional hierarchy or by dimensional reduction e.g. 4-D sales data to 3-D sales data. Drill-down is the reverse of roll-up and involves revealing the detailed data that forms the aggregated data. Drill-down can be performed by moving down the dimensional hierarchy or by dimensional introduction e.g. 3-D sales data to 4-D sales data.

Dimensional Operations Slice and dice - ability to look at data from different viewpoints. The slice operation performs a selection on one dimension of the data whereas dice uses two or more dimensions. For example a slice of sales revenue (type = ‘Flat’) and a dice (type = ‘Flat’ and time = ‘Q1’).

Dimensional Operations Pivot - ability to rotate the data to provide an alternative view of the same data e.g. sales revenue data displayed using the location (city) as x-axis against time (quarter) as the y-axis can be rotated so that time (quarter) is the x-axis against location (city) is the y-axis.

OLAP Tools There are many varieties of OLAP tools available in the marketplace. This choice has resulted in some confusion with much debate regarding what OLAP actually means to a potential buyer and in particular what are the available architectures for OLAP tools.

Codd’s Rules for OLAP Systems In 1993, E.F. Codd formulated twelve rules as the basis for selecting OLAP tools.

Codd’s Rules for OLAP Systems Multi-dimensional conceptual view Transparency Accessibility Consistent reporting performance Client-server architecture Generic dimensionality

Codd’s rules for OLAP Dynamic sparse matrix handling Multi-user support Unrestricted cross-dimensional operations Intuitive data manipulation Flexible reporting Unlimited dimensions and aggregation levels

Codd’s Rules for OLAP Systems There are proposals to re-defined or extended the rules. For example to also include Comprehensive database management tools Ability to drill down to detail (source record) level Incremental database refresh SQL interface to the existing enterprise environment

Categories of OLAP Tools OLAP tools are categorized according to the architecture used to store and process multi-dimensional data. There are three main categories: Multi-dimensional OLAP (MOLAP) Relational OLAP (ROLAP) Hybrid OLAP (HOLAP)

Multi-dimensional OLAP (MOLAP) Use specialized data structures and multi-dimensional Database Management Systems (MDDBMSs) to organize, navigate, and analyze data. Data is typically aggregated and stored according to predicted usage to enhance query performance.

Multi-dimensional OLAP (MOLAP) Use array technology and efficient storage techniques that minimize the disk space requirements through sparse data management. Provides excellent performance when data is used as designed, and the focus is on data for a specific decision-support application.

Multi-dimensional OLAP (MOLAP) Traditionally, require a tight coupling with the application layer and presentation layer. Recent trends segregate the OLAP from the data structures through the use of published application programming interfaces (APIs).

Typical Architecture for MOLAP Tools

MOLAP Tools - Development Issues Underlying data structures are limited in their ability to support multiple subject areas and to provide access to detailed data. Navigation and analysis of data is limited because the data is designed according to previously determined requirements.

MOLAP Tools - Development Issues MOLAP products require a different set of skills and tools to build and maintain the database, thus increasing the cost and complexity of support.

Relational OLAP (ROLAP) Fastest-growing style of OLAP technology due to requirements to analyze ever-increasing amounts of data and the realization that users cannot store all the data they require in MOLAP databases.

Relational OLAP (ROLAP) Supports RDBMS products using a metadata layer - avoids need to create a static multi-dimensional data structure - facilitates the creation of multiple multi-dimensional views of the two-dimensional relation.

Relational OLAP (ROLAP) To improve performance, some products use SQL engines to support the complexity of multi-dimensional analysis, while others recommend, or require, the use of highly denormalized database designs such as the star schema.

Typical Architecture for ROLAP Tools

ROLAP Tools - Development Issues Performance problems associated with the processing of complex queries that require multiple passes through the relational data. Middleware to facilitate the development of multi-dimensional applications. (Software that converts the two-dimensional relation into a multi-dimensional structure).

ROLAP Tools - Development Issues Development of an option to create persistent, multi-dimensional structures with facilities to assist in the administration of these structures.

Hybrid OLAP (HOLAP) Provide limited analysis capability, either directly against RDBMS products, or by using an intermediate MOLAP server. Deliver selected data directly from the DBMS or via a MOLAP server to the desktop (or local server) in the form of a datacube, where it is stored, analyzed, and maintained locally.

Hybrid OLAP (HOLAP) Promoted as being relatively simple to install and administer with reduced cost and maintenance.

Typical Architecture for HOLAP Tools

HOLAP Tools - Development Issues Architecture results in significant data redundancy and may cause problems for networks that support many users. Ability of each user to build a custom datacube may cause a lack of data consistency among users. Only a limited amount of data can be efficiently maintained.

Desktop OLAP (DOLAP) Store the OLAP data in client-based files and support multi-dimensional processing using a client multi-dimensional engine. Requires that relatively small extracts of data are held on client machines. They may be distributed in advance, or created on demand (possibly through the Web).

OLAP Extensions to SQL Advantages of SQL include that it is easy to learn, non-procedural, free-format, DBMS-independent, and that it is a recognized international standard. However, major limitation of SQL is the inability to answer routinely asked business queries such as computing the percentage change in values between this month and a year ago or to compute moving averages, cumulative sums, and other statistical functions.

OLAP Extensions to SQL Answer is ANSI adopted a set of OLAP functions as an extension to SQL to enable these calculations as well as many others that used to be impossible or even impractical within SQL. IBM and Oracle jointly proposed these extensions early in 1999 and they now form part of the current SQL standard, namely SQL: 2008.

OLAP Extensions to SQL - RISQL The extensions are collectively referred to as the ‘OLAP package’ and are described as follows: Feature T431, ‘Extended Grouping capabilities’ Feature T611, ‘Extended OLAP operators’

Extended Grouping Capabilities Aggregation is a fundamental part of OLAP. To improve aggregation capabilities the SQL standard provides extensions to the GROUP BY clause such as the ROLLUP and CUBE functions.

Extended Grouping Capabilities ROLLUP supports calculations using aggregations such as SUM, COUNT, MAX, MIN, and AVG at increasing levels of aggregation, from the most detailed up to a grand total. CUBE is similar to ROLLUP, enabling a single statement to calculate all possible combinations of aggregations. CUBE can generate the information needed in cross-tabulation reports with a single query.

Extended Grouping Capabilities ROLLUP and CUBE extensions specify exactly the groupings of interest in the GROUP BY clause and produces a single result set that is equivalent to a UNION ALL of differently grouped rows.

Extended Grouping Capabilities ROLLUP Extension to GROUP BY enables a SELECT statement to calculate multiple levels of subtotals across a specified group of dimensions. ROLLUP appears in the GROUP BY clause in a SELECT statement using the following format: SELECT ... GROUP BY ROLLUP(columnList)

Extended Grouping Capabilities ROLLUP creates subtotals that roll up from the most detailed level to a grand total, following a column list specified in the ROLLUP clause. ROLLUP first calculates the standard aggregate values specified in the GROUP BY clause and then creates progressively higher level subtotals, moving from right to left through the column list until finally completing with a grand total.

Extended Grouping Capabilities ROLLUP creates subtotals at n + 1 levels, where n is the number of grouping columns. For instance, if a query specifies ROLLUP on grouping columns of propertyType, yearMonth, and city (n = 3), the result set will include rows at 4 aggregation levels.

Example - Using the ROLLUP Group Function Show the totals for sales of flats or houses by branch offices located in Aberdeen, Edinburgh, or Glasgow for the months of August and September of 2008.

Example - Using the ROLLUP Group Function SELECT propertyType, yearMonth, city, SUM(saleAmount) AS sales FROM Branch, PropertyFor Sale, PropertySale WHERE Branch.branchNo = PropertySale.branchNo AND PropertyForSale.propertyNo = PropertySale.propertyNo AND PropertySale.yearMonth IN ('2008-08', '2008-09') AND Branch.city IN (‘Aberdeen’, ‘Edinburgh’, ‘Glasgow’) GROUP BY ROLLUP(propertyType, yearMonth, city);

Example - Using the ROLLUP Group Function

Extended Grouping Capabilities CUBE Extension to GROUP BY CUBE takes a specified set of grouping columns and creates subtotals for all of the possible combinations. CUBE appears in the GROUP BY clause in a SELECT statement using the following format: SELECT ... GROUP BY CUBE(columnList)

Extended Grouping Capabilities CUBE generates all the subtotals that could be calculated for a data cube with the specified dimensions. CUBE can be used in any situation requiring cross-tabular reports. The data needed for cross-tabular reports can be generated with a single SELECT using CUBE. Like ROLLUP, CUBE can be helpful in generating summary tables.

Extended Grouping Capabilities CUBE is typically most suitable in queries that use columns from multiple dimensions rather than columns representing different levels of a single dimension.

Example - Using the CUBE Group Function Show all possible subtotals for sales of properties by branches offices in Aberdeen, Edinburgh, and Glasgow for the months of August and September of 2008.

Example - Using the CUBE Group Function SELECT propertyType, yearMonth, city, SUM(saleAmount) AS sales FROM Branch, PropertyFor Sale, PropertySale WHERE Branch.branchNo = PropertySale.branchNo AND PropertyForSale.propertyNo = PropertySale.propertyNo AND PropertySale.yearMonth IN ('2008-08', '2008-09') AND Branch.city IN (‘Aberdeen’, ‘Edinburgh’, ‘Glasgow’) GROUP BY CUBE(propertyType, yearMonth, city);

Example - Using the CUBE Group Function

Elementary OLAP Operators Supports a variety of operations such as rankings and window calculations. Ranking functions include cumulative distributions, percent rank, and N-tiles. Windowing allows the calculation of cumulative and moving aggregations using functions such as SUM, AVG, MIN, and COUNT.

Elementary OLAP Operators Ranking Functions Computes the rank of a record compared to other records in the dataset based on the values of a set of measures. There are various types of ranking functions, including RANK and DENSE_RANK. The syntax for each ranking function is: RANK( ) OVER (ORDER BY columnList) DENSE_RANK( ) OVER (ORDER BY columnList)

Elementary OLAP Operators The difference between RANK and DENSE_RANK is that DENSE_RANK leaves no gaps in the sequential ranking sequence when there are ties for a ranking.

Example - Using the RANK and DENSE_RANK Functions Rank the total sales of properties for branch offices in Edinburgh. SELECT branchNo, SUM(saleAmount) AS sales, RANK() OVER (ORDER BY SUM(saleAmount)) DESC AS ranking, DENSE_RANK() OVER (ORDER BY SUM(saleAmount)) DESC AS dense_ranking FROM Branch, PropertySale WHERE Branch.branchNo = PropertySale.branchNo AND Branch.city = ‘Edinburgh’ GROUP BY(branchNo);

Example - Using the RANK and DENSE_RANK Functions

Elementary OLAP Operators Supports a variety of operations such as rankings and window calculations. Ranking functions include cumulative distributions, percent rank, and N-tiles. Windowing allows the calculation of cumulative and moving aggregations using functions such as SUM, AVG, MIN, and COUNT.

Elementary OLAP Operators Windowing Calculations Can be used to compute cumulative, moving, and centered aggregates. They return a value for each row in the table, which depends on other rows in the corresponding window.

Elementary OLAP Operators Windowing Calculations Can be used to compute cumulative, moving, and centered aggregates. They return a value for each row in the table, which depends on other rows in the corresponding window. These aggregate functions provide access to more than one row of a table without a self-join and can be used only in the SELECT and ORDER BY clauses of the query.

Example - Using Windowing Calculations Show the monthly figures and three-month moving averages and sums for property sales at branch office B003 for the first six months of 2008.

Example - Using Windowing Calculations SELECT yearMonth, SUM(saleAmount) AS monthlySales, AVG(SUM(saleAmount)) OVER (ORDER BY yearMonth, ROWS 2 PRECEDING) AS 3-month moving avg, SUM(SUM(salesAmount)) OVER (ORDER BY yearMonth ROWS 2 PRECEDING) AS 3-month moving sum FROM PropertySale WHERE branchNo = ‘B003’ AND yearMonth BETWEEN ('2008-01' AND '2008-06’) GROUP BY yearMonth ORDER BY yearMonth;

Example - Using Windowing Calculations