Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.

Slides:



Advertisements
Similar presentations
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Advertisements

C6 Databases.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Data Warehousing M R BRAHMAM.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
OLAP. Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, analytic queries.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25, Part A.
Lab3 CPIT 440 Data Mining and Warehouse.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
DATA WAREHOUSE (Muscat, Oman).
A Comparsion of Databases and Data Warehouses Name: Liliana Livorová Subject: Distributed Data Processing.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
On-Line Analytic Processing Chetan Meshram Class Id:221.
OnLine Analytical Processing (OLAP)
Datawarehouse Objectives
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Data Warehouse & OLAP Kuliah 1 Introduction Slide banyak mengambil dari acuan- acuan yang dipakai.
Data Warehousing.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Data Warehousing and OLAP. Warehousing ► Growing industry: $8 billion in 1998 ► Range from desktop to huge:  Walmart: 900-CPU, 2,700 disk, 23TB Teradata.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Section D.  OLAP (or Online Analytical Processing) has been growing in popularity due to the increase in data volumes and the recognition of the business.
Winter 2006Winter 2002 Keller, Ullman, CushingJudy Cushing 19–1 Warehousing The most common form of information integration: copy sources into a single.
Ayyat IT Group Murad Faridi Roll NO#2492 Muhammad Waqas Roll NO#2803 Salman Raza Roll NO#2473 Junaid Pervaiz Roll NO#2468 Instructor :- “ Madam Sana Saeed”
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
1 On-Line Analytic Processing Warehousing Data Cubes.
On-Line Application Processing Warehousing Data Cubes (Data Mining) (slides borrowed from Stanford)
OLAP & Data Warehousing. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Data Warehousing.
Advanced Database Concepts
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
I am Xinyuan Niu I am here because I love to give presentations. Data Warehousing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
An Overview of Data Warehousing and OLAP Technology
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Data Analysis Decision Support Systems Data Analysis and OLAP Data Warehousing.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
11/20/ :11 AMData Mining 1 Data Mining – CSE 9033 Chapter – 1; Data Warehousing Dr. Goutam Sarker, B.E., M.E., Ph.D.(Engineering), Fellow: IE(I),
Pertemuan <<13>> Data Warehousing dan Decision Support
Data warehouse.
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
On-Line Analytic Processing
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Data Warehouse and OLAP
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
DATA CUBES E0 261 Jayant Haritsa Computer Science and Automation
Data Warehouse and OLAP
Presentation transcript:

Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems

Online Transactional Processing (OLTP) Traditional database application is focused on Online Transactional Processing (OLTP), –Short, simple queries and frequent updates involving a relatively small number of tuples e.g., recording sales at cash-registers, selling airline tickets.

Online Analytical processing (OLAP) Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business strategies. Emphasis is on complex, interactive, exploratory analysis of very large datasets created by integrating data from across all parts of an enterprise; e.g. a company analyzes purchases by all its customers to come up with new products of likely interest to the customers. Called Online Analytical Processing (OLAP). The data used for OLAP are usually stored in a Data Warehouse

OLAP vs OLTP OLTP  Many updates  Many small transactions  Quick response  Mb-Gb of data  Raw data  Up-to-date data  Clerical users  Consistency, recoverability critical OLAP  Mostly reads, updates rare  Queries long, complex  Allow slower response  Gb-Tb of data  Summarized, consolidated data  Current and historical data  Decision-makers, analysts as users  Minor inconsistency is allowed

The raw data Car_sales table For analysis, raw data often needs to be summarized

OLAP:example  Example: find what kinds of cars are popular? sales(make, color, size, num_sold) (slightly summarized data) where make can be Toyota, Nissan, Holden, Ford etc colors are white, red, silver size can be small, medium, large.  Attributes such as num_sold are called measure attributes, since they can be used to measure some value, and can be aggregated.  Attributes like make, color, size are called dimension attributes, since they define the dimensions on which measure attributes are viewed. Data that can be modeled as dimension attributes and measure attributes are called multi-dimensional data.

Dimension Hierarchies

Cross Tabs and Data Cubes  OLAP systems allow analyst to view different summaries of the data.  The following table can be derived from sales(make, color, size, num_sold) Cross-tab or pivot table Relational representation

Data Cubes The generalization of a cross tab, which is 2- dimensional, to n dimensions can be visualized as a n- dimensional cube, called the data cube. white red silver all color ToyotaNissanHolden Ford all

MOLAP vs ROLAP  OLAP systems can use multi-dimensional array to store data cubes, called multidimensional OLAP systems (MOLAP).  Alternatively, they can stored data as relations in relational databases, called relational OLAP systems (ROLAP).

ROLAP  The main relation, which relates dimensions to measures, is called the fact table.  e.g., sales(prod_id, date, shop_id, num_sold)  Very large, accumulation of facts such as sales  Each dimension can have additional attributes and an associated dimensional table.  E.g., product(prod_id, price, color) prod_id is a foreign key of sales shops(shop_id, location, manager)  Dimension data are smaller, generally static

The Star Schema  In a ROLAP system, relations are often stored with star schemas  A star schema consists of the fact table and one or more dimension tables. Dimension tables are usually not normalized, why?  A typical query often involves a join of the fact table and the dimension tables. prod_id date shop_id num_sold sales prod_id Price color shop_id Location manager

The Star Schema Dimension tables are not in 3NF

The snowflake schema A variation of the star schema where the dimension tables are normalized.

Fact constellation A set of fact tables that share some dimension tables

OLAP Queries  A common operation is to aggregate a measure over one or more dimensions, e.g.,  find total/average sales for a product.  find total sales in each city/state/month etc  find top 2 products by total sales  Roll-up: moving from finer granularity to coarser granularity by means of aggregation.  E.g., given total sales for each city, find total sales for each state.  Drill-down: The inverse of roll-up  Pivoting: aggregate on selected dimensions  Slicing and dicing:  E.g., from the data cube find the cross-tab on Model and Color for medium cars. The cross-tab can be viewed as a slice of the data cube.

Query Processing Issues  Expensive aggregations are common  Pre-compute all aggregates? Maybe infeasible!  Materialized views can help.  Which views to materialize?  given a query and some materialized views, can we use the views to answer the query? How?  How frequently should we refresh the views to make them consistent with the underlying tables?  What indexes should one use?

SQL:1999 Extended Aggregations* Example 1 Select make, color, size, sum(number) from sales group by cube(make, color, size) Calculates 8 groupings: (make, color, size), (make, color), (make, size), …., (). Example 2 Select make, color, sum(number) from sales Group by rollup(make, color, size) Calculates 4 groupings: (make, color, size), (make, color), (make), ().

Examples in Oracle: Rollup

Oracle Rollup Example

Data Warehouse  A repository of information gathered from multiple sources, stored under a unified schema, usually at a single site.  Data may be augmented with additional attributes, such as timestamp, and summary information.  Data are stored for a long time, permitting access to historical data.  Interactive response times expected for complex queries; ad-hoc updates uncommon.

Building Data Warehouse Issues: –Semantic integration: When getting data from multiple sources, must eliminate mismatches, e.g., different currencies. –Heterogeneous sources: must access data from a variety of source formats. –Load, refresh, purge: Must load data, periodically refresh it, and purge too old or useless data –Metadata management: Must keep track of source, loading time, etc.

Elements of data warehouse Operational Data EIS/DSS Apps Data Replication & Cleansing Data Metadata Informational Database Information Directory Metadata Query SQL Query

Elements of data warehouse  Data Replication Manager  copying & distribution of data across databases data that needs to be copied, source/destination, frequency, data transforms refresh copy entire source, propagate changes only  all external data is transformed & cleansed before adding to warehouse  Informational Database  database that stores data copied from multiple sources by data replication manager  Information Directory  metadata manager - collects metadata from databases on network  EIS/DSS tools  SQL based query tools  some vendors use extended SQL

Query/Reporting tools  Formulate queries without (extended) SQL or other languages  Result displayed as table, graph, report,  Spreadsheet systems  Web interfaces  Vendor-specific tools  Oracle Discoverer:

Column stores A recently proposed data storage method that allows more efficient aggregation queries in data warehouses stores data as columns rather than as rows. See oriented_DBMS.