CS346: Advanced Databases

Slides:



Advertisements
Similar presentations
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Introduction to Data Warehousing CPS Notes 6.
Data Warehousing M R BRAHMAM.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Data Warehouse IMS5024 – presented by Eder Tsang.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 29 Overview of Data Warehousing and OLAP.
1 9 Adv. DBMS Data Warehouse CSC5301 Review Hachim Haddouti.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
1 © Prentice Hall, 2002 Chapter 11: Data Warehousing.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
1 Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously.  A decision support database that is maintained.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie.
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Data Warehousing.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
CISB594 – Business Intelligence
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
1 On-Line Analytic Processing Warehousing Data Cubes.
Data Mining Data Warehouses.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Chapter 6.  Problems of managing Data Resources in a Traditional File Environment  Effective IS provides user with Accurate, timely and relevant information.
CISB594 – Business Intelligence Data Warehousing Part I.
Data Warehousing.
Business Intelligence Training Siemens Engineering Pakistan Zeeshan Shah December 07, 2009.
Advanced Database Concepts
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
Presented By: Pedel Oppong-Abebrese,Pedel Oppong-Abebrese Michael Boadi, William Osei, Nana Amoa OforiMichael BoadiWilliam OseiNana Amoa Ofori DATA WAREHOUSING.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Overview of Data Warehousing and OLAP
Overview of Data Warehousing (DW) and OLAP
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
On-Line Analytic Processing
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Data Warehouse and OLAP
Data Warehouse Overview September 28, 2012 presented by Terry Bilskie
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Data Warehousing Concepts
Data Warehouse and OLAP
Presentation transcript:

CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Data Warehousing and OLAP

Outline Chapter: “Overview of Data Warehousing and OLAP” in Elmasri and Navathe What is a data warehouse and what is it for? The multidimensional data model and common schema designs Special indexes: bitmap and join indexes Why? Another model of data to study and contrast with RDBMS A different perspective on using data for insight A relatively recent development (1990s): still developing CS346 Advanced Databases

CS910 Foundations of Data Analytics Data Warehouses Data Warehouses were introduced to handle large data stores Typically, historical data for business analytic purposes Separate from the organization’s “live” operational database “A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data” W. Inmon, “father of data warehouse” Subject-oriented: focused on one topic (e.g. all sales records) Integrated: data brought together from many sources, cleaned Time-variant: covers a long history of data (e.g. last decade) Nonvolatile: only periodically updated, not “live” data Data warehouse products from Oracle, IBM, Microsoft, Teradata CS910 Foundations of Data Analytics

OLAP, OLTP, DSS OLAP: Online Analytical Processing Analysis of complex data stored in a data warehouse Often using distributed storage and processing (Hive…) In contrast to Online Transaction Processing (OLTP) Insertions, updates, deletions and queries Decision Support Systems or Enterprise Information Systems Allow organization’s leader to make complex strategic decisions Support data mining / machine learning for knowledge discovery “Business Intelligence” CS346 Advanced Databases

Data Warehouse Characteristics Data Warehouses adopt a different data model to RDBMS Typically a multidimensional data model Warehouses often store integrated data from many sources Contrast to DBMS which encourages multiple disjoint DBs Warehouses typically support time-series, trend analysis Need more historical data, not just the current values Warehouses typically nonvolatile Data is added to only periodically. No need for transactions! Warehouses typically handle very large amounts of data Often two orders of magnitude (100x) larger than “live” databases May be terabytes-petabytes in size CS346 Advanced Databases

ETL: Extract, Transform, Load Putting data into a data warehouse is a complex process Denoted ETL: Extract, Transform, Load Extract: pull data out of whatever system it is stored in Via appropriate interchange format: XML, flat files, etc. Transform: put data into a usable format Pick which attributes, harmonize formats, sort and join as needed Format for consistency: names of entities should agree Clean the data: identify errors and fill in missing values Return cleaned data to update original source: backflushing Fit the data to the model of the warehouse: ensure it fits schema CS346 Advanced Databases

ETL: Extract, Transform, Load Load: store in an appropriate format Many warehouses use simple structures, e.g. sorted flat files Refresh policy: How up to date is the data? Can it be offline? How long does it take to load into the warehouse? Store metadata on the data as well: metadata repository Technical metadata: how data was processed, stored, updated Business metadata: relevant business rules and organization details CS346 Advanced Databases

Characteristics of Data Warehouses A few key properties of data warehouses (DW): Multidimensional: allow many levels of aggregation Support multiple users via client-server architecture Should be intuitive and responsive to use Many variations of the central concept: Enterprise-wide DW: corral everything about an organization Virtual DWs: provide a materialized view of an operational DB Data marts: DWs restricted to a subset of an organization Two common architectures for warehouses: Distributed: must handle replication, partitioning, consistency Federated: collection of autonomous warehouses (data marts) CS346 Advanced Databases

CS910 Foundations of Data Analytics OLAP and Data Cubes Warehouses often support Online Analytical Processing (OLAP) A multidimensional view of data Represents data as a data cube Explored by aggregating or refining dimensions in the data CS910 Foundations of Data Analytics

Aggregating Multidimensional Data E.g. Sales volume as a function of product, month, and region Dimensions: Product, Location, Time Hierarchical summarization paths Region Industry Region Year Category Country Quarter Product City Month Week Office Day Product Month

A Sample Data Cube * (all) Total annual sales of TVs in U.S.A. 1Qtr Date Product Country sum TV DVD PC 1Qtr 2Qtr 3Qtr 4Qtr U.S.A Canada Mexico * (all)

CS910 Foundations of Data Analytics OLAP Operations Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction Drill down (roll down): inverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and dice: project and select Zoom in on particular value, or drop some attributes Apply aggregation: on a given dimension Count, Sum, Min, Max, Average, Variance, Median, Mode CS910 Foundations of Data Analytics

Multidimensional Storage Model The DW multidimensional storage model has two table types: Dimension tables and fact tables Fact table: many tuples, 1 per stored fact, pointing to dimensions E.g. sale of an item: which product, which store, which customer Dimension table: tuples of attributes of the dimension E.g. details of the product, of the store, of the customer CS346 Advanced Databases

Data Warehouse Schemas Star schema: fact table with a single table for each dimension Snowflake schema: variation of a star schema Fact tables are arranged hierarchically after normalization CS346 Advanced Databases

Fact constellations Fact constellation: a set of fact tables that share some dimension tables CS346 Advanced Databases

Bitmap Indexes Bitmap indexes used to support high-performance access One of various techniques used in the database Takes the form of a bit vector for each value in a table Set to 1 if a particular value occurs, 0 if it does not Can be quite compact if the domain size is small E.g. 1M rows and domain size of 4: bitmap index size 0.5MB Efficient to check conjunctive conditions: intersect (AND) bitmaps CS346 Advanced Databases

Join indexing A join index connections dimension data to tuples in a fact table Assuming a star schema A join index is a traditional index linking primary and foreign keys Lists all the keys that meet the (equi)join condition e.g. consider a sales fact table that has city as one dimension Join index on city: list of sales tuple ids for each different city Can make a join index as a bitmap index CS346 Advanced Databases

Data Warehouse versus Views Recall views: result of a (stored) query on a database Could achieve warehouse functionality via (materialized) views Data warehouses are more than just views: Warehouses are stored, not materialized on demand Different data model: multidimensional, not relational Data warehouses can be indexed (views cannot) Warehouses support various analysis tasks (mining, time series) Warehouses typically contain more (historic) data than one DB CS346 Advanced Databases

Data Warehouses: Pros and Cons Data warehouses have many strengths for data analysis: Support fast exploration and aggregation of data Designed to handle very large data sets (TBs / billions of records) Software supports analytics (data mining/machine learning) on top Clustering, Regression, Classification, Rule mining However, they have their limitations: A big undertaking: bringing together all an organization’s data Need a thorough understanding of the organizational structure Can be costly to maintain (time-consuming to clean and load data) As underlying data organization changes, so must the warehouse CS910 Foundations of Data Analytics

Summary What is a data warehouse and what is it for? Storing and querying all the data of a large organization The multidimensional data model and common schema designs Roll up, drill down, slice & dice; star and snowflake schemas Special indexes: bitmap and join indexes Chapter: “Overview of Data Warehousing and OLAP” in Elmasri and Navathe CS346 Advanced Databases