Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25, Part B.

Slides:



Advertisements
Similar presentations
Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
ICS 421 Spring 2010 Data Warehousing 3 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/1/20101Lipyeow.
Introduction to Structured Query Language (SQL)
CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Decision Support Chapter 25.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. SQL - part 2 - Database Management Systems I Alex Coman, Winter 2006.
Database Systems More SQL Database Design -- More SQL1.
Introduction to Structured Query Language (SQL)
CS 603 Data Replication in Oracle February 27, 2002.
Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 157 Database Systems I SQL Constraints and Triggers.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Data Cube Computation Model dependencies among the aggregates: most detailed “view” can be computed from view (product,store,quarter) by summing-up all.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
CPSC 404, Laks V.S. Lakshmanan 1 Data Warehousing & OLAP Chapter 25, Ramakrishnan & Gehrke (Sections )
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
SCUHolliday - coen 1789–1 Schedule Today: u Constraints, assertions, triggers u Read Sections , 7.4. Next u Triggers, PL/SQL, embedded SQL, JDBC.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
CODD’s 12 RULES OF RELATIONAL DATABASE
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Part Two: - The use of views. 1. Topics What is a View? Why Views are useful in Data Warehousing? Understand Materialised Views Understand View Maintenance.
Winter 2006Keller, Ullman, Cushing9–1 Constraints Commercial relational systems allow much more “fine-tuning” of constraints than do the modeling languages.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
Data Warehouse & OLAP Kuliah 1 Introduction Slide banyak mengambil dari acuan- acuan yang dipakai.
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Views In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in the database.) In some.
Chapter No 4 Query optimization and Data Integrity & Security.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Triggers. Why Triggers ? Suppose a warehouse wishes to maintain a minimum inventory of each item. Number of items kept in items table Items(name, number,...)
View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Methodology – Physical Database Design for Relational Databases.
Constraints and Triggers. What’s IC? Integrity Constraints define the valid states of SQL-data by constraining the values in the base tables. –Restrictions.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
View 1. Lu Chaojun, SJTU 2 View Three-level vision of DB users Virtual DB views DB Designer Logical DB relations DBA DBA Physical DB stored info.
Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
OLAP & Data Warehousing. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
SCALING AND PERFORMANCE CS 260 Database Systems. Overview  Increasing capacity  Database performance  Database indexes B+ Tree Index Bitmap Index 
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
1 CS 430 Database Theory Winter 2005 Lecture 7: Designing a Database Logical Level.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
11th International Conference on Web-Age Information Management July 15-17, 2010 Jiuzhaigou, China V Locking Protocol for Materialized Aggregate Join Views.
1 Introduction to Database Systems, CS420 SQL Views and Indexes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
SQL Basics Review Reviewing what we’ve learned so far…….
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
Lecture 6- Query Optimization (continued)
Plan for Populating a DW
Pertemuan <<13>> Data Warehousing dan Decision Support
Views and Security views and security.
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
CS 440 Database Management Systems
Data Warehousing and Decision Support
One-Pass Algorithms for Database Operations (15.2)
CUBE MATERIALIZATION E0 261 Jayant Haritsa
Views 1.
Presentation transcript:

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25, Part B

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke2 Views and Decision Support  OLAP queries are typically aggregate queries.  Precomputation is essential for interactive response times.  The CUBE is in fact a collection of aggregate queries, and precomputation is especially important: lots of work on what is best to precompute given a limited amount of space to store precomputed results.  Warehouses can be thought of as a collection of asynchronously replicated tables and periodically maintained views.  Has renewed interest in view maintenance!

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke3 View Modification (Evaluate On Demand) CREATE VIEW RegionalSales(category,sales,state) AS SELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locid SELECT R.category, R.state, SUM (R.sales) FROM RegionalSales AS R GROUP BY R.category, R.state SELECT R.category, R.state, SUM (R.sales) FROM ( SELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locid) AS R GROUP BY R.category, R.state View Query Modified Query

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke4 View Materialization (Precomputation)  Suppose we precompute RegionalSales and store it with a clustered B+ tree index on [category,state,sales].  Then, previous query can be answered by an index- only scan. SELECT R.state, SUM (R.sales) FROM RegionalSales R WHERE R.category=“Laptop” GROUP BY R.state SELECT R.state, SUM (R.sales) FROM RegionalSales R WHERE R. state=“Wisconsin” GROUP BY R.category Index on precomputed view is great! Index is less useful (must scan entire leaf level).

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke5 Materialized Views  A view whose tuples are stored in the database is said to be materialized.  Provides fast access, like a (very high-level) cache.  Need to maintain the view as the underlying tables change.  Ideally, we want incremental view maintenance algorithms.  Close relationship to data warehousing, OLAP, (asynchronously) maintaining distributed databases, checking integrity constraints, and evaluating rules and triggers.

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke6 Issues in View Materialization  What views should we materialize, and what indexes should we build on the precomputed results?  Given a query and a set of materialized views, can we use the materialized views to answer the query?  How frequently should we refresh materialized views to make them consistent with the underlying tables? (And how can we do this incrementally?)

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke7 View Maintenance  Two steps:  Propagate: Compute changes to view when data changes.  Refresh: Apply changes to the materialized view table.  Maintenance policy: Controls when we do refresh.  Immediate: As part of the transaction that modifies the underlying data tables. ( + Materialized view is always consistent; - updates are slowed)  Deferred: Some time later, in a separate transaction. ( - View becomes inconsistent; + can scale to maintain many views without slowing updates)

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke8 Deferred Maintenance  Three flavors:  Lazy: Delay refresh until next query on view; then refresh before answering the query.  Periodic (Snapshot): Refresh periodically. Queries possibly answered using outdated version of view tuples. Widely used, especially for asynchronous replication in distributed databases, and for warehouse applications.  Event-based: E.g., Refresh after a fixed number of updates to underlying data tables.

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke9 Snapshots in Oracle 7  A snapshot is a local materialization of a view on data stored at a master site.  Periodically refreshed by re-computing view entirely.  Incremental “fast refresh” for “simple snapshots” (each row in view based on single row in a single underlying data table; no DISTINCT, GROUP BY, or aggregate ops; no sub-queries, joins, or set ops) Changes to master recorded in a log by a trigger to support this.

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke10 Issues in View Maintenance (1)  What information is available? (Base relations, materialized view, ICs). Suppose parts(p5,5000) is inserted:  Only materialized view available: Add p5 if it isn’t there.  Parts table is available: If there isn’t already a parts tuple p5 with cost >1000, add p5 to view. May not be available if the view is in a data warehouse!  If we know pno is key for parts: Can infer that p5 is not already in view, must insert it. expensive_parts(pno) :- parts(pno, cost), cost > 1000

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke11 Issues in View Maintenance (2)  What changes are propagated? (Inserts, deletes, updates). Suppose parts(p1,3000) is deleted:  Only materialized view available: If p1 is in view, no way to tell whether to delete it. (Why?) If count(#derivations) is maintained for each view tuple, can tell whether to delete p1 (decrement count and delete if = 0).  Parts table is available: If there is no other tuple p1 with cost >1000 in parts, delete p1 from view.  If we know pno is key for parts: Can infer that p1 is currently in view, and must be deleted. expensive_parts(pno) :- parts(pno, cost), cost > 1000

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke12 Issues in View Maintenance (3)  View definition language? (Conjunctive queries, SQL subset, duplicates, aggregates, recursion)  Suppose parts(p5,5000) is inserted:  Can’t tell whether to insert p5 into view if we’re only given the materialized view. Supp_parts(pno) :- suppliers(sno, pno), parts(pno, cost)

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke13 Incremental Maintenance Alg: One Rule, Inserts  Step 0: For each tuple in the materialized view, store a “derivation count”.  Step 1: Rewrite this rule using Seminaive rewriting, set “delta_old” relations for Rel1 and Rel2 to be the inserted tuples.  Step 2: Compute the “delta_new” relations for the view relation.  Important: Don’t remove duplicates! For each new tuple, maintain a “derivation count”.  Step 3: Refresh the stored view by doing “multiset union” of the new and old view tuples. (I.e., update the derivation counts of existing tuples, and add the new tuples that weren’t in the view earlier.) View(X,Y) :- Rel1(X,Z), Rel2(Z,Y)

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke14 Incremental Maintenance Alg: One Rule, Deletes  Steps 0 - 2: As for inserts.  Step 3: Refresh the stored view by doing “multiset difference ” of the new and old view tuples.  To update the derivation counts of existing tuples, we must now subtract the derivation counts of the new tuples from the counts of existing tuples. View(X,Y) :- Rel1(X,Z), Rel2(Z,Y)

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke15 Incremental Maintenance Alg: General  The “counting” algorithm can be generalized to views defined by multiple rules. In fact, it can be generalized to SQL queries with duplicate semantics, negation, and aggregation.  Try and do this! The extension is straightforward.

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke16 Maintaining Warehouse Views  Main twist: The views are in the data warehouse, and the source tables are somewhere else (operational DBMS, legacy sources, …). 1)Warehouse is notified whenever source tables are updated. (e.g., when a tuple is added to r2) 2)Warehouse may need additional information about source tables to process the update (e.g., what is in r1 currently?) 3)The source responds with the additional info, and the warehouse incrementally refreshes the view. view(sno) :- r1(sno, pno), r2(pno, cost) Problem: New source updates between Steps 1 and 3!

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke17 Example of Warehouse View Maint.  Initially, we have r1(1,2), r2 empty  insert r2(2,3) at source; notify warehouse  Warehouse asks ?r1(sno,2)  Checking to find sno’s to insert into view  insert r1(4,2) at source; notify warehouse  Warehouse asks ?r2(2,cost)  Checking to see if we need to increment count for view(4)  Source gets first warehouse query, and returns sno=1, sno=4; these values go into view (with derivation counts of 1 each)  Source gets second query, and says Yes, so count for 4 is incremented in the view  But this is wrong! Correct count for view(4) is 1. view(sno) :- r1(sno, pno), r2(pno, cost)

Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke18 Warehouse View Maintenance  Alternative 1: Evaluate view from scratch  On every source update, or periodically  Alternative 2: Maintain a copy of each source table at warehouse  Alternative 3: More fancy algorithms  Generate queries to the source that take into account the anomalies due to earlier conflicting updates.