1 A Transactional Model for Data Warehouse Maintenance A Transactional Model for Data Warehouse Maintenance Authored by: Jun Chen, Songting Chen, Elke.

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Serializability in Multidatabases Ramon Lawrence Dept. of Computer Science
1 Lecture 11: Transactions: Concurrency. 2 Overview Transactions Concurrency Control Locking Transactions in SQL.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Review for Final Test Indra Budi
Concurrency Control Nate Nystrom CS 632 February 6, 2001.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
Final Exam Coverage. E/R Converting E/R to Relations. SQL. –Joins and outerjoins –Subqueries –Aggregations –Views –Inserts, updates, deletes –Ordering.
ICS 421 Spring 2010 Data Warehousing 3 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/1/20101Lipyeow.
Data Warehouse IMS5024 – presented by Eder Tsang.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Rainbow: XML and Relational Database Design, Implementation, Test, and Evaluation Project Members: Tien Vu, Mirek Cymer, John Lee Advisor:
1 Transaction Management Overview Yanlei Diao UMass Amherst March 15, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
IS 4420 Database Fundamentals Chapter 12: Data and Database Administration Leon Chen.
Experience with K42, an open- source, Linux-compatible, scalable operation-system kernel IBM SYSTEM JOURNAL, VOL 44 NO 2, 2005 J. Appovoo 、 M. Auslander.
Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.
AGGREGATE PATH INDEX FOR INCREMENTL WEB VIEW MAINTENANCE Author: Li Chen and Elke Rundensteiner Department of Computer Science Worcester Polytechnic Institure.
18.7 The Tree Protocol Andy Yang. Outline Introduction Motivation Rules for Access to Tree-Structured Data Why the Tree Protocol Works.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
AN OPTIMISTIC CONCURRENCY CONTROL ALGORITHM FOR MOBILE AD-HOC NETWORK DATABASES Brendan Walker.
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Introduction –All information systems create, read, update and delete data. This data is stored in files and databases. Files are collections of similar.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25, Part B.
1 Transaction Management Overview Chapter Transactions  Concurrent execution of user programs is essential for good DBMS performance.  Because.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 18.
Concurrency control in transactional systems Jinyang Li Some slides adapted from Stonebraker and Madden.
Database System Concepts and Architecture
TM 1 Dr. Chen, Business Database Systems Data Modeling Professor Chen School of Business Administration Gonzaga University Spokane, WA
CS505: Final Exam Review Jinze Liu. Major Topics Before Mid-Term – Security and Access Control – Indexing After Mid-Term – Transaction Management Locking,
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Lecture 21 Ramakrishnan - Chapter 18.
Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Database Systems/COMP4910/Spring05/Melikyan1 Transaction Management Overview Unit 2 Chapter 16.
Module Coordinator Tan Szu Tak School of Information and Communication Technology, Politeknik Brunei Semester
Data Warehousing. Databases support: Transaction Processing Systems –operational level decision –recording of transactions Decision Support Systems –tactical.
View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost.
Data Management for Decision Support Session-3 Prof. Bharat Bhasker.
Transactions. What is it? Transaction - a logical unit of database processing Motivation - want consistent change of state in data Transactions developed.
Module Coordinator Tan Szu Tak School of Information and Communication Technology, Politeknik Brunei Semester
An Overview of Issues in P2P database systems Presented by Ahmed Ataullah Wednesday, November 29 th 2006.
View 1. Lu Chaojun, SJTU 2 View Three-level vision of DB users Virtual DB views DB Designer Logical DB relations DBA DBA Physical DB stored info.
A New Basis for the SQL Isolation Level Standard Atul Adya: Microsoft Research Barbara Liskov: LCS, MIT Patrick O’ Neil: Univ. Of Mass., Boston.
Classification of Weak Correctness Criteria for Real-Time Database Applications Lee, Kyu-Woong and Park, Seog Sogang Univ., Seoul, Korea.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
1 Announcements Reading for next week: Chapter 4 Your first homework will be assigned as soon as your database accounts have been set up.  Expect an .
Transaction Management and Recovery, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 18.
Database Isolation Levels. Reading Database Isolation Levels, lecture notes by Dr. A. Fekete, resentation/AustralianComputer.
11th International Conference on Web-Age Information Management July 15-17, 2010 Jiuzhaigou, China V Locking Protocol for Materialized Aggregate Join Views.
MULTIUSER DATABASES : Concurrency and Transaction Management.
Plan for Populating a DW
Chapter 8: Concurrency Control on Relational Databases
Architetture della Informazione Anno accademico Carlo Batini Methodologies for planning the evolution of data architectures 1.
COMP 430 Intro. to Database Systems
Transaction Management Overview
Transaction Management Overview
Transaction Management Overview
Fundamentals of Databases
View and Index Selection Problem in Data Warehousing Environments
Outline Introduction Background Distributed DBMS Architecture
Data Warehousing and Decision Support
Lecture 22: Intro to Transactions & Logging IV
Transaction Management
Transaction Management Overview
Outline Introduction Background Distributed DBMS Architecture
Transaction Management Overview
Presentation transcript:

1 A Transactional Model for Data Warehouse Maintenance A Transactional Model for Data Warehouse Maintenance Authored by: Jun Chen, Songting Chen, Elke A. Rundensteiner Published in ER’2002, Finland Database Systems Research Group Worcester Polytechnic Institute

2 Data Warehousing Data Warehouse Wrapper... DWMS Wrapper Base Wrapper Base Data Integration from Remote Base Sources Data Integration from Remote Base Sources  Difficult and Labor-Intensive  Better Do it only ONCE and Materialize the Results  Share Materialized Data by Many Applications

3 Data Warehouse Maintenance Motivation: Keep Data Warehouse (DW) Update-to-Date Motivation: Keep Data Warehouse (DW) Update-to-Date  Base Changes over Time  Source Data Updates  insert, delete, update  Source Schema Changes  add, drop, rename  Basic Idea: Incremental instead of Re-computation  Re-computation may take weeks

4 General Maintenance Algorithms View Maintenance (VM) View Maintenance (VM)  Incrementally incorporate source data updates  [BLT86], [GMS93], [ZGH+95], [SBC+00] View Synchronization (VS) View Synchronization (VS)  Rewrite data warehouse view definition after the schema of the source changed of the source changed  [NLR98], [LNR02] View Adaptation (VA) View Adaptation (VA)  Adapt view extent after the view definition changed  [NR99], [GMR+01]

5 DW Maintenance Example CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer C, FlightRes F WHERE C.Name = F.Name AND F.Dest = ‘Asia’; Customer FlightRes View: Asia_Traveller MAEllen WPIDave AddressName DestFlightNoAgeName EuropeUA Steve AsiaAA838422Dave AA8384WPIDave FlightNoAddressName Insert ( ‘Steve’, ‘Boston’) Select FlightNo from FlightRes where Name=‘Steve’

6 Maintenance Anomaly Problem CustomerMAEllen WPIDave AddressName FlightResDestFlightNoAgeName EuropeUA Steve AsiaAA838422Dave View: Asia_TravellerAA8384WPIDave FlightNoAddressName 1. Insert ( ‘Steve’, ‘Boston’) 3. Select FlightNo from FlightRes where Name=‘Steve’ 2. Rename (FlightRes, FlightReservation) Broken Query!

7 Inside Broken Query Two Transactions Two Transactions  Base Update Transaction  w(B i )c(B i )  DW Maintenance Transaction  r(B 1 )r(B 2 )…r(B n )w(DW)c(DW) Read-write conflicts between two transactions Read-write conflicts between two transactions  Two Independent Transactions  w(B i ) / r(B i )  Data Update w(B i ): Incorrect Query Results [ZGH+95]  Schema Change w(B i ): Broken Query

8 A Transactional Approach A Global Transaction Model A Global Transaction Model  DWMS_Transaction  Integrates both base update transaction and its corresponding DW maintenance transaction  w(B i )c(B i )r(B 1 )r(B 2 )…r(B n )w(DW)c(DW) Maintenance Anomaly Maintenance Anomaly  Rephrased to read-write conflicts of DWMS_Transactions  w(B i )c(B i )r(B 1 )r(B 2 )…r(B j )…r(B n )w(DW)c(DW)  w(B j )c(B j )r(B 1 )r(B 2 )…r(B n )w(DW)c(DW)

9 Serializability of DWMS_Transaction Theorem Theorem  A history of DWMS_Transactions S is serializable iff it is equivalent to some serial schedule S’ of the iff it is equivalent to some serial schedule S’ of the same DWMS_Transactions. same DWMS_Transactions. Basis for Solving Anomaly Problems Basis for Solving Anomaly Problems  To solve the anomaly problem, we need all DWMS_Transactions serializable.

10 Traditional Serializability Algorithms Lock-based Lock-based  Reads / writes acquire locks for access to shared resources  Transactions block each other Multiversion-based Multiversion-based  Write on a version, read on another version  Transactions do not block each other

11 Traditional Serializability Algorithms Lock-based Lock-based  Read / write would need to lock data in sources?  Not desirable in DW environment  Data sources are autonomous  Not realistic to impose locking on them Multiversion-based Multiversion-based  Do not block each other  Desirable in DW environment  DW and data sources do not block each other  Need to maintain versions somewhere

12 TxnWrap: A Multiversion Algorithm CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer C, FlightRes F WHERE C.Name = F.Name AND F.Dest = ‘Asia’; CustomerMAEllen WPIDave AddressName FlightResDestFlightNoAgeName EuropeUA Steve AsiaAA838422Dave View: Asia_TravellerAA8384WPIDave FlightNoAddressName CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer’ C,FlightRes’ F WHERE C.Name = F.Name AND F.Dest = ‘Asia’; Wrapper FlightRes’ Meta Relation ………… ………… …… NameFli’ D.F.A.N. Wrapper Customer’ Meta Relation MAEllen WPIDave AddressName AddressCust’ NameCust’ AttrRel AttrRel

13 Versioned Wrapper Semantics: life time of a tuple is #born <= time < #dead Wrapper for CustomerNameAddress#born#deadDaveWPI0 EllenMA0 Relation Customer’RelAttrRel’Attr’#born#deadC’Name--0 C’Addr.--0 Meta Relation

14 Source Updates on Versioned Wrapper Transcation 2: Drop Customer.Address; Relation Customer’ (Init) Transaction1: 1. DELETE FROM Customer C WHERE C.Name = ‘Dave’; 2. INSERT (‘Steve’, ‘Boston’); MA WPIAddress0 0#bornEllen Dave#deadName Relation Customer’ (state 1 ) 0MAEllen 1BostonSteve WPI Address 0 #born 1Dave #deadName Relation Customer’ (state 2 )0MAEllen 1BostonStove WPI Address 0 #born 1Dave #deadName Meta Relation (state 2 ) - - Rel’ - - Attr’ 20Addr.C’ 0NameC’ #dead#bornAttrRel

15 DW Maintenance Query Rewritten for Versioned Wrapper The maintenance query issued in Transaction2: SELECT Name, Address FROM Customer WHERE condition; Rewritten versioned maintenance query: SELECT Name, Address FROM Customer’ WHERE condition and #born 2; Relation Customer’ (State 1 )0MAEllen 1BostonStove WPI Address 0 #born 1Dave #deadName

16 Performance Evaluation Implementation Implementation  In Java  Platform: Oracle, JDBC on Windows NT  Embedded in DyDa [CCZ+01] System at WPI Testbed Testbed  6 data sources with one relation each  Each relation has 4 attributes and 100,000 tuples  One materialized joined view over these data sources  TxnWrap VS. compensation (SWEEP [AAS+97] & DyDa)

17 Data Update Processing

18 Schema Change Processing

19 Related Work View Maintenance View Maintenance  View Maintenance / Synchronization / Adaptation Maintenance Anomaly Maintenance Anomaly  ECA [ZGH+95], SWEEP [AAS+97] handles only concurrent data updates concurrent data updates  Compensation-based  Performance degrades at a high load Multi-version Algorithms Multi-version Algorithms  2-version, n-version, unlimited-version algorithms [MPL92]

20 Conclusions Identify the Maintenance Anomaly Problem in mixed model environment Identify the Maintenance Anomaly Problem in mixed model environment Design a global Transaction DWMS_Transaction model that integrates both source update transaction and maintenance transaction. Design a global Transaction DWMS_Transaction model that integrates both source update transaction and maintenance transaction. Rephrase the maintenance anomaly in terms of Rephrase the maintenance anomaly in terms of serializability of DWMS_Transactions serializability of DWMS_Transactions Propose multiversion algorithm to achieve serializability Propose multiversion algorithm to achieve serializability Implemented the maintenance solution in Dyda Implemented the maintenance solution in Dyda Achieve stable performance under various workloads Achieve stable performance under various workloads

21 Other Activities and Future Work Batching of updates into more complex maintenance plans Batching of updates into more complex maintenance plans Parallelism of maintenance processes Parallelism of maintenance processes Support more complex views, e.g., aggregation Support more complex views, e.g., aggregation Generalize to more change types Generalize to more change types Provide alternate view synchronization algorithms Provide alternate view synchronization algorithms Discovery of changes by non-cooperating sources Discovery of changes by non-cooperating sources Discovery of meta data in terms of source relationships of distributed sources Discovery of meta data in terms of source relationships of distributed sources Move beyond relational middle-layer model Move beyond relational middle-layer model

22 Questions?