Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Transactional Model for Data Warehouse Maintenance A Transactional Model for Data Warehouse Maintenance Authored by: Jun Chen, Songting Chen, Elke.

Similar presentations


Presentation on theme: "1 A Transactional Model for Data Warehouse Maintenance A Transactional Model for Data Warehouse Maintenance Authored by: Jun Chen, Songting Chen, Elke."— Presentation transcript:

1 1 A Transactional Model for Data Warehouse Maintenance A Transactional Model for Data Warehouse Maintenance Authored by: Jun Chen, Songting Chen, Elke A. Rundensteiner Published in ER’2002, Finland Database Systems Research Group Worcester Polytechnic Institute

2 2 Data Warehousing Data Warehouse Wrapper... DWMS Wrapper Base Wrapper Base Data Integration from Remote Base Sources Data Integration from Remote Base Sources  Difficult and Labor-Intensive  Better Do it only ONCE and Materialize the Results  Share Materialized Data by Many Applications

3 3 Data Warehouse Maintenance Motivation: Keep Data Warehouse (DW) Update-to-Date Motivation: Keep Data Warehouse (DW) Update-to-Date  Base Changes over Time  Source Data Updates  insert, delete, update  Source Schema Changes  add, drop, rename  Basic Idea: Incremental instead of Re-computation  Re-computation may take weeks

4 4 General Maintenance Algorithms View Maintenance (VM) View Maintenance (VM)  Incrementally incorporate source data updates  [BLT86], [GMS93], [ZGH+95], [SBC+00] View Synchronization (VS) View Synchronization (VS)  Rewrite data warehouse view definition after the schema of the source changed of the source changed  [NLR98], [LNR02] View Adaptation (VA) View Adaptation (VA)  Adapt view extent after the view definition changed  [NR99], [GMR+01]

5 5 DW Maintenance Example CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer C, FlightRes F WHERE C.Name = F.Name AND F.Dest = ‘Asia’; Customer FlightRes View: Asia_Traveller MAEllen WPIDave AddressName DestFlightNoAgeName EuropeUA Steve AsiaAA838422Dave AA8384WPIDave FlightNoAddressName Insert ( ‘Steve’, ‘Boston’) Select FlightNo from FlightRes where Name=‘Steve’

6 6 Maintenance Anomaly Problem CustomerMAEllen WPIDave AddressName FlightResDestFlightNoAgeName EuropeUA Steve AsiaAA838422Dave View: Asia_TravellerAA8384WPIDave FlightNoAddressName 1. Insert ( ‘Steve’, ‘Boston’) 3. Select FlightNo from FlightRes where Name=‘Steve’ 2. Rename (FlightRes, FlightReservation) Broken Query!

7 7 Inside Broken Query Two Transactions Two Transactions  Base Update Transaction  w(B i )c(B i )  DW Maintenance Transaction  r(B 1 )r(B 2 )…r(B n )w(DW)c(DW) Read-write conflicts between two transactions Read-write conflicts between two transactions  Two Independent Transactions  w(B i ) / r(B i )  Data Update w(B i ): Incorrect Query Results [ZGH+95]  Schema Change w(B i ): Broken Query

8 8 A Transactional Approach A Global Transaction Model A Global Transaction Model  DWMS_Transaction  Integrates both base update transaction and its corresponding DW maintenance transaction  w(B i )c(B i )r(B 1 )r(B 2 )…r(B n )w(DW)c(DW) Maintenance Anomaly Maintenance Anomaly  Rephrased to read-write conflicts of DWMS_Transactions  w(B i )c(B i )r(B 1 )r(B 2 )…r(B j )…r(B n )w(DW)c(DW)  w(B j )c(B j )r(B 1 )r(B 2 )…r(B n )w(DW)c(DW)

9 9 Serializability of DWMS_Transaction Theorem Theorem  A history of DWMS_Transactions S is serializable iff it is equivalent to some serial schedule S’ of the iff it is equivalent to some serial schedule S’ of the same DWMS_Transactions. same DWMS_Transactions. Basis for Solving Anomaly Problems Basis for Solving Anomaly Problems  To solve the anomaly problem, we need all DWMS_Transactions serializable.

10 10 Traditional Serializability Algorithms Lock-based Lock-based  Reads / writes acquire locks for access to shared resources  Transactions block each other Multiversion-based Multiversion-based  Write on a version, read on another version  Transactions do not block each other

11 11 Traditional Serializability Algorithms Lock-based Lock-based  Read / write would need to lock data in sources?  Not desirable in DW environment  Data sources are autonomous  Not realistic to impose locking on them Multiversion-based Multiversion-based  Do not block each other  Desirable in DW environment  DW and data sources do not block each other  Need to maintain versions somewhere

12 12 TxnWrap: A Multiversion Algorithm CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer C, FlightRes F WHERE C.Name = F.Name AND F.Dest = ‘Asia’; CustomerMAEllen WPIDave AddressName FlightResDestFlightNoAgeName EuropeUA Steve AsiaAA838422Dave View: Asia_TravellerAA8384WPIDave FlightNoAddressName CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer’ C,FlightRes’ F WHERE C.Name = F.Name AND F.Dest = ‘Asia’; Wrapper FlightRes’ Meta Relation ………… ………… …… NameFli’ D.F.A.N. Wrapper Customer’ Meta Relation MAEllen WPIDave AddressName AddressCust’ NameCust’ AttrRel AttrRel

13 13 Versioned Wrapper Semantics: life time of a tuple is #born <= time < #dead Wrapper for CustomerNameAddress#born#deadDaveWPI0 EllenMA0 Relation Customer’RelAttrRel’Attr’#born#deadC’Name--0 C’Addr.--0 Meta Relation

14 14 Source Updates on Versioned Wrapper Transcation 2: Drop Customer.Address; Relation Customer’ (Init) Transaction1: 1. DELETE FROM Customer C WHERE C.Name = ‘Dave’; 2. INSERT (‘Steve’, ‘Boston’); MA WPIAddress0 0#bornEllen Dave#deadName Relation Customer’ (state 1 ) 0MAEllen 1BostonSteve WPI Address 0 #born 1Dave #deadName Relation Customer’ (state 2 )0MAEllen 1BostonStove WPI Address 0 #born 1Dave #deadName Meta Relation (state 2 ) - - Rel’ - - Attr’ 20Addr.C’ 0NameC’ #dead#bornAttrRel

15 15 DW Maintenance Query Rewritten for Versioned Wrapper The maintenance query issued in Transaction2: SELECT Name, Address FROM Customer WHERE condition; Rewritten versioned maintenance query: SELECT Name, Address FROM Customer’ WHERE condition and #born 2; Relation Customer’ (State 1 )0MAEllen 1BostonStove WPI Address 0 #born 1Dave #deadName

16 16 Performance Evaluation Implementation Implementation  In Java  Platform: Oracle, JDBC on Windows NT  Embedded in DyDa [CCZ+01] System at WPI Testbed Testbed  6 data sources with one relation each  Each relation has 4 attributes and 100,000 tuples  One materialized joined view over these data sources  TxnWrap VS. compensation (SWEEP [AAS+97] & DyDa)

17 17 Data Update Processing

18 18 Schema Change Processing

19 19 Related Work View Maintenance View Maintenance  View Maintenance / Synchronization / Adaptation Maintenance Anomaly Maintenance Anomaly  ECA [ZGH+95], SWEEP [AAS+97] handles only concurrent data updates concurrent data updates  Compensation-based  Performance degrades at a high load Multi-version Algorithms Multi-version Algorithms  2-version, n-version, unlimited-version algorithms [MPL92]

20 20 Conclusions Identify the Maintenance Anomaly Problem in mixed model environment Identify the Maintenance Anomaly Problem in mixed model environment Design a global Transaction DWMS_Transaction model that integrates both source update transaction and maintenance transaction. Design a global Transaction DWMS_Transaction model that integrates both source update transaction and maintenance transaction. Rephrase the maintenance anomaly in terms of Rephrase the maintenance anomaly in terms of serializability of DWMS_Transactions serializability of DWMS_Transactions Propose multiversion algorithm to achieve serializability Propose multiversion algorithm to achieve serializability Implemented the maintenance solution in Dyda Implemented the maintenance solution in Dyda Achieve stable performance under various workloads Achieve stable performance under various workloads

21 21 Other Activities and Future Work Batching of updates into more complex maintenance plans Batching of updates into more complex maintenance plans Parallelism of maintenance processes Parallelism of maintenance processes Support more complex views, e.g., aggregation Support more complex views, e.g., aggregation Generalize to more change types Generalize to more change types Provide alternate view synchronization algorithms Provide alternate view synchronization algorithms Discovery of changes by non-cooperating sources Discovery of changes by non-cooperating sources Discovery of meta data in terms of source relationships of distributed sources Discovery of meta data in terms of source relationships of distributed sources Move beyond relational middle-layer model Move beyond relational middle-layer model

22 22 Questions?


Download ppt "1 A Transactional Model for Data Warehouse Maintenance A Transactional Model for Data Warehouse Maintenance Authored by: Jun Chen, Songting Chen, Elke."

Similar presentations


Ads by Google