Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.

Similar presentations


Presentation on theme: "Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561."— Presentation transcript:

1 Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561

2 What is a Data Warehouse? Repository of integrated information As information becomes available from a source it is added to the repository

3 What is a View? A function from a set of base tables to a derived table Can be recreated every time the view is accessed

4 What’s a Materialized View A view where the tuples are stored in a database (or warehouse) Can create indexes on them Provides fast data access Similar to a cache

5 What’s View Maintenance View data becomes out of date when base tables are changed Updating the view to reflect these changes is called view maintenance

6 Sounds Easy Right!

7

8 Here’s Why... Data sources are typically legacy systems and do not understand views Sources can tell the warehouse there is new data, but they cannot determine if any additional data is needed

9 Examples Ideal World – new record is added to base relation and view is notified and updated The Real World –Maintenance Anomaly – trying to update a view while the underlying data is changing Update Anomaly Deletion Anomaly

10 The (Possible) Solutions Recompute the view Store all relations involved in the warehouse Eager Compensating Algorithm (ECA)

11 Recompute the View When? –Whenever an update occurs –At a periodic interval Time and Resource intensive especially in a distributed environment (transferring of data from one source to the other)

12 Storing Base Relations Keep up-to-date copies of all relations in the warehouse, queries can be evaluated locally and no anomalies occur Takes up extra space in the warehouse, storing duplicate data Copied relations still need to be updated

13 Eager Compensating Algorithm Most promising solution –No duplicating base relations or recomputing overheads All queries sent have compensating queries added to them to offset concurrent updates to the source data

14 ECA cont... Strongly Consistent –Upon competition of activity, view is consistent with base relations –Every View state has a corresponding state in the base relations and they are completed in order Not complete –Every source state may not be reflected in a view state (direct mapping)

15 How ECA Works - 4 basic events 1. Source executes an update (U) and notification is sent to the warehouse 2. Warehouse receives update (U) and creates query (Q) to be evaluated by the source 3. Source evaluates query (Q) against base relations and sends answer (A) to warehouse 4. Warehouse receives query result and updates view

16 Resolving Anomalies Two Updates: Query1 is assumed to be computed before Update2 but is actually computed after Update2 –ECA knows that is happens and takes Update2 into account when Updating the view by using a compensating query for each query it creates

17 Resolving Issues When using compensating queries, we should not apply the results until after all related queries have been received If updates occurred after each query the view could temporally be in an invalid state To avoid invalid states ECA collects the intermediate answers in a relation called Collect (initialized to empty set)

18 Example Three insertions in to three base relations and its affect on the view that references them

19 ECA-Key Used to streamline the algorithm when a key from the base relations are available in the View The Collect relation is initialized to current View and becomes a working copy of the View

20 ECA-Key Algorithms Delete received, no query sent, delete is directly applied to Collect Insert received, query sent, no compensating queries created, answers are added to Collect and duplicate values are ignored because of the keys Once completed the tuples in Collect replace the tuples for the View

21 ECA - Local Combines the compensating queries of ECA and the local updates of ECA-Key to create a more streamlined query Maintaining order of execution of local and non-local processes is complicated and will create a greater over head then other algorithms Future work needs to be done to see if this is a worthwhile approach

22 Performance Comparison Total Bytes Transferred vs. Cardinality of Relation Total Bytes Transferred vs. # of Source Updates

23 Review of ECA Incremental updating approach, it doesn’t start from scratch every time No additional burden placed on sources (timestamps or locks) Compensating queries are only used when more then update is occurring, keeping computation costs low


Download ppt "Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561."

Similar presentations


Ads by Google