Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tracing the Lineage of View Data in a Warehousing Environment Seminar : “Digital Information Curation“ Winterterm 2005/2006 Siniša Avramović

Similar presentations


Presentation on theme: "Tracing the Lineage of View Data in a Warehousing Environment Seminar : “Digital Information Curation“ Winterterm 2005/2006 Siniša Avramović"— Presentation transcript:

1 Tracing the Lineage of View Data in a Warehousing Environment Seminar : “Digital Information Curation“ Winterterm 2005/2006 Siniša Avramović avramovi@inf.uni-konstanz.de

2 Outline - The Lineage Problem : Introduction and Definition -Motivating Examples -Basic Assumptions and Definitions -Derivation Tracing Algorithms for SPJ and ASPJ Views

3 Outline II -Derivation Tracing in a DWH Environment * handling intermediate results (Materialization vs. Recomputing) * The Multisource Problem * Example :DWH with Derivation Tracing Support - Discussion

4 Introduction Data Warehouse : integration of data from multiple, heterogeneous, distributed, and possibly very large data sources. Queries posed to the DW : - for analytical purposes (OLAP,OLAM) - often of high complexity (join, aggregation) Materialized Views : intermediate results of query processing stored in DW for query efficiency improvement.

5 Introduction II In general : view definition - mapping from base data to view data

6 Introduction III In general : view definition - mapping from base data to view data View definition

7 Introduction II View Data Lineage : inverse mapping from view of data item to the base relation data that produced it View data lineage * In this presentation, we deal with SPJ and ASPJ views

8 Introduction III Unfortunately : inverse mapping not as straightforward as view computing. Here not only view definition needed, but also some additional information required. Data Warehousing Environment introduces some additional challenges to the lineage problem. Questions to be Answered : * how to trace lineage in a distributed database environment * how to deal with inaccessible or inconsistent sources

9 Motivating Examples σ state=‘ca‘ π store_name, item_name, num_sold California item(item_id,item_name,category); store(store_id,store_name,city,state); sales(store_id,item_id,price,num_sold); CREATE VIEW California AS SELECT store.store_name,item.item_name, sales.num_sold FROM store,item,sales WHERE sales.store_id = store.store_id AND sales.item_id = item.item_id AND store.sate = “CA“

10 Motivating Examples store_idstore_namecitystate 001TargetPACA 002TargetALNY 003Macy‘sSFCA 004Macy‘sNY Item_idItem_nameCategory 0001Binderstationery 0002Pencilstationery 0003Shirtclothing 0004Pantsclothing 0005Potkitchenware store_idItem_idpricenum_sold 001000141000 001000213000 001000430600 00200015800 002000222000 002000435800 store_nameitem_nameNum_sold TargetBinder1000 TargetPencil3000 TargetPants600 Macy‘sShirt1500 Macy‘spants600 item tablestore table sales tablecalifornia view

11 Motivating Examples store_idstore_namecitystate 001TargetPACA 002TargetALNY 003Macy‘sSFCA 004Macy‘sNY Item_idItem_nameCategory 0001Binderstationery 0002Pencilstationery 0003Shirtclothing 0004Pantsclothing 0005Potkitchenware store_idItem_idpricenum_sold 001000141000 001000213000 001000430600 00200015800 002000222000 002000435800 store_nameitem_nameNum_sold TargetBinder1000 TargetPencil3000 TargetPants600 Macy‘sShirt1500 Macy‘spants600 item tablestore table sales tablecalifornia view

12 Motivating Examples store_idstore_namecitystate 001TargetPACA 002TargetALNY 003Macy‘sSFCA 004Macy‘sNY Item_idItem_nameCategory 0001Binderstationery 0002Pencilstationery 0003Shirtclothing 0004Pantsclothing 0005Potkitchenware store_idItem_idpricenum_sold 001000141000 001000213000 001000430600 00200015800 002000222000 002000435800 store_nameitem_nameNum_sold TargetBinder1000 TargetPencil3000 TargetPants600 Macy‘sShirt1500 Macy‘spants600 item tablestore table sales tablecalifornia view

13 Motivating Examples store_idstore_namecitystate 001TargetPACA 002TargetALNY 003Macy‘sSFCA 004Macy‘sNY Item_idItem_nameCategory 0001Binderstationery 0002Pencilstationery 0003Shirtclothing 0004Pantsclothing 0005Potkitchenware store_idItem_idpricenum_sold 001000141000 001000213000 001000430600 00200015800 002000222000 002000435800 store_nameitem_nameNum_sold TargetBinder1000 TargetPencil3000 TargetPants600 Macy‘sShirt1500 Macy‘spants600 item tablestore table sales tablecalifornia view

14 Motivating Examples store_idstore_namecitystate 001TargetPACA 002TargetALNY 003Macy‘sSFCA 004Macy‘sNY Item_idItem_nameCategory 0001Binderstationery 0002Pencilstationery 0003Shirtclothing 0004Pantsclothing 0005Potkitchenware store_idItem_idpricenum_sold 001000141000 001000213000 001000430600 00200015800 002000222000 002000435800 store_nameitem_nameNum_sold TargetBinder1000 TargetPencil3000 TargetPants600 Macy‘sShirt1500 Macy‘spants600 item tablestore table sales tablecalifornia view

15 Basic Assumptions Class of views defined over base relations using relation algebra operation ( σ,π,, ⍺ ) considered  SPJ and ASPJ Views Set Semantics considered Basic Definitions Derivation of view tupels : tupels of base relations which produce the View tupel. Those tupel said to contribute to t* * t : materialized view tupel

16 Basic Definitions: Tupel Derivation for an Operator σ state=‘ca‘ π store_name, item_name, num_sold California T1T2T3 - OP : any relational operator ( σ,π, ⋈, ⍺ ) over T1,T2,T3 -T=Op(T1,..,T3); -t ∊ T ; t‘ s Derivation in T1…T3 : Op -1 〈T1,..,T3〉 (t)=〈T 1 *,…. T 3 *〉 ; T 1 *,…. T 3 * maximal subsets of T1,…,T3 such that: a) the derivation tupel sets T i derive exactly t b) Each tuple in derivation in fact contribute something to t

17 Basic Definitions: Tupel Derivation for Aggregation

18

19 Basic Definitions: Tupel Derivation for Views Derivation of view tuple set T contains all tuples that contribute to any view tuple in set T Any View V with complex definition can be broken into intermediate views, and compute tuples derivation recursively by tracing the hierarchy of intermediate views.

20 SPJ View Derivation Tracing Derivation tracing queries queries written for specific view definition, such that if applied to database D, it return t‘s derivation in D. Given database D, with base relations R 1,…R m ; view definition v over R 1,…R m and tuple t ∊ v ( R 1,…R m ); TQ tv is the derivation tracing for t and v if : TQ tv (D)=v -1 D (t) * v -1 D (t) : t‘s derivation over D according to v

21 SPJ View Derivation Tracing SPJ canonical form : using a sequence of algebraic transformations- all SPJ views can be transformed into the form : ∏ A (σ C ( R 1 ⋈ …. ⋈ R m )) Split Operator : given table T with schema T, Split breaks T into list of tables, each table in the list is projection onto A ⊆ T

22 SPJ View Derivation Tracing store_idstore_namecitystate 001TargetPACA 002TargetALNY 003Macy‘sSFCA 004Macy‘sNY store_nameitem_nameNum_sold TargetBinder1000 TargetPencil3000 TargetPants600 store_idItem_idpricenum_sold 001000141000 001000213000 001000430600 00200015800 Item_idItem_nameCategory 0001Binderstationery 0002Pencilstationery 0003Shirtclothing California View Store Table Sales Table Item Table

23 SPJ View Derivation Tracing Given Database D, with Relations R 1,…R m, and : v (D)= ∏ A (σ C ( R 1 ⋈ …. ⋈ R m )). Derivation of t : For T ⊆ v (D) :

24 SPJ View Derivation Tracing : Optimization -Optimization by pushing selection conditions below the join operator.

25 ASPJ View Derivation Tracing Clothing Additional

26 ASPJ View Derivation Tracing Problems with ASPJ Views : - most ASPJ Views not traceable without storing additional information (intermediate results). - no simple canonical form, since selection operators cannot be pushed above or below aggregation operators. Basic Idea : ASPJ canonical form transform general ASPJ views into sequences of ⍺, π, σ, ⋈ by commuting some SPJ operators ASPJ Segment. Each segment include aggregation operators. View defined by one ASPJ segment – one-level ASPJ view.

27 ASPJ View Derivation Tracing Given a one–level ASPJ view V : and t ∊ T ; t‘s derivation :

28 Recursive Derivation Tracing Algorithm for Multi-Level ASPJ Views Problem : multiple queries required for tracing computations. Idea : - transfer the view into ASPJ canonical form -Divide into set of ASPJ segments -Define intermediate views for each segment

29 Recursive Derivation Tracing Algorithm for Multi-Level ASPJ Views Problem : multiple queries required for tracing computations. Idea : -Trace recursively trough hierarchy of intermediate Views top-down list -Use at each level a one-level ASPJ query to compute derivations -Concatenate local results to form derivation of whole view tuple list

30 Derivation Tracing in a Warehousing Environment Until now, always assumed that all base relations and intermediate results accessible for tracing computations. By looking at DWH as integration of heterogeneous and distributed data sources, following problems may arise: - Efficiency problem : querying remote sources with join and aggregation operation, recomputing intermediate results for each query very inefficient in the distributed environment. - Consistence problem : view refreshing and view maintenance problems - Legacy sources : Views defined on inaccessible, legacy sources not traceable

31 Derivation Tracing in a Warehousing Environment Solutions to the intermediate aggregation results problem: The “Recomputation Approach“: -Recompute the intermediate results for each query. -No permanent extra storage, tracing process takes much longer, especially in distributed environments. The “Materialized View“ Approach -Maintain auxiliary materialized views with intermediate results in the DWH. -Less computation required but.. -extra storage and maintenance costs

32 Storing Derivation Views in a Multi-Source DWH Various strategies for storing auxiliary views : simple extreme solution : store all base relations or store less information with higher tracing computation cost. Goal : intermediate scheme with low tracing cost and modest extra storage and maintenance costs. Idea : Derivation Views -break down view definition again to ASPJ segments. -Only view defined by lowest segments directly over base relations -Those views can be computed by a simple selection with split operator

33 Storing Derivation Views in a Multi-Source DWH All_clothing DV_All_clothing

34 Warehousing System Supporting Derivation Tracing I - Auxiliary view AllClothing maintained for tracing tuples from Clothing - AllClothing records the intermediate aggregation results - to trace tuples from AllClothing, derivation view DV_AllClothing is maintained.

35 Warehousing System Supporting Derivation Tracing I

36 Tracing the Lineage of View Data in a Warehousing Environment Questions ?


Download ppt "Tracing the Lineage of View Data in a Warehousing Environment Seminar : “Digital Information Curation“ Winterterm 2005/2006 Siniša Avramović"

Similar presentations


Ads by Google