Presentation is loading. Please wait.

Presentation is loading. Please wait.

George Papastefanatos 1, Panos Vassiliadis 2, Alkis Simitsis 3,Yannis Vassiliou 1 (1) National Technical University of Athens

Similar presentations


Presentation on theme: "George Papastefanatos 1, Panos Vassiliadis 2, Alkis Simitsis 3,Yannis Vassiliou 1 (1) National Technical University of Athens"— Presentation transcript:

1 George Papastefanatos 1, Panos Vassiliadis 2, Alkis Simitsis 3,Yannis Vassiliou 1 (1) National Technical University of Athens {gpapas,yv}@dbnet.ece.ntua.gr (2) University of Ioannina pvassil@cs.uoi.gr (3) IBM Almaden Research Center asimits@us.ibm.com What-if Analysis for Data Warehouse Evolution

2 DaWaK'07, Regensburg, September 20072 Outline Motivation Graph-based modeling of ETL Adapting ETL workflows Evaluation Conclusions

3 DaWaK'07, Regensburg, September 20073 Outline Motivation Graph-based modeling of ETL Adapting ETL workflows Evaluation Conclusions

4 DaWaK'07, Regensburg, September 20074 Data Warehouse environment WWW Act1 Act2 Act3 Act4 Act5

5 DaWaK'07, Regensburg, September 20075 ETL WWW Sources Extract Transform &Clean DW Load DSA Act1 Act2 Act3 Act4 Act5

6 DaWaK'07, Regensburg, September 20076 Motivation WWW Act1 Act2 Act3 Act4 Act5

7 DaWaK'07, Regensburg, September 20077 Evolving ETL sources… Schema Changes on the sources of ETL processes. Design constructs are –Added, Removed, Modified ETL processes affected: –Syntactically –Syntactically – i.e., become invalid –Semantically –Semantically – i.e., query must conform to the new source database semantics Adaptation of activities, queries and views –time-consuming task,treated in most of the cases manually by the administrators/developers

8 DaWaK'07, Regensburg, September 20078 We would like to know… What part of the process is affected and how if e.g., an attribute is deleted? Can we predict the impact of changes? To what extent can readjustment be automated? Can we perform what-if analysis for potential changes of source configurations?

9 DaWaK'07, Regensburg, September 20079 Contribution A general mechanism for performing what-if analysis for potential changes of ETL source configurations –A graph model for relations, queries, views, ETL activities, and their significant properties –A framework for annotating the graph with policies concerning the behavior of nodes in the presence of hypothetical changes. –A set of rules that dictate the proper actions, when additions, deletions or updates are performed to relations, attributes, and conditions –An experimental assessment of our proposal

10 DaWaK'07, Regensburg, September 200710 Outline Motivation Graph-based modeling of ETL Adapting ETL workflows Evaluation Conclusions

11 DaWaK'07, Regensburg, September 200711 Query representation Q:SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours FROM EMP, WORKS WHERE EMP.Emp# = WORKS.Emp# GROUP BY EMP.Emp#Join

12 DaWaK'07, Regensburg, September 200712 Query representation Q:SELECT EMP.Emp#, Sum(WORKS.Hours) as T_Hours FROM EMP, WORKS WHERE EMP.Emp# = WORKS.Emp# GROUP BY EMP.Emp#Join

13 DaWaK'07, Regensburg, September 200713 Outline Motivation Graph-based modeling of ETL Adapting ETL workflows Evaluation Conclusions

14 DaWaK'07, Regensburg, September 200714 Annotating the graph with adaptation rules According to prevailing policy, the proper action is taken  graph transformation Set of evolving database constructs: relations attributes constraints 1 Set of potential evolution changes: addition deletion modification 2 Set of reaction policies: propagate block prompt 3

15 DaWaK'07, Regensburg, September 200715 Annotating the graph with adaptation rules Assuming that a graph construct is annotated with a policy for a particular event (e.g., an activity node is tuned to deny deletions of its provider attributes ) : –(a) it performs the identification of the affected subgraph –(b) if the policy is appropriate, it automates the readjustment of the graph to fit the new semantics imposed by the change.

16 DaWaK'07, Regensburg, September 200716 Query Adaptation - Example Annotated Query Graph Event Add attribute Phone to EMP relation Transformed Query Graph Q: SELECT EMP.Emp#, EMP.Name FROM EMP Q’: SELECT EMP.Emp#, EMP.Name, EMP.Phone FROM EMP

17 DaWaK'07, Regensburg, September 200717 Algorithm Propagate changeS (PS) Input: an ETL summary S over a graph G o =(V o,E o ) and an event e Output: a graph G n =(V n,E n ) Variables: a set of events E, and an affected node A Begin dps(S, G o, G n, {e}, A) End dps(S, G n, G o, E, A) { I = Ins_by_policy(affected(E)) D = Del_by_policy(affected(E)) G n = G o – D  I E = E–{e}  action(affected(E)) if consumer(A)  nil for each consumer(A) dps(S,G n,G o,E,consumer (A)) }

18 DaWaK'07, Regensburg, September 200718 Conflict resolution Graph constructs may have contradictory policies for the same event Policies defined on query graph structures are stronger than policies defined on view graph structures which in turn prevail on policies defined on relation graph structures Rule

19 DaWaK'07, Regensburg, September 200719 Outline Motivation Graph-based modeling of ETL Adapting ETL workflows Evaluation Conclusions

20 DaWaK'07, Regensburg, September 200720 Evaluation of our framework Reverse engineering of real-world ETL processes, extracted from an application of the Greek public sector Monitored the changes that took place to the sources of the studied data warehouse Performed what-if analysis for several evolution scenarios

21 DaWaK'07, Regensburg, September 200721 Configuration of our setting 7 ETL processes 53 ETL activities 3 lookup tables 7 source tables 9 target tables

22 DaWaK'07, Regensburg, September 200722 Configuration of our setting evolution events –renaming source tables, –renaming attributes of source tables, –adding and deleting attributes from source tables, –modifying the domain of attributes –changing the primary key of lookup tables

23 DaWaK'07, Regensburg, September 200723 Measurements For each event, we counted: –(a) the number of activities affected both semantically and syntactically, –(b) the number of activities, that have automatically been adapted by our framework (propagate or block policies) as opposed to those –(c) that required administrator’s intervention (i.e., a prompt policy)

24 DaWaK'07, Regensburg, September 200724 Adapted activities w.r.t. the ETL scenario size

25 DaWaK'07, Regensburg, September 200725 Adapted activities w.r.t. the complexity of activities.

26 DaWaK'07, Regensburg, September 200726 Outline Motivation Graph-based modeling of ETL Adapting ETL workflows Evaluation Conclusions

27 DaWaK'07, Regensburg, September 200727 Summary A uniform representation for modeling relations, queries, views and ETL activities A framework for annotating the graph with policies concerning the behavior of nodes in the presence of hypothetical changes A set of rules that dictate the proper adaptation actions, when evolution events are performed to ETL sources An experimental assessment of our proposal

28 DaWaK'07, Regensburg, September 200728 On-going/Future Work Hecataeus: A tool for visualizing and performing what-if analysis for several evolution scenarios. SQL extensions for annotating graph constructs with evolution semantics Patterns of evolution sequences

29 DaWaK'07, Regensburg, September 200729 http://www.cs.uoi.gr/~pvassil/projects/architecture_graph/ Danke schön! Questions?

30 DaWaK'07, Regensburg, September 200730

31 DaWaK'07, Regensburg, September 200731 Back up Slides

32 DaWaK'07, Regensburg, September 200732 Related work DB schema Evolution –Schema Versioning DW schema Evolution Materialized View Evolution Evolution wrt Model Mappings

33 DaWaK'07, Regensburg, September 200733 Scenario 1

34 DaWaK'07, Regensburg, September 200734 Evolution Metadata Metadata Repository maintaining –Graph constructs –Annotations –What if analysis scenarios

35 DaWaK'07, Regensburg, September 200735 Extending SQL With Evolution Semantics Used for annotating graph constructs ON TO THEN E.g. SELECT Emp#, NAME, AGE FROM V ON condition addition TO V THEN propagate, ON attribute deletion TO V.AGE THEN block

36 DaWaK'07, Regensburg, September 200736 Hecataeus A tool for visualizing and performing what-if analysis for several evolution scenarios

37 DaWaK'07, Regensburg, September 200737 Adapted activities per Event

38 DaWaK'07, Regensburg, September 200738 Evolution changes occurred on source and lookup tables

39 DaWaK'07, Regensburg, September 200739 Annotation of graph constructs with policies for kinds of events


Download ppt "George Papastefanatos 1, Panos Vassiliadis 2, Alkis Simitsis 3,Yannis Vassiliou 1 (1) National Technical University of Athens"

Similar presentations


Ads by Google