Presentation is loading. Please wait.

Presentation is loading. Please wait.

G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece)

Similar presentations


Presentation on theme: "G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece)"— Presentation transcript:

1 G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) {gpapas,yv}@dbnet.ece.ntua.gr (2) University of Ioannina, Ioannina, Hellas (Greece) pvassil@cs.uoi.gr (3) HP Labs, Palo Alto, California, USA alkis@hp.com Design Metrics for Data Warehouse Evolution

2 ER'08, Barcelona, October 20082 Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

3 ER'08, Barcelona, October 20083 Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

4 ER'08, Barcelona, October 20084 Motivation WWW Act1 Act2 Act3 Act4 Act5 Data warehouses are evolving environments, e.g.: –A dimension is removed or renamed –The structure of a dimension table is updated –A fact table is completely decoupled from a dimension –The measures of a fact table change –An ETL so urce is modified, etc.

5 ER'08, Barcelona, October 20085 Evolution Effects SW and data artifacts around the warehouse (e.g., ETL activities, materialized views, reports) are affected: –Syntactically – i.e., become invalid –Semantically – i.e., must conform to the new source database semantics Adaptation to new semantics –time-consuming task –treated in most of the cases manually by the administrators/developers Evolution-driven design is missing

6 ER'08, Barcelona, October 20086 We would like to know… Can we measure and quantify in a principled way the vulnerability of certain parts of a data warehouse environment and find these constructs that are most sensitive to evolution? Can we predict and quantify the impact of a change towards the rest system? What are the “right” measures for evaluating the quality of the design of a data warehouse, with respect to its evolution capabilities?

7 ER'08, Barcelona, October 20087 Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

8 ER'08, Barcelona, October 20088 Data Warehouse Schema Evolution Our approach Mechanism for performing what-if analysis for potential changes of database configurations Graph based representation of database constructs (i.e., relations, views, constraints, queries) Annotation of graph with rules for adapting queries to database schema evolution Evolving databases Queries Database Schema Graph-based modeling for uniform representation Metrics for Evaluating Evolution Design Evolving applications Rules for Handling Evolution

9 ER'08, Barcelona, October 20089 Graph based representation

10 ER'08, Barcelona, October 200810 Graph Annotation with rules

11 ER'08, Barcelona, October 200811 Graph Adaptation Annotated Query Graph Event Add attribute Phone to relation EMP Transformed Query Graph Q NameEID Name EID S S EMP S S map-select … ON attribute addition TO EMP THEN propagate Q: SELECT EID, Name FROM EMP Q: SELECT EID, Name, Phone FROM EMP Q NameEID Name EID S S EMP S S map-select … ON attribute addition TO EMP THEN propagate Phone S S map-select

12 ER'08, Barcelona, October 200812 Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

13 ER'08, Barcelona, October 200813 Simple Metrics Simple: in-degree, out-degree, degree EMP.Emp# is more “important” than EMP.SAL, w.r.t. how many nodes depend directly on it

14 ER'08, Barcelona, October 200814 Transitive Metrics Transitive: in-degree, out-degree, degree Variant with a view + query is more “complicated” wrt how many nodes are involved in the propagation of EMP.Emp# towards the end

15 ER'08, Barcelona, October 200815 Zoomed-out degrees Only top-level nodes are retained Only one edge between modules is retained weighted with the number of edges suppressed Simple degrees Transitive degrees

16 ER'08, Barcelona, October 200816 Entropy-based metrics Probability that a node v is affected by an event occurring on another node y i : Examples P(Q|V) = 1/3, P(Q|EMP) = 1/3, P(V|WORKS) = 1/2

17 Entropy-based metrics - continued Entropy of a node v: The “sensitivity” that a node v is affected by a random event on the graph. ER'08, Barcelona, October 200817

18 ER'08, Barcelona, October 200818 Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

19 Testbed Configuration ER'08, Barcelona, October 200819 TPC-DS benchmark: Web Sales schema with 3 variants –Original (1 fact – 13 dimensions) –Surrounded with views –Customer dimensions merged

20 ER'08, Barcelona, October 200820 Distribution of Evolution Events OperationDistribution 1Distribution 2 Rename Measure29% (15)0% (0) Add Measure25% (13)0% (0) Rename Dimension Attribute21% (11)0% (0) Add Dimension Attribute15% (8)37% (25) Delete Measure6% (3)0% (0) Delete Dimension Attribute4% (2)44% (30) Delete FKs0%13% (9) Delete Dimension Table0%6% (4) Distr 1: Recorded from the Greek Public sector Distr 2: Migration to a pure star schema

21 ER'08, Barcelona, October 200821 Evaluating effectiveness Effectiveness –how well our metrics can “forecast” the impact of events over the different constructs of the schema Configuration –we used mainly the Distr. 1 of events (real data) –we tested nine configurations based on variations of the schema –Web Sales (WS), Web Sales extended with views (WS-views), star variant of Web Sales (WS-star) variations of the policy –Block-All, Propagate-All, Mixture

22 ER'08, Barcelona, October 200822 Events affecting dimensions (a) WS schema (b) WS-star schema

23 ER'08, Barcelona, October 200823 WS-views schema Events affecting views

24 ER'08, Barcelona, October 200824 Events affecting queries (a) WS schema (b) WS-star schema

25 ER'08, Barcelona, October 200825 Comparison of design configurations (a) only affected queries (b) all affected nodes for Distr. 1

26 ER'08, Barcelona, October 200826 Comparison of design configurations (a) only affected queries (b) all affected nodes for Distr. 2

27 ER'08, Barcelona, October 200827 Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

28 A framework for handling the impact of changes in a DW environment A set of metrics for DW evolution –simple –transitive –entropy-based An extensive experimental evaluation based on both, real and synthetic dataset Platform: Hecataeus –A tool for visualizing and performing what-if analysis for evolution scenarios ER'08, Barcelona, October 200828

29 ER'08, Barcelona, October 200829 Gracias! Hecataeus: A tool for visualizing and performing what-if analysis for evolution scenarios http://www.cs.uoi.gr/~pvassil/projects/hecataeus/index.html

30 ER'08, Barcelona, October 200830 http://www.cs.uoi.gr/~pvassil/projects/architecture_graph/ Questions?

31 ER'08, Barcelona, October 200831 Gracias ! Sources: http://en.wikipedia.org/wiki/Image:Barcelona_-_planol_ciutat_vella_1860.jpg http://maps.google.com


Download ppt "G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece)"

Similar presentations


Ads by Google