G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece)

Slides:



Advertisements
Similar presentations
Using AutoMed Metadata in Data Warehousing Environments Hao FanAlexandra Poulovassilis School of Computer Science & Information Systems Birkbeck college,
Advertisements

BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
Nov DOLAP 2002 McLean USA A Multidimensional and Multiversion Structure for OLAP Applications Mathurin Body 1,2, Maryvonne Miquel 2, Yvan Bédard.
Automating the adaptation of evolving data-intensive ecosystems Petros Manousis, Panos Vassiliadis University of Ioannina, Ioannina, Greece George Papastefanatos.
Department of Software and Computing Systems Physical Modeling of Data Warehouses using UML Sergio Luján-Mora Juan Trujillo DOLAP 2004.
Uncertainty-Aware Data Transformations for Collaborative Reasoning Kwan-Liu Ma.
G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, T. Sellis 1,4, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece)
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Data Warehouse IMS5024 – presented by Eder Tsang.
Graph-Based Synopses for Relational Selectivity Estimation Joshua Spiegel and Neoklis Polyzotis University of California, Santa Cruz.
Management of the Evolution of Database-Centric Information Systems Panos Vassiliadis 2, George Papastefanatos 1, Timos Sellis 1, Yannis Vassiliou 1 1.
© 2003 Eindhoven University of Technology Alexandru Telea, Flavius Frasincar, Geert-Jan Houben Eindhoven University of Technology, the Netherlands Visualizing.
George Papastefanatos 1, Fotini Anagnostou 1 Panos Vassiliadis 2, Yannis Vassiliou 1 (1) National Technical University of Athens
Graph-Based Modeling of ETL Activities with Multi-Level Transformations and Updates Alkis Simitsis 1, Panos Vassiliadis 2, Manolis Terrovitis 1, Spiros.
WEDAGEN: A Synthetic Web Database Generator. Presentation Outline l Existing WWW search mechanisms l WHOWEDA: A Warehouse of Web Data l Modular structure.
George Papastefanatos 1, Panos Vassiliadis 2, Alkis Simitsis 3,Yannis Vassiliou 1 (1) National Technical University of Athens
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, K. Aggistalis 2, F. Pechlivani 2, Yannis Vassiliou 1 (1) National Technical University of Athens.
CSE 590DB: Database Seminar Autumn 2002: Meta Data Management Phil Bernstein Microsoft Research.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
H ECATAEUS A Framework for Representing SQL Constructs as Graphs George Papastefanatos 1, Kostis Kyzirakos 1, Panos Vassiliadis 2, Yannis Vassiliou 1 1.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Database Systems: Design, Implementation, and Management Ninth Edition
Change Impact Analysis for AspectJ Programs Sai Zhang, Zhongxian Gu, Yu Lin and Jianjun Zhao Shanghai Jiao Tong University.
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
Formalizing the Asynchronous Evolution of Architecture Patterns Workshop on Self-Organizing Software Architectures (SOAR’09) September 14 th 2009 – Cambrige.
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
Samad Paydar Web Technology Lab. Ferdowsi University of Mashhad 10 th August 2011.
Data Warehouse design models in higher education courses Patrizia Poščić, Associate Professor Danijela Subotić, Teaching Assistant.
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
Dimitrios Skoutas Alkis Simitsis
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
POLICY ENGINE Research: Design & Language IRT Lab, Columbia University.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
A Taxonomy of ETL Activities Panos Vassiliadis 1, Alkis Simitsis 2, Eftychia Baikousi 1 (1) University of Ioannina (2) HP Labs.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Today’s Agenda  Quick Review  Final Exam  Version Control Software Testing and Maintenance 1.
UNIT-II Principles of dimensional modeling
Conceptual Modeling for ETL processes Panos Vassiliadis, Alkis Simitsis, Spiros Skiadopoulos National Technical.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
7 Strategies for Extracting, Transforming, and Loading.
1 Carga de DW como Wrkf Presentación sobre el artículo: Modeling Data Warehouse Refreshment Process as a Workflow Application M. Bouzeghoub, F. Fabret,
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
Reasoning about the Behavior of Semantic Web Services with Concurrent Transaction Logic Presented By Dumitru Roman, Michael Kifer University of Innsbruk,
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
1 Representation and Evolution of Lego-based Assemblies Maxim Peysakhov William C. Regli ( Drexel University) Authors: {umpeysak,
WP3: Data Provenance and Access Control Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg.
2013 Spring Simulation Interoperability Workshop 13S-SIW-004 Extended FOM Module Merging Capabilities Bjo ̈ rn Mo ̈ ller Andy Bowers Mikael Karlsson Björn.
Data Warehouse/Data Mart It’s all about the data.
Future Directions in Data Warehousing Research DOLAP ’04 Panel Discussion Karen C. Davis Electrical & Computer Engineering and Computer Science Dept. University.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
MAIME: A Maintenance Manager for ETL Processes
Software Project Configuration Management
A Viewpoint-based Approach for Interaction Graph Analysis
Advanced Applied IT for Business 2
Data warehouse and OLAP
Associative Query Answering via Query Feature Similarity
Representation and Evolution of Lego-based Assemblies
File Systems and Databases
Enhance BI Applications and Simplify Development
Managing uncertainty and quality in the classification process
Analysis models and design models
Presentation transcript:

G. Papastefanatos 1, P. Vassiliadis 2, A. Simitsis 3, Y. Vassiliou 1 (1) National Technical University of Athens, Athens, Hellas (Greece) (2) University of Ioannina, Ioannina, Hellas (Greece) (3) HP Labs, Palo Alto, California, USA Design Metrics for Data Warehouse Evolution

ER'08, Barcelona, October Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

ER'08, Barcelona, October Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

ER'08, Barcelona, October Motivation WWW Act1 Act2 Act3 Act4 Act5 Data warehouses are evolving environments, e.g.: –A dimension is removed or renamed –The structure of a dimension table is updated –A fact table is completely decoupled from a dimension –The measures of a fact table change –An ETL so urce is modified, etc.

ER'08, Barcelona, October Evolution Effects SW and data artifacts around the warehouse (e.g., ETL activities, materialized views, reports) are affected: –Syntactically – i.e., become invalid –Semantically – i.e., must conform to the new source database semantics Adaptation to new semantics –time-consuming task –treated in most of the cases manually by the administrators/developers Evolution-driven design is missing

ER'08, Barcelona, October We would like to know… Can we measure and quantify in a principled way the vulnerability of certain parts of a data warehouse environment and find these constructs that are most sensitive to evolution? Can we predict and quantify the impact of a change towards the rest system? What are the “right” measures for evaluating the quality of the design of a data warehouse, with respect to its evolution capabilities?

ER'08, Barcelona, October Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

ER'08, Barcelona, October Data Warehouse Schema Evolution Our approach Mechanism for performing what-if analysis for potential changes of database configurations Graph based representation of database constructs (i.e., relations, views, constraints, queries) Annotation of graph with rules for adapting queries to database schema evolution Evolving databases Queries Database Schema Graph-based modeling for uniform representation Metrics for Evaluating Evolution Design Evolving applications Rules for Handling Evolution

ER'08, Barcelona, October Graph based representation

ER'08, Barcelona, October Graph Annotation with rules

ER'08, Barcelona, October Graph Adaptation Annotated Query Graph Event Add attribute Phone to relation EMP Transformed Query Graph Q NameEID Name EID S S EMP S S map-select … ON attribute addition TO EMP THEN propagate Q: SELECT EID, Name FROM EMP Q: SELECT EID, Name, Phone FROM EMP Q NameEID Name EID S S EMP S S map-select … ON attribute addition TO EMP THEN propagate Phone S S map-select

ER'08, Barcelona, October Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

ER'08, Barcelona, October Simple Metrics Simple: in-degree, out-degree, degree EMP.Emp# is more “important” than EMP.SAL, w.r.t. how many nodes depend directly on it

ER'08, Barcelona, October Transitive Metrics Transitive: in-degree, out-degree, degree Variant with a view + query is more “complicated” wrt how many nodes are involved in the propagation of EMP.Emp# towards the end

ER'08, Barcelona, October Zoomed-out degrees Only top-level nodes are retained Only one edge between modules is retained weighted with the number of edges suppressed Simple degrees Transitive degrees

ER'08, Barcelona, October Entropy-based metrics Probability that a node v is affected by an event occurring on another node y i : Examples P(Q|V) = 1/3, P(Q|EMP) = 1/3, P(V|WORKS) = 1/2

Entropy-based metrics - continued Entropy of a node v: The “sensitivity” that a node v is affected by a random event on the graph. ER'08, Barcelona, October

ER'08, Barcelona, October Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

Testbed Configuration ER'08, Barcelona, October TPC-DS benchmark: Web Sales schema with 3 variants –Original (1 fact – 13 dimensions) –Surrounded with views –Customer dimensions merged

ER'08, Barcelona, October Distribution of Evolution Events OperationDistribution 1Distribution 2 Rename Measure29% (15)0% (0) Add Measure25% (13)0% (0) Rename Dimension Attribute21% (11)0% (0) Add Dimension Attribute15% (8)37% (25) Delete Measure6% (3)0% (0) Delete Dimension Attribute4% (2)44% (30) Delete FKs0%13% (9) Delete Dimension Table0%6% (4) Distr 1: Recorded from the Greek Public sector Distr 2: Migration to a pure star schema

ER'08, Barcelona, October Evaluating effectiveness Effectiveness –how well our metrics can “forecast” the impact of events over the different constructs of the schema Configuration –we used mainly the Distr. 1 of events (real data) –we tested nine configurations based on variations of the schema –Web Sales (WS), Web Sales extended with views (WS-views), star variant of Web Sales (WS-star) variations of the policy –Block-All, Propagate-All, Mixture

ER'08, Barcelona, October Events affecting dimensions (a) WS schema (b) WS-star schema

ER'08, Barcelona, October WS-views schema Events affecting views

ER'08, Barcelona, October Events affecting queries (a) WS schema (b) WS-star schema

ER'08, Barcelona, October Comparison of design configurations (a) only affected queries (b) all affected nodes for Distr. 1

ER'08, Barcelona, October Comparison of design configurations (a) only affected queries (b) all affected nodes for Distr. 2

ER'08, Barcelona, October Outline Motivation Graph-based modeling & DW Evolution Metrics for data warehouse evolution Evaluation Conclusions

A framework for handling the impact of changes in a DW environment A set of metrics for DW evolution –simple –transitive –entropy-based An extensive experimental evaluation based on both, real and synthetic dataset Platform: Hecataeus –A tool for visualizing and performing what-if analysis for evolution scenarios ER'08, Barcelona, October

ER'08, Barcelona, October Gracias! Hecataeus: A tool for visualizing and performing what-if analysis for evolution scenarios

ER'08, Barcelona, October Questions?

ER'08, Barcelona, October Gracias ! Sources: