1 SIGMOD 2000 Christophides Vassilis On Wrapping Query Languages and Efficient XML Integration V. Christophides, S. Cluet, J Simeon Computer Science Department,

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

1 ICS-FORTH Dimitris Plexousakis, Pisa, February 2001 The CYCLADES Mediator Service Dimitris Plexousakis Computer Science Department, University.
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Introduction to XML Algebra
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.
Automatic Data Ramon Lawrence University of Manitoba
1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Graph Algebra with Pattern Matching and Aggregation Support 1.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Quete: Ontology-Based Query System for Distributed Sources Haridimos Kondylakis, Anastasia Analyti, Dimitris Plexousakis Kondylak, analyti,
© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved Automatic Deployment of Application-Specific Metadata and Code in MOCHA Manuel Rodriguez-Martinez.
Module 17 Storing XML Data in SQL Server® 2008 R2.
Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden.
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
IBM Research © 2005 IBM Corporation XJ: Robust XML Processing in Java™ Mukund Raghavachari, Rajesh Bordawekar, Michael Burke, and Igor Peshansky IBM T.
1 ICS-FORTH & Univ. of Crete Paris January 2000 Community Webs (C-Web): Functionality and Architecture Issues V. Christophides Computer Science Department,
1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
LegoDB 1 Data Binding Workshop, Avaya Labs, June 2003 LegoDB: Cost-based XML to Relational “Shredding” Jerome Simeon Bell Labs – Lucent Technologies joint.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
1/39 University of Versailles, September 28th 1999 Hubert Naacke Mediator Cost Models for Heterogeneous Data Sources Hubert Naacke.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
Microsoft Office Solution Accelerator for Six Sigma – A Case Study in Project Server Customization Brian Smith Program Manager Information Worker New.
SOFSEM-SRF 2006, January 21-26, Merin, Czech Republic R. Adamus,K. Kuliberda, J. Wislicki, K. Subieta Wrapping Relational Data Structures to Object-Oriented.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.
San Diego Supercomputer Center University of California, San Diego The MIX Project Native XML Database XML View(s) Wrappers export: 1. Schemas & Metadata.
XML & Mediators Thitima Sirikangwalkul Wai Sum Mong April 10, 2003.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Your Mediators need Data Conversion!* Sophie Cluet, Claude Delobel, Jerome Simeon, K Smaga By Prapulla Bajjuri.
1 III) COMPLEX VALUE DATABASES. 2 Introduction l Relax the 1 Normal Form of the Relational Model  Set-value attributes (e.g., set of tuples => Relations)
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
1 Spring 2000 Christophides Vassilis THE ODMG OQL OPTIMISATION.
1 Information Integration. 2 Information Resides on Heterogeneous Information Sources different interfaces different data representations redundant and.
Semantically Processing The Semantic Web Presented by: Kunal Patel Dr. Gopal Gupta UNIVERSITY OF TEXAS AT DALLAS.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
XML and Database.
1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Database Systems Lecture 1. In this Lecture Course Information Databases and Database Systems Some History The Relational Model.
Object storage and object interoperability
Relational Calculus Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
MPUG Global December 2 nd 2004 Portland, Oregon Brian Smith, Microsoft Corporation.
Chapter 3 The Relational Model. Why Study the Relational Model? Most widely used model. Vendors: IBM, Informix, Microsoft, Oracle, Sybase, etc. “Legacy.
Logic as a Query Language: from Frege to XML
Capability Based Mediation in TSIMMIS
On Wrapping Query Languages and Efficient XML Integration V
Database management concepts
Lecture 12: Data Wrangling
Database management concepts
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Introduction of Week 9 Return assignment 5-2
Query Optimization.
On Provenance of Queries on Linked Web Data
Course Instructor: Supriya Gupta Asstt. Prof
Presentation transcript:

1 SIGMOD 2000 Christophides Vassilis On Wrapping Query Languages and Efficient XML Integration V. Christophides, S. Cluet, J Simeon Computer Science Department, University of Crete Institute for Computer Science - FORTH Heraklion, Crete INRIA Rocquencourt Domaine de Voluceau Paris, France Bell Laboratories Murray Hill, NJ, USA

2 SIGMOD 2000 Christophides Vassilis An Integration Scenario Z39.50 Server ODBC Server Middleware Server SQL queries on tables with trading info about artifacts Full-text queries on well-formed XML docs with descriptive info about artifacts

3 SIGMOD 2000 Christophides Vassilis XML-based Middleware is Cool ! Z39.50 Server ODBC Server Middleware Server RDBMS-XML Wrapper Wais-XML Wrapper What are the Artifacts created in Giverny ? Q1 Q2 V1=(Q1,Q2) V2=... Title Creator Price Nympheas Monet 10M$ Waitress Manet 38M$ Monet Nympheas Impressionism 21 x 61 Giverny > Q S1 S2 Monet Nympheas 10M$ Impressionism 21 x 61 Giverny XML

4 SIGMOD 2000 Christophides Vassilis But XML is not a Panacea !!! Z39.50 Server ODBC Server Middleware Server XML Wrapper Q1’Q2’ Q=Q’(Q1’,Q2’) Q l Wrapping queries is hard l Optimization for XML queries is poor l What about type information? S1 S2 select... from... where... contains word1 or/and … XML

5 SIGMOD 2000 Christophides Vassilis The YAT Approach to Efficient XML Integration YAT Mediator Server SQL-XML Generic Wrapper Full Text-XML Generic Wrapper Q Z39.50 Server ODBC Server S1 S2 l An Algebra for XML l Generic wrapping of query languages and data structures l New optimization opportunities Q1’Q2’ Q’ Q2’Q1’ XML

6 SIGMOD 2000 Christophides Vassilis Outline l Brief Recall u YAT data model (wrappers’ structural metadata) u YATL integration language (XML view definition) l The YAT operational model u XML Algebra l Generic wrapping of source query capabilities u Wrappers’ operational metadata l Optimization opportunities l Summary and Related work

7 SIGMOD 2000 Christophides Vassilis Generic vs Specific XML Data Representation tuple rel_artifacts rel_artifacts:root... 10M$ “Nympheas” creator “Monet” title price tuple Relation: table Symbol tuple Symbol Int v String v Float v Bool Yat: Any Yat & Yat  tuple Float String creator String price rel_artifacts Rel_artifacts: table title YAT modelRelational model Artifacts Schema Artifacts Database * * * * 38M$ “Waitress” creator “Manet” title price table X Y owner String owner

8 SIGMOD 2000 Christophides Vassilis Mixing Valid & Well-formed XML Data artist String artist Float String misc dims price Artifact: artifact title Integrated Artifact Schema Field artwork: collection &Artifact * root works: docs Work * root Work: style * Field Symbol String Field: XML Artwork v Symbol Field String style title dims String work String owner

9 SIGMOD 2000 Christophides Vassilis Integrating Heterogeneous XML Data with YATL Artifact MAKE collection * Artifact($t,$a):= artifact [title:$t, artist:$a, price:$p, style:$s, dims:$d, owner:$o, misc:$f] MATCH rel_artifacts WITH table * tuple * { title:$t, creator:$c, price:$p, owner:$o } works WITH works * work [ artist:$a, title:$t’, style:$s, dims:$d, *($f) ] WHERE $t = $t’ and $c = $a

10 SIGMOD 2000 Christophides Vassilis The XML Algebra l What do we need ? u capture the query language u support optimization u wrap source query languages l Our XML algebra u relational operators: Select, Project, Join, , ,  u core object operators: Map, Djoin, Group, Sort ò Standard Relational & Object Rewritings u two XML operators: Bind and Tree ò New XML Rewritings 

11 SIGMOD 2000 Christophides Vassilis Bind Operator & Tab Structure work * docs Bind works... Tab  artiststyletitledims $s $a$t $d * ($f) $s $a $t$d$f Monet Nympheas Impressionism 21x61 crplace “Giverny” $s $a $t$d$f Manet Waitress Impressionism 37.5x51 theme “Folies Bergere”

12 SIGMOD 2000 Christophides Vassilis Tree & Restructuring Style($s): $s $a * Pablo Picasso Tree Bind (works, …) s1: “Cubism” Georges Braque... * Edouard Manet s2: “Impressionism” Claude Monet...  $s

13 SIGMOD 2000 Christophides Vassilis Algebraization of Queries docs rel_artifacts table Tree Bind Join $t = $t’ and $c = $a rel_artifactsworks * $d $a artist $p$t misc dims price artifact title $f $s style collection * Artifact($t,$a): = artwork: = artist style title dims work tuple creator price title * *($f) $c $p$t $t’ $s$a$d owner $o

14 SIGMOD 2000 Christophides Vassilis The Core YAT Operations Basic PredicateBindGroupSelect Tree Supported by: { YAT } Sig: Yat x Yat  Bool <...= Function Algebra Operation Join Supported by: { YAT } Sig: Yat x FYat  Tab Supported by: { YAT } Sig: Tab x Pred  Tab Supported by: { YAT } Sig: Yat x FYat  YAT...

15 SIGMOD 2000 Christophides Vassilis Generic Wrapping of Source Query Capabilities Function Basic Algebra Operation Predicate Bind Group Select Tree Supported by: { YAT, Rel } Sig: Yat x Yat  Bool <... = Join Supported by: { YAT } Sig: Yat x FYat  Tab Supported by: { YAT, Rel, Wais } Sig: Tab x Pred  Tab Supported by: { YAT } Sig: Yat x FYat  YAT... contains Supported by: { Rel } Sig: Rel x FRel  Tab Supported by: { Wais } Sig: Works x FWork  Tab Supported by: { Wais } Sig: String x Work  Bool Extension Refinement...

16 SIGMOD 2000 Christophides Vassilis Query Processing in YAT l Query: What are the artifacts created in Giverny and sold for less that 10M$? l Three phases query optimization: ¶ Simplification of algebraic expressions: Bind-Tree rewritings, push selections, projections,... · Pushing operations on external sources: filter simplification, source- supplied equivalencies,... ¸ Information passing between sources: reorder join arguments,... MAKE * answer [title: $t, artist: $a, price: $p] MATCH artwork WITH collection * artifact [title: $t, artist: $a, price: $p, misc.crplace: $cp] WHERE $cp = “Giverny” and $p < 10

17 SIGMOD 2000 Christophides Vassilis Query Preprocessing docs Tree Bind Join $t = $t’ and $c = $a rel_artifactsworks * $d $a artist $p$t misc dims price artifact title $f $s style collection * Artifact($t,$a) artwork artist style title dims work *($f) $t $s$a$d Tree Select Bind $cp=“Giverny” and $p<10 * answer title artist price $t$a $p * artifact title artist price $t$a$p collection misc crplace $cp Query View $o owner rel_artifacts table tuple creator price title * $c $p $t owner $o

18 SIGMOD 2000 Christophides Vassilis Query Optimization: Phase 1 Tree Bind * artifact title artist price $t$a$p collection misc $m Bind * t a p $t$a$p m $cp Bind crplace Project $t,$a, $p,$m:f * artifact title artist price $t$a $p misc crplace $cp $d $a artist $p$t misc dims price artifact title $f $s style collection * Artifact($t,$a) artwork $o owner

19 SIGMOD 2000 Christophides Vassilis Query Optimization: Phase 1 Tree Select $p<10 Bind Join $t = $t’ and $c = $a Project $t, $c, $p Select $cp=“Giverny” Project * answer artist price $t$a $p rel_artifacts $t, $a, $m:f docs works * artist style title dims work *($f) $t $s $a $d rel_artifacts table tuple creator price title * $c $p $t owner $o * $cp Bind crplace m title

20 SIGMOD 2000 Christophides Vassilis Query Optimization: Phase 2 Tree $w Bind works Join Select Bind rel_artists Select contains(“Giverny”,$w) Bind = (X,Work) => contains(X,Work) * answer artist price $t$a $p $t = $t’ and $c = $a $cp=“Giverny” $p<10 rel_artifacts table tuple creator price title * $c $p$t docs * work title $t$a $cp crplace * w title artist

21 SIGMOD 2000 Christophides Vassilis Query Optimization: Phase 3 Tree DJoin * answer artist price $t$a $p title $w Bind works Select contains(“Giverny”,$w) Bind $cp=“Giverny” docs * work $t$a $cp crplace * w title artist Select Bind rel_artists $p<10 rel_artifacts table tuple creator price title * a $pt

22 SIGMOD 2000 Christophides Vassilis Summary & Related Work l Wrapping Query Languages: implies to understand the semantics of QLs u Ad hoc solution proposed by Garlic (IBM) u Untyped solution proposed by DISCO (INRIA) u Query templates-based solution proposed by TSIMMIS (Stanford) ò Generic solution introduced for the YAT system (INRIA+Bell Labs) l XML query optimization: requires to exploit XML typing information ò YAT relies on an general purpose algebra allowing ¶ to reuse optimization techniques proposed in the relational and object context (pushing selections, projections, join reordering, …) · to introduce new ones taking advantage of the type information in order to prune navigation in XML trees, push query evaluation to the sources, etc.

23 SIGMOD 2000 Christophides Vassilis The YAT Architecture YAT API View Interface Structural Information Data Information ModuleQuery Module Optimizer Evaluator Server YAT API Data Conversion Structural Extraction Data Information ModuleQuery Module Query Translation Operational Extraction Client Source MEDIATORMEDIATOR WRAPPERWRAPPER Server Client