/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, 20021 XAL - An XML ALgebra for Query Optimization.

Slides:



Advertisements
Similar presentations
XML Data Management 8. XQuery Werner Nutt. Requirements for an XML Query Language David Maier, W3C XML Query Requirements: Closedness: output must be.
Advertisements

Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
/ department of mathematics and computer science TU/e technische universiteit eindhoven WISE 2002December 12, RAL: an RDF Algebra Flavius Frasincar.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Introduction to XML Algebra
A Review of Ontology Mapping, Merging, and Integration Presenter: Yihong Ding.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
/ department of mathematics and computer science TU/e eindhoven university of technology WebNet 2001October 26, XML-Based Automatic Web Presentation.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002 (Extra)January 29, Extraction Operators Projection.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.
/ faculty of mathematics and informatics TU/e eindhoven university of technology ADBIS'200128/09/20011 An RMM-Based Methodology for Hypermedia Presentation.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Query Processing Presented by Aung S. Win.
Technical University of Valencia Computer Science Department SOFSEM’07 (22/01/2007) A Program Slicing Based Method to Filter XML/DTD documents.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Querying Structured Text in an XML Database By Xuemei Luo.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
RELATIONAL ALGEBRA CHAPTER 6 1. LECTURE OUTLINE  Unary Relational Operations: SELECT and PROJECT  Relational Algebra Operations from Set Theory  Binary.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
Relational Algebra.
Propositional Calculus CS 270: Mathematical Foundations of Computer Science Jeremy Johnson.
Chapter 18 Query Processing. 2 Chapter - Objectives u Objectives of query processing and optimization. u Static versus dynamic query optimization. u How.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Advanced Relational Algebra & SQL (Part1 )
S calable K nowledge C omposition Ontology Interoperation January 19, 1999 Jan Jannink, Prasenjit Mitra, Srinivasan Pichai, Danladi Verheijen, Gio Wiederhold.
IST 210 The Relational Language Todd S. Bacastow January 2004.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Mathematical Preliminaries
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Conceptualization Relational Model Incomplete Relations Indirect Concept Reflection Entity-Relationship Model Incomplete Relations Two Ways of Concept.
Deriving Relation Keys from XML Keys by Qing Wang, Hongwei Wu, Jianchang Xiao, Aoying Zhou, Junmei Zhou Reviewed by Chris Ying Zhu, Cong Wang, Max Wang,
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins
Ritu CHaturvedi Some figures are adapted from T. COnnolly
CSE202 Database Management Systems
Query Processing and Optimization, and Database Tuning
Efficient Evaluation of XQuery over Streaming Data
COMP3017 Advanced Databases
Formal Modeling Concepts
Relational Algebra.
Computing Full Disjunctions
The Relational Algebra and Relational Calculus
QUERY OPTIMIZATION.
Relational Algebra & Calculus
Presentation transcript:

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, XAL - An XML ALgebra for Query Optimization Flavius Frasincar Geert-Jan Houben Cristian Pau Databases & Hypermedia Group Division of Computer Science

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Contents 1.Motivation 2.XML Query Algebra Goals 3.XML Query Algebras 4.XAL 5.XAL Optimization Laws 6.XAL Heuristic Optimization Algorithm 7.XAL Query Example 8.Conclusion and Future Work

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Motivation Hera project: automatic hypermedia presentation of data residing in the heterogeneous ‘deep’ web Use XML technologies for querying, transforming, and integrating large amounts of Web data Optimization of XML queries is important: need of an XML algebra for query optimization

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, XML Query Algebra Goals Based on W3C XML Query Data Model Genericity – logical operators independent of the underlying storage representation –Optimizability – support query optimizations Expressivity – express a large class of queries –Composability – operators are closed on the same data type –Flexibility – support various data types

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Lore (Stanford) specific set of logical operators Beech et al. (industry) logical model, no optimization strategies YATL (INRIA) specific data model, focus on data integration XOM (Zhang & Dong) complete and closed, no optimization support SAL (Beeri & Tzaban) focus on semistructured data, limited optimization support XQuery (W3C) weak support for optimization (unordered forests) … 3. XML Query Algebras

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, XAL Based on W3C XML Query Data Model Reduces the impedance mismatch between databases and XML (query languages) by allowing a mix of ordered/unordered operators Support for optimization (reuse the query optimization heuristics from relational systems) Fine grained algebra of vertices and edges (Genericity) Composability, Flexibility, XQuery Compatibility

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, XAL Data Model Rooted connected directed graph with a partial order relation on edges –Acyclic (lexical view) –Cyclic (semantic view) Formally,

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Properties for Vertex Basic Property Element Vertex Result Simple Vertex Result valueidentifiervalue (e.g.“Dali”) typeelementtype of value (e.g.string) Derived Property Result namename of the incoming E edge parentparent vertex (via E edge) parentedgeincoming E edge childelementsoutgoing E edges attributesoutgoing A edges referencesoutgoing R edges

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Properties for Edge Basic Property Result nameelement name (E) attribute name (A) ID attribute name (R) “Data” (D) typeE, A, R, D parentsource vertex of the edge childtarget vertex of the edge Derived Property Result nextfollowing sibling edge previouspreceding sibling edge Note: Derived Property apply to E, D edges

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, XAL Operators All operators have the following form o[f](x 1, x 2, … x n : expression) Unary operators evaluate the input to a collection of vertices and use the implicit map operation to evaluate the result Closedness = all operators are closed on collections (support composability)

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Operator Semantics o[f](x: expression) Variable x is bound to each vertex in the input collection. For each such binding f(x) is evaluated The semantics of the operator o defines how the partial result (resulting from one variable binding) is computed from f(x) The operator result is built by concatenating all the partial results

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Collection Generalization of list and set (collections have a boolean order property) Similar to the mathematician’s monad and functional programmer’s (list) comprehension Monad, where M is a type is a triplet of functions (map, unit, join ) XAL has map and join (called union) but no unit operator (the singleton collection is written as the singleton itself) Collections have elements of arbitrary types

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Operators Type Extraction operators – retrieve the needed information from XML documents Meta-operators – control the evaluation of expressions Construction operators – build new XML documents from the extracted data Note: two vertices are equal if they have the same value

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Extraction Operators Projection  [type, name](e: expr) Selection  [condition](e: expr) Unorder  (e: expr) Join (x: expr) ⋈ [condition] (y: expr) Cartesian Product (x: expr)  (y: expr) Union (x: expr)  (y:expr) Difference (x: expr)  (y:expr) Intersection (x: expr)  (y:expr) Note: Flexibility, x and y do not have to be “union compatible” like in relational algebra

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Projection  [type, name](e: expression) type = E, A, R, D or disjunctions (|) of these name = regular expression over strings Example.  [E, (P|p)ainter[s]#)](e) produces all the target vertices of element containment (E) edges that have names starting with Painter, painter, Painters, or painters, and that originate from the vertices in e

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Meta-operators & Construction Operators Map map[f](e: expression) Kleene Star *[f](e: expression) Note: e is included in the result Create vertex vertex[type](value) Note: for element vertices the value (identifier) is given by the system Create edge edge[type, name, parent](child)

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, An Example Copy a complete graph starting from the vertex v map[edge[type(e), name(e), vertex[type(parent(e))](value(parent(e))) ](vertex[type(child(e))](value(child(e)))) ](e) where e = *[parentedge(  [E|A|D, #](child(x))) ](x: parentedge(  [E|A|D, #](v)))

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, XAL Optimization Laws The main factor in the execution cost of algebra expressions is the iteration (explicit or implicit map operator) over collections The proposed set of optimization laws aims at reducing iteration size for the data extraction expressions The laws are inspired by monad laws and relational algebraic optimization rules

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Law 1 (Left unit) If e 1 is of unit type (singleton collection), then e 2 (e 1 ) = e 2 (v := e 1 ) Law 2 (Right unit) If e 2 is the identity function, i.e. e 2 (v) = v, then e 2 (e 1 ) = e 1 Law 3 (Associativity) (e 1 o e 2 ) o e 3 = e 1 o ( e 2 o e 3 ) Law 4 (Empty collection) If e 2 is the empty function, i.e. e 2 (v) = (), then e 2 (e 1 ) = () Law 5 (Decomposition of join) e 1 ⋈ [condition] e 2 =  [condition](e 1  e 2 )

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Law 6 (Decomposition of projection) If name is a regular expression that can be decomposed in several regular expressions n 1, n 2, … n n and e is an unordered collection, then  [name](e) =  [n 1 ](e)   [n 2 ](e)  …  [n n ](e) Law 7 (Cascading of selection)  [c 1 ∧ c 2 ∧ … c n ](e) =  [c 1 ](  [ c 2 ]( … (  [ c n ](e)) … )) Law 8 (Commutativity of selection)  [c 1 ](  [c 2 ](e)) =  [c 2 ](  [c 1 ](e)) Law 9 (Commutativity of selection with projection) If the condition c involves solely vertices that have incoming edges named by the regular expression name, then  [name](  [c(  [name])](e)) =  [c](  [name](e)) Law 10 (Commutativity of selection with cartesian product) If the condition c involves solely vertices from e 1, then  [c](e 1  e 2 ) =  [c](e 1 )  e 2

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Law 11 (Commutativity of selection with binary operators) If  is one of the set operators: , , or , then  [c](e 1  e 2 ) =  [c](e 1 )   [c](e 2 ) Law 12 (Commutativity of binary operators) If  is one of the set operators: , , or  and e 1 and e 2 are unordered collections, then e 1  e 2 = e 2  e 1 Law 13 (Commutativity of projection with cartesian product) If name is a regular expression that can decomposed in two regular expressions name 1 and name 2, name 1 involves solely vertices in e 1 and name 2 involves solely vertices in e 2, then  [name](e 1  e 2 ) =  [name 1 ](e 1 )   [name 2 ](e 2 ) Law 14 (Commutativity of projection with union)  [name](e 1  e 2 ) =  [name](e 1 )   [name](e 2 )

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, XAL Heuristic Optimization Algorithm S1. Eliminate unnecessary iterations (use Laws 1, 2, and 4). After each following step, S1 is applied again. S2. Unorder collections (use unorder operator). Collections for which order is not relevant are unordered. S3. Decompose joins (use Law 5). S4. Decompose selections (use Law 7). Break down selections into a cascade of selections. It enables moving select operations down in the query tree. S5. Move selections down as far as possible (use Laws 8, 9, 10, and 11). Based on the commutativity of selection with other operators move selections down in the query tree as far as it is permitted by the selection condition.

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, S6. Apply the most restrictive selections first (use Laws 3 and 12). Based on the commutativity and associativity of binary operators rearrange the leaf vertices so that the most restrictive selections apply first. Note: As a selectivity criterion one can use the size of the collection. The most restrictive selections are the selections that produce collections with the fewest elements. S7. Decompose projections (use Law 6). Break down projections into a union of projections. It enables moving the project operations down in the query tree. S8. Move projections down as far as possible (use Laws 1, 2, and 4). Based on the commutativity of projection with other operators, move projections down in the query tree as far as possible. S9. Identify combined operations (use composition laws). Identify subtrees that group operations that can be executed by a single program.

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, XAL Query Example XML repository with three documents: painters.xml Rembrandt Dutch painter … catalogue.xml Painting_ID … paintings.xml Painting_ID01 The Stone Bridge Rembrandt …

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Query: Return in alphabetical order the name of the painters that have a painting over $ (the name of the painters will appear in the element as many times as the number of their paintings that fulfill the above condition) XQuery 1.0: { FOR $i IN document(“painters.xml”)/painters/painter, $j IN document(“paintings.xml”)/paintings/painting[author = $i/name], $k IN document(“catalogue.xml”)/items/item[paintingid = $j/id] WHERE $k/price/data() > RETURN $i/name SORTBY./data() }

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Input: –painters.xml: 3 painters (1,2,3) –paintings.xml: 100 paintings for painter paintings for painter paintings for painter 3 –catalogue.xml: Only painter 1 has 20 paintings more expensive than $ , all the other paintings are below $

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Initial Query Tree –Output is alphabetically ordered!  Cartesian Product: 3 x 350 x 350 = elements XQUERYXAL FOR , ,  WHERE  SORTBY 

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, I Optimization –Step 2: Unorder collections (commutativity of XAL binary operators) –Step 4: Decompose selections –Step 5: Move selections down as far as possible  Cartesian Product: 3 x x 20 = elements

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, II Optimization –Step 6: Apply the most restrictive selections first (switch positions of painter and item)  Cartesian Product: 20 x x 3 = elements

/ department of mathematics and computer science TU/e eindhoven university of technology ADC 2002January 29, Conclusion and Future Work XAL provides an elegant way (by applying the ‘unorder’  operator) to reuse the heuristic optimization algorithm from relational queries Investigate new optimization laws that take advantage of the XML specific features (e.g. tree structure, internal references) Build a translation scheme from XQuery to XAL, exploring the power of expression of XAL