San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas.

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
RDF Schemata (with apologies to the W3C, the plural is not ‘schemas’) CSCI 7818 – Web Technologies 14 November 2001 Van Lepthien.
© Krumbein / Kudrass ADBIS | 2003 September 3-6, 2003, Dresden, Germany {kudrass | Thomas Kudrass, Tobias Krumbein Rule-Based.
XML Schema techniques: issues and recommendations SAML F2F #4 Eve Maler 28 August 2001.
SRDC Ltd. 1. Problem  Solutions  Various standardization efforts ◦ Document models addressing a broad range of requirements vs Industry Specific Document.
1 Conditional XPath, the first order complete XPath dialect Maarten Marx Presented by: Einav Bar-Ner.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
1 Overview XML crash course –HTML vs. XML –pure XML data model (XML = linear syntax for trees) XML Schema Rubin Landau, Bertram Ludaescher, Richard Marciano,
Lecture 14 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
1 COS 425: Database and Information Management Systems XML and information exchange.
1 An introduction to design patterns Based on material produced by John Vlissides and Douglas C. Schmidt.
Health Level Seven Experience Report Paul V. Biron Kaiser Permanente W3C XML Schema User Experience Workshop, Jun 21-22, 2005.
4/20/2017.
17 Apr 2002 XML Schema Andy Clark. What is it? A grammar definition language – Like DTDs but better Uses XML syntax – Defined by W3C Primary features.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
January, 23, 2006 Ilkay Altintas
Welcome to CPSC 534B: Web Data Integration & Management Laks V.S. Lakshmanan Rm. CICSR Main Mall.
Modeling Interactive Web Sources for Information Mediation Information Mediation Framework/Motivation Modeling Interactive Sources with Interaction Diagrams.
An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002.
Lecture 15 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
XML – what is it? eXtensible Markup Language Standard for publishing and interchange on the web and over the wire simpler version of SGML adapted to internet.
Towards Bootstrapping Knowledge- Based Archives* Bertram Ludäscher Richard Marciano Reagan Moore San Diego Supercomputer Center
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Dr. Azeddine Chikh IS446: Internet Software Development.
Neminath Simmachandran
1 XML-KSI, 2004 XML- : an extendible framework for manipulating XML data Jaroslav Pokorny Charles University Praha.
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
Lecture 22 XML querying. 2 Example 31.5 – XQuery FLWOR Expressions ‘=’ operator is a general comparison operator. XQuery also defines value comparison.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
XSDL & Relax : 2 new schema languages for XML Rajasekar Krishnamurthy.
An OO schema language for XML SOX W3C Note 30 July 1999.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
Tutorial 13 Validating Documents with Schemas
Knowledge-Based Integration of Neuroscience Data Sources Amarnath Gupta Bertram Ludäscher Maryann Martone University of California San Diego.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
® A Proposed UML Profile For EXPRESS David Price Seattle ISO STEP Meeting October 2004.
Semantic Mediation and Scientific Workflows Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center University of California, San Diego.
XML Validation II Advanced DTDs + Schemas Robin Burke ECT 360.
XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name value pair;
An Extensible Model-Based Mediator System with Domain Maps Amarnath Gupta * Bertram Ludäscher * Maryann E. Martone + * San Diego Supercomputer Center (SDSC)
UML Profile BY RAEF MOUSHEIMISH. Background Model is a description of system or part of a system using well- defined language. Model is a description.
Lecture 23 XQuery 1.0 and XPath 2.0 Data Model. 2 Example 31.7 – User-Defined Function Function to return staff at a given branch. DEFINE FUNCTION staffAtBranch($bNo)
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
XML Validation III Schemas + RELAX NG Robin Burke ECT 360.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
Dublin, 22/ Link Model Ontology Mathias Kadolsky.
XML: Extensible Markup Language
Jun Tatemura NEC Laboratories Amercia GGF10, March 2004
XML path expressions CSE 350 Fall 2003.
Semi-Structured data (XML Data MODEL)
Query Optimization.
Semi-Structured data (XML)
Presentation transcript:

San Diego Supercomputer Center XMLDM'02, Prague 1 Time to Leave the Trees: From Syntactic to Conceptual Querying of XML Bertram Ludäscher Ilkay Altintas Amarnath Gupta San Diego Supercomputer Center U.C. San Diego Bertram Ludäscher Ilkay Altintas Amarnath Gupta San Diego Supercomputer Center U.C. San Diego

San Diego Supercomputer Center XMLDM'02, Prague 2 OverviewOverview Motivating Example:Motivating Example: –querying XML w/o and w/ conceptual-level information –“syntactic” vs. “conceptual” querying of XML Distilling conceptual-level information:Distilling conceptual-level information: –MXS (abstract Model for XML Schema) XPathT:XPathT: –Incorporating conceptual-level information in XPath

San Diego Supercomputer Center XMLDM'02, Prague 3 Motivating Example Example: “Books DB” (yes, more complex examples exist... ;)Example: “Books DB” (yes, more complex examples exist... ;) –elements: Sample Queries:Sample Queries: –Q1: Which s have a below $80? –Q2: What’s the count and average of s? (Nice) Try:(Nice) Try: –Q1: myDB//book[price<80] –Q2: N := count(myDB//book); S := sum(myDB//book/price); Avg := S/N; But what about...But what about... –... s with multiple s? –... (award-winning-exemplars) elements (= subtype of book having subelement ): we forgot those!

San Diego Supercomputer Center XMLDM'02, Prague 4 Schema Information to the Rescue! XML & Semistructured Data Model:XML & Semistructured Data Model: –labeled ordered trees –“instance contains its own schema information” –XML instances and DTDs have very little “schema info”: tag names (aka element “types”) = attribute names element nesting = object (“slot”) structure  no data types, constraints, classes, class hierarchy,... Schemas are Good for You!Schemas are Good for You! –link to conceptual models/DB design, query formulation, –validation, storage layout (optimization), –query processing (optimization),...  XML Schema

San Diego Supercomputer Center XMLDM'02, Prague 5 Motivating Example (Cont’d) Q1 after studying and/or its XML Schema:Q1 after studying and/or its XML Schema:  there is a type hierarchy below type bookT  tag names are bound to those types  but XPath doesn’t know this => use Syntactic Queries: //*[book OR tbook OR cbook OR...OR awe] [price<80]  tedious and error-prone (do-it-yourself: Appendix A) –e.g. you overlooked ! (usually schema info not contained in the XML instance)  small changes in the schema (adding a new subtype) require rewriting of your query...

San Diego Supercomputer Center XMLDM'02, Prague 6 From Syntactic to Conceptual XML Queries 1. Distill conceptual information from the XML Schema  Abstract Model of XML Schema (MXS) 2. Incorporate MXS information into the query language  XPathT (“XPath with types/classes”)  turn Syntactic XML Query //*[book OR tbook OR cbook OR... OR awe] [price<80]  into a more adequate Conceptual XML Query: //*[ts(bookT)][price<80] /* works for any subtype of bookT */  more robust w.r.t. schema changes  new opportunities for semantic query optimization

San Diego Supercomputer Center XMLDM'02, Prague 7 Abstract Model of XML Schema (MXS) Basic Ideas:Basic Ideas: –Formal abstract model (never mind the XML Schema syntax!), inspired by Model Schema Language (MXL) [Brown-Fuchs-Robie-Wadler-WWW ] –“Types as Classes” XML Schema Names:XML Schema Names: –T: Type Names –E: Element Names –A: Attribute Names XML Instances...XML Instances... –... usually contain only element names (tags) E and attributes A ( exception: “xsd:type =...” )

San Diego Supercomputer Center XMLDM'02, Prague 8 Abstract Model of XML Schema (MXS) MXS NamesMXS Names –T: Types, E: Elements, A: Attributes Kinds of TypesKinds of Types –simple vs. complex: T_s, T_c –abstract vs. concrete: T_a, T_na Type HierarchyType Hierarchy –restrict  (T_s  T_s)  (T_c  T_c) restricts possible instances, keeping structure –extend  (T_s  T_c)  T_c adds “slots” (elements and attributes) –subtype = extend  restrict extend and restrict are subtyping mechanisms

San Diego Supercomputer Center XMLDM'02, Prague 9 Type (Class) Hierarchy in XML Schema Convention: user-defined type names end with “T”Convention: user-defined type names end with “T” –authorT, publicationT, bookT,...

San Diego Supercomputer Center XMLDM'02, Prague 10 Inheritance in XML Schema (I) expTextBookT ::= SUBTYPE (bookT) that RESTRICTs to expPriceT and EXTENDs with expTextBookT ::= SUBTYPE (bookT) that RESTRICTs to expPriceT and EXTENDs with EXTEND RESTRICT SUBTYPE

San Diego Supercomputer Center XMLDM'02, Prague 11 Inheritance in XML Schema (II) 19 th centuryTextBookType ::= SUBTYPE {textBookT, c19bookT} multipleinheritance singleinheritance XML Schema type system does not known the two are equivalent!

San Diego Supercomputer Center XMLDM'02, Prague 12 Framework for Conceptual Queries in XML Binding Types to ElementsBinding Types to Elements –bind  (E  (T_s  T_c ))  (A  T_s) binds element names to simple or complex types binds attribute names to simple types Syntactic XML Instance: DSyntactic XML Instance: D –root(NodeId), child(NodeId,Integer,NodeId), tag(NodeId,Tagname), data(NodeId,Data) Conceptual XML Instance: D+Conceptual XML Instance: D+ –restrict(T, T), extend(T, T), subtype(T, T), –bind(E  T, T) –...

San Diego Supercomputer Center XMLDM'02, Prague 13 XPathT: Incorporating Type (Class) Information in XPath XPath patterns p and qualifiers q: p[q] returns matches of p which qualify according to qXPath patterns p and qualifiers q: p[q] returns matches of p which qualify according to q New XPathT patterns:New XPathT patterns: r(t), e(t), s(t): restrict, extend, subtype type tr(t), e(t), s(t): restrict, extend, subtype type t tr(t), te(t), ts(t): transitive versionstr(t), te(t), ts(t): transitive versions

San Diego Supercomputer Center XMLDM'02, Prague 14 Semantics of XPathT Example:Example: “transitive subtype”: SEM( ts(t) ) := { t’ | subtype*(t,t’) } from types to element names: SEM( [T] ) := { e | bind(t,e), t  T } SEM( [ts(bookT)] ) := {book,ebook,tbook,...}

San Diego Supercomputer Center XMLDM'02, Prague 15 Conceptual(-level) XML Queries in XPathT Which books have price below $80?Which books have price below $80? //*[ts(bookT)][price<80] Semantic-aware equivalent rewriting:Semantic-aware equivalent rewriting: //*[ts(bookT)][NOT ts(expTextBookT)][price<80] Logic XPathT Query Plan:Logic XPathT Query Plan: tree structure information conceptual information

San Diego Supercomputer Center XMLDM'02, Prague 16 SummarySummary Complex domains require conceptual level modeling and querying capabilities beyond just tree structureComplex domains require conceptual level modeling and querying capabilities beyond just tree structure Statues Quo: XML Schema: simple “conceptual model” with may ad-hoc “design decisions”/restrictionsStatues Quo: XML Schema: simple “conceptual model” with may ad-hoc “design decisions”/restrictions  Abstract Model of XML Schema (MXS)  XPathT: first step towards “conceptual” or “semantic” XML query language extensions  more concise, intuitive, flexible, and robust queries  the system maps conceptual to syntactic queries, not the programmer/query designer!

San Diego Supercomputer Center XMLDM'02, Prague 17 Next Steps & Outlook extend MXS to include more conceptual informationextend MXS to include more conceptual information develop formal semanticsdevelop formal semantics –XPathT, extensions: XPathC, XQueryC research problems:research problems: –mapping: XPathC queries => equivalent XPath queries –formalize equivalence, always possible? Then, conventional XML query processors can be used! –“proxy XML Schema doc”: instead of rewriting into XPath over the original instance, can one materialize some conceptual info as a “proxy XML doc” such that conceptual queries become conventional queries against the proxy... –semantic query optimization: equivalent rewritings given the conceptual level constraints