Managing XML and Semistructured Data

Slides:



Advertisements
Similar presentations
1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper.
Advertisements

Spring Part III: Introduction to XPath XML Path Language.
1 Web Data Management Path Expressions. 2 In this lecture Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul,
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
&o1 &o12&o24&o29 &o43 &o96 &o243 &o206 &o25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” paper book paper references author title year http author.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
Friday, September 4 th, 2009 The Systems Group at ETH Zurich XML and Databases Exercise Session 6 courtesy of Ghislain Fourny/ETH © Department of Computer.
Managing XML and Semistructured Data Lecture 8: Query Languages - XML-QL Prof. Dan Suciu Spring 2001.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
Managing XML and Semistructured Data Lecture : Indexes.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.
Managing XML and Semistructured Data Lecture 16: Indexes Prof. Dan Suciu Spring 2001.
Managing XML and Semistructured Data
Managing XML and Semistructured Data Lecture 14: Constraints and Keys Prof. Dan Suciu Spring 2001.
Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Introduction to XQuery Resources: Official URL: Short intros:
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
S EMISTRUCTURED D ATA AND XML D ATA F ILES ON THE W EB HTML documents often generated by applications consumed by humans only easy access: across.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Managing XML and Semistructured Data Lecture 13: XDuce and Regular Tree Languages Prof. Dan Suciu Spring 2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
Part One XML and Databases Soumen Chakrabarti CSE, IIT Bombay.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
1 CSE 326: Data Structures: Graphs Lecture 24: Friday, March 7 th, 2003.
Title Page The title page is the first page in the book. It tells you the title of the book, the author and the illustrator. It also tells you who published.
1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
Dan SuciuXML Toolkit1 XMLTK: An XML Toolkit for Scalable XML Stream Processing I. Avila-Campillo, T.J. Green, A. Gupta, M. Onizuka, D. Raven, D. Suciu.
S EMISTRUCTURED D ATA AND XML D ISCUSSION Q UESTION Think about your personal Itunes library. Should it be maintained in a database system?
Lecture 14: Relational Algebra Projects XML?
XML path expressions CSE 350 Fall 2003.
Managing XML and Semistructured Data
Management of XML and Semistructured Data
Lecture 11: Xpath/XQuery
Management of XML and Semistructured Data
Managing XML and Semistructured Data
Lecture 16: Probabilistic Databases
Managing XML and Semistructured Data
Lecture 12: XML, XPath, XQuery
Lecture 10: Query Complexity
Lecture 9: XML Monday, October 17, 2005.
Wednesday, May 29, 2002 XML Storage Final Review
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Introduction to Database Systems CSE 444 Lecture 10 XML
Lecture 15: Querying XML Friday, October 27, 2000.
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

Managing XML and Semistructured Data Lecture 4: Path Expressions Prof. Dan Suciu Spring 2001

In this lecture Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1

Path Expressions Examples: Bib.paper Bib.book.publisher Bib.paper.author.lastname Given an OEM instance, the answer of a path expression p is a set of objects

Path Expressions Examples: DB = Bib.paper={&o12,&o29} &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” 122 133 paper book references author title year http publisher page firstname lastname first last Bib &o44 &o45 &o46 &o47 &o48 &o49 &o50 &o51 &o52 Examples: DB = Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}

Answer of a Path Expression Simple evaluation algorithms for Answer(P,DB): Runs in PTIME in size(P), size(db): PTIME complexity Answer(P, DB) = f(P, root(DB)) Where: f(e, x) = {x} f(L.P, x) = {f(P,y) | (x,L,y) edges(DB)}

Regular Path Expressions R ::= label | _ | R.R | (R|R) | R* | R+ | R? Examples: Bib.(paper|book).author Bib.book.author.lastname? Bib.book.(references)*.author Bib.(_)*.zip

Applications of Regular Path Expressions Navigating uncertain structure: Bib.book.author.lastname? Syntactic substitute for inheritance: Bib.(paper|book).author Better: Bib.publication.author, but we don’t have inheritance

Applications of Regular Path Expressions Computing transitive closure: Bib.(_)*.zip = everything accessible Bib.book.(references)*.author = everything accessible via references Some regular expressions of doubtful practical use: (references.references)* = a path with an even number of references (_._)* = paths of even length (_._._.(_)?)* = paths of length (3m + 4n) for some m,n But make great examples for illustration 

Answer of a Regular Path Expression Recall: Lang(R) = the set of words P generated by R Answer of regular path expressions: Answer(R,DB) = {Answer(P,DB) | P  Lang(R)} Need an evaluation algorithm that copes with cycles

Regular Path Expressions Recall: each regular expression  NDFA Example: R = (a.a)*.a.b A = a states(A) = {s1,s2,s3,s4} initial(A) = s1 terminal(A) = {s4} s1 s2 a a b s3 s4

Regular Path Expressions Canonical Evaluation Algorithm Answer(R,DB): construct A from R construct product automaton G = A x DB: nodes(G) = states(A) x nodes(db) edges(G) = {((s,x),L,(s’,x’) | (s,L,s’)  edges(A), (x,L,x’)  edges(DB)} root(G) = (initial(A), root(DB)) compute Gacc = set of nodes accessible from root(G) return {x | s  terminal(A) s.t. (s,x)  Gacc}

Regular Path Expressions Example: R = _.(_._)*.a A = DB = s1 s2 s3 _ a &o1 &o2 &o3 &o4 a b Answer of R on DB = { &o2, &o3}

Compute Product Automaton G _ a _ s1,&o1 s2,&o1 s3,&o1 a a a s1,&o2 s2,&o2 s3,&o2 a a a a a a s1,&o3 s1,&o4 s2,&o3 s2,&o4 s3,&o3 s3,&o4 b b b

Compute Accessible Part Gacc _ a _ s1,&o1 s2,&o1 s3,&o1 a a a s1,&o2 s2,&o2 s3,&o2 a a a a a a s1,&o3 s1,&o4 s2,&o3 s2,&o4 s3,&o3 s3,&o4 b b b Answer(R,DB) = {&o2, &o3}

Complexity of Regular Path Expressions The evaluation algorithm runs in PTIME in size(R), size(DB) Even when there are cycles in DB