About XML/Xquery/RDF.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML May 3 rd, XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
IS432: Semi-Structured Data Dr. Azeddine Chikh. 7. XQuery.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
QSX (LN 3)1 Query Languages for XML XPath XQuery XSLT (not being covered today!) (Slides courtesy Wenfei Fan, Univ Edinburgh and Bell Labs)
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
XQuery: 1 W3C (World Wide Web Consortium) What is W3C? –An industry consortium, best known for standardizing HTML and XML. –Working Groups create or adopt.
1 COS 425: Database and Information Management Systems XML and information exchange.
Query Languages - XQuery Slides partially from Dan Suciu.
XML May 1 st, XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick” persons.
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
About XML/Xquery/RDF 4/1. TEXT Structured (relational) Data XML Less Structure More Structure.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
Fundamentals, Design, and Implementation, 9/e Text and XML databases Instructor: Dragomir R. Radev Winter 2005.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
About XML/Xquery/RDF 4/1. Why XML XML is the confluence of several factors: –The Web needed a more declarative format for data, trying to describe the.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
Advisor: Prof. Zaniolo Hung-chih Yang Ling-Jyh Chen XML Query Language.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Introduction to XQuery Resources: Official URL: Short intros:
XML-QL A Query Language for XML Charuta Nakhe
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
NaLIX Natural Language Interface for querying XML Huahai Yang Department of Information Studies Joint work with Yunyao Li and H.V. Jagadish at University.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
XML May 6th, Instructor AnHai Doan Brief bio –high school in Vietnam & undergrad in Hungary –M.S. at Wisconsin –Ph.D. at Washington under Alon &
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Lecture 14: Relational Algebra Projects XML?
XML: Extensible Markup Language
XML path expressions CSE 350 Fall 2003.
XML QUESTIONS AND ANSWERS
Management of XML and Semistructured Data
Management of XML and Semistructured Data
Database Application Design
Slides adapted from Rao (ASU) & Franklin (Berkeley)
Slides adapted from Rao (ASU) & Franklin (Berkeley)
Lecture 11 XML Wednesday, Oct. 24, 2001.
eXtensible Markup Language (XML)
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Lecture 9: XML Monday, October 17, 2005.
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 8: XML Data Wednesday, October
CSE591: Data Mining by H. Liu
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Introduction to Database Systems CSE 444 Lecture 10 XML
Lecture 15: Querying XML Friday, October 27, 2000.
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

about XML/Xquery/RDF

HTML vs. XML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999 <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> -Schema info part of the data “Self-describing” -Good for data exchange (albeit baroque for storage)

Why are Database folks so excited about XML? XML is just a syntax for (self-describing) data This is still exciting because No standard syntax for relational data With XML, we can Translate any legacy data to XML Can exchange data in XML format Ship over the web, input to any application

XML  machine accessible meaning Jim Hendler XML  machine accessible meaning This is what a web-page in natural language looks like for a machine

XML  machine accessible meaning Jim Hendler XML allows “meaningful tags” to be added to parts of the text < > < > < > < > < > CV name education work private

XML  machine accessible meaning Jim Hendler But to your machine, the tags look like this…. < CV > < name > <education> <work> <private> < > < > < > < > < > CV name education work private

XML  machine accessible meaning Jim Hendler Schemas help…. < CV > …by relating common terms between documents private

But other people use other schemas Jim Hendler Someone else has one like this…. < > < > < > < > < > < CV > name> <educ> <> <> CV name education work private

But other people use other schemas Jim Hendler < CV > private …which don’t fit in Moral: There is still need for ontology mapping..

11/18

The X-standards… XML: an on-the-wire representation for data Xquery: a query language for XML Xschema: a schema description language for XML data RDF: a language for meta-data description WSDL/SOAP/UDDI: languages for describing services

XML Terminology tags: book, title, author, … start tag: <book>, end tag: </book> elements: <book>…<book>,<author>…</author> elements are nested empty element: <red></red> abbrv. <red/> an XML document: single root element well formed XML document: if it has matching tags

HTML describes presentation <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999 <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> HTML describes presentation XML describes content

XML Terminology tags: book, title, author, … start tag: <book>, end tag: </book> elements: <book>…<book>,<author>…</author> elements are nested empty element: <red></red> abbrv. <red/> an XML document: single root element well formed XML document: if it has matching tags

More XML: Attributes <book price = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book> Attributes are single-valued --No guidance on when to use them

More XML: Oids and References Object identifiers More XML: Oids and References <person id=“o555”> <name> Jane </name> </person> <person id=“o456”> <name> Mary </name> <children idref=“o123 o555”/> </person> <person id=“o123” mother=“o456”><name>John</name> oids and references in XML are just syntax

XML vs. Relational Data TEXT XML XML is meant as a language that supports both Text and Structured Data Conflicting demands... XML supports semi-structured data In essence, the schema can be union of multiple schemas Easy to represent books with or without prices, books with any number of authors etc. XML supports free mixing of text and data using the #PCDATA type XML is ordered (while relational data is unordered) TEXT Structured (relational) Data XML Less Structure More

DTDs Notice that DTD is not In XML syntax…  <!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)> ]> Semi- structured <paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section> </paper>

XML Schemas More recent proposal (with XML syntax) unifies previous schema proposals generalizes DTDs uses XML syntax two documents: structure and datatypes http://www.w3.org/TR/xmlschema-1 http://www.w3.org/TR/xmlschema-2

RDF: Meta-data Standard for Web <rdf:Description about=“www.mypage.com”> <about> birds, butterflies, snakes </about> <author> <rdf:Description> <firstname> John </firstname> <lastname> Smith </lastname> </rdf:Description> </author> Good’ol semantic networks..?

Querying XML Requirements: Candidates Need to handle lack of schema. We may not know much about the data, so we need to navigate the XML. Need to support both “information retrieval” and “SQL-style” queries. Ordered vs. un-ordered XML “Human readable” like SQL?  Candidates Many… based on conflicting requirements XSL: Makes IR folks happy XML-QL: Makes DB folks happy Xquery : W3C’s attempt to make everybody (un)happy

Agenda: Xquery examples Information Integration 11/20 Agenda: Xquery examples Information Integration

Xquery Resources XQuery 1.0: An XML Query Language W3C Working Draft 20 December 2001 XML Query Use Cases Microsoft .Net Xquery Language Demo http://131.107.228.20/ Supports querying on the documents described in the W3C Use Cases Xquery Tutorial by Fankhauser & Wadler www.research.avayalabs.com/user/wadler/papers/xquery-tutorial/ xquery-tutorial.pdf

FLoWeR Expressions Xquery queries are made up of FLWR expressions that work on “paths” For binds variables to nodes Let computes aggregates Where applies a formula to find matching elements Return constructs the output elements Path expressions are of the form: element//element/element[attrib=value]

Comparison to SQL Look at the use case description on Xquery manual Supports all (?) SQL style queries (with different syntax of course) [default queries in the demo] Has support for “construction”—outputting the answers in arbitrary XML formats (use case XMP ) “path expressions” --- navigating the XML tree (use case seq) Simple text queries [use case text] Allows queries on “Tag” elements Removes the “data/meta-data” barrier in queries For each book that has at least one author, list the title and first two authors, and an empty "et-al" element if the book has additional authors. [XMP use case 6]

DTD for http://www.bn.com/bib.xml <!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED > <!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )> <!ELEMENT price (#PCDATA )>

Example Query Query Result <bib> { for $b in /bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return <book year={ $b/@year }> { $b/title } </book> } </bib> “For all books after 1991, return with Year changed from a tag to an attribute” <bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </bib>

Example Query (2) Return the books that cost more at amazon than fatbrain Let $amazon := document(http://www.amazon.com/books.xml), Let $fatbrain := document(http://www.fatbrain.com/books.xml) For $am in $amazon/books/book, $fat in $fatbrain/books/book Where $am/isbn = $fat/isbn and $am/price > $fat/price Return <book>{ $am/title, $am/price, $fat/price }<book> Join

XML frenzy in the DB Community Now that XML is there, what can we do with it? Convert all databases from Relational to XML? Or provide XML views of relational databases? Develop theory of native XML databases? Or assume that XML data will be stored in relational databases.. Issues: What sort of storage mechanisms? What sort of indices?

XML middleware for Databases XML adapters (middle-ware) received significant attention in DB community SilkRoute (AT&T) Xperanto (IBM) Issues: Need to convert relational data into XML Tagging (easy) Need to convert Xquery queries into equivalent SQL queries Trickier as Xquery supports schema querying

Craig Knoblock University of Southern California Xquery Tutorial Craig Knoblock University of Southern California

References XQuery 1.0: An XML Query Language XML Query Use Cases W3C Working Draft 20 December 2001 XML Query Use Cases Microsoft .Net Xquery Language Demo http://131.107.228.20/ Supports querying on the documents described in the W3C Use Cases Xquery Tutorial by Fankhauser & Wadler www.research.avayalabs.com/user/wadler/papers/xquery-tutorial/ xquery-tutorial.pdf

DTD for http://www.bn.com/bib.xml <!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED > <!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )> <!ELEMENT price (#PCDATA )>

Data for www.bn.com/bib.xml <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price> 65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <price>65.95</price>

Data for www.bn.com/bib.xml (cont.) <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price> </book> <book year="1999"> <title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </bib>

Document References Document can either be referenced explicitly or in the default namespace In the Microsoft Demo /Bib = document("http://www.bn.com/bib.xml")/bib We will use /bib throughout, but you must use the expansion to run the demo In Theseus the document for xquery is passed as input

Projection Return the names of all authors of books /bib/book/author = <author><last>Stevens</last><first>W.</first></author> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author>

Project (cont.) /bib/book/author = The same query can also be written as a for loop /bib/book/author = for $bk in /bib/book return for $aut in $bk/author return $aut <author><last>Stevens</last><first>W.</first></author> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author>

Selection Return the titles of all books published before 1997 /bib/book[@year < "1997"]/title = <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title>

Selection (cont.) Return the titles of all books published before 1997 /bib/book[@year < "1997"]/title = for $bk in /bib/book where $bk/@year < "1997" return $bk/title <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title>

Selection (cont.) Return book with the title “Data on the Web” /bib/book[title = "Data on the Web"] = <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price> </book>

Selection (cont.) Return the price of the book “Data on the Web” /bib/book[title = "Data on the Web"]/price = <price> 39.95</price> How would you return the book with a price of $39.95?

Selection (cont.) Return the book with a price of $39.95 for $bk in /bib/book where $bk/price = " 39.95" return $bk = <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price> 39.95</price> </book>

Construction Return year and title of all books published before 1997 for $bk in /bib/book where $bk/@year < "1997" return <book>{ $bk/@year, $bk/title }</book> = <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title>

Grouping Return titles for each author for $author in distinct(/bib/book/author/last) return <author name={ $author/text() }> { /bib/book[author/last = $author]/title } </author> = <author name="Stevens"> <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title> <author name="Abiteboul"> <title>Data on the Web</title> …

Join Return the books that cost more at amazon than fatbrain Let $amazon := document(http://www.amazon.com/books.xml), Let $fatbrain := document(http://www.fatbrain.com/books.xml) For $am in $amazon/books/book, $fat in $fatbrain/books/book Where $am/isbn = $fat/isbn and $am/price > $fat/price Return <book>{ $am/title, $am/price, $fat/price }<book>

Example Query 1 <bib> { for $b in /bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return <book year={ $b/@year }> { $b/title } </book> } </bib> What does this do?

Result Query 1 <bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </bib>

Example Query 2 <results> { for $b in document("http://www.bn.com/bib.xml")/bib/book, $t in $b/title, $a in $b/author return <result> { $t } { $a } </result> } </results>

Result Query 2 <results> <result><title>TCP/IP Illustrated</title> <last>Stevens </last> </result> <result><title>Advanced Programming in the Unix environment</title> <last>Stevens</last> <result><title>Data on the Web</title> <last>Abiteboul</last> <result> <title>Data on the Web</title> <last>Buneman</last> <last>Suciu</last> </results>

Example Query 3 <books-with-prices> { for $b in document("http://www.bn.com/bib.xml")//book, $a in document("http://www.amazon.com/reviews.xml")//entry where $b/title = $a/title return <book-with-prices> { $b/title } <price-amazon>{ $a/price/text() }</price-amazon> <price-bn>{ $b/price/text() }</price-bn> </book-with-prices> } </books-with-prices>

Result Query 3 <books-with-prices> <book-with-prices> <title>TCP/IP Illustrated</title> <price-amazon>65.95</price-amazon> <price-bn> 65.95</price-bn> </book-with-prices> <title>Advanced Programming in the Unix environment</title> <price-bn>65.95</price-bn> <title>Data on the Web </title> <price-amazon>34.95</price-amazon> <price-bn> 39.95</price-bn> </books-with-prices>

Example Query 4 <bib> { for $b in document("www.bn.com/bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year > "1991" return <book> { $b/@year } { $b/title } </book> sortby (title) } </bib>

Example Result 4 <bib> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </book> <book year="1994"> <title>TCP/IP Illustrated</title> </bib>

Impact of XML on Integration If and when all sources accept Xqueries and exchange data in XML format, then Mediator can accept user queries in Xquery Access sources using Xquery Get data back in XML format Merge results and send to user in XML format How about now? Sources can use XML adapters (middle-ware)

Is XML standardization a magical solution for Integration? If all WEB sources standardize into XML format Source access (wrapper generation issues) become easier to manage BUT all other problems remain Still need to relate source (XML)schemas to mediator (XML)schema Still need to reason about source overlap, source access limitations etc. Still need to manage execution in the presence of source/network uncertainities

“Semantic Web” The LAV/GAV approaches assume that some human expert will do the actual schema mapping The “semantic-web” initiative attempts to automate schema mapping Idea: Allow pages to write logical axioms relating their vocabulary (tags) to other external tags Support automatic inference of relations between source and mediator schema using these rules DAML+OIL