Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fall 2001 CSE3301 Query Languages for XML. Fall 2001 CSE3302 Why a query language? Extracting, Restructuring, Integration, Browsing… XML-QL

Similar presentations


Presentation on theme: "Fall 2001 CSE3301 Query Languages for XML. Fall 2001 CSE3302 Why a query language? Extracting, Restructuring, Integration, Browsing… XML-QL"— Presentation transcript:

1 Fall 2001 CSE3301 Query Languages for XML

2 Fall 2001 CSE3302 Why a query language? Extracting, Restructuring, Integration, Browsing… XML-QL http://www.w3.org/TR/NOTE-xml-ql http://db.cis.upenn.edu/XML-QL/ XPATH (part of a query language) http: www.w3.org/TR/xpath XSLT http://www.w3.org/TR/xslt http://www.mulberrytech.com/quickref/XSLTquickref.pdf QUILT http://www.almaden.ibm.com/cs/people/chamberlin/quilt.html http://db.cis.upenn.edu/Kweelt/

3 Fall 2001 CSE3303 XML-QL (XML Query Language) W3C proposal, August 1998 authors: –Mary FernandezAT&T –Dana FlorescuINRIA –Alon LevyUniv. of Washington –Dan SuciuAT&T –Alin DeutschUniv. of Pennsylvania

4 Fall 2001 CSE3304 Address Book Revisited Caesar Caesar Imperator The Capitol Rome, OH 98765 (321) 786 2543 jc@forum.rome.org

5 Fall 2001 CSE3305 XML-QL: Pattern Matching Find Caesar’s e-mail address: where Caesar $e in “http://db.cis.upenn.edu/~peter/address.xml” construct $e jc@forum.rome.org Data Extraction

6 Fall 2001 CSE3306 XML-QL: Constructing New XML Data Whom can we contact electronically? where $g $e in “http://...” construct $g $e Caesar Imperator jc@forum.rome.org Brutus mb@philippi.com... Data Restructuring

7 Fall 2001 CSE3307 XML-QL: Joins Who of our contacts was involved in a movie? where $g $e in “http://…address.xml” $t $g in “http://www.imdb.com” construct $g $t $e

8 Fall 2001 CSE3308 XML-QL: Joins (cont’d) Caesar Imperator jc@forum.rome.org Asterix and Cleopatra Dr. Strangelove strangelov@love.the.bomb Dr. Strangelove or How I Stopped...... Data Integration

9 Fall 2001 CSE3309 XML-QL Data Model Directed, labeled graph Tags represented as edge labels Sets of attribute name-value pairs as node labels Two models: ordered and unordered

10 Fall 2001 CSE33010 XML-QL Data Model (cont’d) Caesar Caesar Imperator The Capitol Rome, OH 98765 (321) 786 2543 jc@forum.rome.org person nametelfaxtelemailgreet addr SSN=“111-…” Caesar addrBook Caesar Imperator The CapitolRome, OH (321) 786 2543

11 Fall 2001 CSE33011 XML-QL Semantics: Variable Bindings person nametelfaxtelemailgreet addr SSN=“111-…” Caesar addrBook Caesar Imperator The CapitolRome, OH (321) 786 2543 nametelfaxtelemailgreet addr SSN=“111-…” Stragelove Dr. Strangelove The CapitolWashington, DC person strangelov@ jc@forum.rome where $n $e $n $e Caesar jc@forum.rome.org Strangelove strangelov@love.the.bomb

12 Fall 2001 CSE33012 XML-QL Semantics: XML Output $n $e Caesar jc@forum.rome.org Strangelove strangelov@love.the.bomb construct $n $e XML e-contact whowherewhowhere Caesarjc@forum.rome.orgStrangelovestrangelov@love.the.bomb

13 Fall 2001 CSE33013 Advanced XML-QL Find tags of person subelements: where in “http://db.cis.upenn.edu/~peter/address.xml” construct $tag Find all email addresses and fax numbers : where $eORf in “http://db.cis.upenn.edu/~peter/address.xml” construct $eORf Schema browsing

14 Fall 2001 CSE33014 More Advanced XML-QL Find attributes of person elements: where in “http://db.cis.upenn.edu/~peter/address.xml” construct $attrName $attrVal Schema browsing

15 Fall 2001 CSE33015 XPath Reasonably widely adopted -- in XML-Schema and query languages. Neither more expressive nor less expressive than regular path expressions (can’t do (ab)* ) Primary goal = to permit to access some nodes from a given document XPath main construct : axis navigation An XPath path consists of one or more navigation steps, separated by / A navigation step is a triplet: axis + node-test + list of predicates Examples –/descendant::node()/child::author –/descendant::node()/child::author[parent/attribute::booktitle = “XML”][2] XPath also offers some shortcuts –no axis means child –//  / descendant-or-self::node()/

16 Fall 2001 CSE33016 XPath- child axis navigation author is shorthand for child::author. Examples: –aaa -- all the child nodes labeled aaa (1,3) –aaa/bbb -- all the bbb grandchildren of aaa children (4) –*/bbb all the bbb grandchildren of any child (4,6) –. -- the context node –/ -- the root node aaa bbb cccaaa bbb ccc 1 23 4 567 context node

17 Fall 2001 CSE33017 XPath- child axis navigation (cont) –/doc -- all the doc children of the root –./aaa -- all the aaa children of the context node (equivalent to aaa) –text() -- all the text children of the context node –node() -- all the children of the context node (includes text and attribute nodes) –.. -- parent of the context node –.// -- the context node and all its descendants –// -- the root node and all its descendants –//para -- all the para nodes in the document –//text() -- all the text nodes in the document –@font the font attribute node of the context node

18 Fall 2001 CSE33018 Predicates –[2] -- the second child node of the context node –chapter[5] -- the fifth chapter child of the context node –[last()] -- the last child node of the context node –chapter[title=“introduction”] -- the chapter children of the context node that have one or more title children whose string-value is “introduction” (the string-value is the concatenation of all the text on descendant text nodes) –person[.//firstname = “joe”] -- the person children of the context node that have in their descendants a firstname element with string-value “ Joe ”

19 Fall 2001 CSE33019 Unions of Path Expressions employee | consultant -- the union of the employee and consultant nodes that are children of the context node For some reason person/(employee|consultant) --as in regular path expressions -- is not allowed However person/node()[boolean(employee|consultant)] is allowed!! From the XPATH specification: –The boolean function converts its argument to a boolean as follows: a number is true if and only if it is neither positive or negative zero nor NaN a node-set is true if and only if it is non-empty a string is true if and only if its length is non-zero an object of a type other than the four basic types is converted to a boolean in a way that is dependent on that type

20 Fall 2001 CSE33020 Axis navigation So far, nearly all our expressions have moved us down the by moving to child nodes. Exceptions were –. -- stay where you are –/ go to the root –// all descendants of the root –.// all descendants of the context node All other expressions have been abbreviations for child::… e.g. child::para. child :is an example of an axis XPath has several axes: ancestor, ancestor-or-self, attribute, child, descendant, descendant-or-self, following, following- sibling, namespace, parent, preceding, preceding-sibling, self –Some of these ( self, parent ) describe single nodes, others describe sequences of nodes.

21 Fall 2001 XPath Navigation Axes (merci, Arnaud Sahuguet ) ancestor descendant followingpreceding following-siblingpreceding-sibling child attribute namespace self

22 Fall 2001 XPath abbreviated syntax (nothing)child:: @attribute:: ///descendant-or-self::node().self::node().//descendant-or-self::node..parent::node() /(document root)

23 Fall 2001 CSE33023 Quilt proposed by Chamberlin, Robbie and Florescu (from the authors’ slides) Leverage the most effective features of several existing and proposed query languages Design a small, clean, implementable language Cover the functionality required by all the XML Query use cases in a single language Write queries that fit on a slide Design a quilt, not a camel

24 Fall 2001 CSE33024 Quilt = XPath + “comprehension” syntax XML -QL Quilt where in in … construct bind variables use variables for x in y in … where return bind variables use variables

25 Fall 2001 CSE33025 Examples of Quilt (from http://db.cis.upenn.edu/Kweelt/useCases/R/Q1.qlt ) Relational data -- two DTDs: <!DOCTYPE items [ <!ELEMENT item_tuple (itemno, description, offered_by, start_date?, end_date?, reserve_price? )> ]> <!DOCTYPE bids [ ]>

26 Fall 2001 CSE33026 The data 1001 Red Bicycle U01 1999-01-05 1999-01-20 40 1002 Motorcycle U02 1999-02-11 1999-03-15 500 … U02 1001 35 99-01-07 U04 1001 40 99-01-08 …

27 Fall 2001 CSE33027 Query 1 FUNCTION date() { "1999-02-01" } ( FOR $i IN document("items.xml")//item_tuple WHERE $i/start_date LEQ date() AND $i/end_date GEQ date() AND contains($i/description, "Bicycle") RETURN $i/itemno, $i/description SORTBY (itemno) ) XPath expressions in orange simple function definitions dates are formatted so that lexicographic ordering gives the right result

28 Fall 2001 CSE33028 Output from Q1 1003 Old Bicycle 1007 Racing Bicycle

29 Fall 2001 CSE33029 Query Q2 For all bicycles, list the item number, description, and highest bid (if any), ordered by item number. ( FOR $i IN document("items.xml")//item_tuple LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno] WHERE contains($i/description, "Bicycle") RETURN $i/itemno, $i/description, IF ($b) THEN NumFormat("#####.##", max(-1, $b/bid)) ELSE "" SORTBY (itemno) ) use of variable in Xpath lots of coercion

30 Fall 2001 CSE33030 Output from Q2 1001 Red Bicycle 55 1003 Old Bicycle 20 1007 Racing Bicycle 225 1008 Broken Bicycle

31 Fall 2001 CSE33031 Query Q3 Find cases where a user with a rating worse (alphabetically greater than "C" ) offers an item with a reserve price of more than 1000. ( FOR $u IN document("users.xml")//user_tuple, $i IN document("items.xml")//item_tuple WHERE $u/rating GT 'C' AND $i/reserve_price GT 1000 AND $i/offered_by = $u/userid RETURN $u/name/text(), $u/rating/text(), $i/description/text(), $i/reserve_price ) Comparing sets with singletons Same rules as in XPath? In this case the DTD gives uniqueness

32 Fall 2001 CSE33032 Conclusions XML is a data format for which there are an increasing number of useful tools for –Constructing schemas –Programming –Querying Although it is likely that a query language will soon emerge as a standard, there is less agreement or understanding on how to store XML data efficiently. Many other database issues remain to make it useful for manipulating large amounts of data.


Download ppt "Fall 2001 CSE3301 Query Languages for XML. Fall 2001 CSE3302 Why a query language? Extracting, Restructuring, Integration, Browsing… XML-QL"

Similar presentations


Ads by Google