Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Languages for XML: XQuery Adrian Pop, Paul Pop Computer and Information Science Dept. Linköpings universitet.

Similar presentations


Presentation on theme: "Query Languages for XML: XQuery Adrian Pop, Paul Pop Computer and Information Science Dept. Linköpings universitet."— Presentation transcript:

1 Query Languages for XML: XQuery Adrian Pop, Paul Pop Computer and Information Science Dept. Linköpings universitet

2 2 Outline  Motivation  XML applications, types of queries  Approaches  Requirements on a query language  Path expressions, the basic building block  XML query languages: XML-QL, YATL, Lorel, XQL  XQuery  Background, history  Concepts, examples  FLWR expressions  FOR and LET expressions  Collections and sorting  Available software, demo  Examples  XQuery vs. XSLT  Summary

3 3 Motivation  XML applications  Representing many types of information, many sources  Structured and semi-structured documents  Relational databases  Object repositories  Information has to be  Accessed, filtered, grouped, transformed, etc.  Query languages are needed!  Retrieve and interpret information  Diverse sources  Querying a database is different from transforming a document

4 4 Document World vs. Database World  Two worlds, two querying approaches  XML-as-document  Roots in SGML  Queried using path expressions  XML-as-data  Middleware, interface to databases  Queried with SQL-like constructs  XML query language has to work in both worlds  A query language for XML should work across all types of XML data sources and applications  Problem  Exiting query languages designed for specific types of data  Robust for those types, weak for other

5 5 Types of Queries  W3C specification: Important classes of queries  Filtering  Compute a table of contents for a document  Joins  Combine data from multiple sources in a single result  Grouping  Forming data into groups, applying aggregate function like “average” or “count”  Queries on sequence  Queries where the sequence, hierarchy, (i.e. precedence relationships) are important

6 6 Requirements on a Query Language  Output: a query language should output XML  Composition of queries!  Views can be defined via a single query  Transparent to applications  Server-side processing  Selection: choosing a document, element, based on content, structure or attributes;  Extraction: pulling out particular elements of a document;  Reduction: removing selected sub-elements of an element;  Restructuring: Constructing a new set of element instances to hold queried data;  Combination: Merging two or more elements into one;  should all be possible in a single query.  No schema required / exploit available schema  Queries should work on XML data when there is no schema, DTD  Use the exiting schema for detecting errors at compile time

7 7 Requirements on a Query Language, Cont.  Preserve order and association  A query should preserve the order of elements, grouping  Programmatic manipulation  Queries will be constructed via programs, interfaces; programs should in an easy fashion with the representation of queries  XML representation  Mutually embedding with XML  XLink and XPointer cognizant  Namespace alias independence  A query should not be dependent on namespace aliases local to an XML document  Support for new datatypes  Suitable for metadata

8 8 Path Expressions  Query language for XML, semi-structured data  Semi-structured data modeled as a edge-labeled directed graph  Ability to reach to arbitrary depths in the data graph  Achieved using “path expressions”  Path expressions: basic building block of a query language  A sequence of edge labels l 1, l 2, …, l n  A query, whose result for a given data graph is a set of nodes  Can be specified based on some properties  Property of the path: the path must traverse the book edge  Property of an individual edge label: the label contains the substring “Victor”  Regular expressions are used to describe path properties  Limitations  Cannot create new nodes in the database  Cannot perform “Joins”  Cannot test values stored in a database

9 9 Path Expressions  Data, modeled as an edge-labeled directed graph &o1 &o12&o24&o29 &o43 &o70&o71 &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” 122133 paper book paper references author title year http author title publisher author title page firstname lastname firstnamelastnamefirst last Bib &o44&o45&o46 &o47&o48 &o49 &o50 &o51 &o52 Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}

10 10 Regular Path Expressions  R ::= label | _ | R.R | (R|R) | R* | R+ | R?  Examples:  Bib.(paper|book).author  Bib.book.author.lastname?  Bib.book.(references)*.author  Bib.(_)*.zip

11 11 XML Query Languages  Semistructured databases  XML-QL A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. A query language for XML, http://www.research.att.com/~mff/files/final.html  YATL S. Cluet, S. Jacqmin and J. Siméon The New YATL: Design and Specifications. Working draft.  Lorel S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener. The Lorel query language for semistructured data, ftp://db.stanford.edu/pub/papers/lorel96.ps  Structured text, search techniques  XQL J. Robie. The design of XQL, 1999, http://www.texcel.no/whitepapers/xql- design.html

12 12 XML Query Examples TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 Advanced Programming the Unix environment Stevens W. Addison-Wesley 65.95 Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95 - The Economics of Technology and Content for Digital TV Gerbarg Darcy CITI Kluwer Academic Publishers 129.95 Example data: list of books

13 13 XML Query Examples, Cont.  Query  List books published by Addison-Wesley after 1991, including their year and title.  Result: TCP/IP Illustrated Advanced Programming in the Unix environment

14 14 Features of Query Languages  A query has three parts  pattern clause  matches nested elements in the input document and binds variables  filter clause  tests the bound variables  constructor clause  specifies the result in terms of the bound variables  Join operator  Combine data from different portions of documents  Path expressions  Querying without precise knowledge of the document structure  Other useful features:  to check for the absence of information, e.g., missing fields.  Use of arbitrary external functions, such as aggregation functions, string comparison functions, etc.  Use of navigation operators, simplify handling data with references.

15 15 XML-QL CONSTRUCT { WHERE $t Addison-Wesley IN "www.bn.com/bib.xml", $y > 1991 CONSTRUCT $t }  patterns and filters appear in the WHERE clause  the constructor appears in the CONSTRUCT clause  The result of the inner WHERE clause is a relation, that maps variables to tuples of values that satisfy the clause  all pairs of year and title values bound to ($y, $t) that satisfy the clause  The result contains one element for each book that satisfies the WHERE clause of the inner query, one for each pair ($y, $t)

16 16 YATL make bib [ *book [ @year [ $y ], title [ $t ] ] ] match "www.bn.com/bib.xml" with bib [ *book [ @year [ $y ], title [ $t ] ], publisher [ name [ $n ] ] ] where $n = "Addison-Wesley" and $y > 1991  the constructor appears in the make clause  patterns appear in the match clause  a bib element may have many book elements,  but that each book element has one year attribute, one publisher element, and one title element  filters appear in the where clause

17 17 Lorel select xml(bib:{ (select xml(book:{@year:y, title:t}) from bib.book b, b.title t, b.year y where b.publisher = "Addison-Wesley" and y > 1991)})  constructor appears in the select clause  patterns appear in the from clause  both patterns and filters appear in the where clause.  bib is used as the entry point for the data in the XML document  The from clause binds variables to the element ids of elements denoted by the given pattern, and the where clause selects those elements that satisfy the given filters  The select clause constructs a new XML book element with a year attribute and a title element.

18 18 XQL document("http://www.bn.com")/bib { book[publisher/name="Addison-Wesley" and @year>1991] { @year | title }  XQL: from the “document world”  The pattern document("http://www.bn.com")/bib  selects all top-level bib elements  evaluates the nested expression for each such element  selects the book elements that are children of a bib element and that satisfy the filter clause in brackets  XQL does not have a constructor clause; the pattern expressions determine the result of the query  the inner-most expression: the book's year attribute and title element

19 19 XQuery: An XML Query Language  W3C standard  http://www.w3.org/TR/xquery  Derived from Quilt Jonathan Robie, Don Chamberlin, and Daniela Florescu  Based on XML-QL  Relevant W3C documents  XML Query Requirements  XML Query Use Cases  XQuery 1.0: An XML Query Language  XQuery 1.0 and XPath 2.0 Data Model  XQuery 1.0 Formal Semantics  XML Syntax for XQuery 1.0 (XQueryX)

20 20 XQuery { for $b in //bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return { $b/title }  Overview  Path expressions: XPath  FLWR (“flower”) expressions  FOR vs. LET expressions  Collections and sorting  Other constructs

21 21 XPath  W3C Standard  http://www.w3.org/TR/xpath  Building block for other W3C standards:  XSL Transformations (XSLT)  XML Link (XLink)  XML Pointer (XPointer)  XML Query  Was originally part of XSL

22 22 XPath Overview  bibmatches a bib element  *matches any element  /matches the root element  /bibmatches a bib element under root  bib/papermatches a paper in bib  bib//papermatches a paper in bib, at any depth  //papermatches a paper at any depth  paper|bookmatches a paper or a book  @pricematches a price attribute  bib/book/@pricematches price attribute in book, in bib  bib/book/[@price<“55”]/author/lastname

23 23 FLWR (“Flower”) Expressions  “Flower” expressions FOR... LET... FOR... LET... WHERE... RETURN...  Example: find all books titles published after 1995 FOR $x IN document("bib.xml")/bib/book WHERE $x/year > 1995 RETURN $x/title Result: TCP/IP Illustrated Advanced Programming the Unix environment Data on the Web The Economics of Technology and Content …

24 24 FLWR (“Flower”) Expressions, Cont.  FOR $x in expr  binds $x to each element in the list expr  LET $x = expr  binds $x to the entire list expr  Useful for common subexpressions and for aggregations FOR/LET Clauses WHERE Clause RETURN Clause List of tuples Instance of XQuery data model

25 25 FOR vs. LET  FOR Query FOR $x IN document("bib.xml")/bib/book RETURN $x  Returns...  LET Query LET $x := document("bib.xml")/bib/book RETURN $x  Returns...

26 26 More Complex FLWR Expressions  For each author of a book by Morgan Kaufmann, list all his/her books: FOR $a IN distinct(document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t (distinct: eliminates duplicates)  Find books whose price is larger than average: LET $a=avg(document("bib.xml")/bib/book/@price) FOR $b in document("bib.xml")/bib/book WHERE $b/@price > $a RETURN $b

27 27 Collections in XQuery  Ordered and unordered collections  Ordered collection  /bib/book/author  Unordered collection  distinct(/bib/book/author)  LET $a = /bib/book$a is a collection  $b/author a collection (several authors...)  $b/@price list of n prices  $b/@price * 0.7 list of n numbers  $b/@price * $b/@quantity list of n x m numbers

28 28 Sorting in XQuery  Sorting arguments  Refer to the name space of the RETURN clause, not the FOR clause  To sort on an element you don’t want to display  Return it, then remove it with an additional query. FOR $p IN distinct(document("bib.xml")//publisher) RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] RETURN $b/title, $b/@price SORTBY(price DESCENDING) SORTBY(name)

29 29 If-Then-Else FOR $h IN //holding RETURN $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author SORTBY (title)

30 30 Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title

31 31 Other Constructs  BEFORE and AFTER  for dealing with order in the input  FILTER  deletes some edges in the result tree  Recursive functions  Currently: arbitrary recursion  Perhaps more restrictions in the future?

32 32 XQueryX LET $authors := /book/author RETURN { $authors } book author AUTHORS $authors

33 33 XQuery Software  QuiP  http://www.softwareag.com/ developer/downloads/default.htm  Software AG  Windows and Linux on x86  Features  Latest W3C syntax  Graphical user interface.  Kweelt  http://kweelt.sourceforge.net/  Open Source  Runs on all Java platforms  Problems  Older syntax, from previous W3C requirements.  No graphical user interface.

34 34 Example Application: Cruise Controller  Vehicle cruise controller.  Modelled with a process graph of 32 processes.  Mapped on 5 nodes: CEM, ABS, ETM, ECM, TCM.

35 35 P1P1 P1P1 P4P4 P4P4 P2P2 P2P2 P3P3 P3P3 m1m1 m2m2 m3m3 m4m4 S1S1 S0S0 Round 1Round 2Round 3Round 4Round 5 P1P1 P4P4 P2P2 m1m1 m2m2 m3m3 m4m4 P3P3 24 ms Schedule Table

36 36 XML Model of the Cruise Controller  architecture.xml I 128  behaviour.xml 7 2 PR3 PR4 0  mapping.xml PR1 PR2 PR30  schedule.xml 0 0 P6 0 12 P1

37 37 Requirements on the Cruise Controller  Requirements on the model  The model should be consistent  Every process should be mapped to one and only one node  Every sensor/actuator should be connected  The schedule should be correct  The schedule should respect the precedence constraints  No two slots in the schedule should overlap  Cruise Controller  Timing requirements  The CC should execute within 100 ms  Resource requirements  The sum of processes’ memory on a node should not exceed that node's capacity  Should be expressed in XQuery!

38 38 Resource Requirements: Query The sum of processes’ memory on a node should not exceed that node's capacity for $map in document("data/sweb/mapping.xml")//MAP, $nod in document("data/sweb/architecture.xml")//:NODE[@Id = $map/@Resource] let $proc := document("data/sweb/behaviour.xml")//PROCESS[@Id = $map/Process] return <processor Name={$nod/@Name} Id={$nod/@Id} HasMemory={$nod/Memory/text(),$nod/Memory/@unit} MemoryUsedByScheduledProcesses={sum($proc/Memory),$nod/Memory/@unit}> { for $process in $proc return <process Name={$process/@Name} Id={$process/@Id} Memory={$process/Memory/text(),$process/Memory/@unit} /> sortby(int(substring-before(@Memory,"K"))) } sortby(int(substring-after(@Id,"P")))

39 39 Resource Requirements: Result query result:check_resource_consistency.xml … …

40 40 Use Case “XMP”: Experiences and Exemplars TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 Advanced Prog… the Unix environment Stevens W. Addison-Wesley 65.95 Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95 - The Economics of Technology and Content for Digital TV Gerbarg Darcy CITI Kluwer Academic Publishers 129.95 { for $b in document("data/xmp- data.xml")/bib/book where $b/publisher = "Addison-Wesley" and int($b/@year) > 1991 return { $b/title } xmp-data.xml XMPQ1.xquery

41 41 Use Case “XMP”: Experiences and Exemplars TCP/IP Illustrated Advanced Programming in the Unix environment Result of: List books published by Addison-Wesley after 1991, including their year and title.

42 42 Use Case “TREE”: Qs that preserve hierarchy Data on the Web Serge Abiteboul Peter Buneman Dan Suciu Introduction Text... Audience Text... Web Data and the Two Cultures Text... Traditional client/server architecture Text... …. …. { for $f in document("data/tree-data.xml")//figure return { ($f/@*,$f/title ) } tree-data.xml TREEQ2.xquery

43 43 Use Case “TREE”: Qs that preserve hierarchy Traditional client/server architecture Graph representations of structures Examples of Relations Result of: Prepare a (flat) figure list for first book, listing all the figures and their titles. Preserve the original attributes of each element, if any.

44 44 Use Case “TREE”: Qs that preserve hierarchy ( { count(document("data/tree-data.xml")//section) }, { count(document("data/tree-data.xml")//figure) } ) - <quip:result xmlns:quip="http://namespaces. softwareag.com/tamino/quip/"> 7 3 TREEQ3.xquery/Result TREEQ4.xquery/Result { count(document("data/tree-data.xml")/book/section) } <quip:result xmlns:quip="http://namespaces. softwareag.com/tamino/quip/"> 2

45 45 Use Case “SEQ”: Queries based on sequence report> Procedure The patient was taken to the operating room where she was placed in supine position and induced under general anesthesia. A Foley catheter was placed to decompress the bladder and the abdomen was then prepped and draped in sterile fashion. A curvilinear incision was made in the midline immediately infraumbilical and the subcutaneous tissue was divided using electrocautery. The fascia was identified and #2 0 Maxon stay sutures were placed on each side of the midline. The fascia was divided using electrocautery and the peritoneum was entered. … for $s in document("data/report1.xml")//section[section. title = "Procedure"] let $instruments := $s//instrument for $i in 1 to 2 return $instruments[$i] Result of: In the Procedure section of Report1, what are the first two Instruments to be used? <quip:result xmlns:quip="http://namespaces. softwareag.com/tamino/quip/"> using electrocautery. electrocautery report1.xml SEQQ2.xquery

46 46 Use Case “R”: Access to Relational Data USERS USERID NAME RATING U01 Tom Jones B U02 Mary Doe A U03 Dee Linquent D U04 Roger Smith C U05 Jack Sprat B U06 Rip Van Winkle B ITEMS ITEMNO DESCR O_BY DATE PRICE 1001 Red Bicycle U01 99-01-05 99-01-20 40 1002 Motorcycle U02 99-02-11 99-03-15 500 1003 Old Bicycle U02 99-01-10 99-02-20 25 1004 Tricycle U01 99-02-25 99-03-08 15 1005 Tennis Racket U03 99-03-19 99-04-30 20 1006 Helicopter U03 99-05-05 99-05-25 50000 1007 Racing Bicycle U04 99-01-20 99-02-20 200 1008 Broken Bicycle U01 99-02-05 99-03-06 25 BIDS USERID ITEMNO BID BID_DATE U02 1001 35 99-01-07 U04 1001 40 99-01-08 U02 1001 45 99-01-11 U04 1001 50 99-01-13 U02 1001 55 99-01-15 U01 1002 400 99-02-14 …. { for $u in document("users.xml")//user_tuple for $i in document("items.xml")//item_tuple where $u/rating > "C" and $i/reserve_price > 1000 and $i/offered_by = $u/userid return { $u/name } { $u/rating } { $i/description } { $i/reserve_price } } relational data RQ2.xquery

47 47 - - Dee Linquent D Helicopter 50000 Result of: Find cases where a user with a rating worse (alphabetically, greater) than "C" is offering an item with a reserve price of more than 1000. Use Case “R”: Access to Relational Data

48 48 { for $u in document("data/R-users.xml")//user_tuple let $b := document("data/R-bids.xml")//bid_tuple[userid = $u/userid and int(string-value(bid)) >= 100] where count($b) > 1 return { $u/name/text() } Result: Mary Doe Dee Linquent Roger Smith Result of: List names of users who have placed multiple bids of at least $100 each. Use Case “R”: Access to Relational Data

49 49 Use Case “PARTS”: Recursive Parts Explosion partlist> define function one_level(xs:AnyType $p, xs:AnyType $ps) returns xs:AnyType { { ($p/@partid,$p/@name,for $s in $ps[$p/@partid = @partof] return one_level($s,$ps) ) } let $ps := document("data/parts- data.xml")/partlist/part for $p in $ps[not(@partof)] return one_level($p,$ps) parts-data.xml PARTSQ1a.xquery

50 50 - Result of: Convert the sample document from "partlist" format to "parttree" format (see DTD section for definitions). In the result document, part containment is represented by containment of one element inside another. Each part that is not part of any other part should appear as a separate top- level element in the output document. Use Case “PARTS”: Recursive Parts Explosion

51 51 XSLT & XQuery: is there a difference? <xsl:transform xmlns:xsl="http://www.w3.org/1999/ -XSL/Transform" version="1.0"> <xsl:for-each select="document('xmp- data.xml')/bib/book"> <xsl:if test="publisher='Addison-Wesley‘ and @year>'1991'"> { for $b in document("data/xmp- data.xml")/bib/book where $b/publisher = "Addison-Wesley" and int($b/@year) > 1991 return { $b/title } xslt.xls XMPQ1.xquery

52 52 XSLT & XQUERY XQuery Result: TCP/IP Illustrated Advanced Programming in the Unix environment XSLT Xalan engine Result: TCP/IP Illustrated Advanced Programming in the Unix environment

53 53 Summary  Motivation  XML applications, types of queries  Approaches  Requirements on a query language  Path expressions, the basic building block  XML query languages  XQuery  Background, history  Concepts, examples  FLWR expressions  FOR and LET expressions  Collections and sorting  Available software, demo  Examples  XQuery vs. XSLT  Summary


Download ppt "Query Languages for XML: XQuery Adrian Pop, Paul Pop Computer and Information Science Dept. Linköpings universitet."

Similar presentations


Ads by Google