Presentation is loading. Please wait.

Presentation is loading. Please wait.

Universal Database Systems

Similar presentations


Presentation on theme: "Universal Database Systems"— Presentation transcript:

1 Universal Database Systems
Part 4: Databases and XML

2 Overview Querying and Storing XML Introduction to XML
DTDs and Schemas for XML Documents Languages for XML, in particular XSL Querying and Storing XML Summary and Outlook UDBS Part 4 -Winter 2001/2

3 In Detail Query Usage Scenarios, Query Requirements, Query Data Model,
XML Query Algebra as a foundation XQuery: a query language for XML Storing XML Documents in Databases DBMS vendor developments UDBS Part 4 -Winter 2001/2

4 Why Query Languages for XML ?
XML with SAX or DOM: SAX: very simple "event-based" queries DOM: simple navigational queries (getChildNodes, getNextSibling, getElementsByTagName,…) These are "low-level" APIs similar to an iterator/cursor API for RDBs used to write XML applications "high-level" querying, restructuring and transformation is tedious Thus, analogue to high-level relational query languages (SQL, QBE, Logic) needed UDBS Part 4 -Winter 2001/2

5 e-shopper‘s_heaven.com How can e-shopper‘s let Web users search or query its product database ? e-shopper‘s heaven Select director From moviecatalog Where title = ‘MI2‘ UDBS Part 4 -Winter 2001/2

6 History of XQuery Dave Maier‘s requirements at QL’98
XML-QL shortly thereafter Many other proposal between 1998 and 2000 Initial products, e.g., XQL in Tamino @ XQuery Working Draft since Feb. 15, 2001 (revised June 7, 2001; December 20, 2001) UDBS Part 4 -Winter 2001/2

7 XML Query Specifications (Working drafts – ongoing work!)
XML Query Requirements XML Query Use Cases XQuery 1.0 and XPath 2.0 Data Model XQuery 1.0: An XML Query Language XML Syntax for XQuery 1.0 (XQueryX) XQuery 1.0 and XPath 2.0 Functions and Operators Version 1.0 XML Path Language (XPath) 2.0 UDBS Part 4 -Winter 2001/2

8 Usage Scenarios for XML Queries
Queries on structured, human-readable documents or collections of documents to create other documents Queries on XML data (from multiple sources) to create new XML data Mixed document- and data-oriented queries Queries on administrative data such as configuration files or user profiles Filtering queries on streams of XML data UDBS Part 4 -Winter 2001/2

9 Usage Scenarios (cont‘d)
Queries on DOM structures to return sets of nodes Queries on collections of documents managed by native XML repositories Queries to search catalogs that describe document servers, document types, XML schemas, or documents Queries in multiple syntactic environments, e.g., a URL, an XML page, a JSP or ASP page UDBS Part 4 -Winter 2001/2

10 Corresponding Use Cases
XMP:Experiences and Exemplars TREE: Queries that preserve hierarchy SEQ: Queries based on Sequence R: Access to Relational Data SGML: Standard Generalized Markup Language TEXT: Full-text Search NS: Queries Using Namespaces PARTS: Recursive Parts Explosion REF: Queries based on Reference FNPARM: Functions and Parameters UDBS Part 4 -Winter 2001/2

11 Use Case XMP: DTD and Data
<!ELEMENT bib (book* )> <!ELEMENT book (title, (author+ | editor+ ), publisher, price )> <!ATTLIST book year CDATA #REQUIRED > <!ELEMENT author (last, first )> <!ELEMENT editor (last, first, affiliation )> <!ELEMENT title (#PCDATA )> <!ELEMENT last (#PCDATA )> <!ELEMENT first (#PCDATA )> <!ELEMENT affiliation (#PCDATA )> <!ELEMENT publisher (#PCDATA )> <!ELEMENT price (#PCDATA ) Sample Data: <bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price> 65.95</price> </book> </bib> UDBS Part 4 -Winter 2001/2

12 Use Case XMP: Sample Queries
Q1: List books published by Addison-Wesley after 1991, including their year and title Q4: For each author in the bibliography, list the author's name and the titles of all books by that author, grouped inside a "result" element. Q11: For each book with an author, return the book with its title and authors. For each book with an editor, return a reference with the book title and the editor's affiliation. UDBS Part 4 -Winter 2001/2

13 Use Case R: Background Relational data(base) represented as XML document Tables become elements with their tuples nested inside Queries in XQuery can be run on this XML representation of the actual tables Example: database used by an online auction USERS: information on registered users ITEMS: items currently/recently for sale BIDS: has all bids on record, keyed by the id of the bidder and the number of the item to which the bid applies UDBS Part 4 -Winter 2001/2

14 Use Case R: Relational Data & DTD
Tables: USERS ( USERID, NAME, RATING ) ITEMS ( ITEMNO, DESCRIPTION, OFFERED_BY, START_DATE, END_DATE, RESERVE_PRICE ) BIDS ( USERID, ITEMNO, BID, BID_DATE ) DTDs: <!DOCTYPE users [ <!ELEMENT users (user_tuple*)> <!ELEMENT user_tuple (userid, name, rating?)> <!ELEMENT userid (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT rating (#PCDATA)>]> UDBS Part 4 -Winter 2001/2

15 Use Case R: Sample Queries
Q2: For all bicycles, list the item number, description, and highest bid (if any), ordered by item number. Q4: List item numbers and descriptions of items that have no bids. Q14: List item numbers and average bids for items that have received three or more bids, in descending order by average bid. UDBS Part 4 -Winter 2001/2

16 Maier‘s Requirements to a Query Language
Closedness: language maps XML to XML Precise semantics: formally defined Optimizability: queries can be improved prior to execution to achieve better performance Adequateness: language makes full use of the various XML features UDBS Part 4 -Winter 2001/2

17 More Requirements Query operations: language supports selection, extraction, reduction, restructuring, and combination (join) No schema needed: language can be used in the absence of a DTD Schema exploitation: if a DTD is available, it can be used for syntax checking Server-side processing: queries are independent of their creation context UDBS Part 4 -Winter 2001/2

18 W3C Query Requirements Query language may have more than one syntax binding, syntax must be expressed in XML Declarativity Protocol independence Defined error conditions Update capabilities in future versions Defined for finite instances UDBS Part 4 -Winter 2001/2

19 XQuery Data Model Also datamodel of XSLT 2.0 (W3C Working Draft)
Relies on XML Information Set (W3C Recommendation) plus represents both XML 1.0 character data and the simple and complex types of XML Schema, represents collections of documents and collections of simple and complex values Schema availability Namespace awareness UDBS Part 4 -Winter 2001/2

20 Data Model - Formally Node-labeled, tree-constructor representation with node identity 7 kinds of tree nodes: document, element, attribute, namespace, processing_instruction, comment , text Functions to construct tree nodes ("constructors") Functions to access nodes' structure ("accessors") UDBS Part 4 -Winter 2001/2

21 XQuery 1.0 Formal Semantics (As of June 2001)
Is closed (XML fragment -> XML fragment) Is a functional language, no side effects Provides a semantics for the query language supports query optimization through commutativity, associativity, and other laws Features include attributes, namespaces, scalar types, element identity, collation, key constraints Is relationally complete UDBS Part 4 -Winter 2001/2

22 Sample Data <bib> <book year="2000" isbn=" X"> <title> Data on the Web </title> <author> Abiteboul </author> <author> Buneman </author> <author> Suciu </author> </book> <book year="2001" isbn="1-xxxxx-yyy-z"> <title> XML Query </title> <author> Fernandez </author> <author> Suciu </author> </book> </bib> UDBS Part 4 -Winter 2001/2

23 XSD Fragment <xsd:group name="Bib"> <xsd:element name="bib"> <xsd:complexType> <xsd:group ref="Book" minOccurs="0" maxOccurs="unbounded"/> </xsd:complexType> </xsd:element> </xsd:group> <xsd:group name="Book"> <xsd:element name="book"> <xsd:complexType> <xsd:attribute name="year" type="xsd:integer"/> <xsd:attribute name="isbn" type="xsd:string"/> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string" minOccurs="1" maxOccurs="unbounded" /> </xsd:complexType> </xsd:element> UDBS Part 4 -Winter 2001/2

24 Algebra Representation
TYPE Bib = ELEMENT bib (Book*) TYPE Book = ELEMENT book ( ATTRIBUTE year (xs:integer) & ATTRIBUTE isbn (xs:string) ELEMENT title (xs:string), (ELEMENT author(xs:string))+) LET $bib0 := <bib> <book year="2000" isbn=" X"><title>Data on the Web</title> <author>Abiteboul</author><author>Buneman</author><author>Suciu</author> </book> <book year="2001" isbn="1-XXXXX-YYY-Z"><title>XML Query</title> <author>Fernandez</author><author>Suciu</author> </book>), </bib> : Bib RETURN … 2 types global variable, bound to literal XML value UDBS Part 4 -Winter 2001/2

25 Algebra Characteristics
Strongly typed, i.e., the value of variable $bib0 must be an instance of its declared type (or the expression is ill-typed) Operations: Projection, Iteration, Selection Quantification Join, Restructuring, Aggregation Functions, Structural Recursion UDBS Part 4 -Winter 2001/2

26 Projection Return all author elements contained in book elements contained in $bib0: $bib0/book/author ==>(<author>Abiteboul</author>, <author>Buneman</author>, <author>Suciu</author>, <author>Fernandez</author>, <author>Suciu</author>) : (ELEMENT author (xs:string))* 1. an expression 2. the value of the expression 3. the type of the expression UDBS Part 4 -Winter 2001/2

27 Notes on Projection Order of author elements is preserved
Duplicate elements are preserved Although a book can have >= 1 (+) authors, the query result may contain >= 0 (*) authors The type of an expression depends only on the type of its subexpressions UDBS Part 4 -Winter 2001/2

28 Goals of XQuery Design a small, clean, easily implementable language
Cover the functionality required by all the XML Query use cases in a single language Write queries that are concise and easily understood Derived from Quilt, which in turn borrowed from XPath, XQL, XML-QL, SQL, and OQL UDBS Part 4 -Winter 2001/2

29 Antecedents: XPath and XQL
Closely-related languages for navigating in a hierarchy A path expression is a series of steps Each step moves along an axis (children, ancestors, attributes, etc.) and may apply a predicate XPath has an abbreviated syntax, adapted from XQL: /book[title = "War and Peace"] /chapter[title = "War"] //figure[contains(caption, "Guns")] XQL has some additional operators: BEFORE, AFTER, ... UDBS Part 4 -Winter 2001/2

30 Antecedent: XML-QL Proposed by Deutsch, Fernandez, Florescu, Levy, Suciu WHERE-clause binds variables according to a pattern, CONSTRUCT-clause generates output: WHERE <part pno = $pno> $pname </> in "parts.xml", <supp sno = $sno> $sname </> in "supp.xml", <sp pno = $pno sno = $sno> </> in "sp.xml" CONSTRUCT <purchase> <partname> $pname </> <suppname> $sname </> </purchase> UDBS Part 4 -Winter 2001/2

31 Antecedents: SQL and OQL
SQL and OQL are database query languages SQL derives a new table from other tables by a series of clauses: SELECT - FROM - WHERE OQL is a functional language A query is an expression Expressions can take several forms Expressions can be nested and combined SELECT-FROM-WHERE is one form of OQL expression UDBS Part 4 -Winter 2001/2

32 XQuery: XML Query Language
Functional language query is an expression expressions can be nested strongly typed (operands must conform to designated types) Some SQL design errors corrected Semantics: based on "core syntax" Proposed by Chamberlin, Clark, Florescu, Robie, Simeon, Stefanescu UDBS Part 4 -Winter 2001/2

33 Types of Expressions Primary expressions (variable, literal, function call …) Path expressions Sequence expressions Arithmetic expressions Comparison expressions Logical expressions Constructors FLWR expressions Sorting expressions Conditional expressions Quantified expressions Expressions on data types UDBS Part 4 -Winter 2001/2

34 Sample Expressions A path expression (using abbreviated XPath syntax):
document("bids.xml")//bid[itemno="47"]/bid_amount An expression using operators and functions: ($x + $y) * foo($z) An element constructor: <bid> <userid> {$u} </userid> , <bid_amount> {$a} </userid> </bid> UDBS Part 4 -Winter 2001/2

35 A Sample FLWR Expression
"Find the description and average price of each red part that has at least 10 orders" FOR $p IN document("parts.xml") //part[color = "Red"] LET $o := document("orders.xml") //order[partno = $p/partno] WHERE count($o) >= 10 RETURN <important_red_part> {$p/description} <avg_price> {avg($o/price)} </avg_price> </important_red_part> UDBS Part 4 -Winter 2001/2

36 A FLWR Expression A FLWR expression binds some variables, applies a predicate, and constructs a new result. ( FOR | LET )+ ... WHERE? ... RETURN .... FOR_clause LET_clause WHERE_clause RETURN_clause UDBS Part 4 -Winter 2001/2

37 Data Flow in a FLWR Expression
UDBS Part 4 -Winter 2001/2

38 FOR Clause FOR is used for iterating over one or more collections
variable IN expression , FOR is used for iterating over one or more collections FOR introduces one or more variables, associating with each an expression Tuples of variable bindings are drawn from Cartesian product of the sequences of values to which the expressions evaluate Variable bindings are generated as ordered sequence UDBS Part 4 -Winter 2001/2

39 LET Clause LET is also used for binding variables (without iteration)
, variable := expression LET is also used for binding variables (without iteration) A LET clause produces a single binding for each variable (therefore it does not affect the number of binding-tuples) The variable is bound to the value of expression, which may contain many nodes. Document order is preserved among the nodes in each bound collection, unless expression contains a non-order-preserving function such as distinct( ). UDBS Part 4 -Winter 2001/2

40 FOR vs. LET FOR $x IN /library/book results in many bindings, each of which binds $x to one book in the library LET $x := /library/book results in a single binding which binds $x to a list containing all books in the library UDBS Part 4 -Winter 2001/2

41 WHERE Clause Applies predicate(s) to the tuples of bound variables
WHERE boolean-expression Applies predicate(s) to the tuples of bound variables Retains only tuples that satisfy the predicate(s) Preserves order of tuples, if any May contain AND and OR Applies scalar conditions to variables bound in a FOR clause (to individual nodes), e.g., $p/color = "Red" Applies set conditions to variables bound by a LET clause (to sequences of nodes), e.g., avg($p/price) > 100 UDBS Part 4 -Winter 2001/2

42 RETURN Clause RETURN expression Constructs the result of the FLWR expression, which may be a value, a node, or an ordered forest of nodes Executed once for each tuple of bound variables generated by FOR and LET and satisfying WHERE Preserves order of tuples, if any OR, can impose a new order using a SORTBY clause Often contains references to bound variables, nested subexpressions, or an element constructor, e.g., <item> {$item/itemno} <avg_bid> {avg($b/bid_amount)} </avg_bid> </item> SORTBY itemno UDBS Part 4 -Winter 2001/2

43 Example 1 (a) in XQuery In XSL:
FOR $b IN RETURN <TechnicalBook> {$b/title} </TechnicalBook> In XSL: <xsl:template match="/"> <xsl:for-each <TechnicalBook> <xsl:value-of select="TITLE"/> </TechnicalBook> </xsl:for-each> </xsl:template> UDBS Part 4 -Winter 2001/2

44 Example 1 (b) in XQuery In XSL:
FOR $b IN document("eshoppers.xml") //BOOK, $v IN document("eshoppers.xml") //VIDEO WHERE $b/YEAR = $v/YEAR RETURN <BookAndVideoInYear> <Year> {$b/YEAR} </Year> <Book> {$b/TITLE} </Book> <Video> {$v/TITLE} </Video> </BookAndVideoInYear> <xsl:template match="/"> <xsl:for-each select="//BOOK[YEAR]"> <xsl:variable name="book" select="."/> <xsl:for-each select="//VIDEO[YEAR=$book/YEAR]"> <xsl:variable name="video" select="."/> <BookAndVideoInYear <Year><xsl:value-of select="$book/YEAR"/></Year> <Book><xsl:value-of select="$book/TITLE"/></Book> <Video><xsl:value-of select="$video/TITLE"/></Video> </BookAndVideoInYear> </xsl:for-each> </xsl:for-each> </xsl:template> In XSL: UDBS Part 4 -Winter 2001/2

45 Sample Document: bib.xml
bib.xml has the following structure: <bib> <book> <title> </title> <author> </author> <publisher> ... </publisher> <year> </year> <price> </price> </book> </bib> UDBS Part 4 -Winter 2001/2

46 Simple FLWR Queries Find all the books published in 2002 by Morgan Kaufmann: FOR $b IN document("bib.xml")//book WHERE $b/year = "2002" AND $b/publisher = "Morgan Kaufmann" RETURN $b SORTBY(author, title) Find titles of books that have no authors: <orphan_books> WHERE empty($b/author) RETURN $b/title </orphan_books> UDBS Part 4 -Winter 2001/2

47 More FLWRs List each publisher and the average price of its books: FOR $p IN distinct(document("bib.xml")//publisher) LET $a := avg(document("bib.xml")//book[publisher = $p]/price) RETURN <publisher> <name> {$p/text()} </name> <avgprice> {$a} </avgprice> </publisher> UDBS Part 4 -Winter 2001/2

48 More FLWRs List the publishers who have published more than 100 books: <big_publishers> { FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")//book[publisher = $p] WHERE count($b) > 100 RETURN $p } </big_publishers> UDBS Part 4 -Winter 2001/2

49 A Nested Query Invert the hierarchy from publishers inside books to books inside publishers: FOR $p IN distinct(document("bib.xml")//publisher) RETURN <publisher> <name> {$p/text()} </name> { FOR $b IN document("bib.xml")//book[publisher = $p] <book> $b/title, $b/price </book> SORTBY(price DESCENDING) } </publisher> SORTBY(name) UDBS Part 4 -Winter 2001/2

50 More FLWRs For each book whose price is greater than the average price, return the title of the book and the amount by which the book's price exceeds the average price: <result> { LET $a := avg(document("bib.xml")//book/price) FOR $b IN document("bib.xml")//book WHERE $b/price > $a RETURN <expensive_book> {$b/title} <price_difference> {$b/price - $a} </price_difference> </expensive_book> } </result> UDBS Part 4 -Winter 2001/2

51 Conditional Expressions
IF expr1 THEN expr2 ELSE expr3 Make a list of holdings, ordered by title. For journals, include the editor; otherwise include the author: FOR $h IN //holding RETURN <holding> { $h/title, IF = "Journal" THEN $h/editor ELSE $h/author } </holding> SORTBY(title) UDBS Part 4 -Winter 2001/2

52 Quantified Expressions: Some
var IN expr SATISFIES predicate SOME EVERY Quantified expressions are a form of predicate (return Boolean) Find titles of books in which both sailing and windsurfing are mentioned in the same paragraph: FOR $b IN document("bib.xml")//book WHERE SOME $p IN $b//para SATISFIES (contains($p, "Sailing") AND contains($p, "Windsurfing")) RETURN $b/title UDBS Part 4 -Winter 2001/2

53 Quantified Expressions: Every
Find titles of books in which sailing is mentioned in every paragraph: FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title UDBS Part 4 -Winter 2001/2

54 Nested Quantifications
Let employees have multiple skills and multiple duties. Find names of employees who have some duty that is not matched by a skill: FOR $e IN //emp WHERE SOME $d IN $e/duty SATISFIES not(SOME $s IN $e/skill SATISFIES $s = $d) RETURN $e/name UDBS Part 4 -Winter 2001/2

55 Functions XQuery has a library of built-in functions, e.g., document, avg, sum, count, max, min, distinct, empty A query can define its own local functions Version 1 does not allow user-defined functions to be overloaded (multiple functions under the same name) The filter function can select a set of nodes from a hierarchy while preserving the original relationships among these nodes UDBS Part 4 -Winter 2001/2

56 Functions Functions can be recursive
Example: "Compute the maximum depth of nested parts in the document named partlist.xml" NAMESPACE xsd = " DEFINE FUNCTION depth($e) RETURNS xsd:integer { {-- empty element has depth 1 --} {-- otherwise, add 1 to max depth of children --} IF (empty($e/*)) THEN ELSE max(depth($e/*)) } depth(document("partlist.xml") ) UDBS Part 4 -Winter 2001/2

57 Another Function A function that returns all the elements that are "connected" to a given element by child or reference connections, and a recursive function that returns all the elements that are "reachable" from a given element by child or reference connections: In "company.xml", find all the elements that are reachable from the employee with serial number by child or reference connections. DEFINE FUNCTION connected($e) { $e/* UNION } DEFINE FUNCTION reachable($e) { $e UNION reachable(connected($e)) } reachable(document("company.xml")/emp[serial="12345"]) UDBS Part 4 -Winter 2001/2

58 The FILTER Function FILTER ( expression )
Example: Result contains copies of all nodes of type A and B in the original hierarchy, with their original relationships preserved UDBS Part 4 -Winter 2001/2

59 Projection (Filtering a document)
"Generate a table of contents for cookbook.xml containing nested sections and their titles" <toc> { let $b := document("book1.xml") return filter($b//section | $b//section/title | $b//section/title/text()) } </toc> UDBS Part 4 -Winter 2001/2

60 A Join Example Generate a "descriptive catalog" derived from the catalog document, but containing part descriptions instead of part numbers and supplier names instead of supplier numbers. Order the new catalog alphabetically by part description and secondarily by supplier name: <descriptive-catalog> { FOR $i IN document("catalog.xml")//item, $p IN document("parts.xml")//part[partno = $i/partno], $s IN document("suppliers.xml")//supplier[suppno = $i/suppno] RETURN <item> { $p/description, $s/suppname, $i/price } </item> SORTBY(description, suppname) } </descriptive-catalog> UDBS Part 4 -Winter 2001/2

61 SQL vs. XQuery SQL: XQuery: SELECT var.title, var.isbn, var.year
FROM bookcatalog AS var WHERE var.year > 1998 ORDER BY var.year DESC  XQuery: <RESULT> { FOR $var IN document("catalog.xml")//book WHERE $var/year > 1998 RETURN <BOOK> { $var/title, $var/isbn, $var/year } </BOOK> SORTBY(year) DESCENDING } </RESULT> UDBS Part 4 -Winter 2001/2

62 XQuery 1.0 Many additional features, e.g., sequence-related operators, operations on data types, notion of a "query module" Many issues still under discussion (many changes from 02/15/01 to 06/07/01 and again to 20/12/01) A formal grammar is presented in the W3C document Moreover, XQueryX is an XML representation of XQuery UDBS Part 4 -Winter 2001/2

63 XQuery Processors Kawa-XQuery
Partial implementation Queries are complied into Java bytecode see Microsoft's XQuery Language Demo at UDBS Part 4 -Winter 2001/2

64 Summary on XQuery Declarative query language for XML
Likely to become the standard for querying XML Still working draft Implementations under development UDBS Part 4 -Winter 2001/2


Download ppt "Universal Database Systems"

Similar presentations


Ads by Google