Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML May 1 st, 2002. XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick”63436363 persons.

Similar presentations


Presentation on theme: "XML May 1 st, 2002. XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick”63436363 persons."— Presentation transcript:

1 XML May 1 st, 2002

2 XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick”63436363 persons XML: persons

3 XML vs Other Data Models XML is self-describing Schema elements become part of the data –Relational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semistructured data

4 Semi-structured Data Explained Missing attributes: Repeated attributes John 1234 Joe John 1234 Joe  no phone ! Mary 2345 3456 Mary 2345 3456  two phones !

5 Semistructured Data Explained Attributes with different types in different objects Nested collections (no 1NF) Heterogeneous collections: – contains both s and s John Smith 1234 John Smith 1234  structured name !

6 XML Data v.s. E/R, ODL, Relational Q: is XML better or worse ? A: serves different purposes –E/R, ODL, Relational models: For centralized processing, when we control the data –XML: Data sharing between different systems we do not have control over the entire data E.g. on the Web Do NOT use XML to model your data ! Use E/R, ODL, or relational instead.

7 Data Sharing with XML: Easy Data source (e.g. relational Database) Application Web XML

8 Exporting Relational Data to XML Product(pid, name, weight) Company(cid, name, address) Makes(pid, cid, price) productcompany makes

9 Export data grouped by companies GizmoWorks Tacoma gizmo 19.99 … Bang Kirkland gizmo 22.99 … … GizmoWorks Tacoma gizmo 19.99 … Bang Kirkland gizmo 22.99 … … Redundant representation of products

10 The DTD

11 Export Data by Products Gizmo GizmoWorks 19.99 Tacoma Bang 22.99 Kirkland … OneClick … Gizmo GizmoWorks 19.99 Tacoma Bang 22.99 Kirkland … OneClick … Redundant Representation of companies

12 Which One Do We Choose ? The structure of the XML data is determined by agreement, with our partners, or dictated by committees XML Data is often nested, irregular, etc No normal forms for XML

13 Storing XML Data We got lots of XML data from the Web, how do we store it ? Ideally: convert to relational data, store in RDBMS Much harder than exporting relations to XML (why ?) DB Vendors currently work on tools for loading XML data into an RDBMS

14 XML Query Languages Xpath XML-QL Xquery

15 An Example of XML Data Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998

16 XPath Syntax for XML document navigation and node selection A recommendation of the W3C (i.e. a standard) Building block for other W3C standards: – XSL Transformations (XSLT) – XQuery – XML Link (XLink) – XML Pointer (XPointer) Was originally part of XSL – “XSL pattern language”

17 XPath: Simple Expressions /bib/book/year Result: 1995 1998 /bib/paper/year Result: empty (there were no papers)

18 XPath: Restricted Kleene Closure //author Result: Serge Abiteboul Rick Hull Victor Vianu Jeffrey D. Ullman /bib//first-name Result: Rick

19 Xpath: Text Nodes /bib/book/author/text() Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname

20 Xpath: Wildcard //author/* Result: Rick Hull * Matches any element

21 Xpath: Attribute Nodes /bib/book/@price Result: “55” @price means that price is has to be an attribute

22 Xpath: Qualifiers /bib/book/author[firstname] Result: Rick Hull

23 Xpath: More Qualifiers /bib/book/author[firstname][address[//zip][city]]/lastname Result: … …

24 Xpath: More Qualifiers /bib/book[@price < “60”] /bib/book[author/@age < “25”] /bib/book[author/text()]

25 Xpath: Summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a book @pricematches a price attribute bib/book/@pricematches price attribute in book, in bib bib/book/[@price<“55”]/author/lastname matches…

26 Xpath: More Details An Xpath expression, p, establishes a relation between: –A context node, and –A node in the answer set In other words, p denotes a function: –S[p] : Nodes -> {Nodes} Examples: –author/firstname –. = self –.. = parent –part/*/*/subpart/../name = what does it mean ?

27 The Root and the Root 1 2 bib is the “document element” The “root” is above bib /bib = returns the document element / = returns the root Why ? Because we may have comments before and after ; they become siblings of This is advanced xmlogy

28 Xpath: More Details We can navigate along 13 axes: ancestor ancestor-or-self attribute child descendant descendant-or-self following following-sibling namespace parent preceding preceding-sibling self

29 Xpath: More Details Examples: –child::author/child:lastname = author/lastname –child::author/descendant::zip = author//zip –child::author/parent::* = author/.. –child::author/attribute::age = author/@age

30 XQuery Based on Quilt (which is based on XML-QL) Check out the W3C web site for the latest. XML Query data model –Ordered !

31 FLWR (“Flower”) Expressions FOR... LET... FOR... LET... WHERE... RETURN...

32 XQuery Find all book titles published after 1995: FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title FOR $x IN document("bib.xml") /bib/book WHERE $x/year > 1995 RETURN $x/title Result: abc def ghi

33 XQuery For each author of a book by Morgan Kaufmann, list all books she published: FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t FOR $a IN distinct( document("bib.xml") /bib/book[publisher=“Morgan Kaufmann”]/author) RETURN $a, FOR $t IN /bib/book[author=$a]/title RETURN $t distinct = a function that eliminates duplicates

34 XQuery Result: Jones abc def Smith ghi

35 XQuery FOR $x in expr -- binds $x to each value in the list expr LET $x = expr -- binds $x to the entire list expr –Useful for common subexpressions and for aggregations

36 XQuery count = a (aggregate) function that returns the number of elms FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p FOR $p IN distinct(document("bib.xml")//publisher) LET $b := document("bib.xml")/book[publisher = $p] WHERE count($b) > 100 RETURN $p

37 XQuery Find books whose price is larger than average: LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN $b LET $a=avg( document("bib.xml") /bib/book/price) FOR $b in document("bib.xml") /bib/book WHERE $b/price > $a RETURN $b

38 XQuery Summary: FOR-LET-WHERE-RETURN = FLWR FOR/LET Clauses WHERE Clause RETURN Clause List of tuples Instance of Xquery data model

39 FOR v.s. LET FOR Binds node variables  iteration LET Binds collection variables  one value

40 FOR v.s. LET FOR $x IN document("bib.xml") /bib/book RETURN $x FOR $x IN document("bib.xml") /bib/book RETURN $x Returns:... LET $x IN document("bib.xml") /bib/book RETURN $x LET $x IN document("bib.xml") /bib/book RETURN $x Returns:...

41 Collections in XQuery Ordered and unordered collections –/bib/book/author = an ordered collection –Distinct(/bib/book/author) = an unordered collection LET $a = /bib/book  $a is a collection $b/author  a collection (several authors...) RETURN $b/author Returns:...

42 Collections in XQuery What about collections in expressions ? $b/price  list of n prices $b/price * 0.7  list of n numbers $b/price * $b/quantity  list of n x m numbers ?? $b/price * ($b/quant1 + $b/quant2)  $b/price * $b/quant1 + $b/price * $b/quant2 !!

43 Sorting in XQuery FOR $p IN distinct(document("bib.xml")//publisher) RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] RETURN $b/title, $b/price SORTBY(price DESCENDING) SORTBY(name) FOR $p IN distinct(document("bib.xml")//publisher) RETURN $p/text(), FOR $b IN document("bib.xml")//book[publisher = $p] RETURN $b/title, $b/price SORTBY(price DESCENDING) SORTBY(name)

44 Sorting in XQuery Sorting arguments: refer to the name space of the RETURN clause, not the FOR clause

45 If-Then-Else FOR $h IN //holding RETURN $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author SORTBY (title) FOR $h IN //holding RETURN $h/title, IF $h/@type = "Journal" THEN $h/editor ELSE $h/author SORTBY (title)

46 Existential Quantifiers FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title FOR $b IN //book WHERE SOME $p IN $b//para SATISFIES contains($p, "sailing") AND contains($p, "windsurfing") RETURN $b/title

47 Universal Quantifiers FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title FOR $b IN //book WHERE EVERY $p IN $b//para SATISFIES contains($p, "sailing") RETURN $b/title

48 Other Stuff in XQuery BEFORE and AFTER –for dealing with order in the input FILTER –deletes some edges in the result tree Recursive functions –Currently: arbitrary recursion –Perhaps more restrictions in the future ?


Download ppt "XML May 1 st, 2002. XML for Representing Data John 3634 Sue 6343 Dick 6363 John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick”63436363 persons."

Similar presentations


Ads by Google