Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture #6 XML November 2 nd, 2000. Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework.

Similar presentations


Presentation on theme: "Lecture #6 XML November 2 nd, 2000. Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework."— Presentation transcript:

1 Lecture #6 XML November 2 nd, 2000

2 Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework #5

3 Outline XML: –Introduction –Syntax, DTDs, –Politics, –Exporting relational data to XML. Querying XML: –Xpath, XML-QL, Quilt, XSLT XML data management: –Storing XML in relational databases.

4 What is XML ? From HTML to XML HTML describes the presentation: easy for humans

5 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999 HTML is hard for applications

6 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content: easy for applications

7 XML eXtensible Markup Language Roots: comes from SGML (very nasty language). After the roots: a format for sharing data Emerging format for data exchange on the Web and between applications

8 XML Applications Sharing data between different components of an application. Format for storing all data in Office 2000. EDI: electronic data exchange: –Transactions between banks –Producers and suppliers sharing product data (auctions) –Extranets: building relationships between companies –Scientists sharing data about experiments.

9 XML Syntax Very simple: <db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> < > <title>Transaction Processing</title> <author>Bernstein</author> < >Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher> </db>

10 XML Terminology tags: book, title, author, … start tag:, end tag: start tags must correspond to end tags, and conversely

11 XML Terminology an element: everything between tags –example element: Complete Guide to DB2 –example element: Complete Guide to DB2 Chamberlin elements may be nested empty element: abbreviated an XML document has a unique root element well formed XML document: if it has matching tags

12 The XML Tree db book publisher titleauthor titleauthor namestate “Complete Guide to DB2” “Chamberlin”“Transaction Processing” “Bernstein”“Newcomer” “Morgan Kaufman” “CA” Tags on nodes Data values on leaves

13 More XML Syntax: Attributes Complete Guide to DB2 Chamberlin 1998 price, currency are called attributes

14 Replacing Attributes with Elements Complete Guide to DB2 Chamberlin 1998 55 USD attributes are alternative ways (worse ) to represent data

15 XML References  ID’s and IDREFs are used to reference objects.

16 “Types” (or “Schemas”) for XML Document Type Definition – DTD Define a grammar for the XML document, but we use it as substitute for types/schemas Will be replaced by XML-Schema (will extend DTDs)

17 An Example DTD <!DOCTYPE db [ ]> PCDATA means Parsed Character Data (a mouthful for string)

18 DTDs as Grammars db ::= (book|publisher)* book ::= (title,author*,year?) title ::= string author ::= string year ::= string publisher ::= (name, state) name ::= string state ::= string A DTD is a EBNF (Extended BNF) grammar An XML tree is precisely a derivation tree XML Documents that have a DTD and conform to it are called valid

19 More on DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> … XML documents can be nested arbitrarily deep

20 XML for Representing Data John 3634 Sue 6343 Dick 6363 row name phone “John”3634“Sue”“Dick”63436363 persons XML: persons

21 XML vs Data Models XML is self-describing Schema elements become part of the data –Reational schema: persons(name,phone) –In XML,, are part of the data, and are repeated many times Consequence: XML is much more flexible XML = semi-structured data

22 Semi-structured Data Explained Missing attributes: – John 1234 – Joe no phone ! Repeated attributes – Mary 2345 3456

23 S-S Data Further Explained Attributes with different types in different objects – John Smith 1234 Nested collections Heterogeneous collections: – contains both s and s

24 XML Data v.s. E/R, ODL, Relational Q: is XML better or worse ? A: serves different purposes –E/R, ODL, Relational models: For centralized processing, when we control the data –XML: Data sharing between different systems we do not have control over the entire data E.g. on the Web Do NOT use XML to model your data ! Use E/R, ODL, or relational instead.

25 Why Do People Like XML so much? It’s easy to learn. It’s human readable. No need for proprietary formats anymore. Lots of tools out there for basis manipulations, editing, validation, simple querying. It’s very flexible: –Data is self-describing –Can add attributes easily –Data can be irregular Note: (common fallacy) without common DTD’s data sharing is not solved!

26 Why are we DB’ers interested? It’s data, stupid. That’s us. Proof by Altavista: –database+XML -- 40,000 pages. Database issues: –How are we going to model XML? (trees). –How are we going to query XML? (XML-QL,Quilt) –How are we going to store XML (in a relational database? object-oriented?) –How are we going to process XML efficiently? (uh… well..., um..., ah..., get some good grad students!)

27 Exporting Relational Data to XML

28 Data Sharing with XML: Easy Data source (e.g. relational Database) Application Web XML

29 Exporting Relational Data to XML Product(pid, name, weight) Company(cid, name, address) Makes(pid, cid, price) productcompany makes

30 Export data grouped by companies GizmoWorks Tacoma gizmo 19.99 … Bang Kirkland gizmo 22.99 … … Redundant representation of products

31 The DTD

32 Export Data by Products Gizmo GizmoWorks 19.99 Tacoma Bang 22.99 Kirkland … OneClick … Redundant representation of companies

33 Which One Do We Choose ? The structure of the XML data is determined by agreement, with our partners, or dictated by committees –Many XML dialects (called applications) XML Data is often nested, irregular, etc No normal forms for XML

34 Querying XML At first, there was XQL (XPath): weak, impoverished. Then, Dbers noticed that XML is a variant on semi-structured data: invented XML-QL. A lot of people got excited – formed a committee. Committee is still working: –Inside scoop: Quilt – the best of XML-QL, SQL and Xpath.

35 Running Example Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998

36 XPath Introduction Syntax for XML document navigation and node selection A recommendation of the W3C (i.e., a standard) Building block for other W3C standards: – XSL Transformations (XSLT) – XML Link (XLink) – XML Pointer (XPointer)

37 XPath Traversal /bib/book/year Result: 1995 1998 /bib/paper/year Result: empty (there were no papers)

38 XPath Unbounded Traversal //author Result: Serge Abiteboul Rick Hull Victor Vianu Jeffrey D. Ullman /bib//first-name Result: Rick

39 XPath Text Selection /bib/book/author/text() Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname

40 XPath Wild Star //author/* Result: Rick Hull * Matches any element

41 XPath Attribute Navigation /bib/book/@price Result: “55” @price means that price is has to be an attribute

42 XPath Existential Constraint /bib/book/author[firstname] Result: Rick Hull

43 XPath Existentials /bib/book[@price < “60”] /bib/book[author/@age < “25”] /bib/book[author/text()]

44 XPath Expressions Summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a book @pricematches a price attribute bib/book/@pricematches price attribute in book, in bib bib/book/[@price<“55”]/author/lastname matches…

45 XML-QL Hello World Matching data using elements patterns. WHERE Addison-Wesley $t $a IN “www.a.b.c/bib.xml” CONSTRUCT $a

46 XML-QL Element-As Matching data using elements patterns. WHERE Addison-Wesley $t $a ELEMENT-AS $e IN “www.a.b.c/bib.xml” CONSTRUCT $e

47 Constructing XML Data WHERE Addison-Wesley $t $a IN “www.a.b.c/bib.xml CONSTRUCT $a $t

48 Grouping with Nested Queries WHERE $t, Addison-Wesley CONTENT_AS $p IN “www.a.b.c/bib.xml” CONSTRUCT $t WHERE $a IN $p CONSTRUCT $a

49 Joining Elements by Value WHERE $f $l ELEMENT_AS $e IN “www.a.b.c/bib.xml” $f $l IN “www.a.b.c/bib.xml”, y > 1995 CONSTRUCT $e Find all articles whose writers also published a book after 1995.

50 Tag Variables WHERE $f $l ELEMENT_AS $e IN “www.a.b.c/bib.xml” $f $l IN “www.a.b.c/bib.xml”, y > 1995 CONSTRUCT $e Find all articles whose writers have done something after 1995.

51 Regular Path Expressions WHERE $r Ford IN "www.a.b.c/bib.xml" CONSTRUCT $r Find all parts whose brand is Ford, no matter what level they are in the hierarchy.

52 Regular Path Expressions WHERE $r IN "www.a.b.c/parts.xml" CONSTRUCT $r

53 XML Data Integration WHERE ELEMENT_AS $n $ssn IN “www.a.b.c/data.xml” $ssn ELEMENT_AS $I IN “www.irs.gov/taxpayers.xml” CONSTRUCT $n $I Query can access more than one XML document.

54 Quilt: Hello World List all titles of books published by Morgan Kaufmann in 1998: FOR $b IN document(“bib.xml”)/book WHERE $b/publisher = “Morgan Kaufmann” AND $b/year = “1998” RETURN $b/title

55 Quilt: Creating XML Output Find all names with a firstname and lastname; group them in a FOR $a IN document(“bib.xml”)//author, $f IN $a/firstName, $l IN $a/lastName RETURN $f $l

56 Quilt: Joins Retrieve the titles of the books written by Laing before 1967, together with their reviews. FOR $b in document(“bib.xml”)//book[@year<1967], $r in document(“reviews.xml”)//review WHERE $b/authors/lastname=“Laing” and $b/@ISBN=$r/@ISBN RETURN $b/title/text(), $r

57 Quilt: FLWR Expressions Retrieve the titles of the books written by Laing before 1967 together with their reviews. FOR $b in document(“input.xml”)//book[@year<1967] LET $R = document(“input.xml”)//review[@isbn=$b/@isbn] WHERE $b/authors/lastname=“Laing” RETURN $t $R

58 Quilt: another example List all authors that published both in 1998 and 1999 FOR $a IN distinct(document(“bib.xml”)/book/author, WHERE contains(document(“bib.xml”)/book[year=1998]/author, $a) AND contains(document(“bib.xml”)/book[year=1999]/author, $a) RETURN $a

59 XSL A.k.a XSLT A recommendation of the W3C (standard) Initial goal: translate XML to HTML Became: translate XML to XML –HTML is just a special case Interesting politics

60 Query Processing For XML Approach 1: store XML in a relational database. Translate an XML-QL/Quilt query into a set of SQL queries. –Leverage 20 years of research & development. Approach 2: store XML in an object-oriented database system. –OO model is closest to XML, but systems do not perform well and are not well accepted. Approach 3: build a native XML query processing engine. –Still in the research phase; see Zack next week.

61 Relational Approach Step 1: given a DTD, create a relational schema. Step 2: map the XML document into tuples in the relational database. Step 3: given a query Q in XML-QL/Quilt, translate it to a set of queries P over the relational database. Step 4: translate the tuples returned from the relational database into XML elements.

62 Which Relational Schema? The key question! Affects performance. No magic solution. Some options: –The EDGE table: put everything in one table –The Attribute tables: create a table for every tag name. –The inlining method: inline as much data into the tables.

63 An Example DTD <!DOCTYPE db [ ]> PCDATA means Parsed Character Data (a mouthful for string)

64 The Edge Approach sourceID tag destID destValue - Don’t need a DTD. - Very simple to implement.

65 The Attribute Approach rootID bookId bookID title rootID pubID pubID pubName bookID author Book Title Author Publisher pubID state PubName PubState

66 The In-lining Approach bookID title pubName pubState bookID author BookAuthor Book sourceID tag destID destValue Publisher

67 Let the Querying Begin! Matching data using elements patterns. WHERE Bernstein $t IN “www.a.b.c/bib.xml” CONSTRUCT $t

68 The Edge Approach SELECT e3.destValue FROM E as e1, E as e2, E as e3 WHERE e1.tag = “book” and e1.destID=e2.sourceID and e2.tag=“title” and e1.destID=e3.sourceID and e3.tag=“author” and e2.author=“Bernstein”

69 The Attribute Approach SELECT Title.title FROM Book, Title, Author WHERE Book.bookID = Author.bookID and Book.bookID = Title.bookID and Author.author = “Bernstein”

70 The In-lining Approach SELECT Book.title FROM Book, BookAuthor WHERE Book.bookID =BookAuthor.bookID and BookAuthor.author = “Bernstein”

71 Reconstructing XML Elements Matching data using elements patterns. WHERE Bernstein $t ELEMENT-AS $e IN “www.a.b.c/bib.xml” CONSTRUCT $e

72 Open Questions Native query processing for XML To order or not to order? Combining IR-style keyword queries with DB-style structured queries Updates Automatic selection of a relational schema How should we extend relational engines to better support XML storage and querying?


Download ppt "Lecture #6 XML November 2 nd, 2000. Administration Thanks for the mid-term comments Comment on the book & readings Project #2 Project #1 Homework #4 Homework."

Similar presentations


Ads by Google