Presentation is loading. Please wait.

Presentation is loading. Please wait.

About XML/Xquery/RDF 4/1. TEXT Structured (relational) Data XML Less Structure More Structure.

Similar presentations


Presentation on theme: "About XML/Xquery/RDF 4/1. TEXT Structured (relational) Data XML Less Structure More Structure."— Presentation transcript:

1 about XML/Xquery/RDF 4/1

2 TEXT Structured (relational) Data XML Less Structure More Structure

3 HTML vs. XML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … “Self-describing” -Schema info part of the data -Good for data exchange (albeit baroque for storage)

4 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … HTML describes presentation XML describes content

5 Why are Database folks so excited about XML? XML is just a syntax for (self- describing) data This is still exciting because –No standard syntax for relational data –With XML, we can Translate any legacy data to XML Can exchange data in XML format –Ship over the web, input to any application

6 XML  machine accessible meaning This is what a web-page in natural language looks like for a machine Jim Hendler

7 XML  machine accessible meaning CV name education work private XML allows “meaningful tags” to be added to parts of the text Jim Hendler

8 XML  machine accessible meaning CV name education work private But to your machine, the tags look like this…. Jim Hendler

9 XML  machine accessible meaning Schemas help…. …by relating common terms between documents  Jim Hendler

10 But other people use other schemas CV name education work private   >  Someone else has one like this…. Jim Hendler

11 But other people use other schemas …which don’t fit in  Moral: There is still need for ontology mapping.. Jim Hendler

12 The X-standards… XML: an on-the-wire representation for data –Xquery: a query language for XML –Xschema: a schema description language for XML data RDF: a language for meta- data description WSDL/SOAP/UDDI: languages for describing services

13 XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags

14 Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteoul, Buneman, Suciu Morgan Kaufmann, 1999 Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … HTML describes presentation XML describes content

15

16 XML Terminology tags: book, title, author, … start tag:, end tag: elements: …, … elements are nested empty element: abbrv. an XML document: single root element well formed XML document: if it has matching tags

17 More XML: Attributes Foundations of Databases Abiteboul … 1995 Attributes are single-valued --No guidance on when to use them

18 More XML: Oids and References Jane Mary John oids and references in XML are just syntax Object identifiers

19 XML vs. Relational Data XML is meant as a language that supports both Text and Structured Data –Conflicting demands... XML supports semi-structured data –In essence, the schema can be union of multiple schemas Easy to represent books with or without prices, books with any number of authors etc. XML supports free mixing of text and data –using the #PCDATA type XML is ordered (while relational data is unordered) TEXT Structured (relational) Data XML Less Structure More Structure

20 DTDs <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> … Notice that DTD is not In XML syntax…  Semi- structured

21 XML Schemas More recent proposal (with XML syntax) unifies previous schema proposals generalizes DTDs uses XML syntax two documents: structure and datatypes –http://www.w3.org/TR/xmlschema-1 –http://www.w3.org/TR/xmlschema-2

22 XML Schema

23 RDF: Meta-data Standard for Web birds, butterflies, snakes John Smith Good’ol semantic networks..?

24 Querying XML Requirements: –Need to handle lack of schema. We may not know much about the data, so we need to navigate the XML. –Need to support both “information retrieval” and “SQL- style” queries. Ordered vs. un-ordered XML –“Human readable” like SQL? Candidates –Many… based on conflicting requirements XSL: Makes IR folks happy XML-QL: Makes DB folks happy Xquery : W3C’s attempt to make everybody (un)happy

25 XQuery 1.0: An XML Query Language –W3C Working Draft 20 December 2001 XML Query Use Cases –W3C Working Draft 20 December 2001 Microsoft.Net Xquery Language Demo –http://131.107.228.20/http://131.107.228.20/ –http://support.x- hive.com/xquery/index.ht ml –Supports querying on the documents described in the W3C Use Cases Xquery Tutorial by Fankhauser & Wadler –www.research.avayalabs.com/ user/wadler/papers/xquery- tutorial/ xquery-tutorial.pdf Xquery Resources

26 FLoWeR Expressions Xquery queries are made up of FLWR expressions that work on “paths” For binds variables to nodes Let computes aggregates Where applies a formula to find matching elements Return constructs the output elements Path expressions are of the form: element//element/element[attrib=value]

27 Comparison to SQL Look at the use case description on Xquery manual Supports all (?) SQL style queries (with different syntax of course) [default queries in the demo] Has support for –“construction”—outputting the answers in arbitrary XML formats (use case “XMP” ) –“path expressions” --- navigating the XML tree (use case “seq”) –Simple text queries [use case “text”] –Allows queries on “Tag” elements Removes the “data/meta-data” barrier in queries –For each book that has at least one author, list the title and first two authors, and an empty "et-al" element if the book has additional authors. [XMP use case 6]

28 DTD for http://www.bn.com/bib.xml

29 Example Query { for $b in /bib/book where $b/publisher = "Addison- Wesley" and $b/@year > 1991 return { $b/title } } “For all books after 1991, return with Year changed from a tag to an attribute” TCP/IP Illustrated Advanced Programming in the Unix environment Result Query

30 Example Query (2) Return the books that cost more at amazon than fatbrain Let $amazon := document(http://www.amazon.com/books.xml),http://www.amazon.com/books.xml Let $fatbrain := document(http://www.fatbrain.com/books.xml)http://www.fatbrain.com/books.xml For $am in $amazon/books/book, $fat in $fatbrain/books/book Where $am/isbn = $fat/isbn and $am/price > $fat/price Return { $am/title, $am/price, $fat/price } Join

31 XML frenzy in the DB Community Now that XML is there, what can we do with it? –Convert all databases from Relational to XML? Or provide XML views of relational databases? –Develop theory of native XML databases? Or assume that XML data will be stored in relational databases.. –Issues: What sort of storage mechanisms? What sort of indices?

32 XML middleware for Databases XML adapters (middle-ware) received significant attention in DB community –SilkRoute (AT&T) –Xperanto (IBM) Issues: – Need to convert relational data into XML Tagging (easy) –Need to convert Xquery queries into equivalent SQL queries Trickier as Xquery supports schema querying

33 Don’t look beyond this..

34 Xquery Tutorial Craig Knoblock University of Southern California

35 References XQuery 1.0: An XML Query Language –W3C Working Draft 20 December 2001 XML Query Use Cases –W3C Working Draft 20 December 2001 Microsoft.Net Xquery Language Demo –http://131.107.228.20/http://131.107.228.20/ –Supports querying on the documents described in the W3C Use Cases Xquery Tutorial by Fankhauser & Wadler –www.research.avayalabs.com/user/wadler/papers/xquer y-tutorial/ xquery-tutorial.pdf

36 DTD for http://www.bn.com/bib.xml

37 Data for www.bn.com/bib.xmlwww.bn.com/bib.xml TCP/IP Illustrated Stevens W. Addison-Wesley 65.95 Advanced Programming in the Unix environment Stevens W. Addison-Wesley 65.95

38 Data for www.bn.com/bib.xml (cont.)www.bn.com/bib.xml Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95 The Economics of Technology and Content for Digital TV Gerbarg Darcy CITI Kluwer Academic Publishers 129.95

39 Document References Document can either be referenced explicitly or in the default namespace In the Microsoft Demo –/Bib = document("http://www.bn.com/bib.xml")/bib We will use /bib throughout, but you must use the expansion to run the demo In Theseus the document for xquery is passed as input

40 Projection Return the names of all authors of books /bib/book/author = Stevens W. Abiteboul Serge Buneman Peter Suciu Dan

41 Project (cont.) The same query can also be written as a for loop /bib/book/author = for $bk in /bib/book return for $aut in $bk/author return $aut = Stevens W. Abiteboul Serge Buneman Peter Suciu Dan

42 Selection Return the titles of all books published before 1997 /bib/book[@year < "1997"]/title = TCP/IP Illustrated Advanced Programming in the Unix environment

43 Selection (cont.) Return the titles of all books published before 1997 /bib/book[@year < "1997"]/title = for $bk in /bib/book where $bk/@year < "1997" return $bk/title = TCP/IP Illustrated Advanced Programming in the Unix environment

44 Selection (cont.) Return book with the title “Data on the Web” /bib/book[title = "Data on the Web"] = Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95

45 Selection (cont.) Return the price of the book “Data on the Web” /bib/book[title = "Data on the Web"]/price = 39.95 How would you return the book with a price of $39.95?

46 Selection (cont.) Return the book with a price of $39.95 for $bk in /bib/book where $bk/price = " 39.95" return $bk = Data on the Web Abiteboul Serge Buneman Peter Suciu Dan Morgan Kaufmann Publishers 39.95

47 Construction Return year and title of all books published before 1997 for $bk in /bib/book where $bk/@year < "1997" return { $bk/@year, $bk/title } = TCP/IP Illustrated Advanced Programming in the Unix environment

48 Grouping Return titles for each author for $author in distinct(/bib/book/author/last) return { /bib/book[author/last = $author]/title } = TCP/IP Illustrated Advanced Programming in the Unix environment Data on the Web …

49 Join Return the books that cost more at amazon than fatbrain Let $amazon := document(http://www.amazon.com/books.xml),http://www.amazon.com/books.xml Let $fatbrain := document(http://www.fatbrain.com/books.xml)http://www.fatbrain.com/books.xml For $am in $amazon/books/book, $fat in $fatbrain/books/book Where $am/isbn = $fat/isbn and $am/price > $fat/price Return { $am/title, $am/price, $fat/price }

50 Example Query 1 { for $b in /bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return { $b/title } } What does this do?

51 Result Query 1 TCP/IP Illustrated Advanced Programming in the Unix environment

52 Example Query 2 { for $b in document("http://www.bn.com/bib.xml")/bib/book, $t in $b/title, $a in $b/author return { $t } { $a } }

53 Result Query 2 TCP/IP Illustrated Stevens Advanced Programming in the Unix environment Stevens Data on the Web Abiteboul Data on the Web Buneman Data on the Web Suciu

54 Example Query 3 { for $b in document("http://www.bn.com/bib.xml")//book, $a in document("http://www.amazon.com/reviews.xml")//entry where $b/title = $a/title return { $b/title } { $a/price/text() } { $b/price/text() } }

55 Result Query 3 TCP/IP Illustrated 65.95 Advanced Programming in the Unix environment 65.95 Data on the Web 34.95 39.95

56 Example Query 4 { for $b in document("www.bn.com/bib.xml")//book where $b/publisher = "Addison-Wesley" and $b/@year > "1991" return { $b/@year } { $b/title } sortby (title) }

57 Example Result 4 Advanced Programming in the Unix environment TCP/IP Illustrated

58 Impact of XML on Integration If and when all sources accept Xqueries and exchange data in XML format, then –Mediator can accept user queries in Xquery –Access sources using Xquery –Get data back in XML format –Merge results and send to user in XML format How about now? –Sources can use XML adapters (middle-ware)

59 Is XML standardization a magical solution for Integration? If all WEB sources standardize into XML format –Source access (wrapper generation issues) become easier to manage –BUT all other problems remain Still need to relate source (XML)schemas to mediator (XML)schema Still need to reason about source overlap, source access limitations etc. Still need to manage execution in the presence of source/network uncertainities

60 “Semantic Web” The LAV/GAV approaches assume that some human expert will do the actual schema mapping The “semantic-web” initiative attempts to automate schema mapping –Idea: Allow pages to write logical axioms relating their vocabulary (tags) to other external tags –Support automatic inference of relations between source and mediator schema using these rules DAML+OIL

61

62

63

64

65

66

67

68 Data Model

69 Which will have XML Syntax

70 Document Type Definition: DTD part of the original XML specification an XML document may have a DTD terminology for XML: –well-formed: if tags are correctly closed –valid: if it has a DTD and conforms to it validation is useful in data exchange

71 Notice that DTD is not In XML syntax… 

72 External DTD Internal Two ways to specify a DTD Hello, world! <!DOCTYPE greeting [ ]> Hello, world!

73

74 DTDs as Grammars <!DOCTYPE paper [ ]> <!DOCTYPE paper [ ]> …

75

76

77 Shortcomings of DTDs Useful for documents, but not so good for data: No support for structural re-use –Object-oriented-like structures aren’t supported No support for data types –Can’t do data validation Can have a single key item (ID), but: –No support for multi-attribute keys –No support for foreign keys (references to other keys) –No constraints on IDREFs (reference only a Section)

78 XML Schema In XML format Includes primitive data types (integers, strings, dates, etc.) Supports value-based constraints (integers > 100) User-definable structured types Inheritance (extension or restriction) Foreign keys Element-type reference constraints

79 XML Schemas DTD: Pre-specified tags How many different RDBMS Schemas are needed here?

80 Sample XML Schema …

81 .//person[@ssn] @ssn Subtyping in XML Schema

82 DTDs as Schemas Not so well suited: impose unwanted constraints on order references cannot be constrained can be too vague: Union of schemas..?

83 XML Schemas recent proposal unifies previous schema proposals generalizes DTDs uses XML syntax two documents: structure and datatypes –http://www.w3.org/TR/xmlschema-1 –http://www.w3.org/TR/xmlschema-2

84 Although DB folks have several beefs Give me the names of people who are Listed either as editor or author of a book

85

86 Differences between XML and SSD Pure SSD uses edge-labeled graphs as data model XML is ordered, ssd is not XML can mix text and elements: Making Java easier to type and easier to type Phil Wadler XML has lots of other stuff: entities, processing instructions, comments

87 XML vs. standard semi- structured data models Alan 42 ab@com { person: &o123 { name: “Alan”, age: 42, email: “ab@com” } } person nameageemail Alan42ab@com person name age email Alan42ab@com father … { person: { father: &o123 …} } similar on trees, different on graphs Node labeling Edge labeling

88

89 XML seen from (R)DBMS world RDBMS may want to “publish” data in XML [provide an XML view of their data] –“Tagging” the output –Support XML-based querying (which are then converted to SQL querying) Single XML-QL query may correspond to a set of SQL queries –E.g. Schema queries SilkRoute, Xperanto systems –Support XML-based updating Tukwila RDBMS can be used to provide an efficient storage for XML files –Efficient indexing/retrieval of path expressions

90 Other Important XML Standards XSL/XSLT*: –presentation and transformation standards RDF: –resource description framework (meta-info such as ratings, categorizations, etc.) Xpath/Xpointer/Xlink*: –standard for linking to documents and elements within Namespaces: –for resolving name clashes DOM: –Document Object Model for manipulating XML documents SAX: –Simple API for XML parsing

91 RDF http://www.w3.org/TR/REC-rdf-syntax (2/99) purpose: metadata for Web –help search engines syntax in XML semantics: edge-labeled graphs

92 RDF Metadata standard birds, butterflies, snakes John Smith

93 More RDF Examples

94

95 RDF Terminology subject object predicate statement

96 More RDF: Containers bag, sequence, alternative s1 s2

97 RDF Containers (cont’d) Bag s1 s2 a rdf:type rdf_1 rdf_2

98 More RDF: Higher Order Statements “the author of www.thispage.com says: ‘the topic of www.thatpage.com is environment’ “ www.thatpage.com environment topic www.thispage.com says author RDF uses reification

99

100 XML Parsers traditional: return data structure (DOM?) event based: SAX (Simple API for XML) –http://www.megginson.com/SAX –write handler for start tag and for end tag

101 Need for Ontology standardization

102 XML Data Model does not exists Document Object Model (DOM): –http://www.w3.org/TR/REC-DOM-Level-1 (10/98) –class hierarchy (node, element, attribute,…) –objects have behavior –defines API to inspect/modify the document

103

104

105

106

107

108 Start of 4/9 lecture

109 Querying XML

110 XML Data Model (Graph) Issues: distinguish between attributes and sub-elements? Should we conserve order? Think of the labels as names of binary relations.

111 Need for XML querying human-readable documents to retrieve individual documents, to provide dynamic indexes, to perform context-sensitive searching, and to generate new documents. data-oriented documents to query (virtual) XML representations of databases, to transform data into new XML representations, and to integrate data from multiple heterogeneous data sources. mixed-model documents to perform queries on documents with embedded data, such as catalogs, patient health records, employment records, or business analysis documents.

112 Querying XML Requirements: –Query a graph, not a relation. –The result should be a graph (representing an XML document), not a relation. –No schema. –We may not know much about the data, so we need to navigate the XML.

113 W3C requirements The W3C Query Working Group has identified many technical requirements: requirements at least one XML syntax; at least one human-readable syntax. must be declarative; must be protocol independent; must respect XML data model; must be namespace aware; must coordinate with XML Schema; must work even if schemas are unavailable; must support simple and complex datatypes; must support universal and existential quantifiers; must support operations on hierarchy and sequence of document structures; must combine information from multiple documents; must support aggregation; must be able to transform and to create XML structures; must be able to traverse ID references.

114 Query Languages XML-QL: Invented by DB folks –XML-QL is relational-complete (allows Joins) also supports path expressions Can extract as well as transform data into different formats (like XSL) –XML-QL is not in XML syntax XSL: can also be seen as a query language –Can transform data

115 XML-QL data model XML-QL works on an abstraction, called an XML graph, of the concrete XML document: comments and processing instructions are ignored; the relative order of elements is ignored; every node has an ID (autogenerated, if necessary); all leaves are character data. XML graphs are obtained from XML documents but are also generated by queries. A graph is mapped back into an XML document by choosing arbitrary orderings of element sequences. This abstraction is very similar to that from tables to relations: disregard the order of tuples and attributes.

116 Extracting Data by Query Matching data using elements patterns. WHERE Addison-Wesley $t $a IN “www.a.b.c/bib.xml” CONSTRUCT $a “where” clause only specifies What must be in the pattern --pattern can have other stuff besides what is listed in where

117 Constructing XML Data WHERE Addison-Wesley $t $a IN “www.a.b.c/bib.xml CONSTRUCT $a $t

118 Grouping with Nested Queries WHERE $t, Addison-Wesley CONTENT_AS $p IN “www.a.b.c/bib.xml” CONSTRUCT $t WHERE $a IN $p CONSTRUCT $a ”

119 Joining Elements by Value (also integration) WHERE $f $l ELEMENT_AS $e IN “www.a.b.c/artbib.xml” $f $l IN “www.a.b.c/bookbib.xml”, y > 1995 CONSTRUCT $e Find all articles whose writers also published a book after 1995. Multiple queries That share values

120 Tag variables (schema queries) WHERE $t 1995 Smith IN "www.a.b.c/bib.xml", $e IN {author, editor} CONSTRUCT $t Smith $p matches book and article. $e matches author and editor. this saves us from writing four queries. This finds all publications in 1995 where Smith is either author or editor

121 Path Expressions WHERE $r Ford IN "www.a.b.c/parts.xml" CONSTRUCT $r WHERE $r IN "www.a.b.c/parts.xml" CONSTRUCT $r Matches any sequence of nodes all of which are labeled part (can substitute $ for part in the above…)

122 Due 30 th April

123

124


Download ppt "About XML/Xquery/RDF 4/1. TEXT Structured (relational) Data XML Less Structure More Structure."

Similar presentations


Ads by Google