Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.

Similar presentations


Presentation on theme: "XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015."— Presentation transcript:

1 XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015

2 2 Readings & Reminders  Reminder: Homework 1 Milestone 2 due tonight @ 11:59PM  Homework 2 pre-release is now posted  XML, DTD, Schema  XPath  XSLT  For next week: Altinel & Franklin paper on XFilter

3 3 Sample XML Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Paul R. McJones The 1995 SQL Reunion Digital System Research Center Report SRC1997-018 1997 db/labs/dec/SRC1997-018.html http://www.mcjones.org/System_R/SQL_Reunion_95/

4 4 XML Data Model Visualized Root ?xml dblp mastersthesis article mdate key authortitleyearschool editortitleyearjournalvolumeee mdate key 2002… ms/Brown92 Kurt P…. PRPL… 1992 Univ…. 2002… tr/dec/… Paul R. The… Digital… SRC… 1997 db/labs/dec http://www. attribute root p-i element text

5 5 XML Isn’t Enough on Its Own It’s too unconstrained for many cases!  How will we know when we’re getting garbage?  How will we query?  How will we understand what we got?

6 6 Document Type Definitions (DTDs) DTD is an EBNF grammar defining XML structure  XML document specifies an associated DTD, plus the root element  DTD specifies children of the root (and so on) DTD defines special significance for attributes:  IDs – special attributes that are analogous to keys for elements  IDREFs – references to IDs  IDREFS – space-delimited list of IDREFs

7 7 An Example DTD Example DTD: <!ATTLIST mastersthesis(mdateCDATA#REQUIRED keyID#REQUIRED advisorCDATA#IMPLIED> … Example use of DTD in XML file: …

8 8 DTDs Are Very Limited DTDs capture grammatical structure, but have some drawbacks:  Only string scalar types  Global ID/reference space is inconvenient  No way of defining OO-like inheritance

9 9 XML Schema: DTDs Rethought Features:  XML syntax  Better way of defining keys using XPaths  Type subclassing  … And, of course, built-in datatypes

10 10 Basic Constructs of Schema Separation of elements (and attributes) from types:  complexType is a structured type  It can have sequences or choices  element and attribute have name and type  Elements may also have minOccurs and maxOccurs Subtyping, most commonly using: …

11 11 Simple Schema Example

12 12 Embedding XML Schema a But the XML parser is actually free to ignore this – the schema is typically specified “from outside” the document

13 13 Manipulating XML Sometimes:  Need to restructure an XML document  Or simply need to retrieve certain parts that satisfy a constraint, e.g.:  All books  All books by author XYZ

14 14 Document Object Model (DOM) vs. Queries  Build a DOM tree (as we saw earlier) and access via Java (etc.) DOMNode object  DOM objects have methods like “getFirstChild()”, “getNextSibling”  Common way of traversing the tree  Can also modify the DOM tree – alter the XML – via insertAfter(), etc.  Alternate approach: a query language  Define some sort of a template describing traversals from the root of the directed graph  In XML, the basis of this template is called an XPath  Can also declare some constraints on the values you want  The XPath returns a node set of matches

15 15 XPaths In its simplest form, an XPath is like a path in a file system: /mypath/subpath/*/morepath  The XPath returns a node set representing the XML nodes (and their subtrees) at the end of the path  XPaths can have node tests at the end, returning only particular node types, e.g., text(), processing-instruction(), comment(), element(), attribute()  XPath is fundamentally an ordered language: it can query in order-aware fashion, and it returns nodes in order

16 16 Sample XML Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Paul R. McJones The 1995 SQL Reunion Digital System Research Center Report SRC1997-018 1997 db/labs/dec/SRC1997-018.html http://www.mcjones.org/System_R/SQL_Reunion_95/

17 17 XML Data Model Visualized Root ?xml dblp mastersthesis article mdate key authortitleyearschool editortitleyearjournalvolumeee mdate key 2002… ms/Brown92 Kurt P…. PRPL… 1992 Univ…. 2002… tr/dec/… Paul R. The… Digital… SRC… 1997 db/labs/dec http://www. attribute root p-i element text

18 18 Some Example XPath Queries  /dblp/mastersthesis/title  /dblp/*/editor  //title  //title/text()

19 19 Context Nodes and Relative Paths XPath has a notion of a context node: it’s analogous to a current directory  “.” represents this context node  “..” represents the parent node  We can express relative paths: subpath/sub-subpath/../.. gets us back to the context node  By default, the document root is the context node

20 20 Predicates – Filtering Operations A predicate allows us to filter the node set based on selection- like conditions over sub-XPaths: /dblp/article[title = “Paper1”] which is equivalent to: /dblp/article[./title/text() = “Paper1”] because of type coercion. What does this do: /dblp/article[@key = “123” and./title/text() = “Paper1” and./author/*/element()]

21 21 Axes: More Complex Traversals Thus far, we’ve seen XPath expressions that go down the tree (and up one step)  But we might want to go up, left, right, etc.  These are expressed with so-called axes:  self::path-step  child::path-stepparent::path-step  descendant::path-stepancestor::path-step  descendant-or-self::path-stepancestor-or-self::path-step  preceding-sibling::path-stepfollowing-sibling::path-step  preceding::path-stepfollowing::path-step  The previous XPaths we saw were in “abbreviated form”

22 22 Users of XPath  XML Schema uses simple XPaths in defining keys and uniqueness constraints  XLink and XPointer, hyperlinks for XML  XSLT – useful for converting from XML to other representations (e.g., HTML, PDF, SVG)  XQuery – useful for restructuring an XML document or combining multiple documents  Might well turn into the “glue” between Web Services, etc.

23 23 A Functional Language for XML  XSLT is based on a series of templates that match different parts of an XML document  There’s a policy for what rule or template is applied if more than one matches (it’s not what you’d think!)  XSLT templates can invoke other templates  XSLT templates can be nonterminating (beware!)  XSLT templates are based on XPath “match”es, and we can also apply other templates (potentially to “select”ed XPaths)  Within each template, directly describe what should be output

24 24 An XSLT Template  An XML document itself  XML tags create output OR are XSL operations  All XSL tags are prefixed with “xsl” namespace  All non-XSL tags are part of the XML output  Common XSL operations:  template with a match XPath  Recursive call to apply-templates, which may also select where it should be applied  Attach to XML document with a processing-instruction: http://www.com/my.xsl

25 25 An Example XSLT Stylesheet This is DBLP …

26 26 XSLT Processing Model  List of source nodes  result tree fragment(s)  Start with root  Find all template rules with matching patterns from root  Find “best” match according to some heuristics  Set the current node list to be the set of things it maches  Iterate over each node in the current node list  Apply the operations of the template  “Append” the results of the matching template rule to the result tree structure  Repeat recursively if specified to by apply-templates

27 27 What If There’s More than One Match?  Eliminate rules of lower precedence due to importing  Break a rule into any | branches and consider separately  Choose rule with highest computed or specified priority  Simple rules for computing priority based on “precision”:  QName preceded by XPath child/axis specifier: priority 0  NCName preceded by child/axis specifier: priority -0.25  NodeTest preceded by child/axis specifier: pririty -0.5  else priority 0.5

28 28 Other Common Operations  Iteration:  Conditionals:  Copying current node and children to the result set:

29 29 Creating Output Nodes  Return text/attribute data (this is a default rule):  Create an element from text (attribute is similar):  Copy nodes matching a path

30 30 Embedding Stylesheets  You can “import” or “include” one stylesheet from another: http://www.com/my.xsl http://www.com/my.xsl  “Include”: the rules get same precedence as in including template  “Import”: the rules are given lower precedence

31 31 XSLT Summary  A very powerful, template-based transformation language for XML document  other structured document  Commonly used to convert XML  PDF, SVG, GraphViz DOT format, HTML, WML, …  Primarily useful for presentation of XML or for very simple conversions  But sometimes we need more complex operations when converting data from one source to another  Joins – combining and correlating information from multiple sources  Aggregation – computing averages, counts, etc.

32 32 XSLT and Alternatives XSLT is focused on reformatting documents  Stylesheets are focused around one XML file  XML file must reference the stylesheet What if we want to:  Manage and combine collections of XML documents?  Make Web service requests for XML?  “Glue together” different Web service requests?  Query for keywords within documents, with ranked answers  This is where XQuery plays a role – see CIS 330 / 550 for details


Download ppt "XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015."

Similar presentations


Ads by Google