Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania XML (continued) February 10, 2016.

Similar presentations


Presentation on theme: "© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania XML (continued) February 10, 2016."— Presentation transcript:

1 © 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania XML (continued) February 10, 2016

2 © 2016 A. Haeberlen, Z. Ives Announcements HW1 MS1 is due TODAY Reminder: No late submissions accepted without an extension To get an extension, use the web interface at https://alliance.seas.upenn.edu/~cis455/cgi-bin/submit.php Jokers must be used before the assignment is due (but it is ok to spend additional jokers to 'extend an extension') Submission is via the web interface, as with HW0 Please test your solution carefully before submitting! At the very least, you should complete all the tests in the BTG handout Reading: XSLT tutorial http://www.w3schools.com/xsl/ 2 University of Pennsylvania

3 © 2016 A. Haeberlen, Z. Ives Some advice on MS2 START NOW! Some time will be needed for testing and debugging Some features may be trickier than you think FIX MS1 PROBLEMS FIRST MS2 will build on MS1 It's just like building a house: The foundation has to be solid! TEST EXTENSIVELY Not just with Firefox; also use apachebench and curl We'll try to make some testing guidelines available on the course webpage 3 University of Pennsylvania

4 © 2016 A. Haeberlen, Z. Ives Plan for today Data interchange Extensible Markup Language (XML) DTDs and XML Schema; DOM Document Type Definitions (DTDs) XML Schema Document Object Model (DOM) XPath Query examples Axes XSLT 4 University of Pennsylvania NEXT

5 © 2016 A. Haeberlen, Z. Ives Recap: DTDs Why do we need them? What do they check? What are their limitations? 5 University of Pennsylvania

6 © 2016 A. Haeberlen, Z. Ives XML Schema: DTDs rethought Features: XML Syntax Better way of defining keys using XPaths Subtyping Namespaces... and, of course, built-in datatypes 6 University of Pennsylvania

7 © 2016 A. Haeberlen, Z. Ives Example XML Schema 7 University of Pennsylvania http://en.wikipedia.org/wiki/XML_Schema_%28W3C%29 Actual data types Structured type Elements can have minOccurs, maxOccurs Root of every XML Schema

8 © 2016 A. Haeberlen, Z. Ives Basic constructs of Schema Separation of elements (and attributes) from types: complexType is a structured type, which can have sequences or choices Sequence: Elements in the sequence must be present, in that order Choice: Only one of the elements must be present element and attribute have name and type; elements may also have minOccurs and maxOccurs Subtyping, most commonly using 8 University of Pennsylvania...

9 © 2016 A. Haeberlen, Z. Ives Some more examples 9 University of Pennsylvania Adds three elements to 'personinfo' http://www.w3schools.com/schema/

10 © 2016 A. Haeberlen, Z. Ives Designing an XML schema or DTD Often we are given an existing DTD or schema Example: HTML DTD If not, we need to design one What would be a good approach? Idea: Orient the XML tree around the 'central' objects in the application of interest We've already discussed this in the context of mapping data structures to XML; XML schema can specify such a mapping 10 University of Pennsylvania

11 © 2016 A. Haeberlen, Z. Ives Manipulating XML documents Typical tasks: Restructure a XML document Add/remove/modify elements Example: Dynamically changing a document in response to inputs Retrieve certain elements that satisfy some constraint Examples: All books, all addresses in New Hampshire How do we do this in a program? Need an interface that allows programs and scripts to dynamically access and update the content, structure, and style of documents Solution: The Document Object Model (DOM) 11 University of Pennsylvania

12 © 2016 A. Haeberlen, Z. Ives The Document Object Model Document components represented by objects Objects have methods like getFirstChild(), getNextSibling()...  can be used to traverse the tree Can also modify the tree, and thus alter the XML, via insertAfter(), etc. 12 University of Pennsylvania Root ?xml dblp mastersthesis article mdate key authortitleyearschool editortitleyearjournalvolumeee mdate key 2002… ms/Brown92 Kurt P…. PRPL… 1992 Univ…. 2002… tr/dec/… Paul R. The… Digital… SRC… 1997 db/labs/dec http://www. attribute root p-i element text XML parser XML document

13 © 2016 A. Haeberlen, Z. Ives Isn't there an easier way? What if we want to find all the author nodes, or all the title nodes that contain 'scalable'? Coding this manually can be quite cumbersome - need to traverse the entire tree, keep track of conditions,... Alternative: A query language Idea: Declaratively describe the nodes we're interested in, and let a query engine do all the hard work This can be done with XPath 13 University of Pennsylvania

14 © 2016 A. Haeberlen, Z. Ives Recap: DTDs, XML Schema, DOM Document Type Definitions (DTDs) An EBNF grammar that defines the structure of an XML doc. Special support for IDs and references Several limitations, e.g., no proper data types, no subtypes XML Schema More expressive than DTDs; itself an XML document 'Real' data types, subtyping,... Document Object Model (DOM) An interface for accessing/changing XML data from programs Document components are represented by a tree of objects 14 University of Pennsylvania

15 © 2016 A. Haeberlen, Z. Ives Problem: Presentation What does this look like in the browser? 15 University of Pennsylvania Benjamin Franklin 0001 A+ George Washington 0003 C- http://www.cis.upenn.edu/~cis455/demo/example1.xml

16 © 2016 A. Haeberlen, Z. Ives Solution: XSLT Let's add a style sheet (analogous to CSS) 16 University of Pennsylvania... Benjamin Franklin 0001 A+ George Washington 0003 C- http://www.cis.upenn.edu/~cis455/demo/example2.xml

17 © 2016 A. Haeberlen, Z. Ives Solution: XSLT 17 University of Pennsylvania CIS455/555 grades Student PennID Grade Look for 'course' elements at the top level, and output the following for each For each 'student' element within the 'grades' element... Fill in the value of the 'grade' element here

18 © 2016 A. Haeberlen, Z. Ives Plan for today Data interchange Extensible Markup Language (XML) DTDs and XML Schema; DOM Document Type Definitions (DTDs) XML Schema Document Object Model (DOM) XPath Query examples Axes XSLT 18 University of Pennsylvania NEXT

19 © 2016 A. Haeberlen, Z. Ives XPaths What is an XPath? In its simplest form, like a path in a file system: The XPath returns a node set, representing the XML nodes (and their subtrees) at the end of the path XPaths can have node tests at the end, returning only specific node types, e.g., text(), processing-instruction(), comment(), element(), attribute() XPath is fundamentally an ordered language: it can query in an order-aware fashion, and it returns nodes in order 19 University of Pennsylvania /mypath/subpath/*/morepath

20 © 2016 A. Haeberlen, Z. Ives 20 Sample XML for our XPath examples Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Paul R. McJones The 1995 SQL Reunion Digital System Research Center Report SRC1997-018 1997 db/labs/dec/SRC1997-018.html http://www.mcjones.org/System_R/SQL_Reunion_95/ Side note: DBLP provides bibliographic information on major computer science journals and proceedings

21 © 2016 A. Haeberlen, Z. Ives 21 Visualization Root ?xml dblp mastersthesis article mdate key authortitleyearschool editortitleyearjournalvolumeee mdate key 2002… ms/Brown92 Kurt P…. PRPL… 1992 Univ…. 2002… tr/dec/… Paul R. The… Digital… SRC… 1997 db/labs/dec http://www. attribute root p-i element text Which XPath query returns this element?

22 © 2016 A. Haeberlen, Z. Ives Some XPath query examples XPath queries can be relative or absolute /dblp/mastersthesis/titleAbsolute (starts with a /) Wildcards can be used /dblp/*/editorAll editors All ee's with =2 ancestors Special syntax for selecting all elements //titleAll title elements All 'volume' parts of articles Attributes are specified with @ //@idAll 'id' attributes All journals with ISSN attr 22 University of Pennsylvania /*/*/ee //article/volume //Journal[@issn]

23 © 2016 A. Haeberlen, Z. Ives More XPath query examples Square brackets to further specify elements /article/journal[1]First journal child of article /article/journal[last()]Last journal child of article Predicates can filter the nodes that are returned //journal[@issn='123']All journals with this issn count() counts selected elements //*[count(ee)=2]All elements w/2 ee children All elements w/3 children Other functions available //*[contains(name(),'ACM')]All elem. w/name cont. ACM Elements w/name >3 char. 23 University of Pennsylvania //*[count(*)=3] //*[string-length(name())>3]

24 © 2016 A. Haeberlen, Z. Ives Context nodes and relative paths XPath has a notion of a context node Analogous to current working directory under Unix XPath is evaluated relative to that specific node; defaults to the document root '.' represents the context node '..' represents the parent node We can express relative paths: foo/bar/../.. gets us back to the context node Example: Suppose we are at the 'author' child of the mastersthesis node, and we want to query the title 24 University of Pennsylvania

25 © 2016 A. Haeberlen, Z. Ives More complex traversals with axes So far, we have seen XPath queries that go down the tree (and up one step) But we can go up, left, right, etc. This is expressed with so-called axes self::path-step child::path-stepparent::path-step descendant::path-stepancestor::path-step descendant-or-self::path-stepancestor-or-self::path-step preceding-sibling::path-stepfollowing-sibling::path-step preceding::path-stepfollowing::path-step The XPaths we've seen so far were in 'abbreviated form': /AAA/BBB is equivalent of /child::AAA/child::BBB 25 University of Pennsylvania All ancestors (parent, grandparent,...) of the current node Everything after the closing tag of the current node

26 © 2016 A. Haeberlen, Z. Ives 26 Users of XPath XML Schema uses simple XPaths in defining keys and uniqueness constraints XLink and XPointer, hyperlinks for XML XQuery – useful for restructuring an XML document or combining multiple documents XSLT – useful for converting from XML to other representations (e.g., HTML, PDF, SVG) Coming up next

27 © 2016 A. Haeberlen, Z. Ives Recap: XPath A query language for XML Queries select a set of nodes Some selection criteria: Type of node, attributes, position in the tree (relative or absolute), number of children,... Can traverse the tree in all directions: up, down, left, right A building block in many other technologies, e.g., XSLT 27 University of Pennsylvania

28 © 2016 A. Haeberlen, Z. Ives Plan for today Data interchange Extensible Markup Language (XML) DTDs and XML Schema; DOM XPath Query examples Axes XSLT Templates Processing model Common operations 28 University of Pennsylvania NEXT

29 © 2016 A. Haeberlen, Z. Ives Extensible Stylesheet Language Transformations (XSLT) What if we have an XML document, but want...... an XML document in a different schema?... an HTML document for displaying?... a PDF, postscript, or PNG with the same content? Solution: XSLT 29 University of Pennsylvania XML document (Schema A) XML document (Schema B) PDF document HTML document XSLT processor XSLT stylesheet

30 © 2016 A. Haeberlen, Z. Ives A functional language for XML XSLT is based on a series of templates that match different parts of an XML document Functional model XSLT templates can invoke other templates XSLT templates can be nonterminating Beware! It is not difficult to shoot yourself in the foot! XSLT templates are based on XPath matches We can also apply other templates, potentially to 'select'ed XPath Within each template, directly describe what should be output 30 University of Pennsylvania

31 © 2016 A. Haeberlen, Z. Ives A simple example 31 University of Pennsylvania Andreas Haeberlen Zachary G. Ives Haeberlen Ives XSLT processor http://en.wikipedia.org/wiki/XSLT What we have: What we want: XPath matches

32 © 2016 A. Haeberlen, Z. Ives 32 An XSLT Template Itself an XML document XML tags create output OR are XSL operations All XSL tags are prefixed with “xsl” namespace All non-XSL tags are part of the XML output Common XSL operations: template with a match XPath Recursive call to apply-templates, which may also select where it should be applied Attach to XML document with a processing- instruction: http://www.com/my.xsl

33 © 2016 A. Haeberlen, Z. Ives XSLT processing model The XSLT processor has: Input tree, current node list, result tree Processing is as follows: Do a preorder traversal of the input tree until some node is found to match one of the templates If there is more than one template that matches, pick the 'best' one according to certain heuristics Set current node list to be the nodes the template matches Iterate over the nodes in the current node list For each, apply the operations in the template and 'append' the output to the result tree If specified by apply-templates, repeat recursively 33 University of Pennsylvania

34 © 2016 A. Haeberlen, Z. Ives 34 Other common operations Iteration: Conditionals: Copying current node and children to the result set:

35 © 2016 A. Haeberlen, Z. Ives 35 Creating output nodes Return text/attribute data (this is a default rule): Create an element from text (attribute is similar): Copy nodes matching a path

36 © 2016 A. Haeberlen, Z. Ives XSLT example 36 University of Pennsylvania Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Paul R. McJones The 1995 SQL Reunion Digital System Research Center Report SRC1997-018 1997 db/labs/dec/SRC1997-018.html http://www.mcjones.org/System_R/SQL_Reunion_95/...

37 © 2016 A. Haeberlen, Z. Ives XSLT example 37 University of Pennsylvania Goal: Display a list of master's theses Authors in red Titles in green Master's theses

38 © 2016 A. Haeberlen, Z. Ives 38 XSLT Summary A very powerful, template-based transformation language for XML document  other structured document Commonly used to convert XML  PDF, SVG, GraphViz DOT format, HTML, WML, … (e.g., via XSL-FO) Primarily useful for presentation of XML or for very simple conversions But sometimes we need more complex operations when converting data from one source to another Joins – combine+correlate information from multiple sources Aggregation – computing averages, counts, etc. More details in CIS330/550

39 © 2016 A. Haeberlen, Z. Ives Plan for today Data interchange Extensible Markup Language (XML) DTDs and XML Schema; DOM XPath Query examples Axes XSLT Templates Processing model Common operations 39 University of Pennsylvania


Download ppt "© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania XML (continued) February 10, 2016."

Similar presentations


Ads by Google