© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania XML (continued) February 10, 2016.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
XML: Extensible Markup Language
XSL eXtensible Stylesheet Language. What is XSL? XSL is a language that allows one to describe a browser how to process an XML file. XSL can convert an.
1 XSLT – eXtensible Stylesheet Language Transformations Modified Slides from Dr. Sagiv.
XML Unit 6 October 31. XML, review XML is used to markup data Used to describe information Uses tags like HTML –But all tags are user-defined –Must be.
Querying XML Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 6, 2003 Some slide content courtesy of Susan Davidson.
XML: Semistructured Data Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 2, 2003 Some slide content courtesy.
1 COS 425: Database and Information Management Systems XML and information exchange.
More XML XML schema, XPATH, XSLT CS 431 – February 21, 2005 Carl Lagoze – Cornell University acknowledgements to
Cornell CS 502 More XML XML schema, XPATH, XSLT CS 502 – Carl Lagoze – Cornell University.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
XML, Schemas, and XPath Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 14, 2004 Some slide content courtesy.
XML Querying and Views Helena Galhardas DEI IST (slides baseados na disciplina CIS 550 – Database & Information Systems, Univ. Pennsylvania, Zachary Ives)CIS.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML Transformations and Content-based Crawling Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.
XML Schemas and Queries Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 7, 2015.
MC 365 – Software Engineering Presented by: John Ristuccia Shawn Posts Ndi Sampson XSLT Introduction BCi.
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
SD2520 Databases using XML and JQuery
ECA 228 Internet/Intranet Design I Intro to XSL. ECA 228 Internet/Intranet Design I XSL basics W3C standards for stylesheets – CSS – XSL: Extensible Markup.
10/06/041 XSLT: crash course or Programming Language Design Principle XSLT-intro.ppt 10, Jun, 2004.
Sheet 1XML Technology in E-Commerce 2001Lecture 6 XML Technology in E-Commerce Lecture 6 XPointer, XSLT.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
CSE3201/CSE4500 XPath. 2 XPath A locator for elements or attributes in an XML document. XPath expression gives direction.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
CSE3201/CSE4500 Information Retrieval Systems
School of Computing and Management Sciences © Sheffield Hallam University To understand the Oracle XML notes you need to have an understanding of all these.
XP New Perspectives on XML Tutorial 6 1 TUTORIAL 6 XSLT Tutorial – Carey ISBN
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
WORKING WITH XSLT AND XPATH
XSLT Extensible Stylesheet Language Transformations CC432 / Short Course 507 Lecturer: Simon Lucas University of Essex Spring 2002.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Company LOGO OODB and XML Database Management Systems – Fall 2012 Matthew Moccaro.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
ECA 228 Internet/Intranet Design I XSLT Example. ECA 228 Internet/Intranet Design I 2 CSS Limitations cannot modify content cannot insert additional text.
CITA 330 Section 6 XSLT. Transforming XML Documents to XHTML Documents XSLT is an XML dialect which is declared under namespace "
Extensible Stylesheet Language Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University XSL-FO XSLT.
XSLT Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Lecture 11 XSL Transformations (part 1: Introduction)
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Name: Niki Sardjono Class: CS 157A Instructor : Prof. S. M. Lee.
XML Schemas, XPath, and XQuery Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 19, 2004 Some slide content.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
XPath. XPath, the XML Path Language, is a query language for selecting nodes from an XML document. The XPath language is based on a tree representation.
More XML XPATH, XSLT CS 431 – February 23, 2005 Carl Lagoze – Cornell University.
Martin Kruliš by Martin Kruliš (v1.1)1.
XSLT: How Do We Use It? Nancy Hallberg Nikki Massaro Kauffman.
Querying XML, Part II Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 5, 2008.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 14 This presentation © 2004, MacAvon Media Productions XML.
XML Schema – XSLT Week 8 Web site:
1 XSL Transformations (XSLT). 2 XSLT XSLT is a language for transforming XML documents into XHTML documents or to other XML documents. XSLT uses XPath.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Rendering XML Documents ©NIITeXtensible Markup Language/Lesson 5/Slide 1 of 46 Objectives In this session, you will learn to: * Define rendering * Identify.
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
XML in Web Technologies
Chapter 7 Representing Web Data: XML
More XML XML schema, XPATH, XSLT
Presentation transcript:

© 2016 A. Haeberlen, Z. Ives CIS 455/555: Internet and Web Systems 1 University of Pennsylvania XML (continued) February 10, 2016

© 2016 A. Haeberlen, Z. Ives Announcements HW1 MS1 is due TODAY Reminder: No late submissions accepted without an extension To get an extension, use the web interface at Jokers must be used before the assignment is due (but it is ok to spend additional jokers to 'extend an extension') Submission is via the web interface, as with HW0 Please test your solution carefully before submitting! At the very least, you should complete all the tests in the BTG handout Reading: XSLT tutorial 2 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives Some advice on MS2 START NOW! Some time will be needed for testing and debugging Some features may be trickier than you think FIX MS1 PROBLEMS FIRST MS2 will build on MS1 It's just like building a house: The foundation has to be solid! TEST EXTENSIVELY Not just with Firefox; also use apachebench and curl We'll try to make some testing guidelines available on the course webpage 3 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives Plan for today Data interchange Extensible Markup Language (XML) DTDs and XML Schema; DOM Document Type Definitions (DTDs) XML Schema Document Object Model (DOM) XPath Query examples Axes XSLT 4 University of Pennsylvania NEXT

© 2016 A. Haeberlen, Z. Ives Recap: DTDs Why do we need them? What do they check? What are their limitations? 5 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives XML Schema: DTDs rethought Features: XML Syntax Better way of defining keys using XPaths Subtyping Namespaces... and, of course, built-in datatypes 6 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives Example XML Schema 7 University of Pennsylvania Actual data types Structured type Elements can have minOccurs, maxOccurs Root of every XML Schema

© 2016 A. Haeberlen, Z. Ives Basic constructs of Schema Separation of elements (and attributes) from types: complexType is a structured type, which can have sequences or choices Sequence: Elements in the sequence must be present, in that order Choice: Only one of the elements must be present element and attribute have name and type; elements may also have minOccurs and maxOccurs Subtyping, most commonly using 8 University of Pennsylvania...

© 2016 A. Haeberlen, Z. Ives Some more examples 9 University of Pennsylvania Adds three elements to 'personinfo'

© 2016 A. Haeberlen, Z. Ives Designing an XML schema or DTD Often we are given an existing DTD or schema Example: HTML DTD If not, we need to design one What would be a good approach? Idea: Orient the XML tree around the 'central' objects in the application of interest We've already discussed this in the context of mapping data structures to XML; XML schema can specify such a mapping 10 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives Manipulating XML documents Typical tasks: Restructure a XML document Add/remove/modify elements Example: Dynamically changing a document in response to inputs Retrieve certain elements that satisfy some constraint Examples: All books, all addresses in New Hampshire How do we do this in a program? Need an interface that allows programs and scripts to dynamically access and update the content, structure, and style of documents Solution: The Document Object Model (DOM) 11 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives The Document Object Model Document components represented by objects Objects have methods like getFirstChild(), getNextSibling()...  can be used to traverse the tree Can also modify the tree, and thus alter the XML, via insertAfter(), etc. 12 University of Pennsylvania Root ?xml dblp mastersthesis article mdate key authortitleyearschool editortitleyearjournalvolumeee mdate key 2002… ms/Brown92 Kurt P…. PRPL… 1992 Univ…. 2002… tr/dec/… Paul R. The… Digital… SRC… 1997 db/labs/dec attribute root p-i element text XML parser XML document

© 2016 A. Haeberlen, Z. Ives Isn't there an easier way? What if we want to find all the author nodes, or all the title nodes that contain 'scalable'? Coding this manually can be quite cumbersome - need to traverse the entire tree, keep track of conditions,... Alternative: A query language Idea: Declaratively describe the nodes we're interested in, and let a query engine do all the hard work This can be done with XPath 13 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives Recap: DTDs, XML Schema, DOM Document Type Definitions (DTDs) An EBNF grammar that defines the structure of an XML doc. Special support for IDs and references Several limitations, e.g., no proper data types, no subtypes XML Schema More expressive than DTDs; itself an XML document 'Real' data types, subtyping,... Document Object Model (DOM) An interface for accessing/changing XML data from programs Document components are represented by a tree of objects 14 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives Problem: Presentation What does this look like in the browser? 15 University of Pennsylvania Benjamin Franklin 0001 A+ George Washington 0003 C-

© 2016 A. Haeberlen, Z. Ives Solution: XSLT Let's add a style sheet (analogous to CSS) 16 University of Pennsylvania... Benjamin Franklin 0001 A+ George Washington 0003 C-

© 2016 A. Haeberlen, Z. Ives Solution: XSLT 17 University of Pennsylvania CIS455/555 grades Student PennID Grade Look for 'course' elements at the top level, and output the following for each For each 'student' element within the 'grades' element... Fill in the value of the 'grade' element here

© 2016 A. Haeberlen, Z. Ives Plan for today Data interchange Extensible Markup Language (XML) DTDs and XML Schema; DOM Document Type Definitions (DTDs) XML Schema Document Object Model (DOM) XPath Query examples Axes XSLT 18 University of Pennsylvania NEXT

© 2016 A. Haeberlen, Z. Ives XPaths What is an XPath? In its simplest form, like a path in a file system: The XPath returns a node set, representing the XML nodes (and their subtrees) at the end of the path XPaths can have node tests at the end, returning only specific node types, e.g., text(), processing-instruction(), comment(), element(), attribute() XPath is fundamentally an ordered language: it can query in an order-aware fashion, and it returns nodes in order 19 University of Pennsylvania /mypath/subpath/*/morepath

© 2016 A. Haeberlen, Z. Ives 20 Sample XML for our XPath examples Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Paul R. McJones The 1995 SQL Reunion Digital System Research Center Report SRC db/labs/dec/SRC html Side note: DBLP provides bibliographic information on major computer science journals and proceedings

© 2016 A. Haeberlen, Z. Ives 21 Visualization Root ?xml dblp mastersthesis article mdate key authortitleyearschool editortitleyearjournalvolumeee mdate key 2002… ms/Brown92 Kurt P…. PRPL… 1992 Univ…. 2002… tr/dec/… Paul R. The… Digital… SRC… 1997 db/labs/dec attribute root p-i element text Which XPath query returns this element?

© 2016 A. Haeberlen, Z. Ives Some XPath query examples XPath queries can be relative or absolute /dblp/mastersthesis/titleAbsolute (starts with a /) Wildcards can be used /dblp/*/editorAll editors All ee's with =2 ancestors Special syntax for selecting all elements //titleAll title elements All 'volume' parts of articles Attributes are specified 'id' attributes All journals with ISSN attr 22 University of Pennsylvania /*/*/ee //article/volume

© 2016 A. Haeberlen, Z. Ives More XPath query examples Square brackets to further specify elements /article/journal[1]First journal child of article /article/journal[last()]Last journal child of article Predicates can filter the nodes that are returned journals with this issn count() counts selected elements //*[count(ee)=2]All elements w/2 ee children All elements w/3 children Other functions available //*[contains(name(),'ACM')]All elem. w/name cont. ACM Elements w/name >3 char. 23 University of Pennsylvania //*[count(*)=3] //*[string-length(name())>3]

© 2016 A. Haeberlen, Z. Ives Context nodes and relative paths XPath has a notion of a context node Analogous to current working directory under Unix XPath is evaluated relative to that specific node; defaults to the document root '.' represents the context node '..' represents the parent node We can express relative paths: foo/bar/../.. gets us back to the context node Example: Suppose we are at the 'author' child of the mastersthesis node, and we want to query the title 24 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives More complex traversals with axes So far, we have seen XPath queries that go down the tree (and up one step) But we can go up, left, right, etc. This is expressed with so-called axes self::path-step child::path-stepparent::path-step descendant::path-stepancestor::path-step descendant-or-self::path-stepancestor-or-self::path-step preceding-sibling::path-stepfollowing-sibling::path-step preceding::path-stepfollowing::path-step The XPaths we've seen so far were in 'abbreviated form': /AAA/BBB is equivalent of /child::AAA/child::BBB 25 University of Pennsylvania All ancestors (parent, grandparent,...) of the current node Everything after the closing tag of the current node

© 2016 A. Haeberlen, Z. Ives 26 Users of XPath XML Schema uses simple XPaths in defining keys and uniqueness constraints XLink and XPointer, hyperlinks for XML XQuery – useful for restructuring an XML document or combining multiple documents XSLT – useful for converting from XML to other representations (e.g., HTML, PDF, SVG) Coming up next

© 2016 A. Haeberlen, Z. Ives Recap: XPath A query language for XML Queries select a set of nodes Some selection criteria: Type of node, attributes, position in the tree (relative or absolute), number of children,... Can traverse the tree in all directions: up, down, left, right A building block in many other technologies, e.g., XSLT 27 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives Plan for today Data interchange Extensible Markup Language (XML) DTDs and XML Schema; DOM XPath Query examples Axes XSLT Templates Processing model Common operations 28 University of Pennsylvania NEXT

© 2016 A. Haeberlen, Z. Ives Extensible Stylesheet Language Transformations (XSLT) What if we have an XML document, but want an XML document in a different schema?... an HTML document for displaying?... a PDF, postscript, or PNG with the same content? Solution: XSLT 29 University of Pennsylvania XML document (Schema A) XML document (Schema B) PDF document HTML document XSLT processor XSLT stylesheet

© 2016 A. Haeberlen, Z. Ives A functional language for XML XSLT is based on a series of templates that match different parts of an XML document Functional model XSLT templates can invoke other templates XSLT templates can be nonterminating Beware! It is not difficult to shoot yourself in the foot! XSLT templates are based on XPath matches We can also apply other templates, potentially to 'select'ed XPath Within each template, directly describe what should be output 30 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives A simple example 31 University of Pennsylvania Andreas Haeberlen Zachary G. Ives Haeberlen Ives XSLT processor What we have: What we want: XPath matches

© 2016 A. Haeberlen, Z. Ives 32 An XSLT Template Itself an XML document XML tags create output OR are XSL operations All XSL tags are prefixed with “xsl” namespace All non-XSL tags are part of the XML output Common XSL operations: template with a match XPath Recursive call to apply-templates, which may also select where it should be applied Attach to XML document with a processing- instruction:

© 2016 A. Haeberlen, Z. Ives XSLT processing model The XSLT processor has: Input tree, current node list, result tree Processing is as follows: Do a preorder traversal of the input tree until some node is found to match one of the templates If there is more than one template that matches, pick the 'best' one according to certain heuristics Set current node list to be the nodes the template matches Iterate over the nodes in the current node list For each, apply the operations in the template and 'append' the output to the result tree If specified by apply-templates, repeat recursively 33 University of Pennsylvania

© 2016 A. Haeberlen, Z. Ives 34 Other common operations Iteration: Conditionals: Copying current node and children to the result set:

© 2016 A. Haeberlen, Z. Ives 35 Creating output nodes Return text/attribute data (this is a default rule): Create an element from text (attribute is similar): Copy nodes matching a path

© 2016 A. Haeberlen, Z. Ives XSLT example 36 University of Pennsylvania Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Kurt P. Brown PRPL: A Database Workload Specification Language 1992 Univ. of Wisconsin-Madison Paul R. McJones The 1995 SQL Reunion Digital System Research Center Report SRC db/labs/dec/SRC html

© 2016 A. Haeberlen, Z. Ives XSLT example 37 University of Pennsylvania Goal: Display a list of master's theses Authors in red Titles in green Master's theses

© 2016 A. Haeberlen, Z. Ives 38 XSLT Summary A very powerful, template-based transformation language for XML document  other structured document Commonly used to convert XML  PDF, SVG, GraphViz DOT format, HTML, WML, … (e.g., via XSL-FO) Primarily useful for presentation of XML or for very simple conversions But sometimes we need more complex operations when converting data from one source to another Joins – combine+correlate information from multiple sources Aggregation – computing averages, counts, etc. More details in CIS330/550

© 2016 A. Haeberlen, Z. Ives Plan for today Data interchange Extensible Markup Language (XML) DTDs and XML Schema; DOM XPath Query examples Axes XSLT Templates Processing model Common operations 39 University of Pennsylvania