XPath Xml processing as a tree. Introduction Although XML provides a flexible and expressive way of describing data, it does not have a mechanism for.

XPath Xml processing as a tree

Introduction Although XML provides a flexible and expressive way of describing data, it does not have a mechanism for locating specific structured data within a document. To find information in an XML document, parsing would be needed and then the elements returned would need to be examined. This is an inefficient approach for large documents. XPath provides a way of locating specific parts of an XML document. XPath is not structured, like XML, but is string-based and uses expressions common to other XML technologies, like XSLT (which can be used to convert XML to HTML, for example) and Xpointer (XML Pointer language) which can point to information inside an XML document. XSLT and Xpointer are covered later in the text.

XPath views the xml document as a tree The elements are nodes. Of course, a node may have child nodes. Xpath regards an XML document as a tree structure made up of nodes. (Of course, the structure is recursive.) Seven node types: root, element, attribute, text, comment, processing instruction, namespace. Only element, comment, text and PI may be child nodes. Attributes and namespaces are not considered child nodes of their parent: They are not “contained in” the parent, they describe the parent. The root has no parent. The XPath tree is similar to the DOM tree (see chapter 8).

XPath nodes have string representation Each node has a string-value used by XPath to compare nodes. The string-value of a text node is the character data contained in it. The meta-language specification are not included in the text. The string value for a non-text node is the document-order concatenation of its child text nodes. An attribute node has string-value consisting of the normalized attribute. The string-value of a comment consists just of the comment text, excluding xml specification characters Text that does not fall in a CDATA section is normalized (whitespace is truncated). “document order” is dfs. In XPath there is also reverse-document order, which traces back up the hierarchy tree.

An example <![CDATA[ // C++ comment if ( this->getX() < 5 && value[ 0 ] != 3 ) cerr displayError(); ]]> C++ How to Program by Deitel & Deitel

has structure root Comment: fig. 11.1 simple.xml comment: Simple XML doc element: book attribute: title attribute: edition Element:sample text: //comment if(…) text: C++ How to…

String values String value of book in this XML is the concatenation of its two descendant text nodes in document order = //C++ comment if….. cerr<<… C++ How to … The text node (C++… ) is not in a CDATA section and so it is normalized. The string value of the root is the same as for book (its child node). String value of edition (attribute) is 3 String value of a comment is its text without delimiters. In this example: Simple XML document

Another example Processing Instruction and Namespace Nodes <deitel:book deitel:edition = "1" xmlns:deitel = "http://www.deitel.com/xmlhtp1"> XML How to Program

XPath for this example The root node contains 3 nodes, 2 comments and the html element. The namespace’s (http://www.w3...) parent is html.http://www.w3... Html has 3 child nodes: head, PI and body. Head contains only title which in turn contains only a text node. Book contains a namespace bound to prefix deitel. The namespace’s parent is book. The title element is the only child of book. Namespace node string values consist of the URI. PI node string values consist of the text after the target, omitting meta-characters, but including whitespace. In this case, the text: example=“…” A summary of node types appears in the XML text on page 304

Xpath nodetypes NODETYPEString-valueExpanded nameDescription rootConcatenate string values of all text-node descendants in document order none The root node – may contain any other type of node elementditto attributeNormalized attribute valueName including namespace prefix, if any Attribute of an element textChar data in nodenoneChar data content of an element commentcontentnoneXml comment Processing instructionThe part of the PI following the target+any whitespace The target of the PIXml PI namespaceURI of the namespaceprefixXml namespace

Location paths Location paths are expressions that describe how to navigate the XPath tree from one node to another. A location path consists of location steps. Each location step consists of an axis, a node-test, and an optional predicate. The context node specifies the start node for our search. Axis defines which nodes relative to the context node should be included in the test. There are forward and reverse axes which follow document and reverse-document order, respectively.

XPath “axes” (searches) have forward or reverse document order self : the context node itself parent: (reverse ordering) context node’s parent, if any. child :children of context node, if any (forward order) ancestor: context node’s ancestors (reverse) ancestor-or-self: reverse ordering. Include self in ancestor search. decendant: all decendants decendant-or-self :similar to above, forward order following: nodes following the context node, not including decendants. (forward order) following-sibling: siblings that follow the context node. preceding: reverse order. Preceding nodes not including ancestors. preceding-sibling: reverse order. Sibling nodes preceding context node. attribute: attribute nodes of context node. namespace: namespace nodes of the context node (forward order)

Node tests The operator * select all nodes of the same type as the principal node type. node()select all nodes regardless of type The following tests select nodes based on the type specified: –text() –comment() –processing-instruction() –node-name

Some examples child()::* selects all element-node children of the context node. child()::text() would select all text node children of the context node. we can combine tests using /. For example, child()::*/child()::text() selects text node grandchildren of the context node since the second selection applies to the results of the first selection.

Abbreviations child::This is the default axis so it may always be omitted. The search attribute::/decendant-or- self::node()/ is abbreviated as // self::node() abbreviated with a period (.) parent::node() abbreviated with two periods (..)

Another example- a reading list The Color Purple Spanish Czech Mandarin French The Hamlet Spanish Chinese Latin French English The Old Man and the Sea Spanish Chinese Japanese French Russian Moby Dick Tagalog Portugese Dutch Italian Japanese Grapes of Wrath Korean French German Italian Japanese

XML Structure Root (Books) Book Title element Translation elelment Book

Xml structure for a book node Book Element: title Text: The Old Man and the Sea Element: translation Attribute: edition 1 Text: Spanish

Example continued-an xsl to list books in Japanese <xsl:stylesheet version = "1.0" xmlns:xsl = "http://www.w3.org/1999/XSL/Transform"> books in Japanese

Node set operators (|) pipe operator...union of two nodesets. (/) slash...separator (//) double slash...abbreviates path /decendant-or-self::node()/

node-set functions last() last value in node-set position() position number of current node in node-set count(node-set) the number of nodes in the node-set id(string) returns the element node whose id matches the string. local-name(node-set) returns the local part of the expanded name for first node in node-set. namespace-uri(node-set) returns the namesapce URI of the expanded name for first node in node-set. name(node-set) returns the qualified name for first node in node-set.

examples from the reading list head/title[last()] returns the title of the last element node in the head node. book[position()=3] would select the 3 rd book element of the context node. //book selects all books in the document count(*) returns the number of element node children of the context node.

another example:stocks.xml Intel Corporation Cisco Systems, Inc. Dell Computer Corporation Microsoft Corporation Sun Microsystems, Inc. CMGI, Inc.

the stylesheet <xsl:stylesheet version = "1.0" xmlns:xsl = "http://www.w3.org/1999/XSL/Transform"> <xsl:if test = "starts-with(@symbol, 'C')"> <xsl:value-of select = "concat(@symbol,' - ', name)"/>

stocks.xml in IE

The Xalan parser Xalan can be used to render transformations on XML, like generating HTML for a given XML.

remove the xsl reference in stocks.xml and run Xalan from dos command line Microsoft(R) Windows DOS (C)Copyright Microsoft Corp 1990-2001. C:\PROGRA~1\JAVA\JDK15~1.0_0\BIN>java org.apache.xalan.xslt.Process - INDENT 3 -I N stocks.xml -XSL stocks.xsl -OUT stocks.html ========= Parsing file:C:/PROGRA~1/Java/JDK15~1.0_0/bin/stocks.xsl ========== Parse of file:C:/PROGRA~1/Java/JDK15~1.0_0/bin/stocks.xsl took 381 milliseconds ========= Parsing file:C:/PROGRA~1/Java/JDK15~1.0_0/bin/stocks.xml ========== Parse of file:C:/PROGRA~1/Java/JDK15~1.0_0/bin/stocks.xml took 50 milliseconds ============================= Transforming... transform took 40 milliseconds XSLProcessor: done C:\PROGRA~1\JAVA\JDK15~1.0_0\BIN>

generates the html CSCO - Cisco Systems, Inc. CMGI - CMGI, Inc.

XPath Xml processing as a tree. Introduction Although XML provides a flexible and expressive way of describing data, it does not have a mechanism for.

Similar presentations

Presentation on theme: "XPath Xml processing as a tree. Introduction Although XML provides a flexible and expressive way of describing data, it does not have a mechanism for."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

XPath Xml processing as a tree. Introduction Although XML provides a flexible and expressive way of describing data, it does not have a mechanism for.

Similar presentations

Presentation on theme: "XPath Xml processing as a tree. Introduction Although XML provides a flexible and expressive way of describing data, it does not have a mechanism for."— Presentation transcript:

Similar presentations

About project

Feedback