Web Data Management XPath.

Web Data Management XPath

In this lecture Review of the XPath specification Resources:
data model examples syntax Resources: A formal semantics of patterns in XSLT by Phil Wadler. XML Path Language (XPath)

XPath http://www.w3.org/TR/xpath (11/99)
Building block for other W3C standards: XSL Transformations (XSLT) XML Link (XLink) XML Pointer (XPointer) XML Query Was originally part of XSL

XPath An expression language to be used in another host language (e.g., XSLT, XQuery). Allows the description of paths in an XML tree, and the retrieval of nodes that match these paths. Can also be used for performing some (limited) operations on XML data.

Example for XPath Queries
<bib> <book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> <book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year> </book> </bib>

Data Model for XPath XPath expressions operate over XML trees, which consist of the following node types: Document: the root node of the XML document; Element: element nodes; Attribute: attribute nodes, represented as children of an Element node; Text: text nodes, i.e., leaves of the XML tree.

Data Model for XPath The root The root element element attribute
bib The root Processing instruction Comment The root element book book Attr= “1” element attribute publisher author Much like the Xquery data model Addison-Wesley Serge Abiteboul text

Data Model for XPath The root node of an XML tree is the (unique) Document node; The root element is the (unique) Element child of the root node; A node has a name, or a value, or both an Element node has a name, but no value; a Text node has a value (a character string), but no name; an Attribute node has both a name and a value. Attributes are special! Attributes are not considered as first-class nodes in an XML tree. They must be addressed specifically, when needed.

XPath: Simple Expressions
/bib/book/year Result: <year> 1995 </year> <year> 1998 </year> /bib/paper/year Result: empty (there were no papers)

XPath Tree Nodes Seven nodes types:
root, element, attribute, text, comment, processing instruction and namespace Namespace and attribute nodes have parent nodes, but are not children of those parent nodes. The relationship between a parent node and a child node is containment Attribute nodes and namespace nodes describe their parent nodes

Xpath Tree Nodes <?xml version = "1.0"?>
  <html xmlns = " <head> <title>Processing Instruction and Namespace Nodes </title> </head> <?deitelprocessor example = "fig11_03.xml"?> <body> <deitel:book deitel:edition = "1" xmlns:deitel = " <deitel:title>XML How to Program</deitel:title> </deitel:book> </body> </html>

XPath Tree Nodes String-value: Each XPath tree node has a string representation that XPath uses to compare nodes. The string-value of a text node consists of the character data contained in the node. Document order: Nodes in an XPath tree have an ordering that is determined by the order in which the nodes appear in the original XML document. The reverse document order is the reverse ordering of the nodes in a document. The string-value for the html element node is determined by concatenating the string-values for all of its descendant text nodes in document order. The string-value for element node html is Processing Instruction and Namespace NodesXML How to Program Because all whitespace is removed when the text nodes are normalized, there is no space in the concatenation.

XPath Tree Nodes For processing instructions, the string-value consists of the remainder of the processing instruction after the target, including whitespace, but excluding the ending ?> The string-value for the processing instruction is example = "fig11_03.xml" Namespace-node string-values consist of the URI for the namespace. The string-value for the namespace declaration is

XPath Tree Nodes For the root node of the document, the string-value is also determined by concatenating the string-values of its text-node descendents in document order. The string-value of the root node is therefore identical to the string-value calculated for the html element node The string-value for the edition attribute node consists of its value, which is 3. The string-value for a comment node consists only of the comment's text, excluding . The string-value for the second comment node is therefore: Processing instructions and namespacess.

XPath Tree Nodes Expanded-name: Certain nodes (i.e., element, attribute, processing instruction and namespace) also have an expanded-name that can be used to locate specific nodes in the XPath tree. Expanded-names consist of both a local part and a namespace URI. The local part for the element node html is therefore html. If there is a prefix for the element node, the namespace URI of the expanded-name is the URI to which the prefix is bound. If there is no prefix for the element node, the namespace URI of the expanded name is the URI for the default namespace.

XPath Tree Nodes The local part of the expanded name for a processing instruction node corresponds to the target of the processing instruction in the XML document. For processing instructions, the namespace URI of the expanded-name is null The local part of the expanded-name for a namespace node corresponds to the prefix for the namespace, if one exists; or, if it is a default namespace, the local part is empty (i.e., the empty string). The namespace URI of the expanded-name for a namespace node is always null.

XPath Tree Nodes Node Type string-value expanded-name Description root
Determined by concatenating the string-values of all text-node descendents in document order. None Represents the root of an XML document. This node exists only at the top of the tree and may contain element, comment or processor-instruction children. element The element tag, including the namespace prefix (if applicable). Represents an XML element and may contain element, text, comment or processor-instruction children. attribute The normalized value of the attribute. The name of the attribute, including the namespace prefix (if applicable). Represents an attribute of an element. text The character data contained in the text node. None. Represents the character data content of an element comment The content of the comment (not including ). Represents an XML comment processing instruction The part of the processing instruction that follows the target and any whitespace The target of the processing instruction. Represents an XML processing instruction namespace The URI of the namespace The namespace prefix. Represents an XML namespace

XPath: Axes A location path is an expression that specifies how to navigate an XPath tree from one node to another. A location path is composed of location steps, each of which is composed of an "axis," a "node test" and an optional "predicate." Searching through an XML document begins at a context node in the XPath tree. Searches through the XPath tree are made relative to this context node. An axis indicates which nodes, relative to the context node, should be included in the search. The axis also dictates the ordering of the nodes in the set. Axes that select nodes that follow the context node in document order are called forward axes. Axes that select nodes that precede the context node in document order are called reverse axes.

XPath Context A step is evaluated in a specific context [< N1,N2, · · · ,Nn >, Nc] which consists of: a context list < N1,N2, · · · ,Nn > of nodes from the XML tree; a context node Nc belonging to the context list. The context length n is a positive integer indicating the size of a contextual list of nodes; it can be known by using the function last(); The context node position c  [1,n] is a positive integer indicating the position of the context node in the context list of nodes; it can be known by using the function position().

XPath Steps The basic component of XPath expression are steps, of the form: axis::node-test[P1][P2]. . . [Pn] axis is an axis name indicating what the direction of the step in the XML tree is (child is the default). node-test is a node test, indicating the kind of nodes to select. Pi is a predicate, that is, any XPath expression, evaluated as a boolean, indicating an additional condition. There may be no predicates at all. A step is evaluated with respect to a context, and returns a node list.

Path Expressions A path expression is of the form: [/]step1/step2/. . . /stepn A path that begins with / is an absolute path expression; A path that does not begin with / is a relative path expression. Examples /A/B is an absolute path expression denoting the Element nodes with name B, children of the root named A; ./B/descendant::text() is a relative path expression which denotes all the Text nodes descendant of an Element B, itself child of the context node; 2] denotes all the Attribute whose value is greater than 2.

Evaluation of Path Expressions
Each stepi is interpreted with respect to a context; its result is a node list. A step stepi is evaluated with respect to the context of stepi−1. More precisely: For i = 1 (first step) if the path is absolute, the context is a singleton, the root of the XML tree; else (relative paths) the context is defined by the environment; For i > 1 if N = < N1,N2, · · · ,Nn > is the result of step stepi−1, stepi is successively evaluated with respect to the context [N,Nj ], for each j  [1,n]. The result of the path expression is the node set obtained after evaluating the last step.

Evaluation of Path Expressions
Evaluation of The path expression is absolute: the context consists of the root node of the tree. The first step, A, is evaluated with respect to this context.

Evaluation of /A/B/@att1
The result is A, the root element. A is the context for the evaluation of the second step, B.

The result is a node list with two nodes B[1], B[2]. @att1 is first evaluated with the context node B[1].

The result is the attribute node of B[1].

@att1 is also evaluated with the context node B[2].

The result is the attribute node of B[2].

Final result: the node set union of all the results of the last

XPath: Axes Axes Ordering Description Self None
The context node itself. Parent Reverse The context node's parent, if one exists. Child Forward The context node's children, if they exist. Ancestor The context node's ancestors, if they exist. ancestor-or-self The context node's ancestors and also itself. Descendant The context node's descendants. descendant-or-self The context node's descendants and also itself. Following The nodes in the XML document following the context node, not including descendants. following-sibling The sibling nodes following the context node. Preceding The nodes in the XML document preceding the context node, not including ancestors. preceding-sibling The sibling nodes preceding the context node. Attribute The attribute nodes of the context node. Namespace The namespace nodes of the context node.

XPath: Axes An axis has a principal node type that corresponds to the type of node the axis may select. For attribute axes, the principal node type is attribute. For namespace axes, the principal node type is namespace. All other axes have a element principal node type.

XPath: Axes Child axis: denotes the Element or Text children of the context node. Important: An Attribute node has a parent (the element on which it is located), but an attribute node is not one of the children of its parent. Example: child::D

XPath: Axes Parent axis: denotes the parent of the context node.
The node test is either an element name, or * which matches all names, node() which matches all node types. Always a Element or Document node, or an empty node-set (if the parent does not match the node test or does not satisfy a predicate). .. is an abbreviation for parent::node(): the parent of the context Example: parent::node()

XPath: Axes Attribute axis: denotes the attributes of the context node. The node test is either the attribute name, or * which matches all the names. Example: attribute::*

XPath: Axes Descendant axis: all the descendant nodes, except the Attribute nodes. The node test is either the node name (for Element nodes), or * (any Element node) or text() (any Text node) or node() (all nodes). The context node does not belong to the result: use descendant-or-self instead. Example: descendant::node()

XPath: Axes Example: descendant::*

XPath: Axes Ancestor axis: all the ancestor nodes.
The node test is either the node name (for Element nodes), or node() (any Element node, and the Document root node). The context node does not belong to the result: use ancestor-or-self instead. Example: ancestor::node()

XPath: Axes Following axis: all the nodes that follows the context node in the document order. Attribute nodes are not selected. The node test is either the node name, * text() or node(). The axis preceding denotes all the nodes that precede the context node. Example: following::node()

XPath: Axes Following sibling axis: all the nodes that follows the context node, and share the same parent node. Same node tests as descendant or following. The axis preceding-sibling denotes all the nodes the precede the context node. Example: following-sibling::node()

Location Path Abbreviations
child:: This location path is used by default if no axis is supplied and may therefore be omitted attribute:: @ /descendant-or-self::node()/ // self::node() (.) parent::node() (..)

XPath: Node Tests The set of selected nodes is refined with node tests. node tests rely upon the principal node type of an axis for selecting nodes in a location path Node Test Description * Selects all nodes of the same principal node type. node() Selects all nodes, regardless of their type. text() Selects all text nodes. comment() Selects all comment nodes. processing-instruction() Selects all processing-instruction nodes. node name Selects all nodes with the specified node name.

XPath: Axes Location Paths Using Axes and Node Tests child::*
Location paths are composed of sequences of location steps. A location step contains an axis and a node test separated by a double-colon (::) and, optionally, a "predicate" enclosed in square brackets ([ ]). child::* The above location path selects all element-node children of the context node, because the principal node type for the child axis is element.

XPath: Wildcard //author/child::* or //author/*
Result: <first-name> Rick </first-name> <last-name> Hull </last-name> * Matches any element

XPath: Axes and Node Tests
child::text() selects all text-node children of the context node Combining two location steps to form the location path child::*/child::text() selects all text-node grandchildren of the context node

XPath: Node Tests /bib/book/author/text() Result: Serge Abiteboul
Victor Vianu Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname /bib/book/author/*/text() Result: Rick Hull

XPath: Restricted Kleene Closure
select all author element nodes in an entire document /descendent-or-self::node()/child::author Instead use the abbreviation: //author Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author> /bib//first-name Result: <first-name> Rick </first-name>

XPath: Attribute Nodes
Result: “55” @price means that price has to be an attribute

XPath: Predicates Boolean expression, built with tests and the Boolean connectors and/or (negation is expressed with the not() function); a test is either an XPath expression, whose result is converted to a Boolean; a comparison or a call to a Boolean function. Important: predicate evaluation requires several rules for converting nodes and node sets to the appropriate type.

Predicate Evaluation A step is of the form axis::node-test[P]
First axis::node-test is evaluated: one obtains an intermediate result I Second, for each node in I, P is evaluated: the step result consists of those nodes in I for which P is true. /A/B/descendant::text()[1]

Predicate Evaluation Beware: an XPath step is always evaluated with respect to the context of the previous step. Here the result consists of those Text nodes, first descendant (in the document order) of a node B. /A/B//text()[1]

XPath: Predicates <?xml version = "1.0"?>
  <books> <book> <title>Java How to Program</title> <translation edition="1">Spanish</translation> <translation edition="1">Chinese</translation> <translation edition="1">Japanese</translation> <translation edition="2">French</translation> <translation edition="2">Japanese</translation> </book> <book> <title>C++ How to Program</title> <translation edition="1">Korean</translation> <translation edition="2">French</translation> <translation edition="2">Spanish</translation> <translation edition="3">Italian</translation> <translation edition="3">Japanese</translation> </books>

XPath: Predicates Select the title element node for each book that has a Japanese translation /books/book/translation[. = 'Japanese']/../title A predicate is a Boolean expression used as part of a location path to filter nodes from the search. Select the edition attribute node for books with Japanese translations

XPath 1.0 Type System Four primitive types:
The boolean(), number(), string() functions convert types into each other (no conversion to nodesets is defined), but this conversion is done in an implicit way most of the time. Rules for converting to a Boolean: A number is true if it is neither 0 nor NaN. A string is true if its length is not 0. A nodeset is true if it is not empty. Type Description Literals Examples Boolean Boolean values None true(), not($a=3) Number Floating-point 12, 12.5 1 div 33 String Ch. Strings "to", ’ti’ concat(’Hello’,’!’) Nodeset Node set /a/b[c=1

XPath 1.0 Type System Rules for converting a nodeset to a string:
The string value of a nodeset is the string value of its first item in document order. The string value of an element or document node is the concatenation of the character data in all text nodes below. The string value of a text node is its character data. The string value of an attribute node is the attribute value. Examples (Whitespace-only text nodes removed) <a toto="3"> <b titi=’tutu’><c /></b> <d>tata</d> </a> string(/) "tata" "3" boolean(/a/b) true() boolean(/a/e) false()

Operators Node-set operators allow to manipulate the node sets to form other node sets. Node-set Operators Description pipe (|) union of node-sets (Example: slash (/) Separates location steps double-slash (//) Abbreviation for the location path /descendant-or-self::node()/ +, -, *, div, mod standard arithmetic operators or, and Boolean operators and c=3) <, <=, >=, > relational operators (Example: ($a<2) and ($a>0))

Node-set Functions node-set functions perform an action on a node-set returned by a location path Node-set Functions Description last() returns a number equal to the context size from the expression evaluation context position() Returns the position number of the current node in the node-set being tested. count( node-set ) Returns the number of nodes in node-set. id( string ) Returns the element node whose ID attribute matches the value specified by argument string. local-name( node-set ) Returns the local part of the expanded-name for the first node in node-set. namespace-uri( node-set ) Returns the namespace URI of the expanded-name for the first node in node-set. name( node-set ) Returns the qualified name for the first node in node-set.

Node-set Functions //book/author[last()]
Returns the last author child of book node - Jeffrey D. Ullman //book/author[position() = 3] or //book/author[3] Selects the third author element of the book node /book[count(*)] return the total number of element-node children of the book node //book selects all book element nodes in the document

String Functions String Function Description concat($s1,...,$sn)
concatenates the strings $s1, , $sn starts-with($a,$b) returns true() if the string $a starts with $b contains($a,$b) returns true() if the string $a contains $b substring-before($a,$b) returns the substring of $a before the first occurrence of $b substring-after($a,$b) returns the substring of $a after the first occurrence of $b substring($a,$n,$l) returns the substring of $a of length $l starting at index $n (indexes start from 1). $l may be omitted string-length($a) returns the length of the string $a normalize-space($a) removes all leading and trailing whitespace from $a, and collapse all whitespace to a single character translate($a,$b,$c) returns the string $a, where all occurrences of a character from $b has been replaced by the character at the same place in $c

Boolean and Number Functions
Decsription not($b) returns the logical negation of the boolean $b sum($s) returns the sum of the values of the nodes in the nodeset $s floor($n) rounds the number $n to the next lowest integer ceiling($n) rounds the number $n to the next greatest integer round($n) rounds the number $n to the closest integer count(//*) returns the number of elements in the document normalize-space(’ titi toto ’) returns the string “titi toto” translate(’baba,’abcdef’,’ABCDEF’) returns the string “BABA” round(3.457) returns the number 3

XPath String functions
<?xml version = "1.0"?>   <xsl:stylesheet version = "1.0“ xmlns:xsl = " <xsl:template match = "/stocks"> <html> <body> <ul> <xsl:for-each select = "stock"> <xsl:if test = 'C')"> <li> <xsl:value-of select = - ',name)"/> </li> </xsl:if> </xsl:for-each> </ul> </body> </html> </xsl:template> </xsl:stylesheet> <?xml version = "1.0"?>   <stocks> <stock symbol = "INTC"> <name>Intel Corporation</name> </stock> <stock symbol = "CSCO"> <name>Cisco Systems, Inc.</name> </stock> <stock symbol = "DELL"> <name>Dell Computer Corporation</name> </stock> <stock symbol = "MSFT"> <name>Microsoft Corporation</name> </stock> <stock symbol = "SUNW"> <name>Sun Microsystems, Inc.</name> </stock> <stock symbol = "CMGI"> <name>CMGI, Inc.</name> </stock> </stocks>

XPath: Qualifiers /bib/book[@price < “60”]
/bib/book/author[first-name] Result: <first-name> Rick </first-name> < “60”] < “25”] /bib/book/author[text()]

XPath Examples child::A/descendant::B : B elements, descendant of an A element, itself child of the context node; Can be abbreviated to A//B. child::*/child::B : all the B grand-children of the context node descendant-or-self::B : elements B descendants of the context node, plus the context node itself if its name is B. child::B[position()=last()] : the last child named B of the context node. Abbreviated to B[last()]. following-sibling::B[1] : the first sibling of type B (in the document order) of the context node

XPath Examples /descendant::B[10] the tenth element of type B in the document. Not: the tenth element of the document, if its type is B! child::B[child::C] : child elements B that have a child element C. Abbreviated to B[C]. : elements B that have an attribute att1 or an attribute att2; Abbreviated to *[self::B or self::C] : children elements named B or C

XPath: Summary bib matches a bib element * matches any element
/ matches the root element /bib matches a bib element under root bib/paper matches a paper in bib bib//paper matches a paper in bib, at any depth //paper matches a paper at any depth paper|book matches a paper or a book @price matches a price attribute matches price attribute in book, in bib matches…

The Root and the Root <bib> <paper> 1 </paper> <paper> 2 </paper> </bib> bib is the “document element” The “root” is above bib /bib = returns the document element / = returns the root Why ? Because we may have comments before and after <bib>; they become siblings of <bib>

XPath: More Details Examples: What does this mean ?
child::author/child:lastname = author/lastname child::author/descendant::zip = author//zip child::author/parent::* = author/.. child::author/attribute::age = What does this mean ? paper/publisher/parent::*/author /bib//address[ancestor::book] /bib//author/ancestor::*//zip

XPath: Even More Details
name() = the name of the current node /bib//*[name()=book] same as /bib//book What does this mean ? /bib//*[ancestor::*[name()!=book]] In a different notation bib.[^book]*._ Navigation axis gives us strictly more power !

XPath 2.0 An extension of XPath 1.0, backward compatible with XPath 1.0. Main differences: Improved data model tightly associated with XML Schema.  a new sequence type, representing ordered set of nodes and/or values, with duplicates allowed.  XSD types can be used for node tests. More powerful new operators (loops) and better control of the output (limited tree restructuring capabilities) Extensible Many new built-in functions; possibility to add user-defined functions. XPath 2.0 is also a subset of XQuery 1.0.

Path expressions in XPath 2.0
New node tests in XPath 2.0: Nested paths expressions: Any expression that returns a sequence of nodes can be used as a step Node tests Description item() any node or atomic value element() any element (eq. to child::* in XPath 1.0) element(author) any element named author element(*, xs:person) any element of type xs:person attribute() any attribute /book/(author | editor)/name

XPath 1.0 Implementations
libxml2 Free C library for parsing XML documents, supporting XPath. java.xml.xpath Java package, included with JDK versions starting from 1.5. System.Xml.XPath .NET classes for XPath. XML::XPath Free Perl module, includes a command-line tool. DOMXPath PHP class for XPath, included in PHP5. PyXML Free Python library for parsing XML documents, supporting XPath.

Web Data Management XPath.

Similar presentations

Presentation on theme: "Web Data Management XPath."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Web Data Management XPath.

Similar presentations

Presentation on theme: "Web Data Management XPath."— Presentation transcript:

Similar presentations

About project

Feedback