Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 XPath 2.0 Roger.

Similar presentations

Presentation on theme: "1 XPath 2.0 Roger."— Presentation transcript:

1 1 XPath 2.0 Roger L. Costello 6 March 2010

2 2 Set this to XPath 2.0

3 3 Using Namespaces in Oxygen Suppose in the Oxygen XPath expression evaluator tool you would like to write expressions such as this: current-dateTime() - xs:dateTime('2008-01-14T00:00:00') How do you tell Oxygen what namespace the "xs" prefix maps to? Here's how: –Go to: Options Preferences XML XSLT-FO-XQuery XPath and in the Default prefix-namespace mappings table add a new entry mapping xs to the XML Schema namespace

4 4 XML Document Mercury.0553 58.65 1516.983 43.4 Venus.815 116.75 3716.943 66.8 Earth 1 2107 1 128.4 planets.xml We will use this XML document throughout this tutorial, so spend a minute or two familiarizing yourself with it. It is planets.xml in the example01 folder. Please load it into Oxygen XML.

5 5 Sequences Sequences are central to XPath 2.0 XPath 2.0 operates on sequences, and generates sequences. A sequence is an ordered collection of nodes and/or atomic values.

6 6 Example Sequences This sequence is composed of three atomic values: (1, 2, 3) This sequence is also composed of three atomic values: ('red', 'white', 'blue') This XPath expression will generate a sequence composed of three nodes: (//planet/name) See example01example01

7 7 More Sequence Examples With the following XPath, a sequence of six nodes are generated; the first three are nodes, the next three are nodes: (//planet/mass, //planet/name) This sequence contains node values followed by atomic values: (//planet/name, 1, 2, 3) See example02example02

8 8 Definition of Sequence A sequence is an ordered collection of zero or more items. An item is either an atomic value or a node. An atomic value is a single, non-variable piece of data, e.g. 10, true, 2007, "hello world". (An atomic value is an XML Schema simpleType value) There are seven kinds of nodes: –element, text, attribute, document, PI, comment, namespace A sequence containing exactly one item is called a singleton sequence. A sequence containing zero items is called an empty sequence.

9 9 Sequence Constructor A sequence is constructed by enclosing an expression in parentheses. Each item is separated by a comma. –The comma is called the sequence constructor operator.

10 10 No Nested Sequences If you have a sequence (1, 2) and nest it in another sequence ((1, 2), 3) the resulting sequence is flattened to simply (1, 2, 3) A nested empty sequence is removed (1, (2, 3), (), 4, 5, 6) the resulting sequence is flattened to simply: (1, 2, 3, 4, 5, 6) See example03example03

11 11 Extract Items from a Sequence You can extract items from a sequence using the […] operator (predicate): (4, 5, 6)[2] returns the singleton sequence: (5) This XPath expression: //planet[2] returns the second planet See example04example04

12 12 The index must be an integer The predicate value must be an integer (more specifically, it must be an XML Schema integer datatype). (sequence)[index] The index must be an integer

13 13 Initializing Example: suppose an element may or may not have an attribute, discount. If the element has the discount attribute then return its value; otherwise, return 0. (@discount, 0)[1]

14 14 Context Item Dot "." stands for the current context item. The context item can be a node, e.g. //planet[.] or it can be an atomic value, e.g. (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)[. mod 2 = 0] See example05example05

15 15 count(sequence) This function returns an integer, representing the number of items in the sequence. See example03.bexample03.b

16 16 Why Nested Parentheses? Compare these two: count((1, 2, 3))count(1, 2, 3) Notice the nested parentheses Why is this one correct and the other one incorrect?

17 17 Answer The count function has only one argument. This form: count(1, 2, 3) provides three arguments to count, which is incorrect. This form: count((1, 2, 3)) provides one argument to count (the argument is a sequence with three items).

18 18 Sequence of Sequences? There is no such thing as a sequence of sequences! There's only one sequence; all subsequences get flattened into a single sequence. count((//planet, (1, 2, 3), ('red', 'white', 'blue'))) sequence of sequences?

19 19 The value of a non- existent node is the empty sequence, () /Planets/Planet[999] There is no 999th Planet, so the result of evaluating this XPath expression is the empty sequence, denoted by ()

20 20 () is not equal to '' An empty sequence is not equal to a string of length zero. ('a', 'b', (), 'c') is not equal to ('a', 'b', '', 'c') See example03.aexample03.a count = 3count = 4

21 21 This predicate [.] eliminates empty strings The value of ('a', '')[.] is just ('a') The value of ('a', 'b', '', 'c')[.] is just ('a', 'b', 'c')

22 22 Two built-in functions true() false()

23 23 index-of(sequence, value) The index-of() function allows you to obtain the position of value in sequence. index-of((1,3,5,7,9,11), 7) Output: (4) 7 is at the 4th index position. sequence value

24 24 Suppose the value occurs at multiple locations in the sequence index-of returns a sequence of index locations. In the last example the result was a sequence of length 1. index-of((1,3,5,7,9,11,7,7), 7) multiple 7's in the sequence Output: (4, 7, 8) See example05.1example05.1

25 25 remove(sequence, position) The remove function enables you to remove a value at a specified position from a sequence. remove((1,3,5,7,9,11), 4) sequence position Output: (1, 3, 5, 9, 11) See example05.2example05.2 remove this

26 26 The "to" Range Operator The range operator–to–can be used to generate a sequence of consecutive integers: (1 to 10) returns the sequence: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) This expression: (1 to 100)[(. mod 10) = 0] returns the sequence: (10, 20, 30, 40, 50, 60, 70, 80, 90, 100) This expression: (1, 2, 10 to 14, 34, 99) returns this disjointed sequence: (1, 2, 10, 11, 12, 13, 14, 34, 99) See example06example06

27 27 The operands of "to" must be integers ('a' to 'z') Error message you will get: "Error: Required type of first operand of 'to' is integer; supplied value has type string" This is not valid:

28 28 insert-before(sequence, position,value) insert-before((1,3,4,5,6,7,8,9),2,2 sequence (note: '2' is missing) position value Output: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) insert the value 2 before position 2

29 29 Appending a value to the end insert-before(1 to 10, count(1 to 10) + 1, 2) Output: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 2) Specify a position greater than the length of the sequence

30 30 The inserted value can be a sequence insert-before((1,3,4,5,6,7,8,9),2,(2,3)) Output: (1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10) sequence of values See example05.3example05.3

31 31 Sequence Functions index-of() returns the index (position) of a value [idx] returns the value at idx remove() returns the sequence minus the item whose index (position) is specified insert-before() returns the sequence plus a new value Do Lab8

32 32 Sequences are Ordered Order matters. This generates a sequence composed of the elements followed by the elements: (//planet/mass, //planet/name) See example07example07

33 33 reverse(sequence) See example07.1example07.1 Notice in the first example the items are wrapped in parentheses (thus creating a sequence). This function reverses the items in sequence.

34 34 The for Expression Use the for expression to loop (iterate) over all items in a sequence. This is its general form: for variable in sequence return expression Here's an example which iterates over the integers 1-10, multiplying each integer by two: for $i in (1 to 10) return $i * 2 returns (2, 4, 6, 8, 10, 12, 14, 16, 18, 20) See example08example08

35 35 for Expression Examples This iterates over each element, and returns its element: for $p in /planets/planet return $p/radius This iterates over each element, and returns itself (the sequence generated is identical to above): for $r in /planets/planet/radius return $r This iterates over each letter of the alphabet: for $i in ('a','b','c','d','e','f','g','h','i','j','k','l', 'm','n','o','p','q','r','s','t','u','v','w','x','y','z') return $i See example09example09

36 36 More for Examples This returns the radius converted to kilometers (it returns numbers, not nodes): for $r in /planets/planet/radius return $r * 1.61 This applies the avg() function to the sequence of nodes returned by the for expression: avg(for $r in /planets/planet/radius return $r) See example10example10

37 37 Terminology for variable in sequence return expression range variable input sequence return expression The return expression is evaluated once for each item in the input sequence.

38 38 Multiple Variables Multiple variables can be used: forvariable in sequence return expression,

39 39 Example of Multiple Variables for $x in (1, 2), $y in (3, 4) return ($x * $y) returns (3, 4, 6, 8) See example11example11 Do Lab9

40 40 The if Expression The form of the if expression is: if (boolean expression) then expression1 else expression2 If the boolean expression evaluates to true then the result is expression1, else the result is expression2 This if expression finds the minimum of two numbers: if (10 < 20) then 10 else 20 This for loop returns all the positive numbers in the sequence: for $i in (0, -3, 5, 7, -1, 2) return if ($i > 0) then $i else () See example12example12

41 41 Nested if-then-else if (boolean expr) then expr1 else expr2 These can be an if-then-else

42 42 Notes about the if Expression 1.You must wrap the boolean expression in parentheses. 2.You must have an "else" part. There is no if-then expression, only an if-then-else Do Lab10

43 43 The some Expression The form of the some expression is: some variable in sequence satisfies boolean expression The result of the expression is either true or false. Using the some expression means that at least one item in the sequence satisfies the boolean expression.

44 44 Examples of the some Expression This example determines if there are some (one or more) negative values in the sequence: some $i in (2, 6, -1, 3, 9) satisfies $i < 0 Note that this produces the same boolean result: (2, 6, -1, 3, 9) < 0 because "<" is a general comparison operator, i.e. it compares each item in the sequence until a match is found. See example13example13

45 45 More Examples of "some" Is there is some planet that has a radius greater than 2000? some $i in /planets/planet satisfies $i/radius > 2000 Note that this produces the same boolean result: /planets/planet/radius > 2000 See example14example14

46 46 The every Expression The form of the every expression is: every variable in sequence satisfies boolean expression The result of the expression is either true or false. Using the every expression means that every item in the sequence satisfies the boolean expression.

47 47 Examples of the every Expression This example determines if every item in the sequence is positive: every $i in (2, 6, -1, 3, 9) satisfies $i > 0 Note that this produces the same boolean result: not((2, 6, -1, 3, 9) <= 0)

48 48 Multiple Universal Quantifiers An XPath expression can have multiple universal quantifiers. everyvariable in sequence satisfies condition, See example15example15

49 49 Union Operator The union operator is used to combine two node sequences (cannot union atomic sequences). Example: /planets/planet/mass union /planets/planet/radius produces the sequence:.0553 1516.815 3716 1 2107

50 50 Equivalent /planets/planet/mass union /planets/planet/radius /planets/planet/mass | /planets/planet/radius The union and | operators are equivalent.

51 51 Duplicates are Eliminated When you union two node sets, any duplicates are eliminated. This yields 3 nodes, not 6: /planets/planet/mass union /planets/planet/mass See example16example16

52 52 Intersect Operator The intersect operator returns the intersection of two node sequences. Example: find all planets with mass over.8 and radius over 2000: /planets/planet[mass >.8] intersect /planets/planet[radius > 2000] Venus.815 116.75 3716.943 66.8 Earth 1 2107 1 128.4

53 53 Equivalent /planets/planet[mass >.8] intersect /planets/planet[radius > 2000] /planets/planet[(mass >.8) and (radius > 2000)]

54 54 Duplicates are Eliminated When you intersect two node sets, any duplicates are eliminated. This yields 2 nodes, not 4: /planets/planet[mass >.8] intersect /planets/planet[mass >.8] See example17example17

55 55 Except Operator The except operator returns the difference between two node sequences. Example: get all planets except Earth: /planets/planet except /planets/planet[name='Earth'] Mercury.0553 58.65 1516.983 43.4 Venus.815 116.75 3716.943 66.8

56 56 Equivalent /planets/planet except /planets/planet[name='Earth'] /planets/planet[name!='Earth'] See example18example18

57 57 I posed a challenge to the xml-dev list, challenging them to simplify an XPath expression. Their answer is awesome. Problem: create an XPath expression for this: There must be one child Title element and there must be zero or more child Author elements and there must be one child Date element and nothing else. Here's the XPath 2.0 expression I created: count(Title) eq 1 and count(Author) ge 0 and count(Date) eq 1 and count(*[not(name() = ('Title','Author','Date'))]) eq 0 See next slide for the solution created by the XPath masters on xml-dev

58 58 Title and Date and empty(* except (Title[1], Date[1], Author)) Incredible, don't you think?

59 59 No Duplicates, Document Order The union, intersect, and except operators return their results as sequences in document order, without any duplicate items in the result sequence.

60 60 "Duplicate" is Based on Identity, Not Value Two nodes are duplicates iff they are the exact same node. These two elements have the same value, but different identities Box 1 Do Lab11

61 61 Multiple Node Tests Recall that in XPath 1.0 an XPath expression is composed of steps separated by slashes: node-test slash node-test slash … At each step you can only specify one node test. In XPath 2.0 you can specify multiple node tests on each step.

62 62 Example of Multiple Node Tests Example: select the mass and radius for each planet: /planets/planet/(mass|radius).0553 1516.815 3716 1 2107

63 63 Equivalent /planets/planet/(mass|radius) /planets/planet/(mass union radius) /planets/planet/mass | /planets/planet/radius /planets/planet/*[(self::mass) or (self::radius)] See example19example19

64 64 Examples of Multiple Node Tests using Union and Intersect Operators A B C D E /test/(a, b) union /test/(c, d, e) A B C D E Output: /test/(a, b, c) intersect /test/(b, c, d) Output: B C XML: XPath: See example20example20

65 65 Feed Nodes into a Function In XPath 1.0 an expression following a slash identifies node(s). In XPath 2.0 an expression following a slash can be a function. Each value preceding the slash is fed into the function. /planets/planet/name/substring(.,1,1) The name of each planet is fed into Output: ("M", "V", "E") See example21example21

66 66 Feed Nodes into a for loop /planets/planet/day/(for $i in. return $i * 2) Output: (117.3, 233.5, 2) Note: be sure you wrap the for-loop in parentheses. See example22example22

67 67 Can't Feed Atomic Values The previous slides showed feeding nodes into a function and for-loop. You cannot feed atomic values, e.g., this is illegal: (1 to 10)/(for $i in. return $i) Here's the error message you get: Error: Required item type of first operand of / is node(); supplied value has item type xs:integer See example22.aexample22.a Do Lab12

68 68 Comments XPath 2.0 expressions may be commented using this syntax: (: comment :) (: multiply each day by two :) /planets/planet/day/(for $i in. return $i * 2)

69 69 General Comparison Operators Here are the general comparison operators: =, !=,, >= These operators are used to compare sequences. Each item in one sequence is compared against each item in the other sequence; the comparison evaluates to true if one or more item-item comparisons evaluates to true.

70 70 How General Comparison Works (item1, item2) op (item3, item4) is evaluated as: (item1 op item3) or (item1 op item4) or (item2 op item3) or (item2 op item4) (1, 2) = (2, 3) is evaluated as: (1 = 2) or (1 = 3) or (2 =2) or (2 = 3) this it returns true (1, 2) = (3, 4) returns false because there are no equal values between the sequences See example23example23

71 71 Example The left side returns a sequence of two planets (Venus, Earth), and the right side returns a sequence of three planets (Mercury, Venus, Earth). The result is true. /planets/planet[mass >.8] = /planets/planet[density >.9] See example24example24

72 72 Definition of Equal Two nodes are equivalent if: –their node values are the same –the order of the values are the same –the number of values is the same The tag names can be different. Comparison is based on data, not markup.

73 73 Example The below document has two elements. They use different tag names. /planets/planet[1] = /planets/planet[2] returns true. Mercury.0553 58.65 1516.983 43.4 Mercury.0553 58.65 1516.983 43.4 See example25example25

74 74 Equivalent? Problem: find all planets whose name is not in this sequence ('Earth', 'Mars') Are these equivalent? /planets/planet[not(name = ('Earth', 'Mars'))] /planets/planet[name != ('Earth', 'Mars')]

75 75 Not Equivalent! Mercury.0553 58.65 1516.983 43.4 Venus.815 116.75 3716.943 66.8 /planets/planet[not(name = ('Earth', 'Mars'))] Mercury.0553 58.65 1516.983 43.4 Venus.815 116.75 3716.943 66.8 Earth 1 2107 1 128.4 /planets/planet[name != ('Earth', 'Mars')]

76 76 Explanation /planets/planet[not(name = ('Earth', 'Mars'))] for each planet is its name 'Earth' or 'Mars'? if so, don't return it otherwise return it /planets/planet[name != ('Earth', 'Mars')] for each planet is its name not 'Earth' or not 'Mars'? if so, don't return it otherwise return it Consider the planet whose name is Earth: Earth Mars equal? not((Earth equal Earth) or (Earth equal Mars)) not(true or false) not(true) false Consider the planet whose name is Earth: Earth Mars not equal? (Earth not equal Earth) or (Earth not equal Mars) false or true true (Every planet will not equal Earth or Mars, so every planet is returned. See example26example26

77 77 Value Comparison Operators Here are the value comparison operators: eq, ne, lt, le, gt, ge These operators are used to compare atomic values. Example:10 lt 30 returns true Example: /planets/planet[1]/name eq 'Mercury' returns true See example27example27

78 78 No Sequences Allowed! Suppose the third planet contains two elements: Earth Mother Earth then /planets/planet[3]/name eq 'Earth' raises an error: "Error! A sequence of more than one item is not allowed as the first operand of 'eq'." See example28example28

79 79 However, this works Note that: /planets/planet[3]/name = 'Earth' returns true because the "=" operator is used with sequences. See example29example29

80 80 is Operator You can compare two nodes to see if they are the same nodes by using the "is" operator: expr1 is expr2 returns true only if expr1 and expr2 identify the same node. expr1 and expr2 must be singleton sequences. This expression //planet[mass =.815] is //planet[day = 116.75] returns true because both expressions identify the same element See example30example30

81 81 << Operator This expression expr1 << expr2 returns true if the node identified by expr1 comes before the node identified by expr2 in the document. This expression //planet[mass =.0553] << //planet[mass =.815] returns true because the left expression identifies Mercury, the right expression identifies Venus, and Mercury comes before Venus in the document See example31example31

82 82 >> Operator This expression expr1 >> expr2 returns true if the node identified by expr1 comes after the node identified by expr2 in the document. This expression //planet[mass =.815] >> //planet[mass =.0553] returns true because the left expression identifies Venus, the right expression identifies Mercury, and Venus comes after Mercury in the document See example32example32 Do Lab13

83 83 Arithmetic Operators Here are the arithmetic operators: +, -, *, div, mod, idiv The idiv operates on integers and returns an integer rounded toward zero, e.g. 3 idiv 2 returns 1 -5 idiv 2 returns -2 See example33example33

84 84 Equivalent n idiv m floor(n div m) if n and m are positive ceiling(n div m) if n or m is negative

85 85 current-dateTime Function current-dateTime() is an XPath 2.0 function that returns the current date and time, e.g. 2008-01-19T14:19:26.406-05:00 The value returned by this function is of type xs:dateTime (the XML Schema dateTime datatype). See example34example34

86 86 The matches() Function The form of the matches function is: matches(input string, regex) It is a boolean function. It returns true if the input string matches the regular expression, false otherwise. if (matches(/planets/planet[2]/name, 'Venus')) then 'Success' else 'Failure' The matches() function evaluates to true; the result is 'Success'

87 87 The matches() Function if (matches(/planets/planet[2]/name, 'V[a-z]+s')) then 'Success' else 'Failure' This regex says: Any string that starts with 'V' ends with 's' and has at least one lowercase letter of the alphabet. See example44example44

88 88 Regular Expressions The following 4 slides show examples of regular expressions: Regular Expressions Chapter \d Chapter \d a*b [xyz]b a?b a+b [a-c]x Examples Chapter 1 b, ab, aab, aaab, … xb, yb, zb b, ab ab, aab, aaab, … ax, bx, cx

89 89 Regular Expressions (cont.) Regular Expressions [a-c]x [-ac]x [ac-]x [^0-9]x \Dx Chapter\s\d (ho){2} there (ho\s){2} (a|b)+x Examples ax, bx, cx -x, ax, cx ax, cx, -x any non-digit char followed by x Chapter followed by a blank followed by a digit hoho there any (one) char followed by abc ax, bx, aax, bbx, abx, bax,...

90 90 Regular Expressions (cont.) a{1,3}x a{2,}x \w\s\w ax, aax, aaax aax, aaax, aaaax, … word character (alphanumeric plus dash) followed by a space followed by a word character [a-zA-Z-[Ol]]* A string composed of any lower and upper case letters, except "O" and "l" \. The period "." (Without the backward slash the period means "any character")

91 91 Regular Expressions (cont.) ^Hello Hello$ ^Hello$ Hello (and it must be at the beginning) Hello (and it must be at the end) Hello (and it must be the only value)

92 92 Regular Expressions (cont.) \n \r \t \\ \| \- \^ \? \* \+ \{ \} \( \) \[ \] linefeed carriage return tab The backward slash \ The vertical bar | The hyphen - The caret ^ The question mark ? The asterisk * The plus sign + The open curly brace { The close curly brace } The open paren ( The close paren ) The open square bracket [ The close square bracket ]

93 93 Regular Expressions (concluded) \p{L} \p{Lu} \p{Ll} \p{N} \p{Nd} \p{P} \p{Sc} A letter, from any language An uppercase letter, from any language A lowercase letter, from any language A number - Roman, fractions, etc A digit from any language A punctuation symbol A currency sign, from any language \p{Sc}\p{Nd}+(\.\p{Nd}\p{Nd})? "currency sign from any language, followed by one or more digits from any language, optionally followed by a period and two digits from any language"

94 94 Different from the Regex in the XML Schema Pattern Facet Consider this XML Schema element declaration: Hello And suppose this is the input: The input validates against the schema. That is, the string "Hello" matches the regex in the pattern facet. Likewise, using the same input and regex, the matches function succeeds: if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'

95 95 Different from the Regex in the XML Schema Pattern Facet He said Hello World Next, consider this input: The input does not validate against the schema. That is, the string "He said Hello World" does not match the regex in the pattern facet. Conversely, the matches function does succeed: if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure'

96 96 XSD Regex's are Implicitly Achored When you give a regex in a pattern facet, there are "implicit anchors" in the regex. The regex "Hello" is actually this: ^Hello$ The ^ matches the start of the input The $ matches the end of the input Thus "Hello" matches only input that starts with H, ends with o, and in between is ello.

97 97 No Implicit Anchors in XPath Regex's The regex "Hello" in XPath has no implicit anchors. Any anchors must be explicitly specified. Thus, the regex "Hello" matches any input that contains the string Hello if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure' is equivalent to: if (contains(//Free-text, 'Hello')) then 'Success' else 'Failure' See example45example45

98 98 Case-Insensitivity Mode The matches function has an optional third argument: matches(input, regex, flags) The "i" flag is used to: perform a case-insensitive comparison of the input and the regex. Example: suppose this is the input: He said HELLO WORLD Consider this XPath: if (matches(//Free-text, 'Hello', 'i')) then 'Success' else 'Failure' The result is 'Success' because the input is checked to see if it contains 'Hello', 'hello', 'HELLO', 'HeLLO', etc.

99 99 The Default is Case-Sensitive If the "i" flag is not used in the matches function, it defaults to a case-sensitive comparison. Consider this XPath: if (matches(//Free-text, 'Hello')) then 'Success' else 'Failure' The result is 'Failure' because the input is checked to see if it contains 'Hello' See example46example46

100 100 Multiline Mode The "m" flag is used to indicate that the input should be treated as composed of one or more lines, each line has a start and end, and the regex should be compared against each line. Example: suppose this is the input: He said Hello World Consider this XPath: if (matches(//Free-text, '^Hello', 'm')) then 'Success' else 'Failure' The result is 'Success.' The regex says: does the input start with the string 'Hello.' The 'm' flag say: check each line. Thus, the result is 'Success' since the second line start with 'Hello.'

101 101 The Default is One Long String If the "m" flag is not used in the matches function, it defaults to treating the input as one long string, with one start and one end. Consider this XPath: if (matches(//Free-text, '^Hello')) then 'Success' else 'Failure' The result is 'Failure' because the input is treated as one long string and 'Hello' does not start the string. See example47example47

102 102 Dot-all Mode The "s" flag is used to indicate that the dot (.) character matches every character, including the newline (x0A) character. If the "s" flag is not used, the default behavior is for the dot character to match every character except the newline character. if (matches('Hello World', 'H.*World')) then 'Success' else 'Failure' The result is 'Failure' if (matches('Hello World', 'H.*World', 's')) then 'Success' else 'Failure' The result is 'Success' See example48example48

103 103 Ignore Whitespace Mode The "x" flag is used to indicate that whitespace in a regex should be ignored. If the "x" flag is not used then any whitespace in the regex is treated as part of the regex. if (matches('abcabc', '(a b c)+')) then 'Success' else 'Failure' The result is 'Failure.' The regex only matches this input: a b c, a b c a b c, etc. if (matches('abcabc', '(a b c)+', 'x')) then 'Success' else 'Failure' The result is 'Success.' The regex only matches this input: abc, abcabc, etc. See example49example49

104 104 Multiple Flags Zero or more flags can be specified. The default value is used for modes not specified. if (matches('Hello World', '^WORLD$', 'im')) then 'Success' else 'Failure' The result is 'Success.' The regex says: The input must begin and end with the literal string 'WORLD.' The flags say: ignore case and treat the input as 2 lines, and compare each line. See example50example50 Do Lab14

105 105 The tokenize() Function Use to split up a string into pieces (tokens). A regex specifies the characters that separate the tokens. for $i in tokenize('12, 16, 3, 99', ',\s*') return $i The result is: 12 16 3 99

106 106 Use Flags with tokenize() The flags (i, m, s, x) we saw with the matches() function are also available with tokenize() for $i in tokenize('12xx16XX3xX99', 'xx', 'i') return $i The result is: 12 16 3 99 See example51example51

107 107 Separators are Discarded The separators are specified using a regex. The input string is processed from left to right, looking for substrings that match the regex. The separators are discarded, the remaining strings are collected and yield the output sequence.

108 108 Example: Footnote References as Separators Tokenize the input using [n] as the separators. For example, tokenize this: XPath[1] XSLT[2] into these tokens: XPath XSLT Will this work? tokenize('XPath[1] XSLT[2]', '\[.+\]')

109 109 + is a Greedy Quantifier The regex on the previous slide does not produce the desired result. Here's why: the + operator searches for the longest string that matches. It is called a greedy operator. \[.+\] Read as: find the longest string that starts with '[' and ends with ']' See example52example52

110 110 Why Does This Work? tokenize('XPath[1] XSLT[2]', '\[\d+\]')

111 111 Regex is for [ digit(s) ] tokenize('XPath[1] XSLT[2]', '\[\d+\]') Only permit digits in the brackets See example53example53

112 112 +? is a non-Greedy Operator If you want to match the shortest possible substring, add a '?' after the quantifier to make it non-greedy. \[.+?\] Read as: find the shortest string that starts with '[' and ends with ']' tokenize('XPath[1] XSLT[2]', '\[.+?\]') Yields the desired tokens: 'XPath' and 'XSLT' See example54example54

113 113 * and + are Greedy Above we saw that + is greedy * is also greedy To make them non-greedy append a '?' *? and +?

114 114 Regex with 2 Alternatives, and Both Match Consider this XPath: tokenize('bab', 'a|ab') What tokens will be generated? {b, b} or {b}

115 115 First Alternative Wins! If multiple alternatives match, the first one is used. Thus, the result is: {b, b} Suppose that's not what we want. We want the longest alternative ('ab') used whenever possible. See example55example55

116 116 Solution Both of these regex's give the desired result: ab|a or ab? See example56example56

117 117 Separator Matches Beginning and Ending Consider this XPath: tokenize('aba', 'a') The input string starts with the separator and ends with the separator What will be the result?

118 118 Zero-length Strings The output is a zero-length string, 'b', zero- length string: {'', 'b', ''} See example57example57

119 119 Regex Doesn't Match Input If the regex doesn't match the input string then the result is the input string: tokenize('bbb', 'a') produces {'bbb'} See example58example58 Do Lab15

120 120 What Separator? Suppose you want to split (tokenize) this string W151TBH into {'W', '151', 'TBH'} That is, separate the numeric from the alphabetic. What regex would you use?

121 121 Need More Knowledge The problem can't be solved given what we currently know. However, it can be solved by using the tokenize() function with the replace() function, so let's learn about replace().

122 122 The replace() Function The replace() function replaces any string that matches the regex with a replacement string: replace(input, regex, replacement) Example: this removes all vowels: replace('Hello World', '[aeiou]', '') returns: {'Hll Wrld'} See example59example59

123 123 Example What is the result of this replace: replace('banana', '(an)*a', '#') See example60example60

124 124 * is a Greedy Operator The result of: replace('banana', '(an)*a', '#') is b# (an)* looks for the longest string of 'anan…' The * is a greedy operator To make it non-greedy, append ? to the * replace('banana', '(an)*?a', '#') The result is: b#n#n# See example61example61

125 125 Two Matching Alternatives Suppose the regex contains two alternatives, and both match: replace('banana', 'a|an', '#') What will be the result?

126 126 Leftmost Alternative Wins The rule is that the first (leftmost) alternative wins: replace('banana', 'a|an', '#') results in: b#n#n# Switching the alternatives: replace('banana', 'an|a', '#') results in: b### See example62example62

127 127 Using Variables in the Replacement String Consider a regex composed of a sequence of parenthesized expressions: ( … )( … )( … ) $1$2$3 $1 stands for the characters matched by the first parenthesized expression $2 stands for the characters matched by the second parenthesized expression … $9 stands for the characters matched by the ninth parenthesized expression

128 128 Example: Insert Hyphens into a Date replace('12March2008', '([0-9]+)([a-zA-Z]+)([0-9]+)', '$1-$2-$3') The result is: 12-March-2008 See example63example63

129 129 Regex Doesn't Match Input If the regex doesn't match the input then the result will be unchanged: replace('aaaa', 'b', '#') The result is: aaaa See example64example64

130 130 Use Flags with replace() replace() uses the same flags as matches() and tokenize(): i, m, s, x Example: replace('Haha', 'h', 'b', 'i') returns: baba See example65example65 Do Lab16

131 131 Tokenize this String How would you separate the numeric parts from the character parts: W151TBH {'W', '151', 'TBH'}

132 132 Step 1 Use replace() to append a hash mark (#) onto the end of each part: W151TBH W#151#TBH# This is accomplished using replace: replace('W151TBH', '([0-9]+|[a-zA-Z]+)', '$1#') See example66example66

133 133 Step 2 Tokenize using # as the separator: W#151#TBH# {'W', '151', 'TBH', ''} This is accomplished by this: tokenize('W#151#TBH#', '#') See example67example67

134 134 Step 3 Remove the zero-length string ('W', '151', 'TGH', '')[.] The predicate says: Give me the value of the sequence. Recall that the value of ('a', '')[.] is just ('a') See example68example68

135 135 Putting it all Together tokenize(replace('W151TBH', '([0-9]+|[a-zA-Z]+)', '$1#'), '#')[.] This produces: ('W', '151', 'TBH') See example69example69

136 136 What does the predicate apply to? What is the result of these statements? //name[1] (//name)[1]

137 137 Answer //name[1] returns the first element in each element. –Number of elements returned: 3 (//name)[1] returns the first element among all the elements in all the elements. –Number of elements returned: 1 See example70example70

138 138 Select the first Book by each Author Illusions The Adventures of a Reluctant Messiah Richard Bach 1977 0-440-34319-4 Dell Publishing Co. The First and Last Freedom J. Krishnamurti 1954 0-06-064831-7 Harper & Row Jonathan Livingston Seagul Richard Bach 1970 0-684-84684-5 Simon & Schuster Select these two

139 139 Select the first Book by each Author //Book[not(Author = preceding::Book/Author)] The predicate evaluates to true if the Author of the Book is not the same as the Author of a preceding Book See example71example71 Do Lab17

140 140 XPath Functions nctions.asp nctions.asp operators/#contents operators/#contents

141 141 XPath 2.0 Functions

142 142 distinct-values(values) This XPath function will return a sequence composed of unique values. distinct-values((2, 2, 3, 4, 1, 4, 2, 6, 3, 9)) Output: 2 3 4 1 6 9 Note that the sequence of integers is wrapped within a pair of parentheses. Why? Because the function takes only one argument. See example72example72

143 143 Jeff lightgrey David lightblue Roger lightyellow Sally lightgrey Linda purple distinct-values(/FitnessCenter/Member/FavoriteColor) Output: lightgrey lightblue lightyellow purple Another Example See example73example73 Do Lab18

144 144 doc(url) The doc(url) function is used to retrieve data from another XML document. doc('FitnessCenter2.xml') See example74example74 You must put quotes around the file name. Actually, the argument to doc() is a URL.

145 145 data(item) This function returns the (atomic) value of node, i.e., it "atomizes" the node. This function is exactly the same as the string(item) function, except the string function always returns the value of the item as a string, whereas the data(item) function returns the value of the item with its type intact.

146 146 data(item) string(/FitnessCenter/Member[1]/MembershipFee) + 1 error data(/FitnessCenter/Member[1]/MembershipFee) + 1 341 data(340) + 1 341 See example75example75

147 147 error(QName?, description) You can raise an error in your XPath using the error() function. for $i in /FitnessCenter/Member return if (number($i/MembershipFee) lt 0) then error((), 'Invalid value for MembershipFee') else true() example76example76

148 148 trace(value, message) This is used for debugging, to monitor the execution. The trace() function does two things: –it returns (outputs) value –it displays message and information about value for $i in /FitnessCenter/Member return trace($i/MembershipFee, 'The membership fee is:') Output: 340 -500 340 Screen: The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[1]/MembershipFee[1] The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[2]/MembershipFee[1] The membership fee is: [1]: element(MembershipFee, untyped): /FitnessCenter/Member[3]/MembershipFee[1] example77example77

149 149 compare(string1, string2) This function performs a string comparison of string1 against string2. If string1 is less than string2 then it returns -1 If string1 is equal to string2 then it returns 0 If string1 is greater than string2 then it returns 1 compare('ab','abc') compare('ab','ab') compare('abc','ab') Output: 0 1 example78example78

150 150 string-join(sequence, separator) The first argument identifies any number of values. The function will concatenate all the values, placing separator between each value. string-join(('a','b','c'),' ') string-join(/FitnessCenter/Member/Name,'/') Output: a b c Jeff/David/Roger example79example79

151 151 An elegant way of creating the XPath to any node string-join(for $i in ancestor-or-self::* return name($i),'/') This returns the name of the current node (self) plus all its ancestors Example: Suppose that the current node is FavoriteColor. Then this will return: FitnessCenter Member FavoriteColor And this function will concatentate these values together, separating each value with / Thus, the output is: FitnessCenter/Member/FavoriteColor See example80example80 Do Lab19

152 152 starts-with(string-to-test, string) This function returns true if string-to-test starts with string, false otherwise. starts-with('abc', 'a') starts-with(/FitnessCenter/Member[1]/FavoriteColor, 'light') Output: true Note: this XPath function is also present in version 1.0 See example81example81

153 153 ends-with(string-to-test, string) This function returns true if string-to-test ends with string, false otherwise. ends-with('xyz', 'yz') ends-with(/FitnessCenter/Member[1]/FavoriteColor, 'grey') Output: true Note: this XPath function is not present in version 1.0 See example82example82

154 154 String Functions You Already Know contains(string-to-test, string) substring(string, starting-loc, length?) substring-before(string, match-string) substring-after(string, match-string) translate(string, from-pattern, to-pattern) See example83example83

155 155 normalize-space(string) This function strips leading and trailing whitespace (space, carriage return, tab), and replaces multiple whitespaces within the data by a single space. normalize-space(' A cat ate the mouse ') normalize-space('There are two lines') Output: A cat ate the mouse There are two lines See example84example84

156 156 upper-case(string) lower-case(string) upper-case('hello world') lower-case('BLUE SKY') Output: HELLO WORLD Output: blue sky See example85example85

157 157 escape-html-uri(uri) This function makes a URI usable by browsers, by escaping non-ASCII characters. escape-html-uri('Π') Output: See example86example86

158 158 year-from-date(xs:date) The argument of this function is a date as defined in XML Schemas. Recall that the format of a date is: CCYY-MM-DD year-from-date(xs:date('2009-09-19')) Output: 2009 See example87example87

159 159 Many Date, Time Functions! year-from-dateTime(xsd:dateTime) month-from-dateTime(xsd:dateTime) day-from-dateTime(xsd:dateTime) hours-from-dateTime(xsd:dateTime) minutes-from-dateTime(xsd:dateTime) seconds-from-dateTime(xsd:dateTime) timezone-from-dateTime(xsd:dateTime) year-from-date (xsd:date) month-from-date (xsd:date) day-from-date (xsd:date) timezone-from-date (xsd:date) hours-from-time (xsd:time) minutes-from-time (xsd:time) seconds-from-time (xsd:time) timezone-from-time (xsd:time) example88example88

160 160 root(node?) Document / PI Element FitnessCenter Element Member Element Member Element Member Element Name Element FavoriteColor Text Jeff Text lightgrey Element Name Element FavoriteColor Text David Text lightblue Element Name Element FavoriteColor Text Roger Text lightyellow The root() function returns the document node

161 161 Useful if working with multiple documents The root() function can be very useful if are working with multiple documents. The following XPath expression outputs the name of every node in the document, regardless of what document is currently being processed. for $i in root()//* return name($i) See example89example89

162 162 subsequence(sequence, start-loc, length?) This function returns a portion of sequence. Namely, it returns the items in sequence starting at index position start-loc. If length is not specified then it returns all the following items in the sequence. Otherwise, it returns length items. subsequence((1 to 10), 2, 5) subsequence(//Name, 2) Output: 2,3,4,5,6 David Roger See example90example90 Do Lab20

163 163 zero-or-one(sequence) one-or-more(sequence) exactly-one(sequence) These functions are used to assert that a sequence contains the number of occurrences that you expect. Each function will generate an error if the sequence does not contain the expected number of occurrences. If the sequence does contain the expected number of occurrences then it simply returns the sequence zero-or-one(/FitnessCenter/Member[1]/Name) one-or-more(/FitnessCenter/Member[1]/Phone) exactly-one(/FitnessCenter/Member[1]/FavoriteColor) See example91example91

164 164 avg(sequence) avg((1 to 100)) avg(//MembershipFee) Output: 50.5 393.3333333333 Note that the avg() function has only one argument. Consequently, in the first XPath expression it was necessary to wrap the items with parentheses. See example92example92

165 165 max(sequence) The max() function enables you to obtain the maximum value among a sequence of values. max((5, 3, 19, 2, -7)) max(//MembershipFee) See example93example93 Output: 19 500

166 166 min(sequence) The min() function enables you to obtain the minimum value among a sequence of values. min((5, 3, 19, 2, -7)) min(//MembershipFee) See example94example94 Output: -7 340

167 167 Why 2 sets of parentheses? Did you notice that I used two sets of parentheses in the min and max functions? –min((2,1,3)) and max((2,1,3)) In fact, if you omitted the inner parenthesis you would get an error message. –min(2,1,3) and max(2,1,3) Error!

168 168 Reason for 2 parentheses Both the min and max functions have an optional second argument, collation: min(sequence, collation?) max(sequence, collation?) The collation argument enables you to specify the collating sequence that should be used to determine the min/max value. We will typically just use the default collating sequence. Consequently, we will not use the second argument. Do you now understand the need for the 2 parentheses? min(2,1) Is this a member of the sequence, or is it a collation? Instead, you must do this: min((2,1))

169 169 number(value), string(value) number(value) … "Hey, treat value as a number". string(value) … "Hey, treat value as a string". 09 represents the number 9, which has a string value of '9' See example95example95

170 170 Lesson Learned When you are doing a comparison of two values it is very good practice to wrap your values within either number() or string(). That way you are explicitly telling the XSLT Processor how you want the values compared - as numeric values or as string values.

171 171 exists() function This function returns either true or false. This function is used to determine if an element exists. if (exists(/FitnessCenter/Member[3])) then 'There is a 3rd Member' else 'Error! No 3rd Member' Output: There is a 3 rd Member if (exists(/FitnessCenter/Member[99])) then 'There is a 99th Member' else 'Error! No 99th Member' Output: Error! No 99 th Member

172 172 exists(()) = false exists(()) Output: false "The empty sequence does not exist" See example96example96

173 173 empty() function This function returns either true or false. This function is used to determine if an element does not exist. if (empty(/FitnessCenter/Member[3])) then 'No 3rd Member' else 'Error! There is a 3rd Member' Output: Error! There is a 3rd Member if (empty(/FitnessCenter/Member[99])) then 'No 99th Member' else 'Error! There is a 99th Member' Output: No 99th Member example97example97

174 174 empty(()) = true empty(()) Output: true "The empty sequence is empty" See example97example97

175 175 empty() = not(exists()) empty(/FitnessCenter/Member[3]) eq not(exists(/FitnessCenter/Member[3])) Output: true empty(/FitnessCenter/Member[99]) eq not(exists(/FitnessCenter/Member[99])) Output: true See example98example98

176 176 deep-equal(sequence1, sequence2) See example99example99 This function returns true if the two sequences are identical in value and position.

177 177 operand instance of datatype You can use the XPath instance of boolean operator to determine if an operand is of a particular datatype. The operand must not be a node. You must first atomize the node, using data(.) instance of checks the datatype label on the operand. The label must match datatype. Thus 340 is an instance of xs:integer, but not xs:positiveInteger

178 178 operand instance of datatype example100example100

179 179 operand cast as datatype You can use the XPath cast as boolean operator to make operand be a particular datatype: equivalent See example101example101

180 180 operand castable as datatype You can use the XPath castable as boolean operator to determine if an operand can be cast to a particular datatype: See example102example102 if (//Member[1]/MembershipFee castable as xs:integer) then (//Member[1]/MembershipFee cast as xs:integer) * 2 else false()

181 181 name, local-name, namespace-uri name() returns whatever is inside local-name() returns the name that's after the colon namespace-uri() returns the namespace See example103example103

182 182 string(node) This extracts the data of a node and returns it as a string. example104example104

183 183 base-uri(node?), document-uri(node) These return the filepath/URL to where the XML is executing. See example105example105

184 184 Kind Tests Here are different ways to select a kind of item: node(): selects any kind of node (element, attribute, text, comment, PI, namespace) text(): selects a text node element(): selects an element node element(Member): selects Member element nodes attribute(): selects attribute nodes attribute(id): selects id attribute nodes document(): selects the document node comment(): selects a comment node processing-instruction(): selects a PI node

185 185 Occurrence Indicators Use + to indicate one or more Use * to indicate zero or more Use ? to indicated zero or one

186 186 See example107example107 Please look at these examples; they illustrate the kind test and occurrence indicators

187 187 XPath 2.0 is a Strongly Typed Language Each XPath 2.0 function returns a value of a specific datatype. The argument(s) that are passed to the function must be of the required datatype. Also, the XPath 2.0 operators require the operands be of a required datatype. For example, you cannot perform arithmetic operations on strings without explicitly telling the processor to treat your strings like numbers.

188 188 XPath 2.0 is a Strongly Typed Language Consider this expression: '3' + 2 Here's the error message that you will get: Arithmetic operator is not defined for arguments of types (xs:string, xs:integer) Conversely, in XPath 1.0 the processor automatically coerces the string into a number. See example35example35

189 189 Advantages of a Strongly Typed System Early and reliable identification of errors. –Example: '3' + 2 will generate an error because the type of the first operand is not appropriate for the operator. Implementations (XPath processors) can optimize performance if they know about the types of the data. –Example: Consider this comparison: //planet/* = 'mars' If the processor knows the datatypes of each child of then it can just compare the string children against 'mars'

190 190 Disadvantages of a Strongly Typed System XPath authoring is complicated because more attention must be paid to types. –Example: if you want to compare a number against a number that is represented as a string then you have to explicitly cast the number to a string and then do the comparison. Supporting an extensive type system puts a burden on implementers of XPath. This is why schema awareness is optional for implementers.

191 191 XML Schema Datatypes XPath 2.0 uses the datatypes defined in the XML Schema Datatypes Specification XML Schema Datatypes Specification

192 192

193 193 XPath Functions are Strongly Typed Each XPath function requires arguments to be of a certain datatype. Each XPath function returns a result as a certain datatype. Example: here is the signature of the current- dateTime function: current-dateTime() as xs:dateTime Read as: "The current-dateTime function is invoked without any arguments; it returns a value that has the datatype: XML Schema dateTime."

194 194 XPath Operators are Strongly Typed Each XPath operator requires the operands to be of a certain datatype. Each XPath operator returns a result as a certain datatype. Example: you can subtract two dateTime values and the result is of type xs:duration current-dateTime() - xs:dateTime('1970-01-01T00:00:00Z') returns P14275DT15H49M28.796S Read as: "The duration between now (Jan. 31, 2009, 10:49am) and Jan. 01, 1970 is 14,275 days, 15 hours, 49 minutes, 28.796 seconds." See example36example36

195 195 Constructor Functions Constructor functions are used to construct atomic values with the specified types. Example: the constructor: xs:dateTime('1970-01-01T00:00:00Z') constructs an atomic value whose type is xs:dateTime. The signature of the xs:dateTime constructor is: xs:dateTime($arg as xs:anyAtomicType?) as xs:dateTime? There is a constructor function for each of the W3C built-in atomic types. If the argument is a node, the atomic value is extracted and that value is cast to the type. If the argument is an empty sequence, the result is an empty sequence. The complete list of constructor functions.complete list of constructor functions

196 196 xs:string($arg as xs:anyAtomicType?) as xs:string? xs:boolean($arg as xs:anyAtomicType?) as xs:boolean? xs:decimal($arg as xs:anyAtomicType?) as xs:decimal? xs:float($arg as xs:anyAtomicType?) as xs:float? Implementations ·may· return negative zero for xs:float("-0.0E0").·may· xs:duration($arg as xs:anyAtomicType?) as xs:duration? xs:dateTime($arg as xs:anyAtomicType?) as xs:dateTime? xs:time($arg as xs:anyAtomicType?) as xs:time? xs:date($arg as xs:anyAtomicType?) as xs:date? xs:gYearMonth($arg as xs:anyAtomicType?) as xs:gYearMonth? xs:gYear($arg as xs:anyAtomicType?) as xs:gYear? xs:gMonthDay($arg as xs:anyAtomicType?) as xs:gMonthDay? xs:gDay($arg as xs:anyAtomicType?) as xs:gDay? xs:gMonth($arg as xs:anyAtomicType?) as xs:gMonth? xs:hexBinary($arg as xs:anyAtomicType?) as xs:hexBinary? xs:base64Binary($arg as xs:anyAtomicType?) as xs:base64Binary? xs:anyURI($arg as xs:anyAtomicType?) as xs:anyURI? xs:QName($arg as xs:anyAtomicType) as xs:QName? xs:normalizedString($arg as xs:anyAtomicType?) as xs:normalizedString? xs:token($arg as xs:anyAtomicType?) as xs:token? xs:language($arg as xs:anyAtomicType?) as xs:language? xs:NMTOKEN($arg as xs:anyAtomicType?) as xs:NMTOKEN? xs:Name($arg as xs:anyAtomicType?) as xs:Name? xs:NCName($arg as xs:anyAtomicType?) as xs:NCName? xs:ID($arg as xs:anyAtomicType?) as xs:ID? xs:IDREF($arg as xs:anyAtomicType?) as xs:IDREF? xs:ENTITY($arg as xs:anyAtomicType?) as xs:ENTITY? xs:integer($arg as xs:anyAtomicType?) as xs:integer? xs:nonPositiveInteger($arg as xs:anyAtomicType?) as xs:nonPositiveInteger? xs:negativeInteger($arg as xs:anyAtomicType?) as xs:negativeInteger? xs:long($arg as xs:anyAtomicType?) as xs:long? xs:int($arg as xs:anyAtomicType?) as xs:int? xs:short($arg as xs:anyAtomicType?) as xs:short? xs:byte($arg as xs:anyAtomicType?) as xs:byte? xs:nonNegativeInteger($arg as xs:anyAtomicType?) as xs:nonNegativeInteger? xs:unsignedLong($arg as xs:anyAtomicType?) as xs:unsignedLong? xs:unsignedInt($arg as xs:anyAtomicType?) as xs:unsignedInt? xs:unsignedShort($arg as xs:anyAtomicType?) as xs:unsignedShort? xs:unsignedByte($arg as xs:anyAtomicType?) as xs:unsignedByte? xs:positiveInteger($arg as xs:anyAtomicType?) as xs:positiveInteger? xs:yearMonthDuration($arg as xs:anyAtomicType?) as xs:yearMonthDuration? xs:dayTimeDuration($arg as xs:anyAtomicType?) as xs:dayTimeDuration? xs:untypedAtomic($arg as xs:anyAtomicType?) as xs:untypedAtomic?

197 197 New Datatypes The XPath 2.0 working group decided that the XML Schema datatypes are not complete, so they created a few new ones and added them to the XML Schema datatypes.

198 198 xs:anyAtomicType xs:anyAtomicType is an abstract type that is the base type of all atomic values. All datatypes, including the original XML Schema datatypes, are subtypes of xs:anyAtomicType "Abstract" means that it cannot be used directly; instead, a subtype must be used.

199 199 xs:untypedAtomic Any value that has not been associated with a schema type has the type xs:untypedAtomic.

200 200 xs:dayTimeDuration This is a subtype of xs:duration. It has only day, hour, minute, and second components. Subtracting two xs:date values yields a result of type xs:dayTimeDuration current-date() - xs:date('1970-01-01') P1Y2M3DT10H30M12.3S P428DT10H30M12.3S xs:duration xs:dayTimeDuration See example37example37 subtype

201 201 Subtracting Two Dates Here's an example of subtracting two xs:date values: current-date() - xs:date('1970-01-01') The resulting value is an xs:dayTimeDuration value. Here's how it is specified in the XPath 1.0 and XPath 2.0 Functions and Operators specification:XPath 1.0 and XPath 2.0 Functions and Operators op:subtract-dates($arg1 as xs:date, $arg2 as xs:date) as xs:dayTimeDuration? "When subtracting two values, each of type xs:date, the resulting value is of type xs:dayTimeDuration."

202 202 xs:yearMonthDuration This is also a subtype of xs:duration. It has only has the year and month components. P1Y2M3DT10H30M12.3S P1Y2M xs:duration xs:yearMonthDuration subtype

203 203 Datatype of Literals and Expressions datatype of current-dateTime() - xs:dateTime('1970-01-01T00:00:00Z') is xs:dayTimeDuration datatype of current-date() - xs:date('1970-01-01') is xs:dayTimeDuration datatype of 3 is xs:integer datatype of 3.14 is xs:decimal datatype of "3" is xs:string datatype of true is Unknown xs:untypedAtomic datatype of true() is xs:boolean datatype of 1E3 is xs:double See example38example38

204 204 Datatype of Input Data Unassociated with a Schema datatype of //planet[1]/mass is Unknown xs:untypedAtomic datatype of //planet[1]/mass/text() is Unknown xs:untypedAtomic See example39example39

205 205 Datatype of Arithmetic Operations datatype of 2 + 2 is xs:integer datatype of 2.0 + 2.0 is xs:decimal datatype of 2.0 + 2 is xs:decimal datatype of 6 div 2 is xs:integer datatype of 6.0 div 2.0 is xs:decimal datatype of 6.0 div 2 is xs:decimal See example40example40

206 206 Numeric Types The 4 main numeric types supported in XPath 2.0 are: –xs:decimal –xs:integer –xs:float –xs:double All arithmetic operators and functions that can be performed on these types can also be performed on their subtypes.

207 207 xs:decimal Numeric literals that contain only digits and a decimal point (no letter E or e) are considered to be decimal numbers with the type xs:decimal. Example: 25.5 and 25.0 are xs:decimal values.

208 208 xs:integer Numeric literals that contain only digits (no decimal point or the letter E or e) are considered to be integer numbers with the type xs:integer. Example: 25 is an integer value.

209 209 xs:float and xs:double Numeric literals that contain the letter E or e are considered to be double numbers with the type xs:double. Example: 1E3 and 1e3 are xs:double values. See example41example41

210 210 How a Value becomes Numeric The value is a numeric literal The value is selected from an input document that is associated with a schema that declares it to have a numeric type The value is the result of a function that returns a number, e.g. count(…) returns xs:integer The value is the result of a numeric constructor function, e.g. xs:float("25.83") returns a xs:float value The value is the result of an explicit cast, e.g., //planet[1]/mass cast as xs:decimal The value is cast automatically when it is passed to a function

211 211 The number() Function The number() function is almost equivalent to the xs:double() constructor function. Both return a value of type xs:double. Differences: –number("hi") = NaN –xs:double("hi") = error –number(()) = NaN –xs:double(()) = error See example42example42

212 212 Numeric Type Promotion If an operation, such as comparison or an arithmetic operation, is performed on values of two different primitive numeric types, one value's type is promoted to the type of the other.

213 213 Numeric Type Promotion Operand #1Operand #2Promoted to xs:decimalxs:floatxs:float xs:decimalxs:doublexs:double xs:floatxs:doublexs:double

214 214 Numeric Type Promotion Example: 1.0 + 1.2E0 = 2.2E0 xs:decimalxs:double promote xs:double Numeric type promotion happens automatically in arithmetic expressions and comparison expressions. It also occurs in calls to functions that expect numeric values. See example43example43

215 215 Subtype Substitution Wherever a type is expected, you can substitute it with any of its derived types. Example: a function that expects a xs:decimal value can be invoked with an xs:integer value since integer derives from decimal.

216 216

Download ppt "1 XPath 2.0 Roger."

Similar presentations

Ads by Google