Presentation is loading. Please wait.

Presentation is loading. Please wait.

Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 1 I18n Sensitive Processing.

Similar presentations


Presentation on theme: "Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 1 I18n Sensitive Processing."— Presentation transcript:

1 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 1 I18n Sensitive Processing with XQuery and XSLT Felix Sasaki World Wide Web Consortium

2 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 2 Purpose Enable the audience to use XQuery and XSLT for i18n sensitive processing and make them aware of i18n aspects of XQuery and XSLT which have to be handled carefully.

3 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 3 Topics Introduction The common underpinning: XPath 2.0 General processing of XQuery / XSLT String and number processing IRI processing Dates, timezones, language information Generating output: serialization XPath 2.0General processing Strings, numbers IRI processing Dates, language Output: serialization

4 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 4 Introduction 17 (!) specifications about "XQuery" and "XSLT", abbreviated as "QT" QT encompasses a bunch of i18n related features A complex architecture QT describes input, processing and output of XML data XPath 2.0General processing Strings, numbers IRI processing Dates, language Output: serialization

5 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 5 The different pieces of the cake 1.The common underpinning of XQuery and XSLT: XPath 2.0 data model & formal semantics 2.How to select information in XML documents: XPath Manipulating information: XPath functions and operators 4.Generating output: Serialization 5.The XQuery 1.0 and XSLT 2.0 specifications, which deploy 1-4 XPath 2.0General processing Strings, numbers IRI processing Dates, language Output: serialization

6 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 6 Attention! Basis of this presentation: A set of WORKING DRAFTS! Things might still change! XPath 2.0General processing Strings, numbers IRI processing Dates, language Output: serialization

7 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 7 Topics Introduction The common underpinning: XPath 2.0 data model General processing of XQuery / XSLT String and number processing IRI processing Dates, timezones, language information Generating output: serialization

8 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 8 The (very rough) big picture Input: XML documents, XML database, … QT-Processing Serialization: XML documents, XML database, … QT processing: defined in terms of XPath 2.0 data model XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

9 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 9 XPath 2.0 data model: sequences of items, i.e. nodes … –document node –element nodes: … –attribute nodes: –namespace nodes: … –text nodes: My yellow (and small) flower. –comment node: –processing instruction: and / or atomic values (see below) XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

10 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 10 Visualization of nodes document() mydoc.xml element() myDoc element() myEl element() myEl attribute() myAttr attribute() myAttr order of nodes is defined by document order:

11 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 11 Atomic values Nodes in XPath 2.0 have string values and typed values, i.e. a sequence of atomic values "string" function: returns a string value, e.g. –string(doc("mydoc.xml")) XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

12 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 12 i18n related typed values From XML Schema: built in primitive data types like anyURI, dateTime, gYearMonth, gYear, … specially for XPath 2.0: xdt:dayTimeDuration, … Good for: URI processing, time related processing XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

13 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 13 Not in the data model... is: –Character encoding schema –CDATA section boundaries –entity references –DOCTYPE declaration and internal DTD subset All this information might get lost during XQuery / XSLT processing Mainly XSLT allows the user to parameterize the output, i.e. the serialization of the data model XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

14 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 14 Topics Introduction The common underpinning: XPath 2.0 data model General processing of XQuery / XSLT String and number processing IRI processing Dates, timezones, language information Generating output: serialization

15 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 15 General processing of XQuery / XSLT XQuery: –Input: zero or more source documents –Output: zero or more result documents XSLT: –Input: zero or more source documents –Output: zero or more result documents What is the difference? XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

16 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 16 An example Processing input "mydoc.xml": Desired processing output "yourdoc.xml": XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

17 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 17 XSLT Template based processing Traversal of input document, match of templates "Push processing": Nodes from the input are pushed to matching templates...

18 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 18 Templates and matching nodes document() mydoc.xml element() myDoc element() myEl element() myEl attribute() myAttr attribute() myAttr a a c b 3 c 5 c b

19 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 19 "Pull processing": XPath expressions pull information out of document(s) xquery version "1.0"; { let $input := doc("mydoc.xml") for $elements in $input//myEl return } XQuery

20 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 20 document() mydoc.xml element() myDoc element() myEl element() myEl attribute() myAttr attribute() myAttr xquery version "1.0"; { let $input := doc("mydoc.xml") for $elements in $input//myEl return } XQuery

21 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 21 XPath 2.0 expressions …... xquery version "1.0"; { let $input := doc("mydoc.xml") for $elements in $input//myEl return

22 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 22 When to use XSLT Good for processing of mixed content, e.g. text with markup. Example task: My yellow and small flower. should become My yellow (and small) flower. Solution: push processing of the content … … XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

23 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 23 When to use XQuery Good for processing of multiple data sources in a single or multiple documents via For Let Where Order-by Return (FLWOR) expressions Example: creation of a citation index for $mybibl in ("my-bibl.xml")//entry for $citations in doc("mytext.xml") //cite where return

24 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 24 Topics Introduction The common underpinning: XPath 2.0 data model General processing of XQuery / XSLT String and number processing IRI processing Dates, timezones, language information Generating output: serialization

25 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 25 Aspects of string processing What is the scope: characters (code points) String counting Codepoint conversion String comparison: collations String comparison: regular expressions Normalization The role of schemas e.g. in the case of white space handling XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

26 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 26 Scope of string processing Basic operation: Counting 'characters' Good message: QT counts code points, not bytes or code units Attention: All string processing uses string values, not typed values! XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

27 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 27 With a schema: type = xs:date Works not works string-length(xs:string($myDoc/myEl/revision- XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization String values versus typed values

28 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 28 XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization String values versus typed values Difference: second example uses adequate type casting Type casting is not always possible: functions/#casting-from-primitive-to-primitive

29 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 29 Codepoints versus strings: XQuery {"string to code points: suçon becomes ", string-to-codepoints("suçon"), "code points to string: becomes ", codepoints-to-string((115, 117, 231, 111, 110)) } string to code points: suçon becomes code points to string: becomes suçon

30 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 30 Codepoints versus strings: XSLT string to code points: suçon becomes . code points to string: becomes

31 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 31 Collation functions: compare() compare("abc", "abc") Returns "0": Returns "-1": Returns "1": XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

32 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 32 Collation based function compare() compare("Strasse", "Straße", "myCollation") Example: returns "1" if 'myCollation' describes the order respectively: Identification of collation via an URI. XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

33 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 33 Collation identification Identification via an URI. Codepoint-based collation: functions/collation/codepoint Parameterization via an URI: XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization lang=de;strength=primary

34 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 34 String comparison: regular expressions Based on regular expressions for XML Schema datatypes, with some additions Flags for case mapping based on Unicode case mapping tables: XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

35 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 35 Normalization XML documents: not always with early unicode normalization Unicode collation algorithm ensures equivalent results Normalization can be ensured for NCF, NFD, NFKC, NFKD: suçon Output: XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

36 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 36 White space and typed values Assuming a type Comparison of typed values via eq XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization Collation might also affect white space handling

37 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 37 White space and typed values Result: "false" or "true": –"false" if type collapses whitespace –"true" if type does not collapse whitespace XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

38 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 38 Number processing: rounding number / currency formatting: round(2.5) returns 3. round(2.4999) returns 2. round(-2.5) returns -2 does not deploy culture specific rounding conventions, e.g. –round 3rd digit less than 3 to 0 or drop it (Argentina) XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

39 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 39 XSLT-specific: Numbering Conversion of numbers into a string, controlled by various attributes: XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

40 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 40 XSLT-specific: Numbering Erste ア๑ Zweite イ๒ Dritte ウ๓ Output for a sequence of three items: XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

41 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 41 XSLT-specific: Numbering format-number(): designed for numeric quantities (not necessarily whole numbers) XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

42 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 42 Topics Introduction The common underpinning: XPath 2.0 data model General processing of XQuery / XSLT String and number processing IRI processing Dates, timezones, language information Generating output: serialization

43 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 43 Status of IRI in QT In the data model: Support for IRI will be normative. data type xs:anyURI: relies on xml schema anyURI, still defined in terms of URI XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

44 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 44 Functions for IRI / URI processing casting to xs:anyURI: from untyped values or string: xs:anyURI("http://example.müller.com") XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

45 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 45 Functions for IRI / URI processing escaping URI via escape-uri, escaped- reserved="false" escape-uri ("http://example.dürst.com",false()) output: XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

46 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 46 Functions for IRI / URI processing http%3A%2F%2Fexample.d%C3%BCrst.com output with escaped-reserved="true": XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

47 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 47 Topics Introduction The common underpinning: XPath 2.0 data model General processing of XQuery / XSLT String and number processing IRI processing Dates, timezones, language information Generating output: serialization

48 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 48 Dates and time types Basis: –date and time types from XML Schema –QT specific extensions: xdt:yearMonthDuration, xdt:dayTimeDuration Operations: time comparison, time adjustment, timezone sensitive operations XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

49 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 49 Comparison of date types: xdt:yearMonthDuration("P1Y6M") eq xdt:yearMonthDuration("P1Y7M") XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization Comparison of date types output: false

50 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 50 Component extraction Extracting the timezone from a date value: timezone-from-date (xs:date(" :00")) output: PT7H XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

51 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 51 Arithmetic functions on dates and times Subtract dayTimeDurations: xdt:dayTimeDuration("P2DT12H") - xdt:dayTimeDuration("P2DT12H30M") output: -PT30M XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

52 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 52 XSLT: Formatting Dates / Times Some parameters for formatting conventions: picture string with [components]; presentation modifier; language XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

53 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 53 XSLT: Formatting Dates / Times XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization Output: September 7th September 2005

54 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 54 Processing of language information function lang: /myRoot/myEl/text()[lang("de")] returns the content of, assuming the document: Some german text. } XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

55 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 55 Processing of language information no value for xml:lang: lang("de") returns "false" XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

56 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 56 Topics Introduction The common underpinning: XPath 2.0 data model General processing of XQuery / XSLT String and number processing IRI processing Dates, timezones, language information Generating output: serialization

57 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 57 Serialization – basic concept XQuery / XSLT: process XML in terms of the XPath 2.0 data model Output: described in terms of serialization parameters XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

58 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 58 Some serialization parameters byte-order-mark cdata-section-elements encoding escape-uri-attributes media-type normalization-form use-character-maps XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

59 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 59 Output methods Pre-configuration of various serialization parameters for: –XML –XHTML –HTML –Text XQuery: –Mandatory output method: XML, version="1.0" –No need for implementations to support further serialization parameters XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

60 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 60 Output methods in XSLT Provides support for serialization parameters and output methods via –xsl:output Support also not mandatory XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

61 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 61 XSLT character maps Mapping characters to other characters Desired output: '/> XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

62 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 62 XSLT character maps Character map: XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

63 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 63 Regular expressions with XSLT XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

64 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 64 Regular expressions with XQuery xquery version "1.0"; declare function local:expandPUAChar($string as xs:string, $char as xs:string) as item()* { if (contains($string, $char)) then (substring-before($string, $char), element myChar { attribute code {string-to- codepoints($char)} }, local:expandPUAChar(substring-after($string, $char), $char)) else $string }; for $input in doc("replace-characters.xml")//text() return local:expandPUAChar($input,"") XPath 2.0 data model General processing Strings, numbers IRI processing Dates, language Output: serialization

65 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 65 Topics – finally! Introduction The common underpinning: XPath 2.0 data model General processing of XQuery / XSLT String and number processing IRI processing Dates, timezones, language information Generating output: serialization

66 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 66 Wrap up: Is it useful? Yes! QT: a power tool for i18n sensitive XML processing Quite hard to digest, but very tasty Some aspects of i18n related processing might be improved Remember: It's still a set of working drafts...

67 Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 67 I18n Sensitive Processing with XQuery and XSLT Felix Sasaki World Wide Web Consortium


Download ppt "Orlando, Florida, September 2005 I18n Sensitive Processing with XQuery and XSLT 28th Internationalization and Unicode Conference 1 I18n Sensitive Processing."

Similar presentations


Ads by Google