Datatypes in XML Jeni Tennison

Slides:



Advertisements
Similar presentations
Managing XML and Semistructured Data Lecture 12: XML Schema Prof. Dan Suciu Spring 2001.
Advertisements

Can schemas help SVG interwork with other markup vocabularies? MURATA Makoto (FAMILY Given) International University of Japan XHTML XForms SVG XML Events.
ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
XML Unit 6 October 31. XML, review XML is used to markup data Used to describe information Uses tags like HTML –But all tags are user-defined –Must be.
XML Schemas Microsoft XML Schemas W3C XML Schemas.
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Lecture 14 XML Validation. a simple element containing text attribute; attributes provide additional information about an element and consist of a name.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
XML Simple Types CSPP51038 shortcourse. Simple Types Recall that simple types are composed of text-only values. All attributes are of simple type Elements.
XML Schema Matthias Hauswirth. Agenda 4 W3C Process 4 XML Schema Requirements 4 The Specifications 4 Schema Tools.
Outline IS400: Development of Business Applications on the Internet Fall 2004 Instructor: Dr. Boris Jukic XML.
XML Schemas and Namespaces Lecture 11, 07/10/02. BookStore.dtd.
Sunday, June 28, 2015 Abdelali ZAHI : FALL 2003 : XML Schemas XML Schemas Presented By : Abdelali ZAHI Instructor : Dr H.Haddouti.
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
17 Apr 2002 XML Schema Andy Clark. What is it? A grammar definition language – Like DTDs but better Uses XML syntax – Defined by W3C Primary features.
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
Sheet 1XML Technology in E-Commerce 2001Lecture 6 XML Technology in E-Commerce Lecture 6 XPointer, XSLT.
Validating DOCUMENTS with DTDs
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XML Open Computing Institute, Inc. 1 eXtensible Markup Language (XML)
Dr. Azeddine Chikh IS446: Internet Software Development.
Neminath Simmachandran
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML Syntax - Writing XML and Designing DTD's
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
CP3024 Lecture 9 XML: Extensible Markup Language.
New Perspectives on XML, 2nd Edition
An OO schema language for XML SOX W3C Note 30 July 1999.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 5 XML Schema (Based on Møller and Schwartzbach,
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
SNU OOPSLA Lab. Logical structure © copyright 2001 SNU OOPSLA Lab.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
XML Validation II Schemas Robin Burke ECT 360. Outline Namespaces Documents  Data types XML Schemas Elements Attributes Derived data types RELAX NG.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Primer on XML Schema CSE 544 April, XML Schemas Generalizes DTDs Uses XML syntax Two parts: structure and datatypes Very complex –criticized –alternative.
QUALITY CONTROL WITH SCHEMAS CSC1310 Fall BASIS CONCEPTS SchemaSchema is a pass-or-fail test for document Schema is a minimum set of requirements.
Introduction to XML Schema John Arnett, MSc Standards Modeller Information and Statistics Division NHSScotland Tel: (x2073)
Copyright 2004 John Cowan 1 Infinite Diversity in Infinite Combinations why one schema language is not enough John Cowan.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
Lecture 23 XQuery 1.0 and XPath 2.0 Data Model. 2 Example 31.7 – User-Defined Function Function to return staff at a given branch. DEFINE FUNCTION staffAtBranch($bNo)
OVERVIEW AND PARSING JSON. What is JSON JavaScript Object Notation Used to format data Commonly used in Web as a vehicle to describe data being sent between.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Rendering XML Documents ©NIITeXtensible Markup Language/Lesson 5/Slide 1 of 46 Objectives In this session, you will learn to: * Define rendering * Identify.
1 XML and XML in DLESE Katy Ginger November 2003.
Extensible Markup Language (XML) Pat Morin COMP 2405.
Unit 4 Representing Web Data: XML
Infinite Diversity in Infinite Combinations
XML QUESTIONS AND ANSWERS
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Session III Chapter 6 – Creating DTDs
Chapter 7 Representing Web Data: XML
Chapter X IXXXXXXXXXXXXXXXX.
Chapter 9 Web Services: JAX-RPC, WSDL, XML Schema, and SOAP
THE DATATYPES OF XML SCHEMA A Practical Introduction
Department of Computer Science Cal State East Bay, Hayward, CA
Session II Chapter 6 – Creating DTDs
Document Type Definition (DTD)
Extensible Markup Language (XML)
New Perspectives on XML
Presentation transcript:

Datatypes in XML Jeni Tennison www.jenitennison.com XML Schema: Data Types Datatypes in XML Jeni Tennison www.jenitennison.com jeni@jenitennison.com 26-Dec-18

Overview Context: DSDL Part 5 What is data in XML? XML Schema: Data Types Overview Context: DSDL Part 5 What is data in XML? Why do we need to type data? validation, documentation, authoring support, application support What kinds of data do we use? survey from different markup languages How do we specify datatypes currently? Datatype Library Language Outstanding issues 26-Dec-18 Datatypes in XML 26-Dec-18

DSDL ISO-standardised schema language(s) W3C XML Schema is for the big boys want to do data binding into databases or programming languages primarily data-oriented content database vendors, web service providers people who jumped on the XML bandwagon, and are distorting it to satisfy their requirements DSDL is for the rest of us redresses the balance back towards document-oriented content better for data-oriented content too! DSDL Part 5 addresses datatyping... 26-Dec-18 Datatypes in XML

Data in XML Data appears in: Sequences of Unicode characters (strings) attribute values content of text-only elements mixed-content elements, more rarely Sequences of Unicode characters (strings) includes control characters in XML 1.1 characters, not bytes "255" is three characters, not one byte Whitespace weirdness line endings normalised to #xA whitespace in attributes replaced with spaces whitespace chars can be escaped using entities 26-Dec-18 Datatypes in XML

Data in XML Uses in *real* XML human-readable rather than machine-readable for presentation rather than processing abbreviations and text-based formats for ease of writing to control the size of the XML to reuse existing text-based formats to enable reuse in e.g. URIs datatypes rarely match programming language or database datatypes care about characters, not bytes 26-Dec-18 Datatypes in XML

Reasons for Datatyping Validation is the value allowed? does the supplied value equal the fixed/listed/key value? Equality testing isn't straightforward <element name="example"> <attribute name="quality"> <choice> <value>good</value> <value>bad</value> </choice> </parse> </element> <example quality="Good" /> <example quality=" bad " /> 26-Dec-18 Datatypes in XML

Reasons for Datatyping Application support use datatypes in XPath/XForms etc. data binding translate to appropriate datatype in programming language/database General comparisons, not just equality <xsl:for-each select="Order[Total >= 1000]"> <xsl:sort select="Date" data-type="xs:date" /> … </xsl:for-each> 26-Dec-18 Datatypes in XML

Reasons for Datatyping Documentation so users can understand what kind of value an element/attribute takes so users can understand how they have to format that value Authoring support enable applications to prompt user for values rather than using separate validation step pop-ups for enumerated values calendars for dates sliders for numbers 26-Dec-18 Datatypes in XML

Survey of Data Well-known markup languages Designed by people who should know what they're doing, so bound to be good or "designed by committee, so bound to be bad" Used extensively Hard to change XML attributes DocBook XHTML SVG MathML Dublin Core XInclude XSLT XSL-FO XML Schema RELAX NG XForms 26-Dec-18 Datatypes in XML

Standard Atomic Datatypes Strings/text variable whitespace significance variable case significance limited/extended character sets restricted characters in XInclude accept attribute extended characters in MathML values variable length (usually single characters) Numbers integers (7) decimals (7.5) scientific format (7.5E3) Booleans true/false, 1/0, yes/no 26-Dec-18 Datatypes in XML

Enumerated Values Listings of possible values May be case-insensitive xml:space - "preserve" | "default" XInclude - "xml" | "text" May be case-insensitive XHTML LinkTypes May be listed separately, accessible by URI IANA registered media types DCMI type vocabulary ISO 15924 scripts May be part of structured value (see later) xml:lang - language and country codes May be subset of allowed values SVG color keywords Often have expansion/explanation/description 26-Dec-18 Datatypes in XML

Lists Lists of values whitespace-separated comma-separated most common NMTOKENS, IDREFS comma-separated XHTML URI lists XHTML media-type lists (as in CSS2) whitespace-or-comma-separated SVG lists semi-colon-separated Dublin Core Separated Values 26-Dec-18 Datatypes in XML

Values with Units Number coupled with unit designator lengths (36pt, 3px) frequencies (5Hz, 16kHz) angles (90deg) durations (3s, 150ms) proportions (5*) percentages (25%) Some units are absolute, others relative (see later) 26-Dec-18 Datatypes in XML

Simple Structured Values Values conforming to a regular grammar describable using a regular expression though perhaps not easily many examples: dates and times URI references colours in RGB notation SVG path data SVG transformations MathML group alignment XInclude accept, accept-language attributes XPath 2.0 sequence types XPath subsets (as in W3C XML Schema) P3P type names (as in XForms) 26-Dec-18 Datatypes in XML

Complex Structured Values Values not conforming to a regular grammar XPointers XPaths XSLT patterns XSL-FO expressions regular expressions 26-Dec-18 Datatypes in XML

Use of Context Information Validity/meaning of value depends on where it appears in XML document XML (Infoset) context qualified names namespace prefixes relative URIs declared unparsed entities and notations IDs and IDREFs application context lengths - ems and exs proportions - percentages and *s 'auto'/'inherit' in XSL-FO 26-Dec-18 Datatypes in XML

Datatype Support in XML Schema XML Schema hierarchy + XPath 2.0 datatypes 26-Dec-18 Datatypes in XML

Datatype Support in XML Schema Type has: value space lexical space canonical lexical representation (usually) Union types Whitespace-separated lists items all have to be the same type (though it can be a union type) Subtypes derived using facets pattern facet controls lexical representation other facets control value space 26-Dec-18 Datatypes in XML

Problems with WXS Datatypes No way to add primitive datatypes can't represent colours, for example No way to change lexical space can't have dates in format DD/MM/YYYY Possible for canonical lexical representation to be invalid two-decimal places in price 12.50 causes problems with round-tripping Lists must be space separated, and items must be of the same type Can use regexes for lengths etc., but then comparisons are string comparisons 26-Dec-18 Datatypes in XML

Datatype Support in RELAX NG Two basic types, string and token differ in whitespace treatment Supports whitespace-separated lists control over types of items Supports enumerated values Supports "except" and "choice" Datatype libraries can be used test validity of value, and equality of values pass in parameter values and context information usually uses XML Schema datatype library 26-Dec-18 Datatypes in XML

Datatype Library Language Language for describing datatypes Use in RELAX NG datatype library document datatype classes (or equiv.) compiler RNG schema validator 26-Dec-18 Datatypes in XML

Datatype Library Language Use in XSLT 2.0 datatype library document extension function definitions compiler XSLT stylesheet 26-Dec-18 Datatypes in XML

Lexical Datatyping Values are sequences of characters Values have properties accessible via API properties have different types aren't necessarily independent 3pc PT2M 2003-12-19 points: 36 units: pc seconds: 120 year: 2003 month: 12 day: 19 isLeapYear: false 26-Dec-18 Datatypes in XML

Datatypes as Annotated Sets Datatypes are annotated sets of values annotations include: collations for comparisons datatype parameters datatypes define properties for values of that type abstract datatypes define only properties and constraints on those properties concrete datatypes define lexical structure of strings as well Typed value is value + datatype adds property values typed values with same collation can be compared 26-Dec-18 Datatypes in XML

Example colourNames blue red: 0 green: 0 blue: 255 … collation: colourCollation params: lightest, darkest props: red, green, blue, hue, saturation, luminance blue red: 0 green: 0 blue: 255 … 26-Dec-18 Datatypes in XML

Datatype Definitions Extensible set of tests on values valid values must pass all the tests Parsing of values via: regex with named subexpressions list definition with particular separators enumeration of values implementation-defined extension methods EBNF, PEGs, … Sets of conditions testing property values cross-property conditions testing against parameter values Negative conditions 26-Dec-18 Datatypes in XML

Datatype Mapping How value in one datatype maps to value (or properties) in another unidirectional: every source value must be mappable to target Supports casting colourNames blue RRGGBB #0000FF 26-Dec-18 Datatypes in XML

Supertyping Ease definition of types Supertype doesn't imply superset inherit properties and collation can just alter values of parameters Supertype doesn't imply superset if supertype is concrete, all values of subtype are legal values of supertype if supertype is abstract, supertyping provides map to (properties defined by) that supertype 26-Dec-18 Datatypes in XML

Example byte 160 hexByte A0 colour colourName blue RRGGBB #0000FF 26-Dec-18 Datatypes in XML

Concrete hexByte and byte Types <datatype name="byte"> <super type="integer"> <param name="min" value="0" /> <param name="max" value="255" /> </super> </datatype> <datatype name="hexByte"> <parse> <regex>[0-9A-F]{2}</regex> </parse> <map to="byte" select="my:int(substring($this, 1, 1)) * 16 + my:int(substring($this, 2, 1))" /> <collate type="byte" /> 26-Dec-18 Datatypes in XML

Abstract colour Type <datatype name="colour"> <property name="red" type="byte" /> <property name="green" type="byte" /> <property name="blue" type="byte" /> <property name="hue" type="byte" select="..." /> <property name="saturation" type="byte" select="..." /> <property name="luminance" type="byte" select="..." /> <collate> <collate select="$this.hue" type="byte" /> <collate select="$this.saturation" type="byte" /> <collate select="$this.luminance" type="byte" /> </collate> ... </datatype> 26-Dec-18 Datatypes in XML

Abstract colour Type <datatype name="colour"> ... <param name="lightest" type="colour" subtype="le" /> <param name="darkest" type="colour" subtype="ge" /> <constraint test="$type.lightest.luminance >= $type.darkest.luminance" /> <condition test="$type.lightest.luminance >= $this.luminance" /> <condition test="$this.luminance >= </datatype> 26-Dec-18 Datatypes in XML

Concrete RRGGBB Type <datatype name="RRGGBB"> <param name="lightest" type="RRGGBB" subtype="le" /> <param name="darkest" type="RRGGBB" subtype="ge" /> <super type="colour"> <param name="lightest" select="$type.lightest" /> <param name="darkest" select="$type.darkest" /> </super> <parse name="RRGGBB"> <regex ignore-whitespace="true"> #(?[RR][0-9A-F]{2}) (?[GG][0-9A-F]{2}) (?[BB][0-9A-F]{2}) </regex> </parse> <property name="red" select="hexByte($RRGGBB/RR)" /> <property name="green" select="hexByte($RRGGBB/GG)" /> <property name="blue" select="hexByte($RRGGBB/BB)" /> </datatype> 26-Dec-18 Datatypes in XML

Concrete colourName Type <datatype name="colourName"> <super type="colour" /> <parse name="colour"> <enumeration code="@name" values="document('colours.xml')/colours/colour"/> </parse> <property name="red" select="$colour/@red" /> <property name="green" select="$colour/@green" /> <property name="blue" select="$colour/@blue" /> </datatype> <colours> ... <colour name="blue" red="0" green="0" blue="255" /> </colours> 26-Dec-18 Datatypes in XML

Partial Ordering Occurs with durations and date/times (due to timezones) Use min/max collations XPath comparisons based on two-value logic true/false, rather than true/false/unknown map unknown to empty sequence (false) xs:duration('P1M') = xs:duration('P30D') <collate type="xs:decimal" select.min="my:min-seconds($this)" select.max="my:max-seconds($this)" /> 26-Dec-18 Datatypes in XML

Context Information Standard extension functions for Infoset information: inf:ns-for-prefix($prefix) returns URI inf:prefix-declared($prefix) returns boolean inf:entity-declared($entity) returns boolean … Implementations can define additional extension functions for other context information 26-Dec-18 Datatypes in XML

Complex Structured Values No built-in support for complex structures: XPointers XPaths XSLT patterns XSL-FO expressions regular expressions But implementations can provide support via extension parse methods standardise these later 26-Dec-18 Datatypes in XML

XPath Datatyping Problem Want to use XPath to express: bindings to properties conditions that have to be met by values Want expressions to be datatype aware Should be possible in XPath 2.0 we know what type each value actually is and therefore how they should be compared <condition test="$type.lightest.luminance >= $this.luminance" /> <condition test="$this.luminance >= $type.darkest.luminance" /> 26-Dec-18 Datatypes in XML

XPath Datatyping Problem (cont…) But XPath 2.0 assumes WXS datatypes datatypes need to fit into type hierarchy no mechanisms for having different collations for comparison operators defining casts to known datatypes Use functions for comparisons <condition test="dt:ge($type.lightest.luminance, $this.luminance)" /> <condition test="dt:ge($this.luminance, $type.darkest.luminance)" /> 26-Dec-18 Datatypes in XML

Status Draft spec available at: http://www.jenitennison.com/datatypes schemas also available there comments, please! No implementations yet help, please! That's it questions, please! 26-Dec-18 Datatypes in XML