Presentation is loading. Please wait.

Presentation is loading. Please wait.

SE 5145 – eXtensible Markup Language (XML ) XML Schema 2011-12/Spring, Bahçeşehir University, Istanbul.

Similar presentations

Presentation on theme: "SE 5145 – eXtensible Markup Language (XML ) XML Schema 2011-12/Spring, Bahçeşehir University, Istanbul."— Presentation transcript:

1 SE 5145 – eXtensible Markup Language (XML ) XML Schema 2011-12/Spring, Bahçeşehir University, Istanbul

2 3rd Assignment: Validating XML with DTD & XML Schema (page 1/2) The goal of this exercise is to understand the basic concepts of XML Schema and how it extends the capabilities of DTDs. You will use your XML Resume (CV) that you provided in Assignment 2. Task 1. XML Schema: Write an XML schema definition for your XML Resume satisfying the following requirements: For any date in your Resume XML, make sure that your XML Schema checks for a valid date value. Try to avoid xs:string as much as possible, or if you think that something really is a string, use your own string type which for example could take care of checking for a maximum length and some character set (a xs:pattern could be used to achieve the latter). Make sure that at least one of your types is used by more than one element (because reuse is good). In real-life applications, you would start to design a type library, and then start using it when constructing your schema from the ground up. Use minOccurs and maxOccurs to restrict the cardinality of some elements. See next slide.. 2

3 3rd Assignment: Validating XML with DTD & XML Schema (page 2/2) The following are recommended (but optional) for this assignment: Depending on how similar or different your employer and education entries are, try to think of a way how you could find some structural similarity between these entries and then represent this similarity using complex type derivation. Try to add a targetNamespace to your schema, so that your Resume schema now is a full-grown schema with its own namespace. Don't forget that you have to change the instance (by using the namespace there) to match the schema when you do that. Identity constraints could be used to check various aspects of the Resume, depending on what you think should be unique, a key, or a reference to an existing key. A typical example would be to have a key for institutions (educational or companies), and then have each of your skills reference this key so that you can represent where you have acquired each skill. Task 3 – Validate XML: Use a tool to validate your XML Resume (*.xml) using your XML schema (*.xsd). A suitable online tool is On the first page, provide the XML document and select Validate against external XML schema. Click Validate and provide the XML schema on the second page. Another tool is Altova XML Spy (You can use download & use a trial version) Alternatively, you can remember and use the recommended tools described by Melike ( and Erokan (iexmltls.exe, msval.vbs) from the last presentations. 3

4 XML Schemas Schemas is a general term--DTDs are a form of XML schemas According to the dictionary, a schema is a structured framework or plan When we say XML Schemas, we usually mean the W3C XML Schema Language This is also known as XML Schema Definition language, or XSD It has been introduced to overcome some of the commonly observed limitations of DTDs, most notably the lack of typing DTDs, XML Schemas, RELAX NG and Schematron are all XML schema languages 4

5 Whats Wrong with DTDs? DTDs do not support application-level datatypes XML for B2B is very data-centric and needs typing SGML was created for documents where typing was less important DTDs do not support any relationships between markup constructs content models cannot be reused attribute lists cannot be reused structural relationships cannot be exploited in the DTD DTDs provide a very weak specification language No restrictions on text content Very little control over mixed content (text plus elements) Little control over ordering of elements DTDs are written in a strange (non-XML) format You need separate parsers for DTDs and XML 5

6 Why XML Schemas? XML Schema Definition language (XSD) solves these problems XSD allows you to constrain the content of XML documents like DTDs, but they are much more powerful & sophisticated. XSD allows a much finer level of control over structure and content XSD is written using XMLsyntax instead of a custom syntax like DTDs use XML Schema's simple data type provide some semantics a formerly undescribed attribute can now be described as being a xs:date it can be understood as being a date and inserted into a calendar but what kind of date is it? a birthday? an order date? a shipping date? a question of the context of where the xs:date appears XML Schema better supports model-level information however, XML Schema also only captures part of the application semantics an XML Schema is usually better than a DTD, because it contains types types provide information about the basic datatypes being used additional semantics (e.g., different kinds of dates) must be documented elsewhere 6

7 Why not XML schemas? DTDs have been around longer than XSD Therefore they are more widely used Also, more tools support them Power of XSD comes with a price: XSD is a little harder and more verbose to write than DTDs, even by XML standards More advanced XML Schema instructions can be non- intuitive and confusing Nevertheless, XSD is not likely to go away quickly 7

8 Validation and Typing XML Schema does two things at the same time: 1. Validation checks for structural integrity (is the document schema-valid?) checking elements and attributes for proper usage (as with DTDs) checking element contents and attribute values for proper values 2. Type annotations make the types available to applications instead of having to look at the schema, applications get the Post-Schema Validation Infoset (PSVI) type-based applications (such as XSLT 2.0) can work on the typed instance 8

9 Schema-Validation and Applications 9

10 Anatomy of a Schema Schema uses the namespace defined by and usually uses xsd or xs prefix in the XML code The file extension is.xsd The root element is XSD starts like this: 10

11 Referring to a schema To refer to a DTD in an XML document, the reference goes before the root element:... To refer to an XML Schema in an XML document, the reference goes in the root element: (This is where your XML Schema definition can be found)... 11

12 TYPES A type is a set of values the values can be enumerated (home, mobile, office) the values can be described by extension (intervals, regular expressions) DTDs have (almost) no types element content is always #PCDATA (any number of any characters) attributes most often are CDATA (any number of any characters) attributes may have enumerated types (but no extensional types) attributes may use ID/IDREF 12

13 TYPES DTD XML Schema Conceptssome conceptual model (formal/informal) TypesID/IDREF and (#P)CDATAHierarchy of Simple and Complex Types Markup Constructs Element Type Declarations <!ELEMENT order... Element Definitions... Instances (Documents) [ order content ] 13

14 Simple and complex elements A simple element is one that contains text and nothing else A simple element cannot have attributes A simple element cannot contain other elements A simple element cannot be empty However, the text can be of many different types, and may have various restrictions applied to it If an element isnt simple, its complex A complex element may have attributes A complex element may be empty, or it may contain text, other elements, or both text and other elements 14

15 Simple types Simple types describe values not structured by XML markup they describe attribute values (date="2006-10-03") they describe element content ( +1-510-6432253 ) Simple types can be used for elements or attributes XML Schema treats contents in elements and attributes equally simple type libraries can be designed independent of their eventual use Simple types are available in three flavors atomic types: one value of one type (one number in some range) union types: one value of a union of types (a number or the string undefined) list types: a whitespace-separated list of values (phone type="home office") 15

16 Named vs. Anonymous Types can be named or anonymous named types have a name and can be referenced (and thus be reused) anonymous types have no name and can only be used where they are defined 16

17 Type Definitions Simple types are sets of values named simple types are sets of values with a name (and thus reusable) anonymous simple types are sets of values defined where they are needed Simple types are defined to represent model-level information in most cases, they will have restrictions associated with them they may also simply be tags for semantics (fax and phone numbers share the same value space) XML Schema has a library of built-in datatypes ur-types are the conceptual grounding of all types primitive types are the types that are there by definition derived types are based on primitive types users can derive their own types using simple type restriction 17

18 Type Hierarchy 18

19 Built-In Types 19

20 Declaring Elements with Schema Elements can be declared as having a simple or complex type Types can be either built-in or defined by your Schema Elements can also have mixed, empty, or element content, just like in DTDs Elements can be given a minimum and maximum number of times that they are allowed to occur 20

21 Declaring Elements with Schema 21

22 Defining a simple element A simple element is defined as xs:element name="name" type="type" minoccurs/maxoccurs="number/unbounded" /> where: name is the name of the element the most common values for type are xs:booleanxs:integer xs:datexs:string xs:decimalxs:time minoccurs and maxoccurs are optional, default value= 1 Other attributes a simple element may have: default="default value" if no other value is specified fixed="value"no other value may be specified 22

23 Custom Simple Types with Restrict You can define your own custom simple types by deriving them from existing simple types with restriction. the base type must be a simple type the derived type will be a simple type all simple types form a tree, rooted as the anySimpleType Restriction are based on facets each restriction can use 0-n facets facets can be refined in further simple type restrictions XML Schema designers should try to restrict types as much as possible – WHY ? 23

24 Restrictions The general form for putting a restriction on a text value is: (or xs:attribute )... the restrictions... For example: 24

25 Facets Facets define a certain way of restricting a simple type Facets may be repeated in different levels of the type hierarchy Not all facets are applicable to all types the applicability depends on the primitive type being used 25

26 Restrictions on numbers minInclusive -- number must be the given value minExclusive -- number must be > the given value maxInclusive -- number must be the given value maxExclusive -- number must be < the given value totalDigits -- number must have exactly value digits fractionDigits -- number must have no more than value digits after the decimal point 26

27 Restrictions on strings length -- the string must contain exactly value characters minLength -- the string must contain at least value characters maxLength -- the string must contain no more than value characters pattern -- the value is a regular expression that the string must match whiteSpace -- not really a restriction--tells what to do with whitespace value="preserve" Keep all whitespace value="replace" Change all whitespace characters to spaces value="collapse" Remove leading and trailing whitespace, and replace all sequences of whitespace with a single space 27

28 Patterns Patterns restrict the lexical space of simple types most other facets restrict the value space (e.g., intervals of numbers) in many cases, patterns are useful additions to value-oriented facets Patterns are regular expressions they support many common regex constructs and Unicode the language pattern allows de, de-CH, and other tags the pattern checks for lexical correctness, not against a code list ([a-zA-Z]{2}|[iI]-[a-zA-Z]+|[xX]-[a-zA-Z]{1,8})(-[a-zA-Z]{1,8})* 28

29 Facet Limitations Facets limit one dimension of a type's value space using pattern, the lexical space can also be restricted restrictions should be made as specific as possible no limitations are possible beyond the predefined facets There is no connection to the context within the document facets cannot make references to other values (e.g., neighboring attributes) Additional constraints should be documented documentation enables applications to implement constraint checking other schema languages (like Schematron) may be used to express these constraints 29

30 Enumeration An enumeration restricts content to allowable choices Example: 30

31 Simple Type Examples 31

32 What is a Complex Type ? Complex types describe the allowed element content they describe what the element may contain (the element's content model) they describe the attributes that an element may have (the element's attribute list) Complex types do not define the element name they define which content is allowed for the element the element definition uses the complex type to define the allowed element content Complex types have similar properties to simple types they can be named or anonymous Complex Type Derivation can be used to construct a type hierarchy 32

33 Declaring Complex Elements To declare the elements with complex type: Use the xsd:anyType value for the type attribute Use the tag in the definition Structure:... information about the complex type... Remember that attributes are always simple types 33

34 Complex elements Example: says that elements must occur in this order 34

35 Complex Type Example 35

36 Complex Types & Content Types Complex types can have different kinds of content simple content refers to simple type content using additional attributes complex content is anything else (anything beyond simple type content) Complex Type Derivation heavily depends on this classification 36

37 DTD Content Models Defining Elements in DTDs uses a compact syntax XML Schema supports the same facilities with a more verbose syntax XML Schemas adds features which DTDs do not support DTDs allow elements to be mandatory, optional, repeatable, or optional and repeatable XML Schema allows the cardinality to be specified DTDs allow sequences (,) and alternatives (|) XML Schema introduces a (very limited) operator for all groups Apart from the syntax, XML Schema content models are not very different 37

38 Empty Content DTDs have a special keyword for empty elements instead of the content model, the keyword EMPTY is used empty elements may still have attribute lists associated with them XML Schema empty types are defined implicitly there is no explicit keyword for defining an empty type if a type has no model group inside it, it is empty (it still may have attributes) Declaring empty elements 38

39 Mixed Content DTDs define mixed content by mixing #PCDATA into the content model DTDs always require mixed content to use the form ( #PCDATA | a | b )* the occurrence of elements in mixed content cannot be controlled XML Schema defines mixed content outside of the content model the content model is defined like an element-only content model the mixed attribute on the type marks the type as being mixed Example: (only one subtitle is allowed, why ?) 39

40 Mixed Content XML Schema mixed content can use all model groups it is possible to constrain element occurrences in the same way as in element-only content in practice, this feature is rarely used (mixed content often is very loosely defined) 40

41 Defining an attribute Attributes are always declared as simple types Any of the simple types that can be used for elements can also be used for attributes. An attribute is defined as where: name and type are the same as for xs:element 41

42 Defining an attribute Other attributes a simple element may have: default="default value" if no other value is specified fixed="value" no other value may be specified use="required" attribute must be present use="optional" attribute is not required (default) use="prohibited" attribute can not be used Example: <xsd:attribute name="city" type="xsd:string" use="optional" default="istanbul"/> 42

43 Adding attributes to the elements Adding attributes to an element that has an empty content model 43

44 Adding attributes to the elements Adding attributes to an element that only has character data content 44

45 Adding attributes to the elements Adding attributes to an element that have element or mixed content models 45

46 Global and local definitions Elements declared at the top level of a are available for use throughout the schema Elements declared within a xs:complexType are local to that type Thus, in the elements firstName and lastName are only locally declared The order of declarations at the top level of a do not specify the order in the XML data document 46

47 Declaration and use So far weve been talking about how to declare types, not how to use them To use a type we have declared, use it as the value of type="..." Examples: Scope is important: you cannot use a type if is local to some other type 47

48 Declaring elements with element content Sequence: child elements must appear in order All: child elements can occur in any order Choice: any one of the child elements from a list 48

49 sequence child elements must appear in a specific order: 49

50 xs:all Child elements can appear in any order Despite the name, the members of an xs:all group can occur once or not at all You can use minOccurs="n" and maxOccurs="n" to specify how many times an element may occur (default value is 1 ) In this context, n may only be 0 or 1 50

51 xs:choice Aany one of the child elements from a list 51

52 Referencing Once you have defined an element or attribute (with name="..." ), you can refer to it with ref="..." Example: Or just: 52

53 Text element with attributes If a text element has attributes, it is no longer a simple type 53

54 Empty elements Empty elements are (ridiculously) complex 54

55 Mixed elements Mixed elements may contain both text and elements We add mixed="true" to the xs:complexType element The text itself is not mentioned in the element, and may go anywhere (it is basically ignored) 55

56 Extensions You can base a complex type on another complex stuff... 56

57 Predefined string types Recall that a simple element is defined as: Here are a few of the possible string types: xs:string -- a string xs:normalizedString -- a string that doesnt contain tabs, newlines, or carriage returns xs:token -- a string that doesnt contain any whitespace other than single spaces Allowable restrictions on strings: enumeration, length, maxLength, minLength, pattern, whiteSpace 57

58 Predefined date and time types xs:date -- A date in the format CCYY-MM-DD, for example, 2002-11-05 xs:time -- A date in the format hh:mm:ss (hours, minutes, seconds) xs:dateTime -- Format is CCYY-MM-DDThh:mm:ss Allowable restrictions on dates and times: enumeration, minInclusive, maxExclusive, maxInclusive, maxExclusive, pattern, whiteSpace 58

59 Predefined numeric types Here are some of the predefined numeric types: Allowable restrictions on numeric types: enumeration, minInclusive, maxExclusive, maxInclusive, maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace xs:decimalxs:positiveInteger xs:bytexs:negativeInteger xs:shortxs:nonPositiveInteger xs:intxs:nonNegativeInteger xs:long 59

60 Practice.. 60

Download ppt "SE 5145 – eXtensible Markup Language (XML ) XML Schema 2011-12/Spring, Bahçeşehir University, Istanbul."

Similar presentations

Ads by Google