Neminath Simmachandran

Neminath Simmachandran
XML Schema Neminath Simmachandran CS 486 – Spr’01

Overview: XML, a brush up Intro to Schemas Namespaces Elements, Attributes & Content model Summary History

XML Brush up It’s a Meta-Markup language
Markup language, uses tags embedded directly into the text to describe the various pieces and parts of the text. Document-Type Definition(DTD), describes sets of tags & attributes. DTD, rules by which its associated document must play. XML is a language for creating markup languages specifically geared toward one type of content. XML will make it easier for information consumers and producers to find each other. Many tasks involving search or info exchange can be automated with XML, which provides a common framework for representing information. DTDs define the markup one can use to describe the contents of a document.The DTD is a par of the document itself even when it is stored in another file, i.e., DTD s and documents aren’t two separate entities but are a single unit separated into distinct sections.

Role of the DTD Element Declarations: This specifies a single markup element. Eg: <! ELEMENT book>, this identifies an element ‘book’. Attribute list: This declares sets of attributes for a specific element. Eg: <! ATTLIST BOOK CLASS(FICTION|HORROR) > Content Model: This is part of an element declaration and describes what kind of content can be nested within an element. Types: Data, Element, Mixed content. Eg: <! ELEMENT book (title, author, publisher, isbn)>, the title, author, publisher & isbn are elements that must all be contained in that order within ‘book’ element. All the structures in a DTD are designed to describe in exquisite detail the markup that can be used by its documents. Every aspect of the markup should be specified by the DTD. The attributes that can be used by elements, what kinds of values the attributes themselves take, which attributes are required, and what the default values are for each attribute. Every tag that is used in a document will be defined by an element declaration in a DTD. Each attribute list is created to define a set of attributes for a specific element. Not every element need to have an attribute list.

Role of the DTD (conti..) Entity Declaration: This creates an entity, which is essentially an alias that associates a unique name with a group of data. Eg: <! ENTITY XML “Extensible Markup Language”>

Role of the Document Document uses the markup and guidelines specified in the DTD to describe content. Structure, - Prolog - Document element - Elements - Attributes - Content - Comment - Processing instructions Documents use the structures defined in DTDs to describe content, so many document structures have names and functions similar to those in DTDs. Documents can also have couple of structures of their own not found in DTDs. The prolog contains all the information relevant to the document other that content or markup. The document element is the top element and includes all the documents other elements and content. Elements are the main markup components and are defined in the DTD by element declarations. Elements are manifested in documents as markup tags. Attributes exist to provide additional information about the element. PI is used specifically to pass messages to the application that will be processing the XML document.

DTD - Limitations DTDs call for elements to consist of one of the three things, A text string A text string with other child elements mixed together A set of child elements Also, DTD does not have XML syntax and so XML parsers cannot parse them into component parts very easily They have a very primitive system of data types They are not modular, so its not easy to reuse parts of a DTD They are not easily extensible.

Intro to Schemas Schemas are themselves XML documents with markups, elements, attributes and comments. XML Schema system aims to provide a rich grammatical structure for XML documents that overcomes the schema limitations of the DTD. To illustrate the power of XML Schema mechanism let us the the example below, An XML document fragment , <InvoiceNo> </InvoiceNo> <ProductID>J123456</ProductID> The schema defines the elements that can appear within the document and the attributes that can be associated with an element. It also defines the structure of the document, which elements are child elements of others, the sequence in which the child elements can appear, and the number of child elements. It defines whether an element is empty or can include text. The schema can also define default values for attributes. The purpose of a schema is to define a class of XML documents, and so the term "instance document" is often used to describe an XML document that conforms to a particular schema. In fact, neither instances nor schemas need to exist as documents per se -- they may exist as streams of bytes sent between applications, as fields in a database record, or as collections of XML Infoset.

Intro to Schemas (conti..)
DTD fragment describing elements in the fragment above, <!ELEMENT InvoiceNo (#PCDATA)> <!ELEMENT ProductID (#PCDATA)> XML Schema fragment describing elements in the XML fragment, <element name='InvoiceNo' type='positive-integer'/> <element name='ProductID' type='ProductCode'/> <simpleType name='ProductCode' base='string'> <pattern value='[A-Z]{1}d{6}'/> </simpleType> First is an excerpt of an XML document, then we show these two elements declared in DTD syntax, and the last piece of code consists of the corresponding XML Schema syntax. Note that the syntax in XML Schema fragment is the same as XML syntax. Through the schema, a validating parser can verify that the element InvoiceNo is a positive integer and the element ProductID consists of one letter between A and Z followed by six digits. By contrast, a validating parser referring to the DTD can only verify that these elements are represented as strings.

Namespaces A given XML Schema defines a set of new names such as the names if elements, types, attributes, attribute groups, whose definitions and declarations are written in the schema. Need for Namespace ? A document can use names from different schema. The namespace enables us to distinguish between declarations and definitions from different vocabularies. XML Namespace form a mechanism for avoiding name conflicts in XML documents. A namespace itself has a fixed but arbitrary name that must follow the URL syntax. In the collaborative world, one person may be processing documents from many other parties and the different parties may want to represent their data elements differently. Moreover, in a single document, they may need to separately refer to elements with the same name that are created by different parties. This will arise the problem of naming conflicts. XML Schema allows the concept of namespaces to distinguish the definitions. The names defined in a schema are said to belong to its target namespace. For example, you can set the name of the namespace for the schema excerpted in the previous XML Schema fragment to be: Even though the namespace name starts with it does not refer to a file at that URL that contains the schema definition. In fact, the URL does not refer to any file at all, only to an assigned name. Definitions and declarations in a schema can refer to names that may belong to other namespaces. These namespaces are generally referred to as source namespaces. Each schema has one target namespace and possibly many source namespaces. In fact, every name in a given schema belongs to some namespace. The names for the namespaces can be fairly long, but they can be abbreviated with the syntax of xmlns declaration in the XML Schema document.

Namespace (conti..) Namespace:
Target Namespace: Names defined in a schema Source Namespace: Definitions & declaration in a schema that refers to names that belong to other namespaces. Eg: In the following piece of Schema <element name='InvoiceNo' type='positive-integer'/> <element name='ProductID' type='ProductCode'/> <simpleType name='ProductCode' base='string'> <pattern value='[A-Z]{1}d{6}'/> </simpleType> InvoiceNo, ProductID & ProductCode belong to ‘target namespace’ and can be assigned a arbitrary name that follows a URL syntax.

Namespace (conti..) Eg: ( with namespace) Fragment code: 1
<xsd:schema targetNamespace=' xmlns:xsd=' xmlns:ACC= ' <xsd:element name='InvoiceNo' type='xsd:positive-integer'/> <xsd:element name='ProductID' type='ACC:ProductCode'/> <xsd:simpleType name='ProductCode'base='xsd:string'> <xsd:pattern value='[A-Z]{1}d{6}'/> </xsd:simpleType> The schema fragment code:1 does not need to specify locations of source schema files. For the overall "schema of schemas," we need not specify a location because it is well known, it is from this the basic elements are derived. For the source namespace again we need not specify a location since it happens to be the name of the target namespace that is being defined in this file (

Namespace (conti..) Eg: ( with namespace) / Fragment code: 1
In this example, Target namespace: which contains InvoiceNo, ProductId & ProductCode. Source namespace: this has schema, element, simpleType, pattern, string & positive-integer. Also the source has been abbrevated as ‘xsd’ through ‘xmlns’ declaration.

Namespace (conti..) Eg: with multiple source namespace - Fragment code: 2 <schema targetNamespace=' xmlns=' xmlns:ACC= ' xmlns:PART= ' <import namespace=' schemaLocation=' <element name='InvoiceNo' type='positive-integer'/> <element name='ProductID' type='ACC:ProductCode'/> <simpleType name='ProductCode' base='string'> <pattern value='[A-Z]{1}d{6}'/> </simpleType> <element name='stickyGlue' type='PART:SuperGlueType'/> In the fragment code:2 one more namespace reference: This namespace is different from targetNamespace and standard namespaces. As a result, it must be imported using the import declaration element whose schemaLocation attribute specifies the location of the file that contains the schema. The default namespace is whose xmlns declaration does not have a name. Every unqualified name such as schema and element belongs to default namespace If our schema refers to several names from one namespace, it is more convenient to designate that as the default namespace. An XML instance document may refer to names of elements from multiple namespaces that are defined in multiple schemas. To refer to and abbreviate the name of a namespace, again we can use ‘xmlns’ declarations. We use the schemaLocation attribute from the XML Schema instance namespace to specify the file locations.

Elements, Attributes & Content Model
Element: It has a name and content model(defined by type) Type: Simple – cannot have elements or attributes in its value Complex – can embed elements / associate attributes There is a major distinction between definition of elements, which create new types (both simple and complex), and declaration of elements, which enable elements and attributes with specific names and types (both simple and complex) to appear in document instances Eg: User-defined simple type, <element name='age' type='integer'/> <element name='price' type='decimal'/> XML Schema spec has predefined simple types. This includes types like string, byte, integer, int, long, short, float, double, boolean, time, date, duration, name, language, ENTITY, NOTATION, NMTOKEN and many more. A derived simple type constrains the values of its base type. In the example above, we have created two user defined simple types called ‘age’ & ‘price’, derived from integer & decimal respectively. Now, if we try adding an attribute ‘currency’ to the simple element price, we can't. An element of a simple type cannot have an attribute. So if we want to add an attribute, we must define price as a complex type.

Elements, Attributes & Content Model (conti..)
Complex type: Eg: <element name='price'> <complexType base='decimal' derivedBy='extension'> <attribute name='currency' type='string'/> </complexType> </element> In XML instance document, we can write, <price currency='US'>45.50</price> --> In an XML document, an element may embed other elements. This requirement is expressed directly in the DTD. XML Schema instead defines an element, which has a type, and that type can have declarations of other elements and attributes. In our example we have defined what is called an anonymous type, where no explicit name is given to the complex type. In other words, the name attribute of the complexType element is not defined.

Elements, Attributes & Content model (conti..)
A comparison of complex data types in DTD and XML Schema: XML document, <Book> <Title>Cool XML<Title> <Author>Cool Guy</Author> </Book> DTD, <!ELEMENT Book (Title, Author)> <!ELEMENT Title (#PCDATA)> <!ELEMENT Author (#PCDATA)> XML Schema, <element name='Book' type='BookType'/> <complexType name='BookType'> <element name='Title' type='string'/> <element name='Author' type='string'/> </complexType> For the XML document, the DTD describes three elements Book, Title & Author. The XML Schema defines an element Book and declares this element to be of a complex type, BookType. The complexType element is more defined in the sense that its element types are declared as string rather than just #PCDATA. Also in the DTD, all elements are global, whereas the XML Schema shown allows Title and Author to be defined locally -- to occur only within the element Book. TO make the elements Title & Author to have a global scope they must be defined our side of the complexType declaration and referenced back by the complexType. The advantage of this is once declared, a global element or a global attribute can be referenced in one or more declarations using the ‘ref’ attribute.

Elements, Attributes & Content model (conti..)
Constraints: Schema offers greater flexibility that DTD for expressing constraints on the content model. Two constraints that are predefined in XML Schema are minOccurs, maxOccurs. Lets study this with an example, <element name='Title' type='string'/> <element name='Author' type='string'/> <element name='Book'> <complexType> <element ref='Title' minOccurs='0'/> <element ref='Author' maxOccurs='2'/> </complexType> </element> XML Schema offers greater flexibility than DTD for expressing constraints on the content model of elements. At the simplest level, as in DTD, you can associate attributes with an element declaration and indicate that a sequence of one only (1), zero or more (*), or one or more (+) elements from a given set of elements can occur in it. You can express additional constraints in XML Schema using, for example, minOccurs and maxOccurs attributes of element An element is required to appear when the value of minOccurs is 1 or more. The maximum number of times an element may appear is determined by the value of a maxOccurs attribute in its declaration. This may be a positive integer value such as 41, or the term unbounded to indicate there is no maximum number of occurrences. The default value for both the minOccurs and the maxOccurs attributes is 1. Thus, when an element is declared without a maxOccurs attribute, the element may not occur more than once. Be we must be sure that if we specify a value for only the minOccurs attribute, it is less than or equal to the default value of maxOccurs, i.e. it is 0 or 1. Similarly, if we specify a value for only the maxOccurs attribute, it must be greater than or equal to the default value of minOccurs, i.e. 1 or more. If both attributes are omitted, the element must appear exactly once. In our example above, the occurrence of Title is optional in Book. However there must be at least one, but no more than two, Author in the element Book. The element choice allows only one of its children to appear in an instance.

Delving into Simple type
Simple types like ‘string’ & ‘number’ are built into XML Schema, while others are derived from the built-in’s . New simple types can be defined by restricting an existing simple type. Eg: A new integer type whose value rangers from to 99999, <xsd:simpleType name="myInteger"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="10000"/> <xsd:maxInclusive value="99999"/> </xsd:restriction> </xsd:simpleType> We use the simpleType element to define and name the new simple type. We use the ‘restriction’ element to indicate the existing (base) type, and to identify the ‘facets’ that constrain the range of values. To define myInteger, we restrict the range of the ‘integer’ base type by employing two facets, minInclusive and maxInclusive. XML Schema defines fifteen facets. Here are some length, minLength, maxLength, pattern, enumeration, maxInclusive, max Exclusice, minInclusive, min Exclusive.

Delving into Simple type (conti..)
List Type: XML Schema has the concept of list type. List types are comprised of sequences of atomic types(integer, string, etc) and consequently the parts of a sequence themselves are meaningful and hence can be divided. XML Schema has three built-in list types, they are NMTOKENS, IDREFS, and ENTITIES. New list types can be created by derivation from atomic types, <xsd:simpleType name="listOfMyIntType"> <xsd:list itemType=“integer"/> </xsd:simpleType> And an element that conforms to listOfMyIntType is: <listOfMyInt> </listOfMyInt> Several facets can be applied to list types: length, minLength, maxLength and enumeration. For example, to define a list of exactly six integers, we can use the ‘length’ facet, restricting the size to only six items. Elements whose type is listOfMyIntType must have six items, and each of the six items must be one of the (atomic) values of the type ‘integer’.

Delving into Simple type (conti..)
Union Type: This type allows an element or attribute value to be one or more instancees of one type drawn from the union of multiple atomic and list types. Eg: This example tries to create a union type for representing American states as letter abbreviations(string) or lists of numeric codes. The zipUnion union type is built from one atomic type and one list type <xsd:simpleType name="zipUnion"> <xsd:union memberTypes=“string listOfMyIntType"/> </xsd:simpleType> Some valid instance of an element, say ‘state’, of type ‘zipUnion’ is, <state>WV</state> <state> </state>

Element Content Elements can contain, other elements
only a simple type of value elements having attributes & containing other elements attributes, but containing only a simple type of value other elements + character content no content Here the first three are basic content which are supported by DTDs as well. The last three are not supported by DTDs. These types are supported by Schema by deriving one type from another.

Element Content (conti..)
With attribute & simple value: We are trying to define a Schema that will support this, <internationalPrice currency="EUR">423.46</internationalPrice> Solution, <xsd:element name="internationalPrice"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:number"> <xsd:attribute name="currency" type="xsd:string" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element> The schema declares a ‘internationalPrice’ element that is a starting point: <xsd:element name=" internationalPrice" type="number"/> Now, how we need to add an attribute to this element, but simple types cannot have attributes. Therefore, we must define a complex type to carry the attribute declaration. We also want the content to be simple type number. So we should derive a new complex type from the simple type number. And that is what is done in the solution. We use the complexType element to start the definition of a new type. To indicate that the content model of the new type contains only character data and no elements, we use a simpleContent element. Finally, we derive the new type by extending the simple number type. The extension consists of adding a currency attribute using a standard attribute declaration.

Element Content (conti..)
Sub-elements + Character content: Construct a schema where character data can appear alongside sub-elements and character data is not confined to the deepest sub-element. We are trying to define this, <letterBody> <salutation>Dear Mr.<name>Robert Smith</name>.</salutation> </letterBody> Soultion, The elements appearing are declared normally. To enable character data to appear between the child-elements of letterBody, the ‘mixed’ attribute on the type definition is set to true. Under the XML Schema mixed model, the order and number of child elements appearing in an instance must agree with the order and number of child elements specified in the model.

Annotations We can annotate schema for the benefit of both human readers and applications. Elements available for annotation are annotation, documentation & appInfo. Eg: <xsd:annotation> <xsd:documentation xml:lang="en"> info to user goes here </xsd:documentation> </xsd:annotation> Documentation element is the recommended location for human readable material. The ‘lang’ attribute with ‘documenation’ is used to indicate the language of information. The appInfo element can be used to provide information for tools, stylesheets and other applications. Both ‘documentation’ and ‘appInfo’ appear as subelements of annotation, which may itself appear at the beginning of most schema constructions. This annotation element can appear at the beginning of an element declaration and a complex type definition.

Content Models Content model for an <elementType> can be specified by using <element> to refer to other elementTypes. XML Schema enables a group of elements to be defined and named. So that the elements can be used to build up the content models of complex types. To illustrate, we take a PurchaseOrderType definition, this element gives info about the shipping and billing address for a purchase. We use two groups so that purchase orders may contain either separate shipping and billing addresses, or a single address for those cases in which the both addresses are the same, The definitions of complex types in any schema will declare sequences of elements that must appear in the instance document. The occurrence of individual elements declared in the so-called content models of these types may be optional, as indicated by a 0 value for the attribute ‘minOccurs’, or otherwise constrained depending upon the values of ’minOccurs’ and ‘maxOccurs’. XML Schema also provides constraints that apply to groups of elements appearing in a content model.

Content Models (conti..)
<xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:choice> <xsd:group ref="shipAndBill" /> <xsd:element name="singleUSAddress“ type="USAddress" /> </xsd:choice> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items" /> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date" /> </xsd:complexType> <xsd:group name="shipAndBill"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress" /> <xsd:element name="billTo" type="USAddress" /> </xsd:sequence> </xsd:group> The ‘choice’ group element allows only one of its children to appear in an instance. One child is an inner ‘group’ element that references the named group shipAndBill consisting of the element sequence shipTo, billTo, and the second child is a singleUSAddress. Hence, in an instance document, the purchaseOrder element must contain either a shipTo element followed by a billTo element or a singleUSAddress element. Here PurchaseOrder, USAddress, comment are assumed to be previously defined.

Attribute Groups Attributes are used to provide information about each item by adding attribute declarations to the element type definition. We can also create a named attribute group containing all the desired attributes of an element, and reference this group by name in the element declaration.

Extensibility XML Schema provides three types of element models with regards to extensibility, In the open model, the content and attributes that have been declared for the element are required, but other content and attributes can be present. Authors can add their own attributes and elements to XML documents. The refinable model requires the content and attributes that have been declared for the element, and allows for those that have been explicitly declared in the refined sub-types. The closed model reflects the status quo of DTD, where additional child elements and attributes not in the element declaration are not allowed to be present.

Summary New language to describe content & structure of XML documents
In addition to all the capabilities of DTD it provides, Built-in data types as well as user defined data types Element occurrence constraints Export / import mechanism for schema constructs Extensibility Refinement, where elements can use the schema constraint of other elements

History Work started early in 1998 Requirements document by Feb 1999
Working draft by May 1999 Last Proposed Recommendation (standards) submitted as latest as Feb 2001. Most of the browser, except for IE doesn’t support XML Schema XML Schema implementation provided with Internet Explorer 5 focuses on syntactic schemas, without support for inheritance or other object-oriented design feature.

Neminath Simmachandran

Similar presentations

Presentation on theme: "Neminath Simmachandran"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neminath Simmachandran

Similar presentations

Presentation on theme: "Neminath Simmachandran"— Presentation transcript:

Similar presentations

About project

Feedback