Presentation on theme: "XML I. Learning Objectives What is XML Features of XML Uses of XML Structure of an XML document Document Type Declaration Document Type Definitions (DTDs)"— Presentation transcript:
Learning Objectives What is XML Features of XML Uses of XML Structure of an XML document Document Type Declaration Document Type Definitions (DTDs)
What is XML? XML means Extensible markup language. It is NOT a version of HTML Derived from SGML (Standard Generalized Mark-up language, which was established in 1986 as a standard for generalized electronic document exchange. Has 3 main features: structure, extensibility and validation. XML defines a framework for transmitting structured data, hence an XML document is essentially a structured document for storing information. Allows creation of custom mark-up tags for describing virtually anything. XML documents are processed by an XML processor.
Uses of XML Applied use of its capability of storage, and exchange of structural data between applications, that constitute the core of systems). Examples of XML applications are Chemical Markup Language (CML), Extensible Financial Reporting Markup Language (XFRML), and Mathematical Markup Language. Used in e-commerce to store, and transmit product, and other data, including financial information. Used in Open Financial eXchange. Used in search engines to store, and search data. Applied use in virtually every sector.
By including, or referencing a Document type definition (DTD), XML documents can be validated. XML Syntax Fundamentals XML syntax describes the constructs used to define the structure and layout of an XML document, as well as the constraints involved. An XML processor is a software module that reads an XML document, and provides access to its content and structure. XML processors typically process documents on behalf of applications, and are readily available as software plug-ins. IE 5.0 is an e.g. of an XML application that processes and displays XML documents.
Entity: The basic building block of an XML document. Contains either parsed or unparsed data. Parsed data consists of characters that are considered as character data or mark-up, and are processed by an XML processor. Unparsed character is handled as raw text and is not processed. E.g. John, and are mark-up, while John is character data. Markup: Used to provide a description of a documents storage structure (entities) and logical structures (elements). Elements: Describe the logical structure. They have start tags e.g. and end tags ( ), or a single empty tag ( ).
XML mark-up components include: 1.Tags: Most obvious component in XML syntax, used to describe elements. 2.Processing instructions: Passed by the parser to the application. Begin with. E.g indicates that the document is based on xml version 1.0 3.Document type declarations: Used to specify information about the document, including the documents root element, and the Document Type Definition (DTD). Must appear after the XML declaration, but before the root element e.g addressbook declared in line 2 must correspond to in line 3, the root element of the document.
4.Entity references: Used to assign aliases to pieces of data. They are made within an ampersand (&) and a colon (;). E.g. ' corresponds to an apostrophe () while & corresponds to &. 5.Comments: Used to present information that is technically not part of the documents content. Begin with 6.Marked (CDATA) Sections: Used to block off text that is to be sidestepped by the parser. Defined by enclosing it in within. E.g. John ]]. In this example, the name element is not recognized as mark-up and John is not recognized as parsed character data. It is common to use CDATA sections to quote a piece of XML code, e.g. in a tutorial.
Styling XML for display Accomplished in 2 ways: With the use of CSS. With XSL. More complex and advanced than CSS Parsing XML Can be validating or non-validating. Validating parsers validate XML documents against a DTD or XML Schema. E.g.s of XML parsers are The Lark and Larval XML parsers for Java, Suns Project X Parser for Java, IBMs XML Parser for Java, Oracle XML parser for Java, IBMs XML Parser for C++.
Example of an XML Document Tony Benn 210 Temple road London NW9 0RT 02082049565 Peter Bloggs
230 The Vale London NW6 2BT 02082029517 The above example is a well-formed XML document used to store contact information. However, it is not valid yet! Note that the root element ( ) has nested child elements that are defined with opening and closing tags respectively.
XML Data Modelling Involves describing the structure of XML documents, for the purpose of validation. After defining a data model, you can create structured XML documents that must adhere to that model, to be valid. Valid vs Well-formed XML: It is perfectly legal to create an XML document without a data model, in which case the document could be considered well-formed, but is not valid. There are 2 approaches to creating data models: DTDs (Document Type Definitions) and XML Schemas The data model (DTD or XML Schema) defines the arrangement of mark-up and character data within a valid XML document, i.e. the order of nesting of the elements.
Modelling Data with DTDs DTDs (Document Type Definitions) rely on specialized syntax for describing the structure of XML vocabulary (class of document). DTDs can be broken down into 2 subsets: Internal or Local DTD: Mark-up declarations are contained in the prolog (section of document preceding the root element) of the same document. External DTD: External mark-up declarations that can be referenced by one or more documents. The 2 subsets may be combined, with Internal having higher precedence. The DTD declares every element, attribute and entity used in the XML document. It must be declared, or referenced in the document type declaration.
Example: Addressbook.dtd Tony Benn 210 Temple road London NW9 0RT 02082049565
Document type declaration syntax: where rootElem is the root element, ExtDTDRef is the External DTD reference, and InternalDTDDecl is the Internal DTD declaration. Illustration: Lord of the rings External DTDs are more commonly used, and are especially useful when you are creating multiple documents of the same class; when you would like to use an existing DTD; or to make your document as concise as possible.
Internal DTDs are preferable in situations where youre creating only one document, or to reduce the overhead associated with your documents. Elements and Attributes The primary contents described in a DTD are elements and attributes. Think of an element as a logical unit of information, and an Attribute as a characteristic of that information. By looking at a document as a group of information objects, it is usually possible to associate each object with an element. Any leftover information would usually be represented as attributes. Another approach is to consider the type of information and how it will be used.
Attributes provide tighter constraints on information, while elements on the other hand, are very loosely constrained and are better suited for long strings of text. Attributes can be constrained against a predefined list of values, and can have default values. Attributes are very concise, and are easier to parse. They however can not contain nested information. Elements Declared with element declarations in the DTD. Syntax: ElementName corresponds to the tag used to mark up that element in the XML document. Type specifies the content. 4 types are supported in XML:
1.Empty types: The element doesnt contain any content, but may contain attributes. In the DTD, they are declared in the form: E.g Empty elements are defined in the XML document in 2 ways: a) with no space in between e.g. b)with an empty tag e.g or 2.Element only type: The element only type contains child elements. Denoted by The content model is specified using a combination of special element declaration symbols and child element names. The symbols represent the relationship of the child, to the container element.
Table of Special Symbols SymbolUsage Parentheses (())Enclose a sequence or choice group of child elements Comma (,)Separates the items in a sequence and establishes the order in which they must appear. Pipe (|)Separates items in a choice group of elements. No symbolImplies that the child element must appear exactly once Question mark (?)Child element must appear only once or not at all Asterisk (*)Child element can appear any number of times Plus sign (+)Must appear at least once Example:
Mixed Elements Contain both character and child elements. The simplest mixed element is that declared to contain only character data. Take the following form:. E.g. ANY Elements The ANY element, so named because it is declared with the symbol ANY, can contain any type of element, or a combination of elements. Due to its lack of structure, you should avoid using it. Typically used during development of a DTD, but should not appear in a production DTD. Form:
Attributes Used to specify additional information about elements. Within an element, attributes are used to form name/value pairs that describe a particular property of the element. Declared in a DTD with attribute list declaration which take the form: There are 4 types of default types that can be specified: #REQUIRED: The attribute is required #IMPLIED: The attribute is optional #FIXED value: The attribute has a fixed value default: The default value of the attribute #REQUIRED implies that the attribute is required, and you must define that attribute if you use the element.
Attribute Type Must be specified, in addition to the attribute default value. XML supports 10 attribute types: CDATA- Unparsed character data Enumerated: Series of string values NOTATION: A notation declared somewhere else in the DTD ENTITY: An external binary entity ENTITIES: Multiple external binary entities separated by whitespace. ID: A unique identifier IDREF: Reference to an ID declared somewhere else in the DTD IDREFS: Multiple references to IDs declared somewhere else in the DTD NMTOKEN: A name consisting of XML token characters (letters, numbers, periods, dashes, colons and underscores). NMTOKENS: Multiple names consisting of XML token characters.
String Attributes Most commonly used attribute Example: In the above example, the team to which a player belongs is a required character data attribute that must be defined in the player element. would have made the team optional. Another example:
"name": "String Attributes Most commonly used attribute Example: In the above example, the team to which a player belongs is a required character data attribute that must be defined in the player element.",
"description": "would have made the team optional. Another example: