Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML I. XML Meta-language HTML and XML, which are applications based on SGML (standard generalized markup language), use tags (markup) to represent information.

Similar presentations


Presentation on theme: "XML I. XML Meta-language HTML and XML, which are applications based on SGML (standard generalized markup language), use tags (markup) to represent information."— Presentation transcript:

1 XML I

2 XML Meta-language HTML and XML, which are applications based on SGML (standard generalized markup language), use tags (markup) to represent information to both human and machines, and are meant to work on any devise and system In contrast to HTML (hypertext markup language) which provides a fixed set of predefined formatting tags (e.g., for font face, size, and color; lists) to display a document, XML (extensible markup language) allows users to define their own tags to structure a document XML separates a document’s content from its format through structural information by describing all parts of the information (i.e., elements), defining their relationships, and constraining the values that they can take

3 Syntax While some HTML tags do not have to have an end tag (e.g., the paragraph tag does not need a tag), the XML tags must all have an end tag In addition to this different requirement for XML, nesting of XML tags must be in sequence, e.g., the nesting sequence. For example the following is invalid: Mississippi The valid form is: Mississippi

4 Root element All elements of an XML document must be contained in a root element, and the whole document is structured as a tree Each transitively nested element is a child of it enclosing, parent element, which is the child of its grandparent of the parent element, and so on Name of the XML tags are case sensitive, i.e., ocean and Ocean are considered to be two different tags.

5 Aquifer is the root element xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation= "file:/C:/...XML_schemas/Aquifer.xsd" Floridan Confined

6 Markup Languages XML is a standard meta-language in the sense that it allows community of users to make markup language for their domain of interest (e.g., hydrogeology markup language; structural geology markup language) These markup languages are domain-specific vocabularies that include terms (e.g., Fault, Mineral) put into a structure based on their relationships, as understood by the community, and permissible values One main use of XML, in addition to making it possible to make domain-specific markup languages, is automatic data exchange among different applications

7 XML Applications Whereas HTML is mostly about appearance for web site publishing, XML concentrates on structuring the content of documents There are two types of XML applications based on purpose: document and data The document application manipulates information for human consumption (e.g., for publishing) The data application manipulates information for automatic software processing Because XML expresses the structure of a document, it can be automatically converted to different formats, and delivered via a variety of media

8 Standards XML has many so called companion standards. These standards include XSL (XML Stylesheet Language) and CSS (Cascading Stylesheet Language) which are two style sheet languages for XML, allowing conversion of XML code into HTML or other formats These languages are used for rendering XML on different media (screen, paper) Document Object Model (DOM) and Simple API for XML (SAX) are the APIs to access XML documents by browsers, editor, etc Other standards include XLink and XPointer which allow relationships among documents Namespace standard provides a global scope to the elements, allowing reuse of existing elements into new ones without naming conflicts

9 XML Code To write XML code, you need an XML editor such as XML Spy or Oxygen. These editors also allow you to write style sheets and use other standards XMetal is the easiest one, which hides the tags and works like a word processor Other editors include XML notepad which is freely available from msdn.microsoft.com, and XML Pro (www.vervet.com) To view XML code on the Web you need an XML browser Many browsers support XML As a standard, XML allows exchanging and publishing information by providing mechanisms to define the structure (syntax) of the content Note that XML is about syntax, and does not provide any semantics about the document. That’s why RDF, RDFS, and OWL were created to convey meaning of the contents

10 Unicode XML uses the Unicode standard for characters in a document. Unicode allows 16 bits (UTF-16) per character, which can handle characters of all human (natural) languages in an XML document UTF-16 uses way more than the limited 256 characters provided by the Windows default character set which only needs 8 bits per character (UTF-8) The UTF-16 encoding is very important for information globalization, for example to write the name of an element (e.g., ocean) in an XML document in different natural languages (e.g., French, Chinese)

11 Latin-1 character set Other encodings must be put in the ‘XML declaration element’ For example, the second attribute in the following XML processing instruction (what is between ) allows the use of the Latin-1 character set for European languages:

12 This allows using accents (e.g., î), among other special characters. Character reference can be used to put Unicode character to write foreign words with accents Benoît Mandelbrot’s name can be written as: Benoît Mandelbrot ‘238’ is the ASCII decimal character number for ‘î’, which could be checked in the MS Word’s character map

13 Predefined entities Predefined entities are used to escape the delimiters in elements or attributes The following are common predefined entities: < for the ‘less than’ sign, &apos; for the apostrophe sign, ‘ " for the quotation mark, “ The XML parser reading a document with these entities substitutes the value for each entity. An XML parser recognizes the start tag of any element with the ‘ ’ character. It also recognizes the end tag by the ‘ ’ characters

14 Entities cont’d Because the characters are reserved, we cannot use them in the content of elements without using the entities This means that we can only use the ‘ ’ for ‘greater than’ characters in the content of an XML document with entities such as < and >, respectively Forward slash does not create a problem on its own, but as ‘</’, it needs to be escaped with </ For example the content of the element: x will cause an error because the parser will interpret the ‘<’ sign after x as the beginning of a new tag, which is not closed The correct way of expressing it is: x<y

15 … We can declare our own entities using the following construct:, where ‘substitute’ is the replacement content which the parser puts for the entity when it processes the document We can reference the entity in a document with the ‘&entityname;’ just like the predefined entities The &blurb; and &cc; are good examples of the user-defined entities Notice that the ampersand (&) and semicolon (;) are used by the parser to mark out the entity name

16 XML:lang Because XML does not allow space to be inserted in an element or attribute name, we must use XML:space, which can be set to preserve, to force spacing if it is desired (equivalent to HTML’s tag) XML:lang allows setting the language for the content of a document, for example XML:lang=”en-US” specifies the language to be American English

17 Element Structure XML documents are text, and include markup; which is enclosed in angle brackets (<>), and character data which lie between the markup, e.g., France Here, ‘France’ is an instance of the country element. Together, the markup (tags) and the element content constitute the element The content of the element (e.g., France) is between the start tag (e.g., ) and the end tag which starts with a slash, e.g., The start tag gives a name (generic identifier) and defines the element type (e.g, Country). Descriptive tag names (e.g., Rock, River) may provide an informal semantics to humans, but not to software

18 Element content Elements can have the following types of content: element content: i.e., other sub-elements, character data: i.e., text, mixed content: text and other sub-elements, and no text or element If an element has no text or element content, it is an empty element, and is denoted only with the end tag, by putting the slash at the end Information in an empty element is carried in its attributes Empty elements can be used for things that may not have content, for example, a document for minerals may have an element called formula, which may not be known by users

19 Naming rule Valid element name, which are case sensitive, should start with a letter or underscore, and can include letters (including those with accents), numbers, underscore, dot, or hyphen, but no spacing The following are all valid XML names:,,, and Examples of invalid names are:,,,, and

20 … The ‘XML’ string, in any form, cannot start a name, and characters other than the ones mentioned above cannot be included in the name Colon (:), however, can but should not be used because namespaces use them in their declaration (see below) In this course, for the sake of consistency and distinction, the names of elements are in capital CamelCase, whereas those of the attributes are in lower case camelCase

21 Tree structure An element can have child elements nested in it in a tree structure The topmost element in a document is called the root element, which is the parent of all the child elements, i.e., all elements must be nested in this mandatory element For example, the ‘aquifer’ element is the root element in the following instance document The content between ‘ ’ is a comment, and is written for people, and is ignored by the XML parser

22 Aquifer Element Notice that the ‘name’ and ‘type’ elements defined for the Aquifer element can be defined as attributes because they have a simple structure

23 Attributes Whereas all XML documents have at least one element (the root element), the elements can have zero or more attributes Data in an XML document may be stored in the elements and/or the attributes of these elements. Attributes are characteristic of the elements that add more information to the content of the elements without modifying the structure of the document Always ask if an attribute can be an element or a part of an element If it is atomic, i.e., cannot be broken into smaller information fragments, then it should remain an attribute, it should be an element if it could have its own sub-elements or attributes For example, it does not make sense to make the structure of a rock an attribute, because structure is a complex entity, which can have many elements with sub-elements and attributes. The density of a mineral, could be an attribute if it only takes one value

24 The attributes are given names and values, which are enclosed in single or double quotation marks, in the start tag of the element For example, the Mineral element may have a Boolean monomineralic attribute, which can take a true or false value Notice the required (single or double) quotation marks that enclose the names in the W3C XSD schema If an element has more than one attribute, they must be uniquely named, and separated with space in the start tag Empty elements can have attributes

25 Mineral element

26 Structure of XML documents XML instance documents are text files with the.xml extension W3C XSD schemas have a.xsd extensions The instance document starts with an XML declaration The declaration is put between, and has three attributes: version, which is required and currently has the “1.0” value, encoding, that optionally defines the character set for the document, with a default value of “UTF-8”, and the standalone which has a default value of ”Yes”, meaning that it does not have a schema or DTD for processing Standalone’s value can optionally be set to “no” if the document needs a DTD or schema, for example:

27 Namespace XML allows different communities of scientists (e.g., oceanography and atmospheric science) to independently develop their own markup languages More often than not, there is a need to integrate these autonomously developed vocabularies into other applications It is very common for two markup languages to contain elements that have the same name, but constructed in different element structures For example, suppose that the mineralogy and sedimentology vocabularies, developed separately by the Mineralogy and Sedimentology communities, both include a same element named ‘Mineral’, albeit with different structure as given in the following XML document code snippet

28 Mineral instance document orthoclase silicate KAlSi3O6 flesh 6 quartz grain silicate angular

29 Avoiding name conflict The two ‘Mineral’ elements have different sets of sub-elements, and mean different things for the human users (not to software, though!) This is not a problem as long as the two ‘Mineral’ elements are only used locally in their respective domains If the two vocabularies are shared by an application, there will be a name conflict because of the two differently structured ‘Mineral’ elements The XML parser and processor would not know which is which, and will through an error Namespace prevents this kind of name collision by assigning the elements with the same name (e.g., Mineral), which belong to different communities, to different URIs that reference these communities The namespace will tell the parser that the similar elements belong to different namespaces

30 Namespace prefix Declaration of a namespace is done by the xmlns attribute, which allows choosing both an optional prefix, and a URI for the namespace The prefix and the URI are put in the outermost element where we want to use the namespace, using the format:, e.g., The MyElement, in this case, is the one for which we are declaring the namespace, and the URI is the identifier for the namespace

31 Default namespace If the optional prefix is not provided, then we have a default namespace The syntax for default namespace is as follows: for example: In such a case, all elements in the current document are members of the default namespace. If we intend to share a document with others, it is a good idea to declare a default namespace The URI identifier in this case can be used later by other people if they want to integrate it with their own vocabulary They can later assign a prefix for this URI Assume that we want to integrate the markups of mineralogy and sedimentology, and both have an element called Mineral. In this XML document, we are declaring the ‘min’ prefix for the mineralogy, and ‘sed’ prefix for the sedimentology markup:

32 <Lithology xmlns:min = “http://www.geology.org/minerlogy” xmlns:sed = “http://www.geology.org/sedimentology” orthoclase silicate KAlSi3O8 flesh 6 quartz grain silicate angular


Download ppt "XML I. XML Meta-language HTML and XML, which are applications based on SGML (standard generalized markup language), use tags (markup) to represent information."

Similar presentations


Ads by Google