Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML & XML Schema Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.

Similar presentations


Presentation on theme: "XML & XML Schema Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology."— Presentation transcript:

1 XML & XML Schema Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology

2 Semantic web - Computer Engineering Dept. - Spring 2007 2 Outline Markup Languages –SGML, HTML, XML XML Building Blocks XML Applications Namespaces XML Schema

3 Semantic web - Computer Engineering Dept. - Spring 2007 3 SGML(ISO 8879) S tandard G eneralized M arkup L anguage The international standard for defining descriptions of structure and content in text documents Interchangeable: device-independent, system-independent tags are not predefined Using DTD to validate the structure of the document Large, powerful, and very complex Heavily used in industrial and commercial usages for over a decade

4 Semantic web - Computer Engineering Dept. - Spring 2007 4 HTML(RFC 1866) H yper T ext M arkup L anguage A small SGML application used on web (a DTD and a set of processing conventions) Only uses a predefined set of tags

5 Semantic web - Computer Engineering Dept. - Spring 2007 5 What is XML? eXtensible Markup Language A simplified version of SGML Maintains the most useful parts of SGML Designed so that SGML can be delivered over the Web More flexible and adaptable than HTML XHTML: a reformulation of HTML 4 in XML 1.0XHTML

6 Semantic web - Computer Engineering Dept. - Spring 2007 6 HTML vs. XML HTML is used to mark up text so it can be displayed to users. XML is used to mark up data so it can be processed by computers. HTML describes both structure (e.g.,, ) and appearance (e.g.,, ) XML describes only content, or “meaning” HTML uses a fixed, unchangeable set of tags. In XML, you make up your own tags.

7 Semantic web - Computer Engineering Dept. - Spring 2007 7 HTML vs. XML (2) HTML is for humans –HTML describes web pages –You don’t want to see error messages about the web pages you visit –Browsers ignore and/or correct as many HTML errors as they can, so HTML is often sloppy XML is for computers –XML describes data –The rules are strict and errors are not allowed In this way, XML is like a programming language –Current versions of most browsers can display XML However, browser support of XML is spotty at best

8 Semantic web - Computer Engineering Dept. - Spring 2007 8 XML-related technologies DTD (Document Type Definition) and XML Schemas are used to define legal XML tags and their attributes for particular purposes XSLT (eXtensible Stylesheet Language Transformations) and XPath are used to translate from one form of XML to another SAX (Simple API for XML)

9 Semantic web - Computer Engineering Dept. - Spring 2007 9 XML Building blocks - Elements Delimited by angle brackets Identify the nature of the content they surround General format: … Empty element: XML Elements have Relationships –Elements are related as parents and children Elements have Content –Elements can have different content types: Element, mixed, Simple, empty

10 Semantic web - Computer Engineering Dept. - Spring 2007 10 XML Building blocks - Attributes Name-value pairs that occur inside start-tags after element name, like: Provide additional information about elements that often is not a part of data. Attributes and elements are somewhat interchangeable Should I use an element or an attribute? Example using just elements: David Matuszek Example using attributes: metadata (data about data) should be stored as attributes, and that data itself should be stored as elements

11 Semantic web - Computer Engineering Dept. - Spring 2007 11 XML Building blocks - Entities Five special characters must be written as entities: & for & (almost always necessary) < for < (almost always necessary) > for > (not usually necessary) " for " (necessary inside double quotes) &apos; for ' (necessary inside single quotes) These entities can be used even in places where they are not absolutely required. These are the only predefined entities in XML.

12 Semantic web - Computer Engineering Dept. - Spring 2007 12 XML Building blocks - Declaration The XML declaration looks like this: –The XML declaration is not required by browsers, but is required by most XML processors (so include it!) –If present, the XML declaration must be first--not even whitespace should precede it –Note that the brackets are –version="1.0" is required (this is the only version so far) –encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or something else, or it can be omitted –standalone tells whether there is a separate DTD

13 Semantic web - Computer Engineering Dept. - Spring 2007 13 XML Building blocks - Processing instructions PIs (Processing Instructions) may occur anywhere in the XML document (but usually first) A PI is a command to the program processing the XML document to handle it in a certain way XML documents are typically processed by more than one program Programs that do not recognize a given PI should just ignore it General format of a PI: Example:

14 Semantic web - Computer Engineering Dept. - Spring 2007 14 XML Building blocks - Comments Comments can be put anywhere in an XML document Comments are useful for: –Explaining the structure of an XML document –Commenting out parts of the XML during development and testing The character sequence -- cannot occur in the comment Comments are not displayed by browsers, but can be seen by anyone who looks at the source code

15 Semantic web - Computer Engineering Dept. - Spring 2007 15 CDATA By default, all text inside an XML document is parsed You can force text to be treated as unparsed character data by enclosing it in Any characters, even & and <, can occur inside a CDATA Whitespace inside a CDATA is (usually) preserved The only real restriction is that the character sequence ]]> cannot occur inside a CDATA CDATA is useful when your text has a lot of illegal characters (for example, if your XML document contains some HTML text)

16 Semantic web - Computer Engineering Dept. - Spring 2007 16 XML Syntax All XML elements must have a closing tag XML tags are case sensitive All XML elements must be properly nested All XML documents must have a root tag Attribute values must always be quoted With XML, white space is preserved With XML, a new line is always stored as LF Comments in XML:

17 Semantic web - Computer Engineering Dept. - Spring 2007 17 Well-formed XML Every element must have both a start tag and an end tag, e.g.... –But empty elements can be abbreviated:. –XML tags are case sensitive –XML tags may not begin with the letters xml, in any combination of cases Elements must be properly nested, e.g. not bold and italic Every XML document must have one and only one root element The values of attributes must be enclosed in single or double quotes, e.g. Character data cannot contain < or &

18 Semantic web - Computer Engineering Dept. - Spring 2007 18 Displaying XML XML documents do not carry information about how to display the data We can add display information to XML with –CSS (Cascading Style Sheets) –XSL (eXtensible Stylesheet Language) --- preferred

19 Semantic web - Computer Engineering Dept. - Spring 2007 19 XML Applications (1) Separate data XML can Separate Data from HTML Store data in separate XML files Using HTML for layout and display Using Data Islands Data Islands can be bound to HTML elements Benefits: Changes in the underlying data will not require any changes to your HTML

20 Semantic web - Computer Engineering Dept. - Spring 2007 20 XML Applications (2) Exchange data XML is used to Exchange Data Text format Software-independent, hardware-independent Exchange data between incompatible systems, given that they agree on the same tag definition. Can be read by many different types of applications Benefits: Reduce the complexity of interpreting data Easier to expand and upgrade a system

21 Semantic web - Computer Engineering Dept. - Spring 2007 21 XML Application (3) Store Data XML can be used to Store Data Plain text file Store data in files or databases Application can be written to store and retrieve information from the store Other clients and applications can access your XML files as data sources Benefits: Accessible to more applications

22 Semantic web - Computer Engineering Dept. - Spring 2007 22 XML Applications (4) Create new language XML can be used to Create new Languages, e.g. : WML (Wireless Markup Language) used to markup Internet applications for handheld devices like mobile phones (WAP) MusicXML used to publishing musical scores

23 Semantic web - Computer Engineering Dept. - Spring 2007 23 Names in XML Names (as used for tags and attributes) must begin with a letter or underscore, and can consist of: –Letters, both Roman (English) and foreign –Digits, both Roman and foreign. (dot) - (hyphen) _ (underscore) : (colon) should be used only for namespaces –Combining characters and extenders (not used in English)

24 Semantic web - Computer Engineering Dept. - Spring 2007 24 Namespaces Namespaces are a simple mechanism for creating globally unique names for the elements and attributes of your markup language. Benefits: –De-conflicts the meaning of identical names in different markup languages. –Allows different markup languages to be mixed together without ambiguity. Namespaces are implemented by requiring every XML name to consist of two parts: a prefix and a local part:

25 Semantic web - Computer Engineering Dept. - Spring 2007 25 Namespaces and URIs A namespace is defined as a unique string –To guarantee uniqueness, typically a URI (Uniform Resource Indicator) is used, because the author “owns” the domain –It doesn't have to be a “real” URI; it just has to be a unique string –Example: http://ce.sharif.edu/sw There are two ways to use namespaces: –Declare a default namespace –Associate a prefix with a namespace, then use the prefix in the XML to refer to the namespace

26 Semantic web - Computer Engineering Dept. - Spring 2007 26 Namespace syntax In any start tag you can use the reserved attribute name xmlns : –This namespace will be used as the default for all elements up to the corresponding end tag –You can override it with a specific prefix You can use almost this same form to declare a prefix: –Use this prefix on every tag and attribute you want to use from this namespace, including end tags--it is not a default prefix To Begin You can use the prefix in the start tag in which it is defined:

27 Semantic web - Computer Engineering Dept. - Spring 2007 27 Review of XML rules Start with XML is case sensitive You must have exactly one root element that encloses all the rest of the XML Every element must have a closing tag Elements must be properly nested Attribute values must be enclosed in double or single quotation marks There are only five pre-declared entities

28 Semantic web - Computer Engineering Dept. - Spring 2007 28 XML as a tree An XML document represents a hierarchy; a hierarchy is a tree novel foreword chapter number="1" paragraph This is the great American novel. It was a dark and stormy night. Suddenly, a shot rang out!

29 Semantic web - Computer Engineering Dept. - Spring 2007 29 Extended document standards You can define your own XML tag sets, but here are some already available: –XHTML: HTML redefined in XML –SMIL: Synchronized Multimedia Integration Language –MathML: Mathematical Markup Language –SVG: Scalable Vector Graphics –DrawML: Drawing MetaLanguage –ICE: Information and Content Exchange –ebXML: Electronic Business with XML –cxml: Commerce XML –CBL: Common Business Library

30 XML Schema

31 Semantic web - Computer Engineering Dept. - Spring 2007 31 XML Validation "Well Formed" XML document –correct XML syntax "Valid" XML document –“well formed” –Conforms to the rules of a DTD XML DTD –defines the legal building blocks of an XML document –Can be inline in XML or as an external reference XML Schema –an XML based alternative to DTD, more powerful –Support namespace and data types

32 Semantic web - Computer Engineering Dept. - Spring 2007 32 An Example XML with DTD <!DOCTYPE note [ ]> Tove Jani Reminder Don't forget me this weekend

33 Semantic web - Computer Engineering Dept. - Spring 2007 33 XML Schemas “Schema” is a general term –DTDs are a form of XML schemas When we say “XML Schemas,” we usually mean the W3C XML Schema Language –This is also known as “XML Schema Definition” language, or XSD.

34 Semantic web - Computer Engineering Dept. - Spring 2007 34 XSD vs. DTD DTDs provide a very weak specification language –You can’t put any restrictions on text content –You have very little control over mixed content (text plus elements) –You have little control over ordering of elements DTDs are written in a strange (non-XML) format –You need separate parsers for DTDs and XML The XML Schema Definition language solves these problems –XSD gives you much more control over structure and content –XSD is written in XML

35 Semantic web - Computer Engineering Dept. - Spring 2007 35 Referring to a schema To refer to a DTD in an XML document, the reference goes before the root element: –... To refer to an XML Schema in an XML document, the reference goes in the root element: – (This is where your XML Schema definition can be found)...

36 Semantic web - Computer Engineering Dept. - Spring 2007 36 The XSD document Since the XSD is written in XML, it can get confusing which we are talking about. The file extension is.xsd The root element is The XSD starts like this:

37 Semantic web - Computer Engineering Dept. - Spring 2007 37 The element may have attributes: –xmlns:xs="http://www.w3.org/2001/XMLSchema" This is necessary to specify where all our XSD tags are defined –elementFormDefault="qualified" This means that all XML elements must be qualified (use a namespace) It is highly desirable to qualify all elements, or problems will arise when another schema is added

38 Semantic web - Computer Engineering Dept. - Spring 2007 38 “Simple” and “complex” elements A “simple” element is one that contains text and nothing else –A simple element cannot have attributes –A simple element cannot contain other elements –A simple element cannot be empty –However, the text can be of many different types, and may have various restrictions applied to it If an element isn’t simple, it’s “complex” –A complex element may have attributes –A complex element may be empty, or it may contain text, other elements, or both text and other elements

39 Semantic web - Computer Engineering Dept. - Spring 2007 39 Defining a simple element A simple element is defined as where: –name is the name of the element –the most common values for type are xs:booleanxs:integer xs:datexs:string xs:decimalxs:time Other attributes a simple element may have: –default=" default value " if no other value is specified –fixed=" value " no other value may be specified

40 Semantic web - Computer Engineering Dept. - Spring 2007 40 Defining an attribute Attributes themselves are always declared as simple types An attribute is defined as where: –name and type are the same as for xs:element Other attributes a simple element may have: –default=" default value " if no other value is specified –fixed=" value " no other value may be specified –use="optional" the attribute is not required (default) –use="required" the attribute must be present

41 Semantic web - Computer Engineering Dept. - Spring 2007 41 Restrictions, or “facets” The general form for putting a restriction on a text value is: – (or xs:attribute )... the restrictions... For example: –

42 Semantic web - Computer Engineering Dept. - Spring 2007 42 Restrictions on numbers minInclusive -- number must be ≥ the given value minExclusive -- number must be > the given value maxInclusive -- number must be ≤ the given value maxExclusive -- number must be < the given value totalDigits -- number must have exactly value digits fractionDigits -- number must have no more than value digits after the decimal point

43 Semantic web - Computer Engineering Dept. - Spring 2007 43 Restrictions on strings length -- the string must contain exactly value characters minLength -- the string must contain at least value characters maxLength -- the string must contain no more than value characters pattern -- the value is a regular expression that the string must match whiteSpace -- not really a “restriction”--tells what to do with whitespace –value="preserve" Keep all whitespace –value="replace" Change all whitespace characters to spaces –value="collapse" Remove leading and trailing whitespace, and replace all sequences of whitespace with a single space

44 Semantic web - Computer Engineering Dept. - Spring 2007 44 Enumeration An enumeration restricts the value to be one of a fixed set of values Example: –

45 Semantic web - Computer Engineering Dept. - Spring 2007 45 Complex elements A complex element is defined as... information about the complex type... Example: says that elements must occur in this order Remember that attributes are always simple types

46 Semantic web - Computer Engineering Dept. - Spring 2007 46 Declaration and use So far we’ve been talking about how to declare types, not how to use them To use a type we have declared, use it as the value of type="..." –Examples: –Scope is important: you cannot use a type if is local to some other type

47 Semantic web - Computer Engineering Dept. - Spring 2007 47 xs:sequence We’ve already seen an example of a complex type whose elements must occur in a specific order:

48 Semantic web - Computer Engineering Dept. - Spring 2007 48 xs:all xs:all allows elements to appear in any order Despite the name, the members of an xs:all group can occur once or not at all You can use minOccurs="0" to specify that an element is optional (default value is 1 ) –In this context, maxOccurs is always 1

49 Semantic web - Computer Engineering Dept. - Spring 2007 49 Empty elements Empty elements are (ridiculously) complex

50 Semantic web - Computer Engineering Dept. - Spring 2007 50 Mixed elements Mixed elements may contain both text and elements We add mixed="true" to the xs:complexType element The text itself is not mentioned in the element, and may go anywhere (it is basically ignored)

51 Semantic web - Computer Engineering Dept. - Spring 2007 51 Extensions You can base a complex type on another complex type...new stuff...

52 Semantic web - Computer Engineering Dept. - Spring 2007 52 Predefined string types Recall that a simple element is defined as: Here are a few of the possible string types: –xs:string -- a string –xs:normalizedString -- a string that doesn’t contain tabs, newlines, or carriage returns –xs:token -- a string that doesn’t contain any whitespace other than single spaces Allowable restrictions on strings: – enumeration, length, maxLength, minLength, pattern, whiteSpace

53 Semantic web - Computer Engineering Dept. - Spring 2007 53 Predefined date and time types xs:date -- A date in the format CCYY-MM-DD, for example, 2002-11-05 xs:time -- A date in the format hh:mm:ss (hours, minutes, seconds) xs:dateTime -- Format is CCYY-MM- DD T hh:mm:ss –The T is part of the syntax Allowable restrictions on dates and times: – enumeration, minInclusive, minExclusive, maxInclusive, maxExclusive, pattern, whiteSpace

54 Semantic web - Computer Engineering Dept. - Spring 2007 54 Predefined numeric types Here are some of the predefined numeric types: Allowable restrictions on numeric types: – enumeration, minInclusive, minExclusive, maxInclusive, maxExclusive, fractionDigits, totalDigits, pattern, whiteSpace xs:decimalxs:positiveInteger xs:bytexs:negativeInteger xs:shortxs:nonPositiveInteger xs:intxs:nonNegativeInteger xs:long

55 Questions?

56 Semantic web - Computer Engineering Dept. - Spring 2007 56 References http://www.w3.org/XML/ http://www.w3.org/XML/Schema


Download ppt "XML & XML Schema Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology."

Similar presentations


Ads by Google