XML Fundamentals Transparency No. 1 XML Fundamentals Cheng-Chia Chen November 2004.

Slides:



Advertisements
Similar presentations
A Technical Introduction to XML Transparency No. 1 A Technical Introduction to XML Cheng-Chia Chen March 2002.
Advertisements

Introduction to XML: DTD
XML Language Family Detailed Examples Most information contained in these slide comes from: These slides are intended.
XML Study-Session: Part I Writing a XML Document.
Sistemi basati su conoscenza XML (esempi) Prof. M.T. PAZIENZA a.a
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
A Technical Introduction to XML Transparency No. 1 XML quick References.
Extensible Markup Language XML MIS 520 – Database Theory Fall 2001 (Day) Lecture 14.
Structured Documents KA1 Document Type definition DTD.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
XML: New or Old? XML was not an extention of HTML That already existed! SGML (ISO 8879) XML was a simplification of SGML  80 / 20 rule  (80% of the features.
Thayer School of Engineering Dartmouth Lecture 2 Overview Web Services concept XML introduction Visual Studio.net.
26-Jun-15 XML. 2 HTML and XML, I XML stands for eXtensible Markup Language HTML is used to mark up text so it can be displayed to users XML is used to.
A Technical Introduction to XML Transparency No. 1 A Technical Introduction to XML Cheng-Chia Chen March 2004.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Fundamentals of Web DevelopmentRandy Connolly and Ricardo HoarFundamentals of Web DevelopmentRandy Connolly and Ricardo Hoar Fundamentals of Web DevelopmentRandy.
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
XML introduction to Ahmed I. Deeb Dr. Anwar Mousa  presenter  instructor University Of Palestine-2009.
XML and friends Part 1 - XML and DTD ELAG 2001 workshop 8 Jan Erik Kofoed © BIBSYS Library Automation.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML - DTD. The building blocks of XML documents Elements, Tags, Attributes, Entities, PCDATA, and CDATA.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language)  XML is a markup language for creating documents containing structured information.
August Chapter 2 - Markup and Core Concepts Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
XML Syntax - Writing XML and Designing DTD's
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Lecture 6 XML DTD Content of.xml fileContent of.dtd file.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language  XML is a markup language for creating documents containing structured information.
XML - DTD Week 4 Anthony Borquez. What can XML do? provides an application independent way of sharing data. independent groups of people can agree to.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 3. XML Fundamentals.
XML Fundamentals, Namespaces Data Warehousing Lab. 박유림.
XML Extensible Markup Language Aleksandar Bogdanovski Programing Enviroment LABoratory
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 4. Document Type Definitions (DTDs)
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
17 Apr 2002 XML Syntax: Documents Andy Clark. Basic Document Structure Element tags – Elements have associated attributes Text content Miscellaneous –
Appendix C: Brief Overview of XML. ©SoftMoore ConsultingSlide 2 What is XML? The eXtensible Markup Language (XML) is a meta-markup language; i.e., a language.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
XML Fundamentals Cheng-Chia Chen.
Fundamentals of Web DevelopmentRandy Connolly and Ricardo HoarFundamentals of Web DevelopmentRandy Connolly and Ricardo Hoar Fundamentals of Web DevelopmentRandy.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
Chapter 2: Well-Formed XML. Chapter 2 Objectives How to create SML elements using start- tags and end-tags How to further describe elements with attributes.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
Document Type Definitions (DTD) A Document Type Definition (DTD) defines the structure and the legal elements and attributes of an XML document. A DTD.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
Unit 10 Schema Data Processing. Key Concepts XML fundamentals XML document format Document declaration XML elements and attributes Parsing Reserved characters.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
225 City Avenue, Suite 106 Bala Cynwyd, PA , phone , fax presents… XML Syntax v2.0.
Well Formed XML The basics. A Simple XML Document Smith Alice.
Unit 8 XML Documents. Key Concepts XML fundamentals XML document format Document declaration XML elements and attributes Parsing Characters and white.
Web Technology (NCS-504) Prepared By Mr. Abhishek Kesharwani Assistant Professor,UCER Naini,Allahabad.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
WELL- FORMEDNESS CH 6. Objective Well-formedness rules Text in XML Elements and Tags in Atributes Entity references CDATA sections Comments Unicode XML1.1.
HTML is about making documents. Simple Code for Simple Layout My Document This is an example HTML document First paragraph Second paragraph This is the.
Unit 4 Representing Web Data: XML
Extensible Markup Language XML
The XML Language.
Chapter 7 Representing Web Data: XML
Allyson Falkner Spokane County ISD
Presentation transcript:

XML Fundamentals Transparency No. 1 XML Fundamentals Cheng-Chia Chen November 2004

XML Fundamentals Transparency No. 2 Well-formed XML Document An XML document is a sequence of characters: Each character is an atomic unit of text as specified by ISO/IEC [unicode]. can be opened/edited with any program that knows how to read/write a text file usually given a.xml extension file name MIME media type: application/xml or text/xml Ex: 張得功

XML Fundamentals Transparency No. 3 Characters used in XML A character is an atomic unit of text as specified by ISO/IEC [ISO/IEC 10646]. Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC Character Range [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ character encoding may vary from entity to entity. All XML processors must accept the UTF-8 and UTF-16 encodings.

XML Fundamentals Transparency No. 4 Whitespace White Space: [3] S ::= (#x20 | #x9 | #xD | #xA)+ S (white space) consists of one or more space (#x20) characters, tabs, carriage returns or line feeds. Whitespace can used to separate otherwise indistinguishable parts of an XML Document. …

XML Fundamentals Transparency No. 5 XML Declaration Besides using file extension name, an xml document may use an XML declaration to identify itself as an XML document. If used, it should occur first (no proceding whitespace allowed) in the document. Version of the XML specification 1.0 or 1.1 character encoding of the document, expressed in Latin characters, e.g., UTF-8, UTF-16, iso , no: parsing affected by external DTD subset yes: not affected.

XML Fundamentals Transparency No. 6 Elements, tags and character data The previous example is composed of a single element named student Start-tag: End-tag: Everything between start-tag and end-tag is called content Content encompasses real information Whitespace is part of the content, though many applications will choose to ignore it and are markup 張得功 and its surrounding whitespace are character data

XML Fundamentals Transparency No. 7 Structure of an element Each XML document contains one or more elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty- element tag. Each element has a type, identified by name, and may have a set of attribute specifications. The name used in start-tag and end-tag must be identical. Note: xml is case sensitive, so != Each attribute specification has a name and a value. Element [39] element ::= EmptyElemTag | STag content ETag

XML Fundamentals Transparency No. 8 Element (cont’d) The text between the start-tag and end-tag is called the element's content: Content of Elements [43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* If an element is empty, it must be represented either by a start-tag immediately followed by an end-tag or by an empty-element tag. Tags for Empty Elements [44] EmptyElemTag ::= ' ' Empty element tags may be used for any element which has no content, whether or not it is declared using the keyword EMPTY.

XML Fundamentals Transparency No. 9 Examples of empty elements <IMG align="left” src=" />

XML Fundamentals Transparency No. 10 Start tag with attribute ( in document) and end tag name of the attribute value or values of the attribute name(or type) of the element single or double quotes, ‘ or “ must match Each element may contain zero or more attributes start tag and end tag name must match

XML Fundamentals Transparency No. 11 Attributes Attach additional information to elements An attribute is a name-value pair attached to an element’s start-tag One element can have more than one attribute Name and value are separated by = and optional whitespace Attribute value is enclosed in double or single quotation marks Attribute order is not significant 趙得勝

XML Fundamentals Transparency No. 12 Start Tag Start-tag [40] STag ::= ' ' [ WFC: Unique Att Spec ] [41] Attribute ::= Name Eq AttValue Example: End-tag [42] ETag ::= ' ’ Example: vs

XML Fundamentals Transparency No. 13 Use attribute or element ? Should one use child elements or attributes to hold information? Attributes are for metadata about the element, while elements are for the information itself Each element may have no more than one attribute with a given name The value of attribute is simply a text string – limited in structure An element-based structure is a lot more flexible and extensible If you are designing your own XML vocabulary, it is up to you to decide when to use which

XML Fundamentals Transparency No. 14 XML Names Rules for naming elements, attributes… Names and Tokens [4] NameChar ::= Letter | Digit | '.' | '-' | '_' | ':' |Letter CombiningChar | Extender [5] Name ::= (Letter | '_' | ':') (NameChar)* [6] Names ::= Name ( #x20 Name)* [7] Nmtoken ::= (NameChar)+ [8] Nmtokens ::= Nmtoken (#x20 Nmtoken)* Names beginning with (x|M)(m|M)(l|L) are reserved. Name is used for naming elements, attributes, entities etc. Nmtoken (Nmtokens) is used for values of special attributes(ID,IDREFS,NMTOKEN,NMTOKENS).

XML Fundamentals Transparency No. 15 AttValues (attribute value literal) are those that can occur as an attribute value. [10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" Enclosed by double or single quotes. Can contain entity/char references or any char data but < and &.

XML Fundamentals Transparency No. 16 Comments Comments may appear 1. anywhere in a document outside other markup; 2. within the document type declaration at places allowed by the grammar. They are not part of the document's character data. The string "--" (double-hyphen) must not occur within comments. Comments [15] Comment ::= ' ' Example: & -->

XML Fundamentals Transparency No. 17 Processing Instructions (PIs) Processing instructions (PIs) allow documents to contain instructions for applications. Processing Instructions: [16] PI ::= ' ' Char*)))? '?>' [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) The PI begins with a target (PITarget) used to identify the application. The target names "XML", "xml", and so on are reserved for standardization. Ex:

XML Fundamentals Transparency No. 18 Processing Instruction and comment may contain any characters except the string “--”

XML Fundamentals Transparency No. 19 XML Document [1]document ::= prolog element Misc* elemet is called the root or document element of the document [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' [27] Misc ::= Comment | PI | S

XML Fundamentals Transparency No. 20 Character references What if the character data inside an element contains < ? x+1 Instead of using ‘<‘, we can use its character code (60) reference: < --- decimal #60 < --- hexadecimal #x3c

XML Fundamentals Transparency No. 21 Entity reference Numeric code is hard to remember. Can use a name to denote a char or a sequence of chars Such name is called entity. Entity reference – If xxx is an entity => &xxx; is it entity reference when parsing an XML document, xml processor replaces the entity reference with the actual characters to which the entity reference refers XML predefines 5 entity references – you can define more < – the less-than sign (<) & – the ampersand (&) > – the greater-than sign(>) " – the straight, double quotation marks (") &apos; – the straight single quote (')

XML Fundamentals Transparency No. 22 CDATA Section What if my element content has a lot of special characters ? Ex: x Solution 1: x < y &amps;&amps; z < 1 Hard to read Solution 2:

XML Fundamentals Transparency No. 23 CDATA Sections CDATA sections may occur anywhere character data may occur; used to escape blocks of text containing characters which would otherwise be recognized as markup. begin with the string " ": CDATA Sections [18] CDSect ::= CDStart CData CDEnd [19] CDStart ::= '<![CDATA[' [20] CData ::= (Char* - (Char* ']]>' Char*)) [21] CDEnd ::= ']]>' Within a CDATA section, only the CDEnd ']]>' string is recognized as markup, so that left angle brackets ‘<‘ and ampersands ‘&’ may and must occur in their literal form. Example: Hello, world! ]]>

XML Fundamentals Transparency No. 24 Character Data and Markup XML Document consists of intermingled character data and markup. Markup takes the form of start-tags, end-tags, empty-element tags, entity references, character references, comments, CDATA section delimiters, document type declarations, processing instructions, XML declarations, text declarations and white space outside root element All text that is not markup constitutes the character data of the document. I.e., it may occur in the content of an element or In the content of an CDATA Section.

XML Fundamentals Transparency No. 25 Character Data and Markup (cont’d) In the content of elements, character data is any string of characters which does not contain the start-delimiter of any markup. In a CDATA section, character data is any string of characters not including the CDATA-section-close delimiter, "]]>". To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "&apos;", and the double- quote character (") as """. Character Data : [14] CharData ::= [^ ' [^<&]*) i.e., Any string containing none of.

XML Fundamentals Transparency No. 26 Possible contents of an element content Element [39] element ::= EmptyElemTag | STag content ETag Content of Elements [43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* In addition to char data and child elements, an element may contain as children also references, PIs, comments or CDATA sections.

XML Fundamentals Transparency No. 27 Rules for well-formed XML Documents 1: balance start and end tags The set of tags is unlimited but all start tags must have matching end tags Example of legal XML DeTsi Wang 20 2: There must be exactly one root element

XML Fundamentals Transparency No. 28 Rules for well-formed XML Documents Rule 3: Proper element nesting All tags must be nested correctly. Like HTML, XML can intermix tags and text, but tags may not overlap each other. Legal XML DeTsi Wang 20 Illegal XML This text is bold and italic and italic

XML Fundamentals Transparency No. 29 Rules for well-formed XML Documents Rule 4: Attribute values must be single or double quoted Legal Illegal Rule 5: An element may not have two attributes with the same name Rule 6: Comments and processing instructions may not appear inside tags size = “6” /> Rule 7: No unescaped < or & signs may occur in the character data of an element or attributes 20&3