C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM.

Slides:



Advertisements
Similar presentations
Defining XML The Document Type Definition. Document Type Definition text syntax for defining –elements of XML –attributes (and possibly default values)
Advertisements

What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 Extensible Markup Language: XML HTML: portable, widely supported protocol for describing how to format data XML: portable, widely supported protocol.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Document Type Definitions
1 Extensible Markup Language: XML HTML: portable, widely supported protocol for describing how to format data XML: portable, widely supported protocol.
Thayer School of Engineering Dartmouth Lecture 2 Overview Web Services concept XML introduction Visual Studio.net.
Tutorial 11 Creating XML Document
Introduction to XML Extensible Markup Language
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
Copyright © 2003 Pearson Education, Inc. Slide 2-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
August Chapter 2 - Markup and Core Concepts Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
XML Syntax - Writing XML and Designing DTD's
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Introduction to XML Extensible Markup Language. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
Electronic Commerce COMP3210 Session 4: Designing, Building and Evaluating e-Commerce Initiatives – Part II Dr. Paul Walcott Department of Computer Science,
The XML Document Object Model (DOM) Aug’10 – Dec ’10.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
Consuming eXtensible Markup Language (XML) feeds.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
School of Computing and Information Systems CS 371 Web Application Programming XML and JSON Encoding Data.
Web Technologies COMP6115 Session 4: Adding a Database to a Web Site Dr. Paul Walcott Department of Computer Science, Mathematics and Physics University.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
225 City Avenue, Suite 106 Bala Cynwyd, PA , phone , fax presents… XML Syntax v2.0.
Well Formed XML The basics. A Simple XML Document Smith Alice.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
XML CORE CSC1310 Fall XML DOCUMENT XML document XML document is a convenient way for parsers to archive data. In other words, it is a way to describe.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Introduction to XML Extensible Markup Language.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Week-9 (Lecture-1) XML DTD (Data Type Document): An XML document with correct syntax is called "Well Formed". An XML document validated against a DTD is.
Unit 4 Representing Web Data: XML
Chapter 7 Representing Web Data: XML
Web Programming Maymester 2004
Review of XML IST 421 Spring 2004 Lecture 5.
Presentation transcript:

C# and Windows Programming XML Processing

2 Contents Markup XML DTDs XML Parsers DOM

3 Markup When we write text, it is just text For example:  John Smith  123 Main St.  Toronto  Ontario We can all read this and understand it A computer cannot and needs additional information

4 Markup Markup is added to documents in the form of tags A tag consists of text delimited by angle brackets The name of the tag identifies it and the information which is conveyed by the tag

5 Markup Let’s add some semantic markup to our address John Smith 123 Main St. Toronto Ontario This identifies the information in the various parts of the address

6 Markup You will notice  Tags occur in pairs A start tag A matching end tag with a “/” before the tag name  The text that the tags are describing is enclosed between the start tag and the end tag  A single tag is placed around the entire document  The fact that every start tag has a matching end tag makes the document well-formed

7 XML XML is the latest is a long line of markup languages It is the eXtensible Markup Language Unlike, other markup languages, you can define your own tags Any meaning associated with those tags is imposed by your program

8 Uses of XML SOAP  Simple Object Access Protocol – a type of remote procedure call Configuration files Web services Security information Electronic document exchange

9 Defining Documents If you can define your own tags, how do you know what should be in a document?  Document Type Definition This defines the allowable tags and their order It is similar to a BNF grammar  Schema Like a DTD, it describes the tags and their order It also describes the content which can be placed within the tags

10 XML Structure Here is a simple XML document John Smith 123 Main St. Toronto Ontario

11 Attributes A tag can also have attributes which provide additional information about the tag  Toronto A tag can have zero or more attributes

12 The XML Declaration The first line is the optional XML declaration It consists of  <?xml Identify this as the XML declaration  version=“1.0” The version of XML in the document

13 The XML Declaration  encoding=“ISO ” This is the character set used in the document Various character sets can be used including unicode (UTF- 8) an international character set  standalone = “no” Determines if the document uses any external entities which are defined in other files This will be discussed later in the course  In general, the order of attributes is not important but it is in the XML declaration

14 The DOCTYPE Declaration The optional DOCTYPE declaration follows the XML declaration  This declaration is required only if you want to validate the document against a definition of the tags in the document

15 The Root Element This is the element which begins the document It is the first element in the document It contains all other elements in the document

16 Elements An element consists of a start tag, character data, and an end tag  John Smith A tag name must start with a letter or underscore A tag name cannot contain spaces or colons The end tag must match the start tag exactly, including case

17 Mixed Content If an element contains just text, it has simple content  John Smith If it contains a mix of text and elements, it is said to have mixed content  these are nested correctly

18 Attributes Attributes are name-value pairs which can be added to elements Attributes allow you to provide additional information without changing the tag itself The names for attributes follow the same rules as tag names Every attribute name within the same tag must be unique

19 Attributes accountant sales Note that these both contain a name attribute That is OK since the attributes are in separate elements Attribute values are placed in either single or double quotes

20 Comments Comments are delimited by spacial brackets   Comments can Add explanations Remove XML which is not needed for a while

21 Entities The less than and greater than signs delimit tags What if you want to type these symbols in a document and not have them delimit a tag? Then, enter them as entities To enter a less than sign  < All entities are referenced using  &  The entity name  ;

22 Entities EntitySymbolDescription <<Less than >>Greater than &&Ampersand "“Double quote &apos;‘apostrophe

23 CDATA Sometimes using entities is not enough since you have many special characters to type A CDATA section allows you to enter anything without having special characters interpreted 

24 Document Type Definitions The DTD is one way to describe what should be in a valid XML document There are other ways which we will examine later in the course A DTD  Describes each element and the elements which can occur within it  Describes the attributes for each element  Describes entities which can be used in the document

25 Person DTD <!DOCTYPE persontype [ <!ELEMENT person (first, last, gender, employee-id) > ]>

26 Reading the DTD There is an element person containing the elements  first  last  gender  employee-id These element are described below Each of these contains PCDATA, meaning parseable character data This means that these elements only contain text – not nested tags

27 XML Parsers There are two types of XML parsers  DOM The Document Object Model This parses the document into a tree-like structure called a DOM The document is parsed all at once  SAX Simple Api for Xml This is a sequential parser which executes a callback when each part of the document is recognized This is good for very large documents since the entire document does not have to be in memory at once

28 What is DOM? DOM is an in-memory data structure It describes an XML document as a tree structure The nodes in the tree are described by the interface to them This means that there can be many implementations that implement the interface

29 So, how do make a document into a tree? Harold Document friend whitespace handle Harold whitespace degree close Root Element Text Attribute

30 Nodes All nodes in a DOM implement the Node interface All other interfaces in the tree extend the Node interface This means that every node can be treated as a Node, and maybe more

31 XmlNode Represents every node in the DOM Properties  ParentNode  Name  FirstChild  NextSibling  PreviousSibling  Value

32 XmlNode Methods  InsertBefore()  AppendChild()  RemoveChild()  Clone()

33 XmlDocument The node above the root node of the document Can be used to represent an empty document Properties  DocumentElement Methods  CreateElement()  CreateTextNode()  GetElementsByTagName()  Load()  Save()

34 XmlElement This represents an element An element can have attributes Properties  XmlAttributeCollection Attributes Methods  GetElementsByTagName()  SetAttribute(string name, string value)  string GetAttribute(string name)

35 XmlAttribute This is an attribute Can have either Text nodes or EntityReferences as children Name property gets the name Value gets the value

36 XmlText This is the node representing text The text has no markup Even whitespace is represented as a text node

37 CDATASection Interface This is a CDATA section It is similar to a text node but the content undergoes no interpretation

38 Other Node Subinterfaces Comment Notation Entity EntityReference ProcessingInstruction  These are all just the same as in XML

39 Other Node Subinterfaces DocumentFragment  Part of a document tree which can be inserted into another tree DOMImplementation  Prevides capabilities of the implementation  Has the method for creating a document

40 Other Node Subinterfaces DOMException  Something went wrong NodeList  A list of nodes which has an iterator NamedNodeMap  A map structure holding a collection of nodes

41 Common.NET DOM Classes XmlNode XmlDocumentXmlElementXmlTextXmlAttribute

42 XmlNodeList A list of nodes Returned by GetElementsByTagName() Properties  Count -- number of nodes in the list  Indexer-- retrieves a node Methods  Item(int n)-- retrieves a node

43 XmlNamedNodeMap A map of nodes indexed by name Superclass of XmlAttributeCollection Returned by the Attributes property Properties  Count Methods  Item(int n)  GetNamedItem(string name)

44 Examples * see NodeLister * see DocBuilder