Presentation is loading. Please wait.

Presentation is loading. Please wait.

C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM.

Similar presentations


Presentation on theme: "C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM."— Presentation transcript:

1 C# and Windows Programming XML Processing

2 2 Contents Markup XML DTDs XML Parsers DOM

3 3 Markup When we write text, it is just text For example:  John Smith  123 Main St.  Toronto  Ontario We can all read this and understand it A computer cannot and needs additional information

4 4 Markup Markup is added to documents in the form of tags A tag consists of text delimited by angle brackets The name of the tag identifies it and the information which is conveyed by the tag

5 5 Markup Let’s add some semantic markup to our address John Smith 123 Main St. Toronto Ontario This identifies the information in the various parts of the address

6 6 Markup You will notice  Tags occur in pairs A start tag A matching end tag with a “/” before the tag name  The text that the tags are describing is enclosed between the start tag and the end tag  A single tag is placed around the entire document  The fact that every start tag has a matching end tag makes the document well-formed

7 7 XML XML is the latest is a long line of markup languages It is the eXtensible Markup Language Unlike, other markup languages, you can define your own tags Any meaning associated with those tags is imposed by your program

8 8 Uses of XML SOAP  Simple Object Access Protocol – a type of remote procedure call Configuration files Web services Security information Electronic document exchange

9 9 Defining Documents If you can define your own tags, how do you know what should be in a document?  Document Type Definition This defines the allowable tags and their order It is similar to a BNF grammar  Schema Like a DTD, it describes the tags and their order It also describes the content which can be placed within the tags

10 10 XML Structure Here is a simple XML document John Smith 123 Main St. Toronto Ontario

11 11 Attributes A tag can also have attributes which provide additional information about the tag  Toronto A tag can have zero or more attributes

12 12 The XML Declaration The first line is the optional XML declaration It consists of  <?xml Identify this as the XML declaration  version=“1.0” The version of XML in the document

13 13 The XML Declaration  encoding=“ISO-8859-1” This is the character set used in the document Various character sets can be used including unicode (UTF- 8) an international character set  standalone = “no” Determines if the document uses any external entities which are defined in other files This will be discussed later in the course  In general, the order of attributes is not important but it is in the XML declaration

14 14 The DOCTYPE Declaration The optional DOCTYPE declaration follows the XML declaration  This declaration is required only if you want to validate the document against a definition of the tags in the document

15 15 The Root Element This is the element which begins the document It is the first element in the document It contains all other elements in the document

16 16 Elements An element consists of a start tag, character data, and an end tag  John Smith A tag name must start with a letter or underscore A tag name cannot contain spaces or colons The end tag must match the start tag exactly, including case

17 17 Mixed Content If an element contains just text, it has simple content  John Smith If it contains a mix of text and elements, it is said to have mixed content  these are nested correctly

18 18 Attributes Attributes are name-value pairs which can be added to elements Attributes allow you to provide additional information without changing the tag itself The names for attributes follow the same rules as tag names Every attribute name within the same tag must be unique

19 19 Attributes accountant sales Note that these both contain a name attribute That is OK since the attributes are in separate elements Attribute values are placed in either single or double quotes

20 20 Comments Comments are delimited by spacial brackets   Comments can Add explanations Remove XML which is not needed for a while

21 21 Entities The less than and greater than signs delimit tags What if you want to type these symbols in a document and not have them delimit a tag? Then, enter them as entities To enter a less than sign  < All entities are referenced using  &  The entity name  ;

22 22 Entities EntitySymbolDescription <<Less than >>Greater than &&Ampersand "“Double quote &apos;‘apostrophe

23 23 CDATA Sometimes using entities is not enough since you have many special characters to type A CDATA section allows you to enter anything without having special characters interpreted 

24 24 Document Type Definitions The DTD is one way to describe what should be in a valid XML document There are other ways which we will examine later in the course A DTD  Describes each element and the elements which can occur within it  Describes the attributes for each element  Describes entities which can be used in the document

25 25 Person DTD <!DOCTYPE persontype [ <!ELEMENT person (first, last, gender, employee-id) > ]>

26 26 Reading the DTD There is an element person containing the elements  first  last  gender  employee-id These element are described below Each of these contains PCDATA, meaning parseable character data This means that these elements only contain text – not nested tags

27 27 XML Parsers There are two types of XML parsers  DOM The Document Object Model This parses the document into a tree-like structure called a DOM The document is parsed all at once  SAX Simple Api for Xml This is a sequential parser which executes a callback when each part of the document is recognized This is good for very large documents since the entire document does not have to be in memory at once

28 28 What is DOM? DOM is an in-memory data structure It describes an XML document as a tree structure The nodes in the tree are described by the interface to them This means that there can be many implementations that implement the interface

29 29 So, how do make a document into a tree? Harold Document friend whitespace handle Harold whitespace degree close Root Element Text Attribute

30 30 Nodes All nodes in a DOM implement the Node interface All other interfaces in the tree extend the Node interface This means that every node can be treated as a Node, and maybe more

31 31 XmlNode Represents every node in the DOM Properties  ParentNode  Name  FirstChild  NextSibling  PreviousSibling  Value

32 32 XmlNode Methods  InsertBefore()  AppendChild()  RemoveChild()  Clone()

33 33 XmlDocument The node above the root node of the document Can be used to represent an empty document Properties  DocumentElement Methods  CreateElement()  CreateTextNode()  GetElementsByTagName()  Load()  Save()

34 34 XmlElement This represents an element An element can have attributes Properties  XmlAttributeCollection Attributes Methods  GetElementsByTagName()  SetAttribute(string name, string value)  string GetAttribute(string name)

35 35 XmlAttribute This is an attribute Can have either Text nodes or EntityReferences as children Name property gets the name Value gets the value

36 36 XmlText This is the node representing text The text has no markup Even whitespace is represented as a text node

37 37 CDATASection Interface This is a CDATA section It is similar to a text node but the content undergoes no interpretation

38 38 Other Node Subinterfaces Comment Notation Entity EntityReference ProcessingInstruction  These are all just the same as in XML

39 39 Other Node Subinterfaces DocumentFragment  Part of a document tree which can be inserted into another tree DOMImplementation  Prevides capabilities of the implementation  Has the method for creating a document

40 40 Other Node Subinterfaces DOMException  Something went wrong NodeList  A list of nodes which has an iterator NamedNodeMap  A map structure holding a collection of nodes

41 41 Common.NET DOM Classes XmlNode XmlDocumentXmlElementXmlTextXmlAttribute

42 42 XmlNodeList A list of nodes Returned by GetElementsByTagName() Properties  Count -- number of nodes in the list  Indexer-- retrieves a node Methods  Item(int n)-- retrieves a node

43 43 XmlNamedNodeMap A map of nodes indexed by name Superclass of XmlAttributeCollection Returned by the Attributes property Properties  Count Methods  Item(int n)  GetNamedItem(string name)

44 44 Examples * see NodeLister * see DocBuilder


Download ppt "C# and Windows Programming XML Processing. 2 Contents Markup XML DTDs XML Parsers DOM."

Similar presentations


Ads by Google