XML IST 421
XML eXtensible Markup Language Used for digital representation of documents Store, process, search, transmit, display and print documents www.w3c.org/xml - current information about XML
XML Basic building block is the element, defined by tags Root element contains all of the other elements Attributes describe properties of elements XML uses delimiters to differentiate character data Less than < and greater than > called a tag
XML Elements Name the contents of the element. Typically in pairs with a start and end tag. Some elements take attributes. The structure describes the relationship between the elements. Example: <order_no>101</order_no>
XML Elements Letters Digits Underscore character Dot Hyphen May start with a letter or an underscore May consist of: Letters Digits Underscore character Dot Hyphen Cannot start with the string “xml”
XML Elements XML names are case sensitive unlike HTML tags which are not Must have one root element Similar to the <HTML>; </HTML> in html type of document Programmer defines a root name First line must be xml declaration <?xml version=“1.0”?> (Note: ? Means information is passed)
XML Comments <!-- ………--> Any text desired can be placed within the <!-- -- > Example: <!-- Updated Apr 3 -->
XML Processing Instructions Enables the passing of information to another application Format: <?… ?> Specifies version. Example: <?xml version “1.0”?>
Let’s Practice Open an editor Declare the XML version as “1.0” Create a Root Element, called class_listing Note that every element must have both a beginning and ending tag Save as: class_listing.xml Add some other elements! Create data within each element! Test this in the browser.
Review the Practice XML data is hierarchical Elements contained within other elements are called children Elements that contain children are called parent elements – nesting Each XML document contains a root element Element names describe the data
Components of XML Documents XML Declaration First line of document Declaration tag begins with: <?xml version=“1.0” encoding=“UTF-8” standalone=“no”?> May contain 3 attributes: version=“1.0” encoding=“UTF-8” (default if not given) standalone=“yes” or standalone=“no” (default) UTF-8 = 8-bit Unicode character-encoding scheme. Others are UTF-16, UTF-32, and ISO-10646-UCS-2.
Attributes Attributes may be attached to elements Attributes have: Names Values Name is separated from value by “=“ sign Value must have “ “ around it
Attributes Creates additional information It is often information about the ELEMENT content Nesting an ELEMENT within others may accomplish the same purpose
Let’s Practice Add attribute to define categorize of student status <student status=“sr”> Add this attribute to all students Remember to save with .xml extension.
XML Entities Entities are used as placeholders for content Two types of entities: General Parameter
XML General Entities Placeholders for any information contained in the root element Three types: Character – used in place of special characters Content – used to mark the place of a common block of content that you type often Unparsed – used for binary or nontext data like images or video clips
Character Entities Some tag delimiter characters have special meaning in XML <?xml version=“1.0”?> <equation> 50 < 100 </equation> Causes a syntax error
Character Entities Solve problem by using character entities: > > < < “ " ‘ '
Content Entities Used to mark the place of a common block of content that you type often or that may change Internal entities – defined as part of the DTD within the XML document Example: <!DOCTYPE class_listing [ <!ENTITY campus "Harrisburg"> ]> <class_campus>Penn State &campus;</class_campus> External entities – information saved in an external file with a .xml extension
Unparsed Entities Used for binary or nontext data like images or video clips <!ENTITY picture SYSTEM “sunset.gif” NDATA GIF> NDATA = notation data The unparsed entity declaration tells the processing system not to parse the data but rather to pass it through as is.
Well-Formed XML Document that adheres to XML syntax rules – well formed Rules: Must contain only one root element All elements must have a start and end tag Elements must be nested properly and cannot overlap <book><chapter> … </book></chapter>
Well-Formed XML Rules (cont.) All attributes must have a value and must be enclosed in quotes <student status=“sr”> Attributes must be placed in the start tag of an element and may appear only once Element names are case-sensitive <STUDENT> vs. </student>
Well-Formed XML Rules (cont.) Certain markup characters are reserved such as < and >. Must use a character entity instead Element names may start with letters or an underscore; names may contain only letters, numbers, hyphens, periods, and underscore Element names may not start with xml
XML Parser XML Parser is a program that checks an XML document to ensure it follows the rules and is well formed. Nonvalidating parser – looks for syntax errors according to the language rules Validating parser – checks your document against a DTD or schema Like compilers, one error may cause many messages
Lab Create an XML document for the following: Camping Trip Gear List The following is a list of items that are essential on any camping trip: Flashlight Hiking boots Sleeping bag Pocket knife Bug spray Compass Hatchet Lantern Shovel Tent Bucket Ground cloth
Document Type Definition (DTD)
DTD Add rules to an XML document that enforce structure Document Type Definition(DTD) XML Schema
DTD Validates a document against its model, i.e. declares what is legal Define the elements your document can contain Define the order in which elements appear Require that certain elements appear Define the allowed number of occurrences of a given element Define the type of data an element can contain Define child elements for a given element Define the attributes for each of your elements Assign constraints to the attribute values
DTD XML documents do not have to include a DTD DTD becomes a way to validate a document and guarantees consistency DTD is important when sharing an XML document with other programs A document is valid if it conforms to a DTD An XML document may not be valid and yet be well-formed XML code
DTD There are 2 kinds of DTD declarations: Internal DTD – provided as part of the document External DTD – an external file The syntax and rules for defining the 2 are the same
Element Declarations If you use a DTD with your XML document, DTD must declare all elements used Syntax for declaring an element: <!ELEMENT element_name (content model)>
Element Declarations <?xml version=“1.0”? standalone=“yes”?> <!DOCTYPE merchant_name [ <!ELEMENT merchant_name (#PCDATA)> ]> <merchant_name>Giant Foods</merchant_name>
Content Models for Elements Text: Supports text or character data <!ELEMENT item (#PCDATA)> Elements: Supports content that is another element <!ELEMENT item (element_name)> PCDATA stands for parsed character data Data type is not yet known
Content Models for Elements Mixed content: Supports both text and other elements. <!ELEMENT item (#PCDATA|element_name)> #PCDATA must be first in declaration
Content Models for Elements Empty: Supports an element that has no content <!ELEMENT item (EMPTY)> Any: May contain text or elements <!ELEMENT item (ANY)>
<?xml version=“1.0”?> <invoice> <merchant_name>Giant Foods</merchant_name> <merchant_address> <street>123 Any Street</street> <city>Harrisburg</city> <state>PA</state> </merchant_address> <sales_date>04-10-03</sales_date> <items_purchased> <item price=“2.35” quantity=“1”>Milk</item> <item price=“0.99” quantity=“1”>Eggs 12 count</item> <item price=“2.65” quantity=“1”>Tropicana Orange Juice</item> </items_purchased> </invoice>
Internal DTD <?xml version=“1.0”?> <!DOCTYPE invoice[ <!ELEMENT invoice (merchant_name, merchant_address, sales_date, items_purchased)> <!ELEMENT merchant_name (#PCDATA )> <!ELEMENT merchant_address (street, city, state)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT sales_date (#PCDATA)> <!ELEMENT items_purchased (item*)> <!ELEMENT item (#PCDATA)> <!ATTLIST item price NMTOKEN #REQUIRED quantity NMTOKEN #REQUIRED> ]>
DTD Guidelines Special symbols to indicate how many times an element may appear. An * indicates a unit may appear as many times as needed or not at all An + indicates a unit must appear at least once and as many times as needed An ? Indicates the unit can appear only once An , indicates the elements must appear in the order specified
Attributes Syntax for a DTD attribute declaration <!ATTLIST element_name attribute_name data_type default_value attribute_name data_type default_value > Can be located anywhere in the DTD but it is good practice to keep it close to the corresponding element
Attribute Data Types CDATA ID Stands for character/string data Contains any combination of characters except “<“ or “&” Is simple and easy to use ID Defined to have a value that is unique, like a key Must start with a letter
Attribute Data Types IDREF NMTOKEN or NMTOKENS Define an attribute that refers to one of the ID attributes NMTOKEN or NMTOKENS Data typing NMTOKEN type attributes may not contain any white space NMTOKENS type attributes may contain white space May consist of letters, numbers, hyphens, periods, underscores, and colons.
Attribute Default Values #REQUIRED: attribute must contain some value #IMPLIED: attribute has no default value and may be omitted #FIXED fixedvalue: attribute must always be set to the value, fixedvalue Default: merely type a default value instead of the above
DTD Comments Comments may be written in the DTD in the same fashion as for an XML document <!-- This is a comment for a DTD -->
External DTD’s Make sure the standalone value in the XML declaration is set to “no” DTD declaration must tell the parser where to find the DTD file <!DOCTYPE invoice SYSTEM “invoice.dtd”> <!DOCTYPE invoice SYSTEM “http://www.personal.psu.edu/invoice.dtd”>
XML Schemas May 2001, W3C released XML Schema Language recommendation Covers structure Covers data types Alternative to DTD’s More powerful method to describe and set constraints on XML components
Homework Create a DTD for your XML document for the Camping Trip Gear List The following is a list of items that are essential on any camping trip: Flashlight Hiking boots Sleeping bag Pocket knife Bug spray Compass Hatchet Lantern Shovel Tent Bucket Ground cloth