Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applied Component-Based Software Engineering XML Basics

Similar presentations


Presentation on theme: "Applied Component-Based Software Engineering XML Basics"— Presentation transcript:

1 Applied Component-Based Software Engineering XML Basics
CSE 668 / ECE 668 Prof. Roger Crawfis

2 XML Quiz What does XML stand for? Is XML a language?
What is HTML? What is XHTTP? What is HTTPS? Is HTML a language?

3 XML Quiz What does XML stand for? Is XML a language?
eXtensible Markup Language Is XML a language? No! What is HTML? What is XHTTP? What is HTTPS? xhttp is a well-formed html (aka a valid XML) Is HTML a language? Yes!

4 XML Motivation Data interchange is critical in today’s networked world
Examples: Banking: funds transfer Order processing (especially inter-company orders) Scientific data Chemistry: ChemML, … Genetics: BSML (Bio-Sequence Markup Language), … Paper flow of information between organizations is being replaced by electronic flow of information Each application area has its own set of standards for representing information Plain text with line headers indicating the meaning of fields XML has become the basis for all new generation data interchange formats

5 Semi-structured Data Nodes = objects.
Labels on arcs (attributes, relationships). Atomic values at leaf nodes (nodes with no arcs out). Flexibility: no restriction on: Labels out of a node. Number of successors with a given label.

6 Example: Data Graph Notice a new kind of data. root beer beer bar
The beer object for Bud manf manf prize name A.B. name year award servedAt Bud M’lob 1995 Gold The bar object for Joe’s Bar name addr Joe’s Maple

7 XML Standardization World Wide Web Consortium (W3C) More resources at Java-XML (and web services) info at .NET-XML (via web services) info at

8 XML Uses Example: the Ajax technology. Small volume browser-server communication in XML supports more interactive Web pages. Example: Web services. Marshalling and unmarshalling data in SOAP uses XML. Service descriptions use XML.

9 XML Uses Example: Data exchange formats. (Applications must agree on common meaning for tags.) Older data exchange formats have been redesigned as instances of XML, eg. HL7 in health informatics, FIX in the financial industry, etc. Even proprietary formats like MS Word now have open XML versions. Example: Software development configuration files, eg., in W3C, Apache, Java EE, .NET frameworks. (All this may be geek paradise but it’s awfully verbose and the scarcity of visual editors is puzzling.)

10 Why People Like XML Can get data from all sorts of sources
Allows us to touch data we don’t own! Can integrate various data sources as if they were databases (almost) We can publish some of the data in our databases on the Web conveniently

11 Well-Formed and Valid XML
Well-Formed XML allows you to invent your own tags. Similar to labels in semi-structured data. Valid XML involves either a: DTD (Document Type Definition), a grammar for tags. XSD (XML Scheme Document), a grammar for tags in XML format.

12 Well-Formed XML A legal XML document – fully parsable by an XML parser
All open-tags have matching close-tags Attributes (which are unordered) only appear once in an element There’s a single root element

13 Well-Formed XML Start the document with a declaration, surrounded by <?xml … ?> . Normal declaration is: <?xml version = “1.0” standalone = “yes” ?> Standalone – DTD or Schema provided. Balance of document is a root tag surrounding nested tags.

14 Tags Tags, as in HTML, are normally matched pairs, as <FOO> … </FOO> . Tags may be nested arbitrarily. XML tags are case sensitive.

15 Example: Well-Formed XML
A NAME subobject <?xml version = “1.0” standalone = “yes” ?> <BARS> <BAR><NAME>Joe’s Bar</NAME> <BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER> <BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER> </BAR> <BAR> … </BARS> A BEER subobject

16 XML and Semi-structured Data
Well-Formed XML with nested tags is exactly the same idea as trees of semi-structured data. Graphs are possible through indirection.

17 Example The <BARS> XML document is: BARS BAR BAR BAR NAME . . .
BEER BEER Joe’s Bar PRICE NAME PRICE NAME Bud 2.50 Miller 3.00

18 XML as a Data Model XML “information set” includes 7 types of nodes:
Document (root) Element Attribute Processing instruction Text (content) Namespace Comment XML data model includes this, plus order info and a few other things

19 XML Anatomy Processing Instr.
<?xml version="1.0" encoding="ISO " ?> <dblp> <mastersthesis mdate=" " key="ms/Brown92"> <author>Kurt P. Brown</author> <title>PRPL: A Database Workload Specification Language</title> <year>1992</year> <school>Univ. of Wisconsin-Madison</school> </mastersthesis> <article mdate=" " key="tr/dec/SRC "> <editor>Paul R. McJones</editor> <title>The 1995 SQL Reunion</title> <journal>Digital System Research Center Report</journal> <volume>SRC </volume> <year>1997</year> <ee>db/labs/dec/SRC html</ee> <ee> </article> Open-tag Element Attribute Close-tag

20 A Visualization of XML Data
root attribute Root p-i element ?xml dblp text mastersthesis article mdate mdate key key 2002… author title year school editor title journal volume year ee ee 2002… 1992 1997 ms/Brown92 The… tr/dec/… PRPL… Digital… db/labs/dec Kurt P…. Univ…. Paul R. SRC…

21 Empty Elements We can do all the work of an element in its attributes.
Like BEER in previous example. Another example: SELLS elements could have attribute price rather than a value that is a price. Example use: <SELLS theBeer = “Bud” price = “2.50”/> Note exception to “matching tags” rule

22 XML Namespaces Namespaces allow us to specify a context for different tags Two parts: Binding of namespace to URI Qualified names <tag xmlns:myns=“ <thistag>is in namespace myns</thistag> <myns:thistag>is the same</myns:thistag> <otherns:thistag>is a different tag</otherns:thistag> </tag>

23 XML Attributes An (opening) tag may contain attributes. These are typically used to describe the content of an element <entry> <word language = “en”> cheese </word> <word language = “fr”> fromage </word> <word language = “ro”> branza </word> <meaning> A food made … </meaning> </entry>

24 XML Attributes Another common use for attributes is to express dimension or type <picture> <height dim= “cm”> 2400 </height> <width dim= “in”> 96 </width> <data encoding = “gif” compression = “zip”> ... </data> </picture>

25 When to use attributes <person ssno= “123 45 6789”>
<name> F. MacNiel </name> < > </ > ... </person> <person> <ssno> </ssno> <name> F. MacNiel </name> < > </ > ... </person> The choice between representing data as attributes or as elements is sometimes unclear, taste applies.

26 Defining the structure of an XML file
We can check if an XML file is well-formed by looking at it, maybe By loading it into a browser If well-formed, it will be displayed However, how can we check that the well-formed file contains the correct elements in the correct quantities? We need to write a specification for the XML file

27 XML Needs Help It’s too unconstrained for many cases! We also need:
How will we know when we’re getting garbage? How will we query? How will we understand what we got? We also need: Some idea of the structure Presentation, in some cases – CSS, XSL Some way of interpreting the tags

28 Defining the structure of an XML file
There are 2 main alternatives Document Type Definitions Original and simple XML Schema More versatile and complex We will look at both Concentrating on XML Schema XML documents are not required to have an associated schema

29 Document Type Definition (DTD)
The type of an XML document can be specified using a DTD DTD constrains structure of XML data What elements can occur What attributes can/must an element have What sub-elements can/must occur inside each element, and how many times. DTD does not constrain data types All values represented as strings in XML DTD syntax <!ELEMENT element (subelements-specification) > <!ATTLIST element (attributes) >

30 Example: An Address Book
An attribute <person ssn = “4444”> <name> Homer Simpson </name> <tel> 2543 </tel> <tel> 2544 </tel> < > </ > </person> Exactly one name Up to 4 tel nos At least one One or more persons

31 Example: The Address Book2
<person> <name> MacNiel, John </name> <greet> Dr. John MacNiel </greet> <addr>1234 Huron Street </addr> <addr> Rome, OH </addr> <tel> (321) </tel> <fax> (321) </fax> < > </ > </person> Exactly one name At most one greeting As many address lines as needed (in order) Mixed telephones and faxes At least one

32 DTD - Specifying the Structure
In a DTD, we can specify the permitted content for each element, using regular expressions For a person element, the regular expression is name, title?, tel*, +

33 What’s in a person Element?
This means name = there must be a name element title? = there is an optional title element (i.e., 0 or 1 title elements) name, title? = the name element is followed by an optional title element tel* = there are 0 or more tel elements + = there are 1 or more elements

34 DTD For the Address Book2
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE addressbook [ <!ELEMENT addressbook (person*)> <!ELEMENT person (name, title?, tel*, +)> <!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT (#PCDATA)> <!ATTLIST person ssn CDATA REQUIRED> ]> PCDATA means parsed character data Regular expressions

35 Attributes in a DTD XML elements can have attributes.
General Syntax for DTD: <!ATTLIST element-name attribute-name1 type1 default-value1 …. attribute-namen typen default-valuen> Example: <!ATTLIST person ssn CDATA REQUIRED> CDATA means Character data Default value could be REQUIRED or IMPLIED (meaning optional)

36 Example: DTD A BARS object has zero or more BAR’s nested within. <!DOCTYPE BARS [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> A BAR has one NAME and one or more BEER subobjects. NAME and PRICE are text. A BEER has a NAME and a PRICE.

37 Use of DTD’s Set standalone = “no”. Either:
Include the DTD as a preamble of the XML document, or Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD can be found.

38 Use of DTD’s The DTD The document
<?xml version = “1.0” standalone = “no” ?> <!DOCTYPE BARS [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> <BARS> <BAR><NAME>Joe’s Bar</NAME> <BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER> <BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER> </BAR> <BAR> … </BARS> The DTD The document

39 Use of DTD’s Assume the BARS DTD is in file bar.dtd.
<?xml version = “1.0” standalone = “no” ?> <!DOCTYPE BARS SYSTEM “bar.dtd”> <BARS> <BAR><NAME>Joe’s Bar</NAME> <BEER><NAME>Bud</NAME> <PRICE>2.50</PRICE></BEER> <BEER><NAME>Miller</NAME> <PRICE>3.00</PRICE></BEER> </BAR> <BAR> … </BARS> Get the DTD from the file bar.dtd

40 Valid Documents A document with a DTD is valid if it conforms to the DTD, i.e., the document conforms to the regular-expression grammar, types of attributes are correct, and constraints on references are satisfied

41 DTDs Problems DTDs are rather weak specifications by DB & programming-language standards Some limitations: Only one base type – PCDATA Also no constraints, e.g range of values, frequency of occurrence Not easily parsed (since they are not XML) Not easy to express that element a has exactly the children c, d, e in any order

42 DTDs Problems Difficult to specify unordered sets of subelements
Order is usually irrelevant in databases (unlike in the document-layout environment from which XML evolved) (A | B)* allows specification of an unordered set, but Cannot ensure that each of A and B occurs only once Many other more complex problems.

43 XML Schema DTDs are now being superceded by XML schemas.
They provide the following features XML Syntax So can be parsed, validated with standard XML tools Data types other than #PCDATA There are built in types such as integer, float, boolean, string and many others Greater control over permitted constructs Can specify maximum and minimum occurrences Can use regular expressions to set patterns to be matched Support for modularity and inheritance

44 Schema types There are some basic built-in types such as xs:string, xs:decimal, xs:integer, xs:ID Each element is composed of either simple types or complex types. A complex type is often a sequence of elements The content of the type can be declared as shown in the following example. A type can also be declared, named and referred to. Notice the use of minOccurs and maxOccurs. Their default is 1.

45 Simple Schema Example standard stuff <?xml version="1.0" ?> <xs:schema xmlns:xs= " <xs:element name="people"> <xs:complexType> <xs:sequence> <xs:element name="person" maxOccurs = "unbounded"> details of the person element -pto </xs:element> </xs:sequence> </xs:complexType> </xs:schema> Top-level element Namespace

46 Namespace declaration
So at the start of a document we must specify what namespaces we are using. In the schema example, we are using the XML schema namespace with the xs prefix We declare this namespace in an attribute in the top-level element <xs:schema xmlns:xs= " We then use the xs prefix in all the XML Schema elements e.g. complexType, sequence, element etc

47 Schema Example Continued
Details of the person element <xs:element name="person" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name ="name" type="xs:string"/> <xs:element name = "tel" type="xs:string" /> <xs:element name = " " type="xs:string" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name= "sssNo" type="xs:integer" use="required"/> </xs:complexType> </xs:element> Empty element A person is a complex type which is a sequence of elements and an attribute

48 Restrictions on elements
You can also restrict the data values a range <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> an enumerated list <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> a pattern <xs:pattern value="([a-z])*"/> Means 0 or more lowercase alphabetic chars

49 XSD Built-in Types

50 Declaring your own types
Named types can be used for elements or attributes. Here’s an example which specifies restrictions on the attribute A named type is declared <xs:simpleType name = "ssstype"> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> </xs:restriction> </xs:simpleType> And used as the attribute type <xs:attribute name= "sssNo" type="ssstype" use="required"/>

51 More complex Schemas The previous example shows a simple schema.
It is also possible to make the schema easier to maintain by declaring all the simple elements first and then referring to them in the body of the document By naming the declaration of simple and complex types, which could then be used later in the document, and more than once if necessary

52 Referring to a schema Save your schema in a file with the extension xsd. Linking schema definition with a document is done using a special attribute of the root node of the document: <people xmlns:xsi=" xsi:noNamespaceSchemaLocation=“people.xsd">

53 Validating Validators Many others on the web
Many others on the web

54 XML Schema Example <xs:schema xmlns:xs= <xs:element name=“bank” type=“BankType”/> <xs:element name=“account”> <xs:complexType> <xs:sequence> <xs:element name=“account_number” type=“xs:string”/> <xs:element name=“branch_name” type=“xs:string”/> <xs:element name=“balance” type=“xs:decimal”/> </xs:squence> </xs:complexType> </xs:element> ….. definitions of customer and depositor …. <xs:complexType name=“BankType”> <xs:squence> <xs:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/> </xs:sequence> </xs:complexType> </xs:schema>

55 Application Program Interface
Two standard application program interfaces to XML data (Java, C++, etc.): SAX (Simple API for XML) (3rd party for .NET) Based on parser model, user provides event handlers (call-back functions) for parsing events E.g. start of element, end of element Not suitable for database applications DOM (Document Object Model) XML data is parsed into a tree representation Functions for accessing, traversing and searching the DOM .NET DOM API provides XmlNode class: ParentNode, ChildNodes, NextSibling, FirstChild, Attributes. properties .NET adds a 3rd method: LINQ to XML.


Download ppt "Applied Component-Based Software Engineering XML Basics"

Similar presentations


Ads by Google