Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Engineering for the World Wide Web

Similar presentations


Presentation on theme: "Software Engineering for the World Wide Web"— Presentation transcript:

1 Software Engineering for the World Wide Web
XML Overview Jeff Offutt SWE 642 Software Engineering for the World Wide Web sources: Professional Java Server Programming, Patzer, Wrox, 2nd edition, Ch 5, 6 Programming the World Wide Web, Sebasta, Ch 8, Addison-Wesley Michelle Lee and Ye Wu

2 Topics Motivation How does XML work ? Syntax of XML documents
XML and HTML 9/17/2018 © Offutt

3 XML eXtensible Markup Language
Markup languages insert “tags” into text files to describe presentation or other information SGML : Standard Generalized Markup Language HTML : Visual presentation Latex : Document formatting XML : Data description W3C standard: 9/17/2018 © Offutt

4 Why XML? Passing data from one software component to another has always been difficult The two components must agree on format, types, and organization Web software applications have unique requirements for data passing: Very loose coupling Dynamic integration 9/17/2018 © Offutt

5 Passing Data – 1978 Program P2 needs to use data produced by program P1 Data saved to a file as records (COBOL, Fortran, …) The file format often not documented Source for P1 may not be available Data saved in binary mode – not readable by hand P1 P2 File Format of file deduced from executing P1 and from trial and error MSU Computing Services, 1979 : Two weeks of trial and error executions of P1 to understand format of file 9/17/2018 © Offutt

6 Passing Data – 1985 Program P2 needs to use data produced by program P1 Data saved to a file as records (C, Ada, …) The file format often not documented Data saved as plain text P1 P2 File WM Both P1 and P2 access the file through a “wrapper module” Module needs to repeatedly updated Module written by development team Data hard to validate Mothra, 1985 : ~12 data files shared among 15 to 20 separate programs 9/17/2018 © Offutt

7 Wrapper Method Problems
Slow – everything is a file in plain text Sharing – Developers of P1 and P2 must agree to share source of WM Maintenance – Who has control of WM? Solution – data sharing that is : Independent of type Self documenting Easy to understand format Especially important for web applications – XML 9/17/2018 © Offutt

8 Passing Data – 21st Century
Data is passed directly between components XML allows for self-documenting data P1, P2 and P3 can see the format, contents, and structure of the data Free parsers are available to put XML messages into a standard format Information about type and format is readily available Schema P1 P2 Parser XML File P3 9/17/2018 © Offutt

9 Web Services A Web Service is a program that offers software services over the Internet to other programs Internet-based Uses SOAP and XML Peer-to-peer communication Web service components can integrate dynamically Finding other services during execution Web services transmit data that are formatted in XML 9/17/2018 © Offutt

10 Topics Motivation How does XML work ? Syntax of XML documents
XML and HTML 9/17/2018 © Offutt

11 Introductory Example Programmers can create their own tags
Tags have been designed for mathematics, formal specifications, resumes, recipes, addresses, … Pizza Markup Language (PML): <pizza> <topping extracheese="yes">Pepperoni</topping> <price> </price> <size> large </size> </pizza> 9/17/2018 © Offutt

12 Markup Language – History
1969 : Charles Goldfarb, Ed Mosher, and Ray Lorie invented GML (Generalized Markup Language) 1974 : Goldfarb proved the concept of a validating parser for GML : Goldfarb worked on the team that made the transition from GML to ISO 8879, called SGML 9/17/2018 © Offutt

13 Markup Language – History
1989 : Tim Berners-Lee and Anders Berglund developed a markup language for hyperlinked text documents: HTML HTML does not quite conform to SGML 1993 : A student at University of Illinois, Marc Andreesen, created a graphical Web browser called Mosaic to render HTML Although Mosaic was not the first web browser, the NCSA ported it to the three most popular computing platforms and gave it away for free. 9/17/2018 © Offutt

14 Markup Language – History
1994 : Jim Clark and Marc Andreesen co-founded Netscape Communications and created the Netscape Navigator web browser To solve the problems of interoperability and scalability on the web, the W3C began work on a simplified version of SGML, called Extensible Markup Language (XML) 1998 : The XML 1.0 standard was approved and published by the World Wide Web Consortium (W3C) 1994, Jim Clark (founder of Silicon Graphics) and Marc Andreesen co-founded Netscape Communications (originally Mosaic Communications Corp.) and created the Netscape Navigator web browser. As the web grew in popularity the weaknesses of HTML became apparent in several areas including interactivity, data interchange and precise text formatting. In response to this, vendors began extending HTML in proprietary ways to address these shortcomings. To solve the problems of interoperability and scalability on the web, without further extending HTML, the W3C began work on a simplified version of SGML, which is called the Extensible Markup Language XML. 9/17/2018 © Offutt

15 Markup Languages – Typesetting
Markup languages began in typesetting Documents were marked-up to represent how they would be printed For example, words can be Bold, italicized, or underlined Typesetting only effects the printing of specific phrases or words, and not categories of phrases or words Typesetting only effects the printing of specific phrases or words, and not categories of phrases or words. For example, if a newspaper wanted all its headlines in boldface, the typesetter had to markup every single headline. Markup languages were developed so that the typesetter simply could indicate in one command to make all headlines boldface. 9/17/2018 © Offutt

16 Markup Languages – Semantic Tags
Markup languages can be used to logically organize the contents of a document For example, a document representing a book can contain the following organizational tags : Title Chapter headings Section headings 9/17/2018 © Offutt

17 Markup Languages – Semantic Tags
A markup language can also provide semantic information (meta-data) about the text in a document Examples : First name, Last name, Phone number Semantic tags can improve the accuracy of document queries Documents can be searched using their tag assignments rather than the plain-text contents 9/17/2018 © Offutt

18 Markup Languages – Semantic Tags
Use semantic tags to define the hierarchical structure of the document Author First name Last name Publisher Name Address 9/17/2018 © Offutt

19 Markup Languages – Examples
Typesetting tags <bold> Chapter 1 </bold> <italic> Background </italic> <underline> Important text </underline> Semantic tags <first name> Steffi </first name> <last name> Offutt </last name> <phone number> </phone number> 9/17/2018 © Offutt

20 SGML SGML — Standard Generalized Markup Language
Set up by the ISO in 1986 Super set of all markup languages – includes all the features of every markup language derived from it Allows a document to be annotated with text that describes the semantic meanings of portions of the document The standard of the markup of electronic documents. 9/17/2018 © Offutt

21 SGML Separates the structure of the document from the content
The structure denotes the purpose of the document’s data Use Document Type Definitions (DTDs) to define the syntax of the annotations used in a document SGML captures meta-data for a document by marking up the content Data that describes other data is called meta-data. 9/17/2018 © Offutt

22 Characteristics of XML
XML is extensible Tags have been designed for mathematics, formal specifications, resumes, recipes, addresses, … XML has a strict structure XML is validating Grammars (DTD & Schema) define XML languages Documents can be checked against the grammar Required fields, etc Allows programs to assume the data is formatted correctly, reducing the amount of checking the program must do 9/17/2018 © Offutt

23 XML Provides Data Independence
Allows data to be used by any application Requires every document to be in a clear and specific format Fosters information sharing better than other markup languages 9/17/2018 © Offutt

24 XML Simplifies Data Sharing
Plain text Create and edit files with any editor Easy to debug Scalability : suitable for both small and large scaled data Data identification Once different parts of the information have been identified, they can be used in different ways by different applications Data transference Very easy to move between XML and form parameters Very easy to move between XML and databases Plain Text: Since XML is not a binary format, you can create and edit files with anything from a standard text editor to a visual development environment. Easy to debug your programs, and make it useful for storing small amounts of data. At the other end of the spectrum, an xml front end to a database makes it possible to efficiently store large amounts of xml data as well. So XML provides scalability for anything from small configuration files to a company-wide data repository. Data Identification: XML tells you what kind of data you have, not how to display. Because the markup tags identify the information and break up the data into parts, an program can process it, a search program can look for messages sent to a particular people, and an address book can extract the address information from the rest of the message. In short, because the different parts of the information have been identified, they can be used in different ways by different applications. For example, when searching the address book, one can only match a keyword in the last name field. 9/17/2018 © Offutt

25 XML Can Easily Be Displayed
The stylesheet standard, XSL, lets you dictate how to format and display the data Since XML is inherently style free, you can use different stylesheets to produce different output formats When display is important, the stylesheet standard, XSL, lets you dictate how to portray the data. Hierarchical document structures are, in general, faster to access because you can drill down to the part of need, like stepping through a table of contents. 9/17/2018 © Offutt

26 Searching XML Keywords can be compared to the entire contents of a document Search can be limited to a single field or a set of fields 9/17/2018 © Offutt

27 XML Example <message> <to> you@yourAddress.com </to>
<from> </from> <subject> XML Is Really Cool </subject> <text> How many ways is XML cool? Let me count the ways ... </text> </message> 9/17/2018 © Offutt

28 Topics Motivation How does XML work ? Syntax of XML documents
XML and HTML 9/17/2018 © Offutt

29 XML Structure Containment : Tags can be contained in other tags
Tag names should be meaningful All tags must have an end tag Note that HTML does not … meaning HTML is not fully SGML-compliant 9/17/2018 © Offutt

30 XML Can Easily Be Validated
XML messages are described in grammars Two ways to describe an XML language Document Type Definitions (DTD) : Older, easier to read and understand, but somewhat limited Schemas : Grammar plus types and facets Documents can be checked against the grammar Grammar can specify that certain fields are required Allows programs to assume the data is formatted correctly, reducing the amount of checking the program must do 9/17/2018 © Offutt

31 Syntax of XML XML syntax is defined at two levels
General syntax : defines syntax on all XML documents Correct documents said to be “well formed” Specific syntax : defines syntax on a specific group of documents Correct documents said to be “valid” Statements in an XML document XML declaration – which version of XML Data elements – the primary contents of the document Markup declarations – instructions to XML parser Processing instructions – instructions to the program 9/17/2018 © Offutt

32 <?xml version="1.0" encoding="ISO-8859-1" standalone="yes“ ?>
XML Declaration <?xml version="1.0" encoding="ISO " standalone="yes“ ?> <? .. ?> Prolog declaration version Identifies the version of the XML markup language used in the data This attribute is required encoding Identifies the character set used to encode the data "ISO " is "Latin-1" the Western European and English language character set Default is compressed Unicode: UTF-8 standalone Tells whether or not this document references an external entity or an external data type specification If there are no external references, use “yes” 9/17/2018 © Offutt

33 XML Data Element Names Must start with a letter or underscore, and can include digits, hyphens, and periods XML names are case sensitive lastName, lastname, LASTNAME are all different 9/17/2018 © Offutt

34 XML General Syntax Rules Well-formed
Every XML document has a single root element Opening tag must be first line of XML All other elements are nested inside the root element XML tags are surrounded by pointy brackets “< >” Every XML tag must have a closing tag If no content: <empty/> XML elements must be properly nested <B><I> … </B></I> is not well formed XML All attribute values must be enclosed in quotes 9/17/2018 © Offutt

35 Pizza Markup Language (PML)
XML Example Pizza Markup Language (PML) root element data value attribute <pizza> <topping extracheese=“yes”>Pepperoni</topping> <price>13.00</price> <size>large</size> </pizza> 9/17/2018 © Offutt

36 XML Attributes Product XML <products> <product>
<name>Monitor</name> <!-- Price can be USD, Euro, or Yuan --> <price currency=“USD”>200</price> </product> </products> 9/17/2018 © Offutt

37 Attributes vs. Nested Tags
In PML, “extraCheese” could have been defined as attribute or a nested tag Images can only be attributes It is easier to add new tags than attributes Attributes cannot define structure <… name=“Yao Ming”> <name>Yao Ming</name> <name> <familyName>Ming</familyName> <givenName>Yao</givenName> </name> 9/17/2018 © Offutt

38 Attributes vs. Nested Tags (2)
Attributes are necessary when : Identifying numbers or names of elements Values are selected from a finite set Attributes should be used when : No substructure Attribute describes information about the element 9/17/2018 © Offutt

39 XML Entity References (Variables)
Entities are usually used to embed special characters into XML messages Document Entity : The file that represents the document Other entities have names Entity names start with letters, dash, colon Can also contain digits, periods, underscores References to entities surround name with &; &entityName; Some built-in XML entities: < > & " &apos; Use entities to avoid malformed XML <pred> X < Y </pred> … <pred> X < Y </pred> 9/17/2018 © Offutt

40 Topics Motivation How does XML work ? Syntax of XML documents
XML and HTML 9/17/2018 © Offutt

41 XML vs. HTML Unlike HTML, XML tags tell you what the data means, rather than how to display it XML elements must be strictly nested, XML can represent data in any level of complexity Both XML and HTML allow empty tags; in XML an empty tag must be followed by a forward slash: <emptyTag/> 9/17/2018 © Offutt

42 XML vs. HTML XML attribute values must be surrounded by single or double quotes HTML does not require quotes for single values XML tags are case sensitive HTML tags are not This is really confusing at first ! 9/17/2018 © Offutt

43 XML Summary XML gives software engineers an incredibly flexible, simple, and powerful way to represent data Works with all sorts of data Maps naturally to tables, spreadsheets and databases Grammatical rules can be defined (next slides …) Human readable Performance costs Plain text files use more space on disk Takes time to read, write, and reformat XML to and from internal representations This cost is seldom important and almost never within web applications 9/17/2018 © Offutt


Download ppt "Software Engineering for the World Wide Web"

Similar presentations


Ads by Google