Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases.

Similar presentations


Presentation on theme: "1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases."— Presentation transcript:

1

2 1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases

3 2 Spring 2000 Christophides Vassilis Preliminary Issues

4 3 Spring 2000 Christophides Vassilis What is a document? l Content: the components (words, images etc). which make up a document l Structure: the organization and inter-relationship of the components l Presentation: how a document looks and what processes are applied to it

5 4 Spring 2000 Christophides Vassilis Separating these things means... l The content can be re-used u for printing u for querying u for exchanging l The structure can be formally validated l The presentation can be customized for u different media u different audiences l … in short, the information can be uncoupled from its processing

6 5 Spring 2000 Christophides Vassilis Documents vs Databases Document world l plenty of small documents u usually static l implicit structure u section, paragraph, toc, l tagging u human friendly l content u form/layout, annotation l paradigms u “Save as”, WYSIWYG l metadata u author name, date, subject Database world l a few large databases u usually dynamic l explicit structure u types l records u machine friendly l content u data, methods l paradigms u Data Independence, Transaction Management, Query Languages l metadata u schema description

7 6 Spring 2000 Christophides Vassilis DBMS ANSI/SPARC Architecture PHYSICAL SCHEMA LOGICAL SCHEMA INTERNAL LEVEL CONCEPTUAL LEVEL EXTERNAL LEVEL VIEW 1VIEW 2VIEW 3 INTEGRATION

8 7 Spring 2000 Christophides Vassilis What to do with them Documents l editing l spell-checking l counting words l retrieving (IR) l printingDatabase l updating l cleaning l querying l composing/transforming

9 8 Spring 2000 Christophides Vassilis Query Languages Document Retrieval Claude Monet and San Diego Museum of Art Database Querying select p from Artists a, a.artwork p where a.first = “Claude” and a.last = “Monet” and p.located = “San Diego Museum of Art”

10 9 Spring 2000 Christophides Vassilis The Long Road of Document Standards  Rick Jelliffe 1999

11 10 Spring 2000 Christophides Vassilis MONET, Claude Haystacks at Chailly at Sunrise 1865 Oil on canvas 30 x 60 cm (11 7/8 x 23 3/4 in.) San Diego Museum of Art What’s Wrong with HTML l If written properly, normal HTML may reflect document presentation, but it cannot adequately represent the semantics & structure of data Artist Name Date Artifact Title Dimensions Material Museum Image Reference

12 11 Spring 2000 Christophides Vassilis HTML Document Presentation vs. …

13 12 Spring 2000 Christophides Vassilis … XML Data Representation l A possible XML markup of the same information will retain the structure (and the semantics) of the various data objects Claude Monet Haystacks at Chailly at Sunrise 1865 Oil on canvas 30 60 11 7/8 23 3/4 San Diego Museum of Art

14 13 Spring 2000 Christophides Vassilis XML can be Published as normal Web Data

15 14 Spring 2000 Christophides Vassilis What is XML? l Markup Meta-Language for domain or application specific structured documentation u Mathematical, chemical, musical, publishing, etc. l Developed by the SGML Editorial Board formed under the auspices of the World Wide Web Consortium (W3C) u Founded in 1996 by Jon Bosac (Sun) and various Web/SGML vendors: Textuality, Netscape, Microsoft, INSO, HP, Highland, NCSA, ArbortText, GRIF, SoftQuand l Subset of SGML optimized for use in the Inter/Intranet u SGML is proving difficult to implement for Web/Intranet applications u SGML has been hard to cost-justify to management l Opens the way for a new generation of Web applications u Improve precision during searching and retrieval u Enable multiple usage of the same data u Facilitate distributed processing with more versatile ways to manipulate data

16 15 Spring 2000 Christophides Vassilis Why XML? l XML provides key features for a new generation of Web applications: u Structuring: unlike HTML it preserves the structure of the data u Extensibility: not a fixed format like HTML but user-oriented tagging u Validation: provides the means to consuming applications to check data for structural validity on importation u Presentation Late Binding: describes data, not visual presentation u Human Readable: similar to HTML u Interchange: good for transmission of data from server to browser, and from application to application, or machine to machine u Open standard: non proprietary format l XML becomes an integral part of the Web infrastructure u Microsoft Explorer (V5.0) already offers XML browsing u Ongoing XML implementation by Netscape u Various XML middleware and manipulation tools

17 16 Spring 2000 Christophides Vassilis The XML Language Family l XML (Extensible Markup Language) u A subset of SGML (ISO 8879) designed for easy implementation l XLink (Extensible Linking Language) u A set of standard hypertext mechanisms based on HyTime (ISO/IEC 10744) and the Text Encoding Initiative (TEI) l XSL (Extensible Stylesheet Language) u A standard stylesheet language for structured information derived from DSSSL (ISO/IEC 10179) and key CSS concepts   A

18 17 Spring 2000 Christophides Vassilis Interrelationships Among the Various W3C Efforts

19 18 Spring 2000 Christophides Vassilis XML Syntax and Semantics

20 19 Spring 2000 Christophides Vassilis An Example of XML Markup Claude Monet Haystacks at Chailly at Sunrise 1865 Oil on canvas 30 60 11 7/8 23 3/4 San Diego Museum of Art Element NameElement Content Empty Element Attribute Value Attribute Name

21 20 Spring 2000 Christophides Vassilis The Logical Tree Structure of XML ARTIST NAMEARTWORK FIRSTLASTARTIFACT TITLEDATE MATERIAL DIM IMAGE DIM LOCATION...hayricks.jpg ClaudeMONET Haystacks1865 Oil on canvas San Diego Mus. 3060 11 7/8 23 3/4 HWHW

22 21 Spring 2000 Christophides Vassilis XML Document Type Definitions <!DOCTYPE artist [...... ]>

23 22 Spring 2000 Christophides Vassilis XML Core Markup Features l Elements: Components of the tree logical structure defined by a DTD u identified in a document instance by descriptive markup, usually a start-tag and end-tag l Attributes: Characteristics associated to the elements (other than their content and type) u may be applied to one specific instance of a given element l Entities: Named fragments of information that can be stored separately from a document (or a DTD) u can be included in the document (or the DTD) one or more times by reference to their names

24 23 Spring 2000 Christophides Vassilis Definition of Element’s Content l Mixed models must be optional repeatable OR-groups, with #PCDATA first

25 24 Spring 2000 Christophides Vassilis What XML can express? Sequence «, » l Choice « | » l Option ( 1 or 0 ) « ? » <!ELEMENT artist (…, nationality?, …) l Repetition (1 or more ) « + » l Option and Repetition ( 0, 1 or more ) « * »

26 25 Spring 2000 Christophides Vassilis XML Content Models and Regular Expressions l Each element content model is defined by a regular expression  Example: name, addr*, email Each regular expression determines a corresponding finite state automaton l This suggests a simple parsing program l Content Models should be defined by unambiguous regular expressions name addr email

27 26 Spring 2000 Christophides Vassilis XML Regular Expressions: Another Example l Adding in the optional greet further complicates things u Example: name,address*,(tel | fax)*,email* name address tel fax email

28 27 Spring 2000 Christophides Vassilis Definition of Attribut’s Content l More types (e.g., DATE) may soon be part of the standard

29 28 Spring 2000 Christophides Vassilis Attribute Default Values Value ‘vi ’ u a given value from an enumeration of values #FIXED value u the value is the only possible instance for the attribute l #REQUIRED u the value must be supplied #IMPLIED u the value can be optionally supplied

30 29 Spring 2000 Christophides Vassilis XML Entities l Entities allow the definition of short strings to stand for more complex information, which can reside inside or outside the document or its DTD l Used for substitutions of data or markup: u DTD level e.g., markup declaration (Parameter entity) u Document level e.g., data and markup instances (General entity) l Used for references to external data or markup sources: u the content of the entity can be found using an XML system-specific storage location (Specific entity) u the content of the entity can be found by mapping a public identifier to a system-specific storage location (Public entity)

31 30 Spring 2000 Christophides Vassilis XML Parameter Entities l Parameter entities are used for extensible declarations (e.g., macros) of complex content models or attributes in a DTD l Parameter entities can be nested but we must avoid infinite loops l Replacement entity text can be found outside the DTD

32 31 Spring 2000 Christophides Vassilis XML General Entities l General entities are used for substitution of textual or not textual objects (e.g., constants) occurring many times or are volatiles in the document instances l Replacement text of general entities can contain tags, character references or other entities Extensible Markup Language &www; ”> but also we must avoid infinite loops l The content of a general entity can be found outside the DTD and it may have a particular format

33 32 Spring 2000 Christophides Vassilis XML Specific Entities l Specific entities can be viewed as “abstract storage objects” (e.g., data stream) that are mapped onto real ones by using a system-specific storage location l Sub-documents encoded in XML with a different DTD l Textual data encoded with a particular format l Non-SGML data

34 33 Spring 2000 Christophides Vassilis The Main XML Components

35 34 Spring 2000 Christophides Vassilis Well-Formed XML l A textual object is said to be a well-formed XML document if it meets all the well-formedness constraints (WFCs) of the XML syntax: u tags (etc.) are syntactically correct u every tag has an end-tag u tags are properly nested u there exists a root l By definition if a document is not well-formed, it is not XML u This means that there is no an XML document which is not well- formed, and XML processors are not required to do anything with such documents

36 35 Spring 2000 Christophides Vassilis Valid XML l A well-formed document is valid only if it contains a proper DTD and if the document obeys the constraints of that DTD and therefore the XML Validity Constraints (VCs) u only declared tags are used u all tag occurrences conform to specified content models l Examples: u The following XML Document is well-formed but not valid MONET, Claude u The following XML Document is not even well-formed Claude MONET

37 36 Spring 2000 Christophides Vassilis When do we need a DTD? l At document preparation time (definitely) u validation, checking, consistency l At document processing time (probably) u simplifies generic/specific processing u may clarify intended semantics l At document delivery time (possibly) u strictly unnecessary for well-formed docs u but reduces processing effort Creation Composition Validation Usage

38 37 Spring 2000 Christophides Vassilis Where is the behaviour of XML defined? l In a stylesheet u using XSL or CSS l Possibly embedded in a program applet, or script, or JAVA bean u defined for that particular DTD, set of tags, or tag l By reference to pre-existing mutual agreement amongst user communities u aka “namespaces” l By reference to a Document Object Model XMLXSL

39 38 Spring 2000 Christophides Vassilis type-checking constants macros void* header file #ifdef standard library namespace validation entity reference entity parameter ANY IDREF DTD conditional section key entities namespace Comparing XML and Programming Languages But no type inference, polymorphism, modules, etc.

40 39 Spring 2000 Christophides Vassilis XML DTDs vs. Database Schemas l By database standards, DTDs are rather weak specifications u Only one base type i.e., PCDATA u Only two element constructors i.e., sequence and alternative u No useful “abstractions” e.g., bulk types, inheritance u IDREFs are untyped  You point to something, but you don’t know what!  No integrity constraints e.g., child is inverse of parent u No methods u Tag definitions are global l Recent XML extensions impose something like a schema or type on an XML data (XML Schema)

41 40 Spring 2000 Christophides Vassilis XML vs. ODMG ODL: Example class Movie ( extent Movies, key title ) { attribute string title; attribute string director; relationship set casts inverse Actor::acted_In; attribute int budget; } ; class Actor ( extent Actors, key name ) { attribute string name; relationship set acted_In inverse Movie::casts; attribute int age; attribute set directed; } ;

42 41 Spring 2000 Christophides Vassilis XML vs. ODMG ODL: Example Waking Ned Divine Kirk Jones III 100,000 Dragonheart Rob Cohen 110,000 Moondance Dagmar Hirtz 90,000 David Kelly Sean Connery 68 Ian Bannen :

43 42 Spring 2000 Christophides Vassilis XML vs. ODMG ODL: Example <!DOCTYPE db [ ]>

44 43 Spring 2000 Christophides Vassilis Mapping Between XML and Objects

45 44 Spring 2000 Christophides Vassilis XML vs Relational DBMS projects: title budget managedBy employees: name ssn age <!DOCTYPE db [... ]> <!DOCTYPE db [... ]>

46 45 Spring 2000 Christophides Vassilis Recursive DTDs <DOCTYPE genealogy [ <!ELEMENT person ( name, dateOfBirth, person, -- mother person -- father )>... ]> What is the problem with this?

47 46 Spring 2000 Christophides Vassilis Recursive DTDs cont’d. <DOCTYPE genealogy [ <!ELEMENT person ( name, dateOfBirth, person?, -- mother person? )> -- father... ]> l What is now the problem with this?

48 47 Spring 2000 Christophides Vassilis Some Things are Hard to Specify l Each employee element is to contain name, age and ssn elements in some order <!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) | …)> l Suppose there were many more fields !

49 48 Spring 2000 Christophides Vassilis Specifying ID and IDREF Attributes <!DOCTYPE family [ <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED> ]>

50 49 Spring 2000 Christophides Vassilis Some Conforming XML data Jane Doe John Doe Mary Doe Jack Doe

51 50 Spring 2000 Christophides Vassilis An Alternative XML DTD Specification <!DOCTYPE family [ ]>

52 51 Spring 2000 Christophides Vassilis The Revised XML Data Jane Doe John Doe...

53 52 Spring 2000 Christophides Vassilis Mapping between XML and Tables

54 53 Spring 2000 Christophides Vassilis Bluring the Frontiers between Data & Documents

55 54 Spring 2000 Christophides Vassilis Towards XML-enabled DBMS l Xml-enabled database system u Store XML data/documents into the database server u Query and search valid and well- formed XML u Generate XML data from the database server u Add XML capabilities in supporting database facilities l XML has the potential to impact four important markets u Web integration u Web publishing u Application integration u Electronic commerce DBMS Store XML Generate XML Integrate with other facilities

56 55 Spring 2000 Christophides Vassilis Storing XML Data l Enhance XML storage facilities in the database: u Utilities to load XML data into the database u Provide more efficient database storage (componentized storage, compression, indexing,…) u XML export tools from the server u Allow server-to-server replication of XML data Database HTML XML Database

57 56 Spring 2000 Christophides Vassilis Querying and Searching XML Data l Fine-grained access to XML documents l Search XML data efficiently u Special SQL queries over valid + well-formed XML u Content-based indexing (e.g. Text indexes) for searching XML data efficiently u Support for XML query languages (e.g. XQL) on XML data Database HTML XML Web

58 57 Spring 2000 Christophides Vassilis Generating and Manipulating XML l Generate XML from the database server u Map ODMG, SQL92, SQL3 and PL/SQL datatypes to XML u Provide mappings between java, SQL and XML types l Script XML content from the database u Allow SQL queries to return XML results u Provide embedded XML in stored procedures u Java scripting: support embedded XML in java u Common APIs to access any XML content in databases Database HTML XML Web

59 58 Spring 2000 Christophides Vassilis Database X Database Y XML Sorted XML Total

60 59 Spring 2000 Christophides Vassilis l 1960’s: Data Centric l 1970’s: Process Centric l 1980’s: Object Oriented l 1990’s: Component Based l 2000’s: XML? Epilogue

61 60 Spring 2000 Christophides Vassilis 60’s Data Record Layouts Printer Layouts System Flow Charts Decision Tables Batch Jobs were a Series of small Programs Data was our First Focus

62 61 Spring 2000 Christophides Vassilis 60’s Data 70’s Logic GOTO-Less Programming Structured Programming Top-Down Design Programs Became Very Large Then we Focused on Logic

63 62 Spring 2000 Christophides Vassilis 60’s Data 80’s OO 70’s Logic Common Terms for Analysis and Design Tightly Coupled Code Code Reuse was the Holy Grail, Rarely Achieved Object Oriented Programming Focused on Runtime Behavior

64 63 Spring 2000 Christophides Vassilis 60’s Data 90’s Comp 70’s Logic 80’s OO Serialization Tied to Code Code Reuse IDE-Based Composition Limited Acceptance Component Programming Shifted the Focus to Interfaces

65 64 Spring 2000 Christophides Vassilis 00’s XML 70’s Logic 80’s OO 90’s Comp XML Wrappers for Incompatible Systems Industry-Specific Markup Languages XML for Persistent Data and Composition XML Enables Middleware for Application-Specific Data XML Returns the Focus to Data

66 65 Spring 2000 Christophides Vassilis BIBLIOGRAPHY l Charles F. Goldfarb, Paul Prescod, Paper Michael, Leventhal, et al. “The XML Handbook”. Printice Hall, 1998. l David Megginson. “Structuring XML Documents”. Printice Hall, 1998. l Simon St. Laurent. “XML : Extensible Markup Language”. IDG Books, 1998. l Rick Jelliffe. “The XML and SGML Cookbook : Recipes for Structured Information”. Printice Hall, 1998. l Simon St. Laurent. “Xml : A Primer”. IDG Books, 1998. l Steven Holzner. “XML Complete”. McGraw-Hill, 1997. l Richard Light, Tim Bray. “Presenting Xml”. Macmillan Publishing, 1997. l Bryan Pfaffengerger. “Web Publishing With XML in Six Easy Steps”, 1997. l Steven J. DeRose. “The SGML FAQ Book : Understanding the Foundation of HTML and XML”. Kluwer Academic Publishers, 1997. l Sean McGrath. “XML by Example: Building E-Commerce Applications”. Printice Hall, 1998 l Charles F. Goldfarb, Steve Pepper, Chet Ensign. “SGML Buyer’s Guide : A Unique Guide to Determining Your Requirements and Choosing the Right SGML and XML Products and Services”. Printice Hall, 1998.


Download ppt "1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases."

Similar presentations


Ads by Google