Download presentation
Presentation is loading. Please wait.
Published byLenard Rodgers Modified over 8 years ago
2
1 Spring 2000 Christophides Vassilis The eXtensible Markup Language: An Introduction to XML Documents & Databases
3
2 Spring 2000 Christophides Vassilis Preliminary Issues
4
3 Spring 2000 Christophides Vassilis What is a document? l Content: the components (words, images etc). which make up a document l Structure: the organization and inter-relationship of the components l Presentation: how a document looks and what processes are applied to it
5
4 Spring 2000 Christophides Vassilis Separating these things means... l The content can be re-used u for printing u for querying u for exchanging l The structure can be formally validated l The presentation can be customized for u different media u different audiences l … in short, the information can be uncoupled from its processing
6
5 Spring 2000 Christophides Vassilis Documents vs Databases Document world l plenty of small documents u usually static l implicit structure u section, paragraph, toc, l tagging u human friendly l content u form/layout, annotation l paradigms u “Save as”, WYSIWYG l metadata u author name, date, subject Database world l a few large databases u usually dynamic l explicit structure u types l records u machine friendly l content u data, methods l paradigms u Data Independence, Transaction Management, Query Languages l metadata u schema description
7
6 Spring 2000 Christophides Vassilis DBMS ANSI/SPARC Architecture PHYSICAL SCHEMA LOGICAL SCHEMA INTERNAL LEVEL CONCEPTUAL LEVEL EXTERNAL LEVEL VIEW 1VIEW 2VIEW 3 INTEGRATION
8
7 Spring 2000 Christophides Vassilis What to do with them Documents l editing l spell-checking l counting words l retrieving (IR) l printingDatabase l updating l cleaning l querying l composing/transforming
9
8 Spring 2000 Christophides Vassilis Query Languages Document Retrieval Claude Monet and San Diego Museum of Art Database Querying select p from Artists a, a.artwork p where a.first = “Claude” and a.last = “Monet” and p.located = “San Diego Museum of Art”
10
9 Spring 2000 Christophides Vassilis The Long Road of Document Standards Rick Jelliffe 1999
11
10 Spring 2000 Christophides Vassilis MONET, Claude Haystacks at Chailly at Sunrise 1865 Oil on canvas 30 x 60 cm (11 7/8 x 23 3/4 in.) San Diego Museum of Art What’s Wrong with HTML l If written properly, normal HTML may reflect document presentation, but it cannot adequately represent the semantics & structure of data Artist Name Date Artifact Title Dimensions Material Museum Image Reference
12
11 Spring 2000 Christophides Vassilis HTML Document Presentation vs. …
13
12 Spring 2000 Christophides Vassilis … XML Data Representation l A possible XML markup of the same information will retain the structure (and the semantics) of the various data objects Claude Monet Haystacks at Chailly at Sunrise 1865 Oil on canvas 30 60 11 7/8 23 3/4 San Diego Museum of Art
14
13 Spring 2000 Christophides Vassilis XML can be Published as normal Web Data
15
14 Spring 2000 Christophides Vassilis What is XML? l Markup Meta-Language for domain or application specific structured documentation u Mathematical, chemical, musical, publishing, etc. l Developed by the SGML Editorial Board formed under the auspices of the World Wide Web Consortium (W3C) u Founded in 1996 by Jon Bosac (Sun) and various Web/SGML vendors: Textuality, Netscape, Microsoft, INSO, HP, Highland, NCSA, ArbortText, GRIF, SoftQuand l Subset of SGML optimized for use in the Inter/Intranet u SGML is proving difficult to implement for Web/Intranet applications u SGML has been hard to cost-justify to management l Opens the way for a new generation of Web applications u Improve precision during searching and retrieval u Enable multiple usage of the same data u Facilitate distributed processing with more versatile ways to manipulate data
16
15 Spring 2000 Christophides Vassilis Why XML? l XML provides key features for a new generation of Web applications: u Structuring: unlike HTML it preserves the structure of the data u Extensibility: not a fixed format like HTML but user-oriented tagging u Validation: provides the means to consuming applications to check data for structural validity on importation u Presentation Late Binding: describes data, not visual presentation u Human Readable: similar to HTML u Interchange: good for transmission of data from server to browser, and from application to application, or machine to machine u Open standard: non proprietary format l XML becomes an integral part of the Web infrastructure u Microsoft Explorer (V5.0) already offers XML browsing u Ongoing XML implementation by Netscape u Various XML middleware and manipulation tools
17
16 Spring 2000 Christophides Vassilis The XML Language Family l XML (Extensible Markup Language) u A subset of SGML (ISO 8879) designed for easy implementation l XLink (Extensible Linking Language) u A set of standard hypertext mechanisms based on HyTime (ISO/IEC 10744) and the Text Encoding Initiative (TEI) l XSL (Extensible Stylesheet Language) u A standard stylesheet language for structured information derived from DSSSL (ISO/IEC 10179) and key CSS concepts A
18
17 Spring 2000 Christophides Vassilis Interrelationships Among the Various W3C Efforts
19
18 Spring 2000 Christophides Vassilis XML Syntax and Semantics
20
19 Spring 2000 Christophides Vassilis An Example of XML Markup Claude Monet Haystacks at Chailly at Sunrise 1865 Oil on canvas 30 60 11 7/8 23 3/4 San Diego Museum of Art Element NameElement Content Empty Element Attribute Value Attribute Name
21
20 Spring 2000 Christophides Vassilis The Logical Tree Structure of XML ARTIST NAMEARTWORK FIRSTLASTARTIFACT TITLEDATE MATERIAL DIM IMAGE DIM LOCATION...hayricks.jpg ClaudeMONET Haystacks1865 Oil on canvas San Diego Mus. 3060 11 7/8 23 3/4 HWHW
22
21 Spring 2000 Christophides Vassilis XML Document Type Definitions <!DOCTYPE artist [...... ]>
23
22 Spring 2000 Christophides Vassilis XML Core Markup Features l Elements: Components of the tree logical structure defined by a DTD u identified in a document instance by descriptive markup, usually a start-tag and end-tag l Attributes: Characteristics associated to the elements (other than their content and type) u may be applied to one specific instance of a given element l Entities: Named fragments of information that can be stored separately from a document (or a DTD) u can be included in the document (or the DTD) one or more times by reference to their names
24
23 Spring 2000 Christophides Vassilis Definition of Element’s Content l Mixed models must be optional repeatable OR-groups, with #PCDATA first
25
24 Spring 2000 Christophides Vassilis What XML can express? Sequence «, » l Choice « | » l Option ( 1 or 0 ) « ? » <!ELEMENT artist (…, nationality?, …) l Repetition (1 or more ) « + » l Option and Repetition ( 0, 1 or more ) « * »
26
25 Spring 2000 Christophides Vassilis XML Content Models and Regular Expressions l Each element content model is defined by a regular expression Example: name, addr*, email Each regular expression determines a corresponding finite state automaton l This suggests a simple parsing program l Content Models should be defined by unambiguous regular expressions name addr email
27
26 Spring 2000 Christophides Vassilis XML Regular Expressions: Another Example l Adding in the optional greet further complicates things u Example: name,address*,(tel | fax)*,email* name address tel fax email
28
27 Spring 2000 Christophides Vassilis Definition of Attribut’s Content l More types (e.g., DATE) may soon be part of the standard
29
28 Spring 2000 Christophides Vassilis Attribute Default Values Value ‘vi ’ u a given value from an enumeration of values #FIXED value u the value is the only possible instance for the attribute l #REQUIRED u the value must be supplied #IMPLIED u the value can be optionally supplied
30
29 Spring 2000 Christophides Vassilis XML Entities l Entities allow the definition of short strings to stand for more complex information, which can reside inside or outside the document or its DTD l Used for substitutions of data or markup: u DTD level e.g., markup declaration (Parameter entity) u Document level e.g., data and markup instances (General entity) l Used for references to external data or markup sources: u the content of the entity can be found using an XML system-specific storage location (Specific entity) u the content of the entity can be found by mapping a public identifier to a system-specific storage location (Public entity)
31
30 Spring 2000 Christophides Vassilis XML Parameter Entities l Parameter entities are used for extensible declarations (e.g., macros) of complex content models or attributes in a DTD l Parameter entities can be nested but we must avoid infinite loops l Replacement entity text can be found outside the DTD
32
31 Spring 2000 Christophides Vassilis XML General Entities l General entities are used for substitution of textual or not textual objects (e.g., constants) occurring many times or are volatiles in the document instances l Replacement text of general entities can contain tags, character references or other entities Extensible Markup Language &www; ”> but also we must avoid infinite loops l The content of a general entity can be found outside the DTD and it may have a particular format
33
32 Spring 2000 Christophides Vassilis XML Specific Entities l Specific entities can be viewed as “abstract storage objects” (e.g., data stream) that are mapped onto real ones by using a system-specific storage location l Sub-documents encoded in XML with a different DTD l Textual data encoded with a particular format l Non-SGML data
34
33 Spring 2000 Christophides Vassilis The Main XML Components
35
34 Spring 2000 Christophides Vassilis Well-Formed XML l A textual object is said to be a well-formed XML document if it meets all the well-formedness constraints (WFCs) of the XML syntax: u tags (etc.) are syntactically correct u every tag has an end-tag u tags are properly nested u there exists a root l By definition if a document is not well-formed, it is not XML u This means that there is no an XML document which is not well- formed, and XML processors are not required to do anything with such documents
36
35 Spring 2000 Christophides Vassilis Valid XML l A well-formed document is valid only if it contains a proper DTD and if the document obeys the constraints of that DTD and therefore the XML Validity Constraints (VCs) u only declared tags are used u all tag occurrences conform to specified content models l Examples: u The following XML Document is well-formed but not valid MONET, Claude u The following XML Document is not even well-formed Claude MONET
37
36 Spring 2000 Christophides Vassilis When do we need a DTD? l At document preparation time (definitely) u validation, checking, consistency l At document processing time (probably) u simplifies generic/specific processing u may clarify intended semantics l At document delivery time (possibly) u strictly unnecessary for well-formed docs u but reduces processing effort Creation Composition Validation Usage
38
37 Spring 2000 Christophides Vassilis Where is the behaviour of XML defined? l In a stylesheet u using XSL or CSS l Possibly embedded in a program applet, or script, or JAVA bean u defined for that particular DTD, set of tags, or tag l By reference to pre-existing mutual agreement amongst user communities u aka “namespaces” l By reference to a Document Object Model XMLXSL
39
38 Spring 2000 Christophides Vassilis type-checking constants macros void* header file #ifdef standard library namespace validation entity reference entity parameter ANY IDREF DTD conditional section key entities namespace Comparing XML and Programming Languages But no type inference, polymorphism, modules, etc.
40
39 Spring 2000 Christophides Vassilis XML DTDs vs. Database Schemas l By database standards, DTDs are rather weak specifications u Only one base type i.e., PCDATA u Only two element constructors i.e., sequence and alternative u No useful “abstractions” e.g., bulk types, inheritance u IDREFs are untyped You point to something, but you don’t know what! No integrity constraints e.g., child is inverse of parent u No methods u Tag definitions are global l Recent XML extensions impose something like a schema or type on an XML data (XML Schema)
41
40 Spring 2000 Christophides Vassilis XML vs. ODMG ODL: Example class Movie ( extent Movies, key title ) { attribute string title; attribute string director; relationship set casts inverse Actor::acted_In; attribute int budget; } ; class Actor ( extent Actors, key name ) { attribute string name; relationship set acted_In inverse Movie::casts; attribute int age; attribute set directed; } ;
42
41 Spring 2000 Christophides Vassilis XML vs. ODMG ODL: Example Waking Ned Divine Kirk Jones III 100,000 Dragonheart Rob Cohen 110,000 Moondance Dagmar Hirtz 90,000 David Kelly Sean Connery 68 Ian Bannen :
43
42 Spring 2000 Christophides Vassilis XML vs. ODMG ODL: Example <!DOCTYPE db [ ]>
44
43 Spring 2000 Christophides Vassilis Mapping Between XML and Objects
45
44 Spring 2000 Christophides Vassilis XML vs Relational DBMS projects: title budget managedBy employees: name ssn age <!DOCTYPE db [... ]> <!DOCTYPE db [... ]>
46
45 Spring 2000 Christophides Vassilis Recursive DTDs <DOCTYPE genealogy [ <!ELEMENT person ( name, dateOfBirth, person, -- mother person -- father )>... ]> What is the problem with this?
47
46 Spring 2000 Christophides Vassilis Recursive DTDs cont’d. <DOCTYPE genealogy [ <!ELEMENT person ( name, dateOfBirth, person?, -- mother person? )> -- father... ]> l What is now the problem with this?
48
47 Spring 2000 Christophides Vassilis Some Things are Hard to Specify l Each employee element is to contain name, age and ssn elements in some order <!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) | …)> l Suppose there were many more fields !
49
48 Spring 2000 Christophides Vassilis Specifying ID and IDREF Attributes <!DOCTYPE family [ <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED> ]>
50
49 Spring 2000 Christophides Vassilis Some Conforming XML data Jane Doe John Doe Mary Doe Jack Doe
51
50 Spring 2000 Christophides Vassilis An Alternative XML DTD Specification <!DOCTYPE family [ ]>
52
51 Spring 2000 Christophides Vassilis The Revised XML Data Jane Doe John Doe...
53
52 Spring 2000 Christophides Vassilis Mapping between XML and Tables
54
53 Spring 2000 Christophides Vassilis Bluring the Frontiers between Data & Documents
55
54 Spring 2000 Christophides Vassilis Towards XML-enabled DBMS l Xml-enabled database system u Store XML data/documents into the database server u Query and search valid and well- formed XML u Generate XML data from the database server u Add XML capabilities in supporting database facilities l XML has the potential to impact four important markets u Web integration u Web publishing u Application integration u Electronic commerce DBMS Store XML Generate XML Integrate with other facilities
56
55 Spring 2000 Christophides Vassilis Storing XML Data l Enhance XML storage facilities in the database: u Utilities to load XML data into the database u Provide more efficient database storage (componentized storage, compression, indexing,…) u XML export tools from the server u Allow server-to-server replication of XML data Database HTML XML Database
57
56 Spring 2000 Christophides Vassilis Querying and Searching XML Data l Fine-grained access to XML documents l Search XML data efficiently u Special SQL queries over valid + well-formed XML u Content-based indexing (e.g. Text indexes) for searching XML data efficiently u Support for XML query languages (e.g. XQL) on XML data Database HTML XML Web
58
57 Spring 2000 Christophides Vassilis Generating and Manipulating XML l Generate XML from the database server u Map ODMG, SQL92, SQL3 and PL/SQL datatypes to XML u Provide mappings between java, SQL and XML types l Script XML content from the database u Allow SQL queries to return XML results u Provide embedded XML in stored procedures u Java scripting: support embedded XML in java u Common APIs to access any XML content in databases Database HTML XML Web
59
58 Spring 2000 Christophides Vassilis Database X Database Y XML Sorted XML Total
60
59 Spring 2000 Christophides Vassilis l 1960’s: Data Centric l 1970’s: Process Centric l 1980’s: Object Oriented l 1990’s: Component Based l 2000’s: XML? Epilogue
61
60 Spring 2000 Christophides Vassilis 60’s Data Record Layouts Printer Layouts System Flow Charts Decision Tables Batch Jobs were a Series of small Programs Data was our First Focus
62
61 Spring 2000 Christophides Vassilis 60’s Data 70’s Logic GOTO-Less Programming Structured Programming Top-Down Design Programs Became Very Large Then we Focused on Logic
63
62 Spring 2000 Christophides Vassilis 60’s Data 80’s OO 70’s Logic Common Terms for Analysis and Design Tightly Coupled Code Code Reuse was the Holy Grail, Rarely Achieved Object Oriented Programming Focused on Runtime Behavior
64
63 Spring 2000 Christophides Vassilis 60’s Data 90’s Comp 70’s Logic 80’s OO Serialization Tied to Code Code Reuse IDE-Based Composition Limited Acceptance Component Programming Shifted the Focus to Interfaces
65
64 Spring 2000 Christophides Vassilis 00’s XML 70’s Logic 80’s OO 90’s Comp XML Wrappers for Incompatible Systems Industry-Specific Markup Languages XML for Persistent Data and Composition XML Enables Middleware for Application-Specific Data XML Returns the Focus to Data
66
65 Spring 2000 Christophides Vassilis BIBLIOGRAPHY l Charles F. Goldfarb, Paul Prescod, Paper Michael, Leventhal, et al. “The XML Handbook”. Printice Hall, 1998. l David Megginson. “Structuring XML Documents”. Printice Hall, 1998. l Simon St. Laurent. “XML : Extensible Markup Language”. IDG Books, 1998. l Rick Jelliffe. “The XML and SGML Cookbook : Recipes for Structured Information”. Printice Hall, 1998. l Simon St. Laurent. “Xml : A Primer”. IDG Books, 1998. l Steven Holzner. “XML Complete”. McGraw-Hill, 1997. l Richard Light, Tim Bray. “Presenting Xml”. Macmillan Publishing, 1997. l Bryan Pfaffengerger. “Web Publishing With XML in Six Easy Steps”, 1997. l Steven J. DeRose. “The SGML FAQ Book : Understanding the Foundation of HTML and XML”. Kluwer Academic Publishers, 1997. l Sean McGrath. “XML by Example: Building E-Commerce Applications”. Printice Hall, 1998 l Charles F. Goldfarb, Steve Pepper, Chet Ensign. “SGML Buyer’s Guide : A Unique Guide to Determining Your Requirements and Choosing the Right SGML and XML Products and Services”. Printice Hall, 1998.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.