Fall 2001 CSE3301 XML and Beyond: Parts I and II

Slides:



Advertisements
Similar presentations
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
Advertisements

XML 6.3 DTD 6. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:  Elements.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
1 XML Data Management 2. XML Syntax Werner Nutt. 2 HTML Designed for publishing hypertext on the Web Describes how a browser should arrange text, images,
XML Examples CSC 436 – Fall 2005 Slides to be used in conjunction with class notes.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Document Type Definitions
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Document Type Descriptors (DTDs) Imposing Structure on XML Documents.
1 XML Major Sources: ppt CIS550 Course Notes, U. Penn, source for many slides Yaron Kanza’s slides, source.
Full declaration When an element is declared to have element content, the children element types must also be declared Example: to which the following.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
XML eXtensible Markup Language.
1 XML and Databases. 2 Outline (ambitious) Background: documents (SGML/HTML) and databases (structured and semistructured data) XML Basics and Document.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
1 XML – Extensible Markup Language DBI – Representation and Management of Data on the Internet.
1 XML Major Sources: ppt CIS550 Course Notes, U. Penn, source for many slides Yaron Kanza’s slides, source.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
Document Type Definitions. XML and DTDs A DTD (Document Type Definition) describes the structure of one or more XML documents. Specifically, a DTD describes:
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Chapter 10: XML.
CIS550 Handout 6 Fall CIS 550 Fall 2001 Handout 6 XML.
CSE 6331 © Leonidas Fegaras XML1 Introduction to XML Leonidas Fegaras.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
XML eXtensible Markup Language Part 2.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
CIS 451: XML DTDs Dr. Ralph D. Westfall February, 2009.
IS432 Semi-Structured Data Lecture 2: DTD Dr. Gamal Al-Shorbagy.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
XML – A Quick Introduction Kerry Raymond (stolen from others)
1 IST 210 Organization of Data Database and the Web.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
+ 1 XML eXtensible Markup Language. + 2 XML Lecture Adapted from the work of Dr. Praveen Madiraju of Marquette University.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
XML eXtensible Markup Language.
CITA 330 Section 2 DTD. Defining XML Dialects “Well-formedness” is the minimal requirement for an XML document; all XML parsers can check it Any useful.
Extensible Markup Language (XML) Pat Morin COMP 2405.
CS 480: Database Systems Lecture 26 March 18, 2013.
Managing XML and Semistructured Data
Lecture 9: XML Monday, October 17, 2005.
DTD (Document Type Definition)
eXtensible Markup Language
Presentation transcript:

Fall 2001 CSE3301 XML and Beyond: Parts I and II

Fall 2001 CSE3302 Outline Background: documents (SGML/HTML) and databases (structured and semistructured data) XML Basics and Document Type Descriptors XML API’s: Document Object Model (DOM), SAX (not covered in this course) XML query languages: XML-QL, XSL, Quilt.

Fall 2001 CSE3303 Part I: Background What’s the difference between the world of documents and information retrieval and databases and query interfaces?

Fall 2001 CSE3304 Documents vs Databases Document world > plenty of small documents > usually static > implicit structure section, paragraph, toc, > tagging > human friendly > content form/layout, annotation > Paradigms “Save as” > meta-data author name, date, subject Database world > a few large databases > usually dynamic > explicit structure (schema) > records > machine friendly > content schema, data, methods > Paradigms Atomicity, Concurrency, Isolation, Durability > meta-data schema description

Fall 2001 CSE3305 What to do with them Documents editing printing spell-checking counting words retrieving (IR) searching Database updating cleaning querying composing/transforming

Fall 2001 CSE3306 HTML Lingua franca for publishing hypertext on the World Wide Web Designed to describe how a Web browser should arrange text, images and push-buttons on a page. Easy to learn, but does not convey structure. Fixed tag set. Welcome to the XML course Introduction Opening tag Text (PCDATA) Closing tag “Bachelor” tag Attribute nameAttribute value

Fall 2001 CSE3307 Thin red line The line between the document world and the database world is not clear. In some cases, both approaches are legitimate. An interesting middle ground is data formats -- of which XML is an example Examples –Personal address book

Fall 2001 CSE3308 Personal address book over 20 years 1977 N Achison, Malcolm F Dr. M.P. Achison A Dept. of Computer Science A University of Edinburgh A Kings Buildings A Edinburgh E12 8QQ A Scotland T ext (work) T (home) N Albani, Paolo F Prof. Paolo Albani A Dip. Informatica e Sistemistica A Universita di Roma La Sapienza N Achison, Malcolm F Dr. M.P. Achison A Dept. of Computer Science.... T (home) C 1990 N Achison, Malcolm F Prof. M.P. Achison A Dept. of Computing Science A University of Glasgow A Lilybank Gardens A Glasgow G12 8QQ A Scotland T ext T (private) T (home) X C N Achison, Malcolm F Prof. M.P. Achison A 34 Inverness Place A Edinburgh, EH3 8UV 1997 N Achison, Malcolm F Prof. M.P. Achison A Department of Computing Science... T (home) X C W ?

Fall 2001 CSE3309 The Structure of XML XML consists of tags and text Tags come in pairs... They must be properly nested good bad (You can’t do in HTML)

Fall 2001 CSE33010 XML text XML has only one “basic” type -- text. It is bounded by tags e.g. The Big Sleep is still text XML text is called PCDATA (for parsed character data). It uses a 16-bit encoding, e.g. \&\#x0152 for the Hebrew letter Mem

Fall 2001 CSE33011 XML structure Nesting tags can be used to express various structures. E.g. A tuple (record) : Malcolm Atchison (215)

Fall 2001 CSE33012 XML structure We can represent a list by using the same tag repeatedly:...

Fall 2001 CSE33013 Terminology The segment of an XML document between an opening and a corresponding closing tag is called an element. Malcolm Atchison (215) element not an element element, a sub-element of

Fall 2001 CSE33014 XML is tree-like person name tel Malcolm Atchison (215) Semistructured data models typically put the labels on the edges

Fall 2001 CSE33015 Mixed Content An element may contain a mixture of sub-elements and PCDATA British Airways World’s favorite airline Data of this form is not typically generated from databases. It is needed for consistency with HTML

Fall 2001 CSE33016 A Complete XML Document Malcolm Atchison (215)

Fall 2001 CSE33017 Representing relational DBs: Two ways projects: title budget managedBy employees: name ssn age

Fall 2001 CSE33018 Project and Employee relations in XML Pattern recognition Joe Joe Sandra Auto guided vehicle Sandra : Projects and employees are intermixed

Fall 2001 CSE33019 Pattern recognition Joe Auto guided vehicles Sandra : Project and Employee relations in XML (cont’d) Joe Sandra : Employees follows projects

Fall 2001 CSE33020 Pattern recognition Joe Auto guided vehicles Sandra : Project and Employee relations in XML (cont’d) Joe Sandra : Or without “separator” tags …

Fall 2001 CSE33021 Attributes An (opening) tag may contain attributes. These are typically used to describe the content of an element cheese fromage branza A food made …

Fall 2001 CSE33022 Attributes (cont’d) Another common use for attributes is to express dimension or type A document that obeys the “nested tags” rule and does not repeat an attribute within a tag is said to be well-formed.

Fall 2001 CSE33023 When to use attributes It’s not always clear when to use attributes F. MacNiel OR F. MacNiel

Fall 2001 CSE33024 Using IDs Jane Doe John Doe Mary Doe Jack Doe

Fall 2001 CSE33025 An object-oriented schema class Movie ( extent Movies, key title ) { attribute string title; attribute string director; relationship set casts inverse Actor::acted_In; attribute int budget; } ; class Actor ( extent Actors, key name ) { attribute string name; relationship set acted_In inverse Movie::casts; attribute int age; attribute set directed; } ;

Fall 2001 CSE33026 An example Waking Ned Divine Kirk Jones III 100,000 Dragonheart Rob Cohen 110,000 Moondance Dagmar Hirtz 90,000 : David Kelly Sean Connery 68 Ian Bannen :

Fall 2001 CSE33027 Part II: Document Type Descriptors Imposing structure on XML documents

Fall 2001 CSE33028 Document Type Descriptors Document Type Descriptors (DTDs) impose structure on an XML document. There is some relationship between a DTD and a schema, but it is not close – there is still a need for additional “typing” systems. The DTD is a syntactic specification.

Fall 2001 CSE33029 Example: An Address Book MacNiel, John Dr. John MacNiel 1234 Huron Street Rome, OH (321) Exactly one name At most one greeting As many address lines as needed (in order) Mixed telephones and faxes As many as needed

Fall 2001 CSE33030 Specifying the structure name to specify a name element greet? to specify an optional (0 or 1) greet elements name,greet? to specify a name followed by an optional greet

Fall 2001 CSE33031 Specifying the structure (cont) addr* to specify 0 or more address lines tel | fax a tel or a fax element (tel | fax)* 0 or more repeats of tel or fax * 0 or more elements

Fall 2001 CSE33032 Specifying the structure (cont) So the whole structure of a person entry is specified by name, greet?, addr*, (tel | fax)*, * This is known as a regular expression. Why is it important?

Fall 2001 CSE33033 Regular Expressions Each regular expression determines a corresponding finite state automaton. Let’s start with a simpler example: name, addr*, This suggests a simple parsing program name addr

Fall 2001 CSE33034 Another example name,address*,(tel | fax)*, * name address tel fax Adding in the optional greet further complicates things

Fall 2001 CSE33035 A DTD for the address book <!DOCTYPE addressbook [ <!ELEMENT person (name, greet?, address*, (fax | tel)*, *)> ]>

Fall 2001 CSE33036 Two DTDs for the relational DB <!DOCTYPE db [... ]> <!DOCTYPE db [... ]>

Fall 2001 CSE33037 Recursive DTDs <DOCTYPE genealogy [ <!ELEMENT person ( name, dateOfBirth, person, -- mother person )> -- father... ]> What is the problem with this?

Fall 2001 CSE33038 Recursive DTDs cont’d. <DOCTYPE genealogy [ <!ELEMENT person ( name, dateOfBirth, person?, -- mother person? )> -- father... ]> What is now the problem with this?

Fall 2001 CSE33039 Some things are hard to specify Each employee element is to contain name, age and ssn elements in some order. <!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) |... )> Suppose there were many more fields !

Fall 2001 CSE33040 Summary of XML regular expressions AThe tag A occurs e1,e2The expression e1 followed by e2 e*0 or more occurrences of e e?Optional -- 0 or 1 occurrences e+1 or more occurrences e1 | e2either e1 or e2 (e)grouping

Fall 2001 CSE33041 It’s easy to get confused… Cited from oagis_segments.dtd (one of the files in the Novell Developer Kit Ben Franklin Q. Which NAME is it?

Fall 2001 CSE33042 Specifying attributes in the DTD <!ATTLIST height dimension CDATA #REQUIRED accuracy CDATA #IMPLIED > The dimension attribute is required; the accuracy attribute is optional. CDATA is the “type” of the attribute -- it means string.

Fall 2001 CSE33043 Specifying ID and IDREF attributes <!DOCTYPE family [ <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED> ]>

Fall 2001 CSE33044 Some conforming data Jane Doe John Doe Mary Doe Jack Doe

Fall 2001 CSE33045 Consistency of ID and IDREF attribute values If an attribute is declared as ID –the associated values must all be distinct (no confusion) If an attribute is declared as IDREF –the associated value must exist as the value of some ID attribute (no dangling “pointers”) Similarly for all the values of an IDREFS attribute ID and IDREF attributes are not typed

Fall 2001 CSE33046 An alternative specification <!DOCTYPE family [ ]>

Fall 2001 CSE33047 The revised data Jane Doe John Doe...

Fall 2001 CSE33048 A useful abbreviation When an element has empty content we can use for For example: Jane Doe...

Fall 2001 CSE33049 Back to the object-oriented schema class Movie ( extent Movies, key title ) { attribute string title; attribute string director; relationship set casts inverse Actor::acted_In; attribute int budget; } ; class Actor ( extent Actors, key name ) { attribute string name; relationship set acted_In inverse Movie::casts; attribute int age; attribute set directed; } ;

Fall 2001 CSE33050 Schema.dtd <!DOCTYPE db [

Fall 2001 CSE33051 Schema.dtd (cont’d) ]>

Fall 2001 CSE33052 More on ODL and DTD Earlier last year (May 2000), Object Data Management Group (ODMG) suggested OIFML, a XML document type of Object Interchange Format.

Fall 2001 CSE33053 Constraints on ID s and IDREF s ID stands for identifier. No two ID attributes with the same name may have the same value (of type CDATA ) IDREF stands for identifier reference. Every value associated with an IDREF attribute must exist as an ID attribute value IDREFS specifies several (0 or more) identifiers

Fall 2001 CSE33054 Connecting the document with its DTD In line: … ]>... Another file : A URL: <!DOCTYPE db SYSTEM "

Fall 2001 CSE33055 Well-formed and Valid Documents Well-formed applies to any document (with or without a DTD): proper nesting of tags and unique attributes Valid specifies that the document conforms to the DTD: conforms to regular expression grammar, types of attributes correct, and constraints on references satisfied

Fall 2001 CSE33056 DTDs v.s Schemas (or Types) By database (or programming language) standards DTDs are rather weak specifications. –Only one base type -- PCDATA –No useful “abstractions” e.g., sets –IDREFs are untyped. You point to something, but you don’t know what! –No constraints e.g., child is inverse of parent –No methods –Tag definitions are global Some of the XML extensions impose something like a schema or type on an XML document. We’ll see these later

Fall 2001 CSE33057 Lots of possibilities for schemas XML Schema (under W3C’s spotlight) XDR (Microsoft’s BizTalk) SOX (Schema for Object-Oriented XML) Schematron DSD (AT&T Labs and BRICS) and more.

Fall 2001 CSE33058 Some tools XML Authority _authority/index.htm XML Spy

Fall 2001 CSE33059 Summary XML is a new data format. Its main virtues are widespread acceptance and the (important) ability to handle semistructured data (data without schema). DTDs provide some useful syntactic constraints on documents. As schemas they are weak. Next slides: XML programming, XML querying.