What Are Real DTDs Like Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng.

Slides:



Advertisements
Similar presentations
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
Advertisements

XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
Document Type Definition DTDs CS-328. What is a DTD Defines the structure of an XML document Only the elements defined in a DTD can be used in an XML.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
Review Writing XML  Style  Common errors 1XML Technologies David Raponi.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
More of DTDs Lecture 3. Symbols used in DTD COMMA “, ” enforces sequence.
1 XML: Document Type Definitions 2 Road Map  Introduction to DTDs  What’s a DTD?  Why are they important?  What will we cover?  Our First DTD 
More XML namespaces, DTDs CS 431 – February 16, 2005 Carl Lagoze – Cornell University.
 2002 Prentice Hall, Inc. All rights reserved. ISQA 407 XML/WML Winter 2002 Dr. Sergio Davalos.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
Unit 4 – XML Schema XML - Level I Basic.
4/20/2017.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Tutorial 3: XML Creating a Valid XML Document. 2 Creating a Valid Document You validate documents to make certain necessary elements are never omitted.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Session III Chapter 7 – Entities and Notation in DTD’s
Chapter 10: XML.
Chapter 4: Document Type Definitions. Chapter 4 Objectives Learn to create DTDs Validate an XML document against a DTD Use DTDs to create XML documents.
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
Document Type Definitions Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
MIS 315 Bsharah An Introduction to XML 1MIS Bsharah.
XML - DTD. The building blocks of XML documents Elements, Tags, Attributes, Entities, PCDATA, and CDATA.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
Cornell CS 502 More XML XHTML, namespaces, DTDs CS 502 – Carl Lagoze – Cornell University.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
XMLI Structure of XML Data Structure of XML Data XML Document Schema XML Document Schema XPATH XPATH.
XP 1 DECLARING A DTD A DTD can be used to: –Ensure all required elements are present in the document –Prevent undefined elements from being used –Enforce.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
 2002 Prentice Hall, Inc. All rights reserved. Chapter 6 – Document Type Definition (DTD) Outline 6.1Introduction 6.2Parsers, Well-formed and Valid XML.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
XML - DTD Week 4 Anthony Borquez. What can XML do? provides an application independent way of sharing data. independent groups of people can agree to.
XML Documents Chao-Hsien Chu, Ph.D. School of Information Sciences and Technology The Pennsylvania State University Elements Attributes Comments PI Document.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
More XML namespaces, DTDs CS 431 – Carl Lagoze – Cornell University.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
Tutorial 13 Validating Documents with Schemas
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
XML Document Type Definitions and the Document object model.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
XML DTD. XML Validation XML with correct syntax is "Well Formed" XML. XML validated against a DTD is "Valid" XML.
Document Type Definition (DTD) Eugenia Fernandez IUPUI.
DTD Document Type Definition. Agenda Introduction to DTD DTD Building Blocks DTD Elements DTD Attributes DTD Entities DTD Exercises DTD Q&A.
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Copyrighted material John Tullis 3/18/2016 page 1 04/29/00 XML Part 4 John Tullis DePaul Instructor
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
CS 480: Database Systems Lecture 26 March 18, 2013.
XML Technologies DTD.
Document Type Definition DTDs
The XML Language.
New Perspectives on XML
DTD (Document Type Definition)
Presentation transcript:

What Are Real DTDs Like Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng

Outline Overview Introduction Local properties Global properties

Overview XML is widely used in a variety of areas DTDs with different structures define XML with different usages A survey based on a number of DTDs in our real world

Introduction DTDs are from XML.org DTD repository Three DTD categories : app : Describe objects interchanged between programs/applications data : Describe data stored in database meta : Describe the structure of document markup 60 DTDs - 7 are app, 13 are data, 40 are meta

Introduction (cont.) A DTD can be described as a collection of element declarations of the form e α where e is the element name and α is the content model. The content model α::= ε| pcdata |e |α,α| α | α|α* | α+ | α?

Introduction (cont.) DTD <!ATTLIST from name CDATA #IMPLIED address CDATA #REQUIRED> <!ATTLIST to name CDATA #IMPLIED address CDATA #REQUIRED> <!ATTLIST cc name CDATA #IMPLIED address CDATA #REQUIRED> <!ATTLIST attachment encoding (mime|binhex) "mime" file CDATA #REQUIRED> (head, body) head (from, to+, cc*, subject) from (ε) to (ε) cc (ε) subject (pcdata) body (text, attachment*) text (pcdata) attachment (ε)

Introduction (cont.) Local properties Describe content models in individual element declarations Global properties Describe the graph-theoretic structure of the whole DTD

Local properties Content model classification (1) pcdata (2) ε (3) any No restriction on subelements (4) Mixed content body (text, attachment*) text (pcdata) (5) “|” only but not mixed content (6) “,” only (7) Complex content Contains both “|” and “,” directory (dirname, dirinfo?, dirdesc?, (file | directory)*) (8) List α * α + (9) Single α ? body1 (pcdata, attatchment*)

Local properties (cont.) Content model classification

Local properties (cont.) Syntactic complexity depth( ε) = 0; depth(е) = 1; depth(α*) = depth(α+) = depth(α?) = depth(pcdata) = 1; depth(α 1,α 2,…, α n ) = depth(α 1 |α 2,…|α n ) = depth( α ) + 1; max(depth(α i )) + 1;

Local properties (cont.) An example head (from, to+, cc*, subject) depth(from, to+, cc*, subject) = depth(cc*) + 1 = depth(cc) = = 3

Local properties (cont.) Determinism If a content model DOES NOT require look ahead when parsing, it is a deterministic content model. non-deterministic content model : (a, b) | (a, c) deterministic content model : a, (b|c) Result It detects 5 non-deterministic content models in 4 DTDs.

Local properties (cont.) Ambiguity Definition : An expression R is ambiguous if and only if there exists some string s in R such that there can be distinct ways to parse string s. partner (name?, onetime?, partnrid?, partnrtype?, syncind?, name*, parentid?, partnridx?, partnrratg*) Result It detects 2 ambiguous content models.

Global properties Reachability Definition : An element name e’ is reachable from e, denoted by e e’, if either e αand e’ occurs in α, or e e” and e” e’. An example : (head, body) head (from, to+, cc*, subject) Definition : An element name e is reachable if r e, where r is the name of the root element. Otherwise element name e is called unreachable or useless. head subject head subject

Global properties (cont.) Reachability Unreachable element names in DTDs

Global properties (cont.) Recursions Definition : A content model αis derivable from an element name e, denoted by e α, if either e α, or e α’, e’ α”, and α= α’[e’/ α”], where α= α’[e’/ α”] denotes the content model obtained by substituting α” for all occurrences of e’ in α’. An example : (head, body) head (from, to+, cc*, subject) Definition : A DTD is recursive if and only if it has an element name e such that e e and e is reachable. (head, body) head (from, to+, cc*, subject) (from, to+, cc*, subject, body)

Global properties (cont.) Recursions Definition : A DTD is linear recursive if and only if it is recursive and for any reachable element name e and any e α, e occurs at most once inαand the occurrence is not enclosed in “*” or “+”. A DTD is said to be non-linear recursive if it is recursive but is not linear recursive. An example of non-linear recursive : directory (dirname, dirinfo?, dirdesc?, (file | directory)*) An example of linear recursive : e (pcdata | e) Result No linear recursive DTD is found in the sample DTDs. There are 7, 2 and 26 non-linear recursive DTDs in the app, data and meta category respectively.

Global properties (cont.) Chain of stars An example : entity (name*, contact*, location*, phone*, fax*) location (city*, otherinfo?) There is a chain of 2 stars.

Global properties (cont.) Chain of stars

Global properties (cont.) Hubs Definition : Fan-in of an element name e is the cardinality of the set {e ’ | e ’ αand e occurs in α}. An element name with a large fan-in value is called hub. An example : (head, body) head (from, to+, cc*, subject) from (ε) to (ε) cc (ε) subject (pcdata) body (text, attachment*) text (pcdata) attachment (ε) The fan-in value of element is 0, and the fan-in value of all other elements in this DTD is 1.

Global properties (cont.) Result : Fan-in of elements in data DTDsFan-in of elements in meta DTDs

Summary Local properties Content model classification Syntactic complexity Determinism Ambiguity Global properties Reachability Recursions Chain of stars Hubs One drawback of this survey It does not study any properties of attributes