Presentation is loading. Please wait.

Presentation is loading. Please wait.

What Are Real DTDs Like Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng.

Similar presentations


Presentation on theme: "What Are Real DTDs Like Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng."— Presentation transcript:

1 What Are Real DTDs Like Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng

2 Outline Overview Introduction Local properties Global properties

3 Overview XML is widely used in a variety of areas DTDs with different structures define XML with different usages A survey based on a number of DTDs in our real world

4 Introduction DTDs are from XML.org DTD repository Three DTD categories : app : Describe objects interchanged between programs/applications data : Describe data stored in database meta : Describe the structure of document markup 60 DTDs - 7 are app, 13 are data, 40 are meta

5 Introduction (cont.) A DTD can be described as a collection of element declarations of the form e α where e is the element name and α is the content model. The content model α::= ε| pcdata |e |α,α| α | α|α* | α+ | α?

6 Introduction (cont.) Email DTD <!ATTLIST from name CDATA #IMPLIED address CDATA #REQUIRED> <!ATTLIST to name CDATA #IMPLIED address CDATA #REQUIRED> <!ATTLIST cc name CDATA #IMPLIED address CDATA #REQUIRED> <!ATTLIST attachment encoding (mime|binhex) "mime" file CDATA #REQUIRED> email (head, body) head (from, to+, cc*, subject) from (ε) to (ε) cc (ε) subject (pcdata) body (text, attachment*) text (pcdata) attachment (ε)

7 Introduction (cont.) Local properties Describe content models in individual element declarations Global properties Describe the graph-theoretic structure of the whole DTD

8 Local properties Content model classification (1) pcdata (2) ε (3) any No restriction on subelements (4) Mixed content body (text, attachment*) text (pcdata) (5) “|” only but not mixed content (6) “,” only (7) Complex content Contains both “|” and “,” directory (dirname, dirinfo?, dirdesc?, (file | directory)*) (8) List α * α + (9) Single α ? body1 (pcdata, attatchment*)

9 Local properties (cont.) Content model classification

10 Local properties (cont.) Syntactic complexity depth( ε) = 0; depth(е) = 1; depth(α*) = depth(α+) = depth(α?) = depth(pcdata) = 1; depth(α 1,α 2,…, α n ) = depth(α 1 |α 2,…|α n ) = depth( α ) + 1; max(depth(α i )) + 1;

11 Local properties (cont.) An example head (from, to+, cc*, subject) depth(from, to+, cc*, subject) = depth(cc*) + 1 = depth(cc) + 1 + 1 = 1 + 1 + 1 = 3

12 Local properties (cont.) Determinism If a content model DOES NOT require look ahead when parsing, it is a deterministic content model. non-deterministic content model : (a, b) | (a, c) deterministic content model : a, (b|c) Result It detects 5 non-deterministic content models in 4 DTDs.

13 Local properties (cont.) Ambiguity Definition : An expression R is ambiguous if and only if there exists some string s in R such that there can be distinct ways to parse string s. partner (name?, onetime?, partnrid?, partnrtype?, syncind?, name*, parentid?, partnridx?, partnrratg*) Result It detects 2 ambiguous content models.

14 Global properties Reachability Definition : An element name e’ is reachable from e, denoted by e e’, if either e αand e’ occurs in α, or e e” and e” e’. An example : email (head, body) head (from, to+, cc*, subject) Definition : An element name e is reachable if r e, where r is the name of the root element. Otherwise element name e is called unreachable or useless. email head email subject head subject

15 Global properties (cont.) Reachability Unreachable element names in DTDs

16 Global properties (cont.) Recursions Definition : A content model αis derivable from an element name e, denoted by e α, if either e α, or e α’, e’ α”, and α= α’[e’/ α”], where α= α’[e’/ α”] denotes the content model obtained by substituting α” for all occurrences of e’ in α’. An example : email (head, body) head (from, to+, cc*, subject) Definition : A DTD is recursive if and only if it has an element name e such that e e and e is reachable. email (head, body) head (from, to+, cc*, subject) (from, to+, cc*, subject, body) email

17 Global properties (cont.) Recursions Definition : A DTD is linear recursive if and only if it is recursive and for any reachable element name e and any e α, e occurs at most once inαand the occurrence is not enclosed in “*” or “+”. A DTD is said to be non-linear recursive if it is recursive but is not linear recursive. An example of non-linear recursive : directory (dirname, dirinfo?, dirdesc?, (file | directory)*) An example of linear recursive : e (pcdata | e) Result No linear recursive DTD is found in the sample DTDs. There are 7, 2 and 26 non-linear recursive DTDs in the app, data and meta category respectively.

18 Global properties (cont.) Chain of stars An example : entity (name*, contact*, location*, phone*, fax*) location (city*, otherinfo?) There is a chain of 2 stars.

19 Global properties (cont.) Chain of stars

20 Global properties (cont.) Hubs Definition : Fan-in of an element name e is the cardinality of the set {e ’ | e ’ αand e occurs in α}. An element name with a large fan-in value is called hub. An example : email (head, body) head (from, to+, cc*, subject) from (ε) to (ε) cc (ε) subject (pcdata) body (text, attachment*) text (pcdata) attachment (ε) The fan-in value of email element is 0, and the fan-in value of all other elements in this DTD is 1.

21 Global properties (cont.) Result : Fan-in of elements in data DTDsFan-in of elements in meta DTDs

22 Summary Local properties Content model classification Syntactic complexity Determinism Ambiguity Global properties Reachability Recursions Chain of stars Hubs One drawback of this survey It does not study any properties of attributes


Download ppt "What Are Real DTDs Like Group Members : Xijie Zeng Peiyu Cai Presentor : Xijie Zeng."

Similar presentations


Ads by Google