Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sebastian Bitzer Seminar Semistructured Data University of Osnabrueck May 2, 2003 XML An introduction in relation to semistructured.

Similar presentations


Presentation on theme: "Sebastian Bitzer Seminar Semistructured Data University of Osnabrueck May 2, 2003 XML An introduction in relation to semistructured."— Presentation transcript:

1 Sebastian Bitzer (sbitzer@uos.de)sbitzer@uos.de Seminar Semistructured Data University of Osnabrueck May 2, 2003 XML An introduction in relation to semistructured data

2 02.05.2003XML2 Overview Background / History Basic syntax XML and semistructured data Document type definitions Extensions for XML Paraphernalia

3 02.05.2003XML3 Overview Background / History –SGML –SGML, HTML and XML –World Wide Web Consortium Basic syntax XML and semistructured data Document type definitions Extensions for XML Paraphernalia

4 02.05.2003XML4 Standard Generalized Markup Language (SGML) model information exclusively on basis of its inner laws and its function  platform independent storage of structured information standard: ISO 8879 from 1986

5 02.05.2003XML5 SGML, HTML and XML SGML(web application) = HTML (is one special instance of SGML) XML  SGML

6 02.05.2003XML6 Why XML from SGML? SGML: –is exceedingly complex and difficult to understand –is formally so complex, that online-applications have difficulties to process it in reasonable time –has many properties which were not designed for use in network environments (remember that it is a standard from 1986)

7 02.05.2003XML7 World Wide Web Consortium Nov 1996: initial XML draft Dec 1997: XML1.0 Proposed Recommendation Feb 1998: W3C Recommendation: Extensible Markup Language (XML) 1.0 Oct 2000: XML1.0 2nd edition

8 02.05.2003XML8 Overview Background / History Basic syntax –Elements –Attributes –Well-formed XML documents XML and semistructured data Document type definitions Extensions for XML Paraphernalia

9 02.05.2003XML9 Elements element = content, = markups content = structures between markups no predefined tags basic content (no markups) is treated as text: PCDATA (Parsed Character Data) abbreviation for empty elements:

10 02.05.2003XML10 Example John Cage Bearer Elaine Vassal chief secretary …

11 02.05.2003XML11 Attributes sometimes called “property” in data models (name=“value”) pairs value always a string (type NMTOKEN) allows building of groups of elements ambiguity: information as attribute or element?

12 02.05.2003XML12 Example John Cage Bearer Elaine Vassal chief secretary …

13 02.05.2003XML13 Well-formed XML documents a XML document is well-formed, if: –tags nest properly (not ) –attributes are unique within one element (not )

14 02.05.2003XML14 Overview Background / History Basic syntax XML and semistructured data –Simple transformations –Differences that make transformation more difficult –Additional constructs Document type definitions Extensions for XML Paraphernalia

15 02.05.2003XML15 Simple transformations with basic XML syntax (no attributes, tree as data structure): from XML to ssd: John Cage Bearer  {person : {name : “John Cage”, function : ”bearer”}}

16 02.05.2003XML16 Simple transformations II from ssd to XML (transformation function T): T(atomic value) = atomic value T({l 1 : v 1, …, l n : v n }) = T(v 1 ) … T(v n )

17 02.05.2003XML17 Differences that make transformation more difficult different semantic of labels element or attribute order mixing elements and text

18 02.05.2003XML18 Semantics of labels XML graphs with labels on nodes ssd graphs with labels on edges person nameageemail Alan42ab@com person name age email Alan42ab@com Alan 42 ab@com {person : {name : “Alan”}, {age: 42}, {email: “ab@com”} }

19 02.05.2003XML19 Element or attribute ambiguity between representation of information as element or as attribute  different possibilities of encoding in particular in combination with references some string or: some string aa b c “some string”

20 02.05.2003XML20 Order ssd model based on unordered collections XML elements are ordered but: XML attributes are not unordered data can be processed more efficiently  for data exchange applications ignore order of XML

21 02.05.2003XML21 Mixing elements and text XML allows mixing of PCDATA and subelements: XML - An introduction in relation to semistructured data Sebastian Bitzer

22 02.05.2003XML22 Additional constructs in XML comments processing instructions CDATA (for escaping) entities e.g. “ä” but also external files can be declared as entities e.g. a gif-file as “&pic-1;”

23 02.05.2003XML23 Overview Background / History Basic syntax XML and semistructured data Document type definitions –DTDs as grammars –DTDs as schemas –Attributes –Valid XML documents –Limitations Extensions for XML Paraphernalia

24 02.05.2003XML24 DTDs as grammar document type definition (DTD) serves as grammar for underlying XML document is precisely a context-free grammar (non- terminal  ordered list of one or more terminals and non-terminals) can be recursive

25 02.05.2003XML25 Definitions DTD: element-def.s: … content model: ordered list of names of elements which can occur in the outer element

26 02.05.2003XML26 Variations of content model means that elements of type “r1” contain: –0 or 1 “a” (“a” is optional) and –arbitrary many “b” (0 - ∞) and –either: exactly 1 “c” (“c” is obligatory) or:at least 1 “d” (“d” is required) groups can be build, too: means: at least one sequence of “a” followed by “b” comes in front of the optional “c”

27 02.05.2003XML27 DTDs as Schemas DTD: <!DOCTYPE db [ ]> can be seen as representation for relational schema r1(a,b,c), r2(c,d)

28 02.05.2003XML28 Declaring attributes <!ATTLIST el.name att.name1 type1 spec1 att.name2 type2 spec2 … > el.name: element which is modified by att.s type: often “CDATA”, but also more restricted e.g.: “(m|f)” for male or female in att. “sex” spec: #REQUIRED, #IMPLIED, #FIXED or default value

29 02.05.2003XML29 Unique Identifiers e.g.: <!ATTLIST person id ID#REQUIRED mom IDREF#IMPLIED dad IDREF#IMPLIED children IDREFS#IMPLIED instance:

30 02.05.2003XML30 Valid XML documents a XML document is valid, if: –document is well-formed –additionally has a DTD –conforms to that DTD: elements only nested as described in DTD just attributes used which are allowed by DTD all attributes of type ID must have distinct values all IDREFS must be to existing identifiers

31 02.05.2003XML31 Limitations of DTDs as schemas (summarized) order only one atomic type (PCDATA, but no INT etc.) names are global (partial solution: namespaces) IDREFs are not constrained to a certain type (“mother”-reference should point to a “person”)

32 02.05.2003XML32 Overview Background / History Basic syntax XML and semistructured data Document type definitions Extensions for XML –DCD –Document navigation Paraphernalia

33 02.05.2003XML33 Document Content Definitions making typing more precise seems to be gone recent approach: XML Schema which must e.g.: – provide for primitive data typing, including byte, date, integer, sequence, SQL & Java primitive data types, etc. –allow creation of user-defined datatypes, such as datatypes that are derived from existing datatypes and which may constrain certain of its properties –mechanism for URI reference to standard semantic understanding of a construct; –… (http://www.w3.org/TR/NOTE-xml-schema-req)

34 02.05.2003XML34 XLink & XPointer pointing to arbitrary positions in documents using IDs or relative position links can be defined externally to both source and target (files)

35 02.05.2003XML35 Overview Background / History Basic syntax XML and semistructured data Document type definitions Extensions for XML Paraphernalia –RDF –Stylesheets –SAX and DOM

36 02.05.2003XML36 Resource Description Framework for representing metadata consists of data model and syntax simple form: edge-labelled graph additionally: –containers (bag, sequence or alternative) –higher-order statements (“John says that …”)

37 02.05.2003XML37 Stylesheets to specify presentation of data Cascading Style Sheets (CSS): associate with each element type a presentation Extensible Stylesheet Language (XSL): specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary http://www.w3.org/Style/XSL/

38 02.05.2003XML38 SAX and DOM Application Programming Interfaces Simple API for XML (SAX) –standard for parsing Document Object Model (DOM): interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents –compile whole document and build a tree representation for it http://www.w3.org/DOM/

39 02.05.2003XML39 Outlook Database issues: –How are we going to model XML? (graphs). –How are we going to query XML? (XML-QL) –How are we going to store XML (in a relational database? object-oriented?) –How are we going to process XML efficiently? (uh… well..., um..., ah..., get some good grad students!) Raghu Ramakrishnan http://www.cs.wisc.edu/~cs784-1/handouts/intro-ssxml.ppt

40 02.05.2003XML40 References S. Abiteboul, P. Buneman, and D. Suciu, Data on the Web. From relations to Semistructured Data and XML, Morgan Kaufmann Publishers, San Francisco 2000 H. Lobin, Informationsmodellierung in XML und SGML, Berlin, Heidelberg, 2000 World Wide Web Consortium, Extensible Markup Language (XML), http://www.w3.org/XML/


Download ppt "Sebastian Bitzer Seminar Semistructured Data University of Osnabrueck May 2, 2003 XML An introduction in relation to semistructured."

Similar presentations


Ads by Google