Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 XML and Databases. 2 Outline (ambitious) Background: documents (SGML/HTML) and databases (structured and semistructured data) XML Basics and Document.

Similar presentations


Presentation on theme: "1 XML and Databases. 2 Outline (ambitious) Background: documents (SGML/HTML) and databases (structured and semistructured data) XML Basics and Document."— Presentation transcript:

1 1 XML and Databases

2 2 Outline (ambitious) Background: documents (SGML/HTML) and databases (structured and semistructured data) XML Basics and Document Type Descriptors XML query languages: XML-QL and XSL. XML additions: Xlink, Xpointer, RDF, SOX, XML- Data Document Object Model (XML API's)

3 3 Some Useful Articles XML, Java, and the future of the web http://webreview.com/wr/pub/97/12/19/xml/index.html XML and the Second-Generation Web http://www.sciam.com/1999/0599issue/0599bosak.html Articles/standards for XML, XSL, XML-QL http://www.w3c.org/ http://www.w3.org/TR/REC-xml

4 4 Background What’s the difference between the world of documents and information retrieval and databases and query interfaces?

5 5 Documents vs Databases Document world > plenty of small documents > usually static > implicit structure section, paragraph, toc, > tagging > human friendly > content form/layout, annotation > Paradigms “Save as”, wysiwyg > meta-data author name, date, subject Database world > a few large databases > usually dynamic > explicit structure (schema) > records > machine friendly > content schema, data, methods > Paradigms Atomicity, Concurrency, Isolation, Durability > meta-data schema description

6 6 What to do with them Documents editing printing spell-checking counting words retrieving (IR) searching Database updating cleaning querying composing/transforming

7 7 HTML Publishing hypertext on the World Wide Web Designed to describe how a Web browser should arrange text, images and push-buttons on a page. Easy to learn, but does not convey structure. Fixed tag set. Welcome to the XML course Introduction Opening tag Text (PCDATA) Closing tag “Bachelor” tag Attribute nameAttribute value

8 8 The Structure of XML XML consists of tags and text Tags come in pairs... They must be properly nested...... --- good...... --- bad (You can’t do......... in HTML)

9 9 XML text XML has only one “basic” type -- text. It is bounded by tags e.g. The Big Sleep 1935 --- 1935 is still text XML text is called PCDATA (for parsed character data). It uses a 16-bit encoding, e.g. \&\#x0152 for the Hebrew letter Mem Later we shall see how new types are specified by XML-data

10 10 XML structure Nesting tags can be used to express various structures. E.g. A tuple (record) : Malcolm Atchison (215) 898 4321 mp@dcs.gla.ac.sc

11 11 XML structure (cont.) We can represent a list by using the same tag repeatedly:...

12 12 Terminology The segment of an XML document between an opening and a corresponding closing tag is called an element. Malcolm Atchison (215) 898 4321 mp@dcs.gla.ac.sc element not an element element, a sub-element of

13 13 XML is tree-like person name email tel Malcolm Atchison (215) 898 4321 mp@dcs.gla.ac.sc Semistructured data models typically put the labels on the edges

14 14 Mixed Content An element may contain a mixture of sub-elements and PCDATA British Airways World’s favorite airline Data of this form is not typically generated from databases. It is needed for consistency with HTML

15 15 A Complete XML Document Malcolm Atchison (215) 898 4321 mp@dcs.gla.ac.sc

16 16 Two ways of representing a DB projects: title budget managedBy employees: name ssn age

17 17 Project and Employee relations in XML Pattern recognition 10000 Joe Joe 344556 34 Sandra 2234 35 Auto guided vehicle 70000 Sandra : Projects and employees are intermixed

18 18 Pattern recognition 10000 Joe Auto guided vehicles 70000 Sandra : Project and Employee relations in XML (cont’d) Joe 344556 34 Sandra 2234 35 : Employees follow projects

19 19 Pattern recognition 10000 Joe Auto guided vehicles 70000 Sandra : Project and Employee relations in XML (cont’d) Joe 344556 34 Sandra 2234 35 : Or without “separator” tags …

20 20 Attributes An (opening) tag may contain attributes. These are typically used to describe the content of an element cheese fromage branza A food made … Order of attributes in an element does not matter XML elements are ordered

21 21 Attributes (cont’d) Another common use for attributes is to express dimension or type 2400 96 M05-.+C$@02!G96YE<FEC... A document that obeys the “nested tags” rule and does not repeat an attribute within a tag is said to be well-formed.

22 22 When to use attributes It’s not always clear when to use attributes F. MacNiel fmacn@dcs.barra.ac.sc... 123 45 6789 F. MacNiel fmacn@dcs.barra.ac.sc...

23 23 XML Misc. Apart from elements and attributes, XML allows processing instructions and comments. A processing instruction is a statement of the form: A comment takes the following form: enclose comments between

24 24 Document Type Descriptors Imposing structure on XML documents

25 25 Document Type Descriptors Document Type Descriptors (DTDs) impose structure on an XML document. There is some relationship between a DTD and a schema, but it is not close -- hence the need for additional “typing” systems. The DTD is a syntactic specification.

26 26 Example: The Address Book MacNiel, John Dr. John MacNiel 1234 Huron Street Rome, OH 98765 (321) 786 2543 jm@abc.com Exactly one name At most one greeting As many address lines as needed (in order) Mixed telephones and faxes As many as needed

27 27 Specifying the structure name to specify a name element greet? to specify an optional (0 or 1) greet elements name,greet? to specify a name followed by an optional greet

28 28 Specifying the structure (cont) addr* to specify 0 or more address lines tel | fax a tel or a fax element (tel | fax)* 0 or more repeats of tel or fax email* 0 or more email elements

29 29 Specifying the structure (cont) So the whole structure of a person entry is specified by name, greet?, addr*, (tel | fax)*, email* This is known as a regular expression. Why is it important?

30 30 Regular Expressions Each regular expression determines a corresponding finite state automaton. Let’s start with a simpler example: name, addr*, email This suggests a simple parsing program name addr email

31 31 Another example name,address*,(tel | fax)*,email* name address tel fax email Adding in the optional greet further complicates things email

32 32 A DTD for the address book <!DOCTYPE addressbook [ <!ELEMENT person (name, greet?, address*, (fax | tel)*, email*)> ]>

33 33 Our relational DB revisited projects: title budget managedBy employees: name ssn age

34 34 Two DTDs for the relational DB <!DOCTYPE db [... ]> <!DOCTYPE db [... ]>

35 35 Some things are hard to specify Each employee element is to contain name, age and ssn elements in some order. <!ELEMENT employee ( (name, age, ssn) | (age, ssn, name) | (ssn, name, age) |... )> Suppose there were many more fields !

36 36 Summary of XML regular expressions AThe tag A occurs e1,e2The expression e1 followed by e2 e*0 or more occurrences of e e?Optional -- 0 or 1 occurrences e+1 or more occurrences e1 | e2either e1 or e2 (e)grouping

37 37 Specifying attributes in the DTD <!ATTLIST height dimension CDATA #REQUIRED accuracy CDATA #IMPLIED > The dimension attribute is required; the accuracy attribute is optional. CDATA is the “type” of the attribute -- it means string.

38 38 The DTD Language Default modifiers in DTD attributes:

39 39 The DTD Language Datatypes in DTD attributes:

40 40 Consistency of ID and IDREF attribute values If an attribute is declared as ID –the associated values must all be distinct (no confusion) –Id is a poor cousin of a key in relational databases. If an attribute is declared as IDREF –the associated value must exist as the value of some ID attribute (no dangling “pointers”) –IDREF is a poor cousin of foreign key in relational databases. Similarly for all the values of an IDREFS attribute –An attribute of type IDREFS represent a space- separated list of strings of references to valid IDs. ID and IDREF attributes are not typed

41 41 Specifying ID and IDREF attributes <!DOCTYPE family [ <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED children IDREFS #IMPLIED> ]>

42 42 Some conforming data Jane Doe John Doe Mary Doe Jack Doe

43 43 An alternative specification <!DOCTYPE family [ ]>

44 44 The revised data Jane Doe John Doe...

45 45 The DTD Language Example: Sales Order Document “An order document is comprised of several sales orders. Each individual order has a number and it contains the customer information, the date when the order was received, and the items ordered. Each customer has a number, a name, street, city, state, and ZIP code. Each item has an item number, parts information and a quantity. The parts information contains a number, a description of the product and its unit price. The numbers should be treated as attributes.”

46 46 The DTD Language Example: Sales Order Document DTD

47 47 The DTD Language Example: Sales Order XML Document ABC Industries 123 Main St. Chicago IL 60609 10222000 Turkey wrench 9.95 10

48 48 A useful abbreviation When an element has empty content we can use for For example: Jane Doe...

49 49 Schema.dtd <!DOCTYPE db [

50 50 Schema.dtd (cont’d) ]>

51 51 Connecting the document with its DTD In line: … ]>... Another file : A URL: <!DOCTYPE db SYSTEM "http://www.schemaauthority.com/schema.dtd">

52 52 Well-formed and Valid Documents Well-formed applies to any document (with or without a DTD): proper nesting of tags and unique attributes Valid specifies that the document conforms to the DTD: conforms to regular expression grammar, types of attributes correct, and constraints on references satisfied

53 53 DTDs v.s Schemas (or Types) By database (or programming language) standards DTDs are rather weak specifications. –Only one base type -- PCDATA –No useful “abstractions” e.g., sets –IDREFs are untyped. You point to something, but you don’t know what! –No constraints e.g., child is inverse of parent –No methods –Tag definitions are global Some of the XML extensions impose something like a schema or type on an XML document. We’ll see these later

54 54 Lots of possibilities for schemas XML Schema (under W3C’s spotlight) XDR (Microsoft’s BizTalk) SOX (Schema for Object-Oriented XML) Schematron DSD (AT&T Labs and BRICS) and more.

55 55 Some tools XML Authority http://www.extensibility.com/tibco/solutions/xml _authority/index.htm XML Spy http://www.xmlspy.com/download.html

56 56 Summary XML is a new data format. Its main virtues are widespread acceptance and the (important) ability to handle semistructured data (data without schema) DTDs provide some useful syntactic constraints on documents. As schemas they are weak How to store large XML documents? How to query them? How to map between XML and other representations?


Download ppt "1 XML and Databases. 2 Outline (ambitious) Background: documents (SGML/HTML) and databases (structured and semistructured data) XML Basics and Document."

Similar presentations


Ads by Google