Presentation is loading. Please wait.

Presentation is loading. Please wait.

2/6/05Salman Azhar: Database Systems1 XML Salman Azhar Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions.

Similar presentations


Presentation on theme: "2/6/05Salman Azhar: Database Systems1 XML Salman Azhar Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions."— Presentation transcript:

1 2/6/05Salman Azhar: Database Systems1 XML Salman Azhar Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs These slides use some figures, definitions, and explanations from Elmasri- Navathe’s Fundamentals of Database Systems and Molina-Ullman-Widom’s Database Systems

2 2/6/05Salman Azhar: Database Systems2 Framework 1. Information Integration : u Making databases from various places work as one. 2. Semi-structured Data : u A new data model designed to cope with problems of information integration. 3. XML : u A standard language for describing semi- structured data schemas and representing data.

3 2/6/05Salman Azhar: Database Systems3 1. Information Integration u Generally databases in an enterprises have: u Several underlying database management systems u Oracle,MS SQL Server, DB2, u Informix,Sybase (SQL Server), MS Access, etc. u Several underlying database schemas u Information in an employee table can contain u Employee Name, SSN, DOB, title, hrsPerWeek. modifiedTime, modifiedBy u Employee Name, SSN, DOB, title, degree, createTime, createBy u Employee Name, SSN, DOB, title, salary, modifiedTime, modifiedBy, createTime, createBy

4 2/6/05Salman Azhar: Database Systems4 2. Semi-structured Data u A new data model designed to cope with problems of information integration u Accommodates of different DBMS u Oracle,MS SQL Server, DB2, u Informix,Sybase (SQL Server), MS Access, etc. u Integrates different schemas u Employee Name, SSN, DOB, title, hrsPerWeek, modifiedTime, modifiedBy u Employee Name, SSN, DOB, title, degree, createTime, createBy u Employee Name, SSN, DOB, title, salary, createTime, createBy, modifiedTime, modifiedBy

5 2/6/05Salman Azhar: Database Systems5 3. XML u A standard language for describing semi-structured data schemas and representing data.

6 2/6/05Salman Azhar: Database Systems6 The Information-Integration Problem Major bottleneck in enterprise application integration For example… Hewlett Packard split into HP and Agilent Need to separate data into different destinations HP bought Compaq Need to integrate data from different sources

7 2/6/05Salman Azhar: Database Systems7 The Information-Integration Problem Related data exists in many places and could, in principle, work together. But different databases differ in: 1. Model w relational, object-oriented? 2. Schema w normalized/denormalized? 3. Terminology w are consultants employees? Retirees? Subcontractors? 4. Conventions w meters versus feet?

8 2/6/05Salman Azhar: Database Systems8 Example Consider merger of two stores in a Mall may be some overlap in the products sold but the databases are different

9 2/6/05Salman Azhar: Database Systems9 Example Each company has a database One may use a relational DBMS the other keeps the data in an MS-Word document One stores the phones of distributors, the other does not One distinguishes products in one department the other doesn’t One counts inventory by number of items, the other by cases

10 2/6/05Salman Azhar: Database Systems10 Two Approaches to Integration 1. Warehousing u Makes a copy of the data u More developed of the two 2. Mediation u Creates a view of the data u Newer and less developed

11 2/6/05Salman Azhar: Database Systems11 Warehouse Diagram Warehouse Wrapper Source 1Source 2 User queryResult

12 2/6/05Salman Azhar: Database Systems12 A Mediator Mediator Wrapper Source 1Source 2 User queryQuery Result

13 2/6/05Salman Azhar: Database Systems13 Warehousing u Make copies of the data sources at a central site and transform it to a common schema Reconstruct data daily/weekly Do not try to keep it more up-to-date than that. Pro: very well-developed several commercial tools are available Con: data can be old since updates are expensive 24-hour availability threatened by large data updates

14 2/6/05Salman Azhar: Database Systems14 Mediation u Create a view of all sources, as if they were integrated Answers a view query by translating it to terminology of the sources and querying them Pro: Current data Con: Can be slow as it requires real time merger of different data sources Lack of tools available

15 2/6/05Salman Azhar: Database Systems15 Warehouse Diagram Warehouse Wrapper Source 1Source 2 User queryResult

16 2/6/05Salman Azhar: Database Systems16 A Mediator Mediator Wrapper Source 1Source 2 User queryQuery Result

17 2/6/05Salman Azhar: Database Systems17 Semi-structured: Motivation Most effective approach to Information Integration: Semi-structured Data Model or Semi-structured Objects

18 2/6/05Salman Azhar: Database Systems18 Semi-structured: Motivation Main limitation of Object-Oriented Models: Object Models are Strongly Typed Objects of a class have one structure only Semi-structured approach solves this problem

19 2/6/05Salman Azhar: Database Systems19 Semi-structured Data Purpose: Represent data from independent sources more flexibly than either relational or object-oriented models

20 2/6/05Salman Azhar: Database Systems20 Semi-structured Data Each object has a class of their own and properties are defined whatever labels are attached to that object Properties mean attributes, relationships, methods, etc.

21 2/6/05Salman Azhar: Database Systems21 Semi-structured Data Think of objects but with the type of each object is the objects its own business not that of its “class” Labels to indicate meaning of substructures

22 2/6/05Salman Azhar: Database Systems22 Semi-structured Graphs Easy to think of Semi-structured data as Graphs Nodes = objects Labels on arcs = attributes leading to a leaf node relationships leading to another node

23 2/6/05Salman Azhar: Database Systems23 Semi-structured Graphs Atomic values at leaf nodes nodes with no arcs out Flexibility: no restriction on… labels out of a node number of successors with a given label

24 2/6/05Salman Azhar: Database Systems24 Example: Data Graph Pepsi PepsiCo BestSeller 2003 Main St KFC Sobe soda rest manf sellsAt name addr prize yearaward root The soda object for Pepsi (arc-in called soda; arc-out called name to Pepsi) Notice a new kind of data. Root object represents the entire DB. Often look like trees, but are not. The restaurant object for KFC (arc-in called rest; arc-out labeled name to KFC)

25 2/6/05Salman Azhar: Database Systems25 Stage is Now Set for XML A technology has application to different situations foundations remain the same applications changes

26 2/6/05Salman Azhar: Database Systems26 Extensible Markup Language (XML) XML uses tags for semantics (e.g., “this is an address”) HTML uses tags for formatting (e.g., “italic”), Key idea: create tag sets for a domain (e.g., genomics) translate all data into properly tagged XML docs

27 2/6/05Salman Azhar: Database Systems27 Well-Formed and Valid XML Well-Formed XML allows you to invent your own tags similar to labels in semi-structured data graph Valid XML involves a DTD (Document Type Definition) DTD gives a grammar for the use of labels limits the set of labels our of node the order and number of times a label occurs

28 2/6/05Salman Azhar: Database Systems28 Well-Formed XML All XML documents have Header Body Header defines version specifies that the document is in well-formed XML Body can include root tag several properly matching tags

29 2/6/05Salman Azhar: Database Systems29 Well-Formed XML: Header Start the document with a declaration surrounded by. Normal declaration for Well-Formed XML is: Version indicates version number Standalone = “yes” means no DTD no DTD means well-formed XML

30 2/6/05Salman Azhar: Database Systems30 Well-Formed XML: Body Body of document is a root tag surrounding nested tags. Body can include: several properly matching tags (as in html structure) special tag called root tag can have a special meaning such as document type or can be generic

31 2/6/05Salman Azhar: Database Systems31 Tags Tags, as in HTML are normally matched pairs, as … may be nested arbitrarily some tags requiring no matching ending such as in HTML, are also permitted however, we will not use these in examples

32 2/6/05Salman Azhar: Database Systems32 Example: Well-Formed XML Taco Bell Pepsi 1.00 Sobe 2.00 … … Root tag RESTS surrounds the entire document One of several nested REST tags representing information about a single REST tag specifies the REST name tags have names and price for each Soda nested in and tags Literal Data items are contained at the atomic level

33 2/6/05Salman Azhar: Database Systems33 XML and Semi-structured Data Consider this… Is Well-Formed XML documents with nested tags is exactly the same idea as trees of semi-structured data? Tags are the labels on edges Nodes represent data between matching tags Parent-child relationship is immediate nesting in XML

34 2/6/05Salman Azhar: Database Systems34 XML and Semi-structured Data Semi-structured approach allows for non-tree structures We shall see that XML also enables non-tree structures mimics the semi-structured data model

35 2/6/05Salman Azhar: Database Systems35 Group Exercise Convert the following into a Semi- structured representation Taco Bell Pepsi 1.00 Sobe 2.00 … … Note: Do not turn over to the next page before attempting this exercise yourself!

36 2/6/05Salman Azhar: Database Systems36 Solution: The semi-structured representation Taco Bell Pepsi1.00Sobe2.00 PRICE REST RESTS NAME... REST PRICE NAME SODA NAME Note: Data is stored in leaf nodes and structure (tags) in internal nodes Taco Bell Pepsi 1.00 Sobe 2.00 … …

37 2/6/05Salman Azhar: Database Systems37 Valid XML Switching gears: Well-formed to Valid XML Valid XML is the most interesting use of XML Essentially a context-free grammar for describing XML tags and their nesting Specified by DTD Each domain of interest creates one DTD that describes all the documents this group will share For example, electronic components, travel industry, etc., will have their own DTDs

38 2/6/05Salman Azhar: Database Systems38 DTD Structure [ ( ) ]> Note: !DOCTYPE is key word with being the name of DOCTYPE Between [ … ] list of ELEMENT definition Each !ELEMENT has a with the allowed list of usually in the order listed

39 2/6/05Salman Azhar: Database Systems39 DTD Elements Element definition consists of its name (tag) and a parenthesized description of any nested tags includes order of subtags and their multiplicity (0, 1, or many times) Leaves (text elements) have #PCDATA in place of nested tags

40 2/6/05Salman Azhar: Database Systems40 Example: DTD <!DOCTYPE RESTS [ ]> RESTS can have * (0 or more) REST REST has NAME and then + (1 or more) SODA… Order matters! NAME and PRICE are data (#PCDATA): No more tags just text SODA has NAME followed PRICE SODA’s NAME and PRICE are data (#PCDATA) GROUP EXERCISE: COMPLETE THE DTD Note: Do not turn over to the next page before attempting this exercise yourself!

41 2/6/05Salman Azhar: Database Systems41 Example: DTD <!DOCTYPE RESTS [ ]> RESTS can have * (0 or more) REST REST has NAME and then + (1 or more) SODA… Order matters! NAME and PRICE are data (#PCDATA): No more tags just text SODA has NAME followed PRICE

42 2/6/05Salman Azhar: Database Systems42 Element Descriptions Rules Subtags must appear in order shown A tag may be followed by a symbol to indicate its multiplicity: Identical to UNIX regular expressions. * = zero or more. + = one or more. ? = zero or one. Alternative sequences of tags can be connected by the symbol |

43 2/6/05Salman Azhar: Database Systems43 Example: Element Description A name is Either an optional title (e.g., “Dr.”), a first name, and a last name, in that order, or it is an IP address <!ELEMENT NAME ( (TITLE?, FIRST, LAST) | IPADDR )> Alternative symbol

44 2/6/05Salman Azhar: Database Systems44 Use of DTDs u In order to specify a document follows a particular DTD 1. Set STANDALONE = “no” a) Either include the DTD as a preamble of the XML document b) Follow DOCTYPE and the by SYSTEM and a path to the file where the DTD is stored

45 2/6/05Salman Azhar: Database Systems45 Example (a) <!DOCTYPE RESTS [ ]> Taco Bell Pepsi 1.00 Sobe 2.00 … … DTD Document Same as earlier but this time it conforms to the above DTD

46 2/6/05Salman Azhar: Database Systems46 Example (b) Assume the RESTS DTD is in file rest.dtd Taco Bell Pepsi 1.00 Sobe 2.00 … … Get the DTD from the file rest.dtd Document Same as earlier but this time it conforms to the DTD in rest.dtd

47 2/6/05Salman Azhar: Database Systems47 Attributes Attributes are another important component of DTD and XML docs Opening tags in XML can have attributes like in HTML In DTD … > gives a list of attributes and their data types for this element

48 2/6/05Salman Azhar: Database Systems48 Example: Attributes Rests can have an attribute kind which is either qsr, family, or other. The element definition is unchanged However, we add an ATTLIST. <!ATTLIST REST kind “qsr” | “family” | “other”>

49 2/6/05Salman Azhar: Database Systems49 Example: Attribute Use In a document that allows REST tags, we might see: KFC Pepsi 1.00... New info: kind = “qsr”

50 2/6/05Salman Azhar: Database Systems50 IDs and IDREFs Introduce links from one object to another Allows the structure of an XML document to be a general graph rather than just a tree. These are pointers from one object to another in analogy to HTML’s NAME = “blah” and HREF = “#blah”

51 2/6/05Salman Azhar: Database Systems51 Creating IDs We give an element Elephant an attribute Attention of type ID in the DTD When using tag in an XML document, give its attribute Attention a unique value. For example,

52 2/6/05Salman Azhar: Database Systems52 Creating IDREFs IDREFs are similar to IDs: To allow objects of type Fig to refer to another object with an ID attribute, give Fig an attribute of type IDREF (single string of type ID) Or, let the attribute have type IDREFS, so the Fig –object can refer to any number of other objects (any number strings of type ID).

53 2/6/05Salman Azhar: Database Systems53 Example: IDs and IDREFs Let us redesign our RESTS DTD to include both REST and SODA sub-elements Both rests and sodas will have ID attributes called name Rests have PRICE sub-objects, consisting of a number (the price of one soda) and an IDREF theSoda leading to that soda Sodas have attribute soldBy, which is an IDREFS leading to all the rests that sell it

54 2/6/05Salman Azhar: Database Systems54 The DTD <!DOCTYPE Rests [ ]> RESTS have 0+ REST and 0+ SODA REST objects have name as an ID attribute and have one or more PRICE sub-objects PRICE objects have a number (the price) and one reference to a soda Soda objects have an ID attribute called name, and a soldBy attribute that is a set of Rest names

55 2/6/05Salman Azhar: Database Systems55 Example Document 1.00 2.00 … <SODA name = “Pepsi”, soldBy = “KFC, TacoBell,…”> … <!DOCTYPE Rests [ ]>

56 2/6/05Salman Azhar: Database Systems56 Recap Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs

57 2/6/05Salman Azhar: Database Systems57 Perspective Here XML is used as a EDI medium EDI = electronic data interchange There are many other using for XML Each has its own utilization

58 2/6/05Salman Azhar: Database Systems58 Questions? Questions??? Doesn’t mean you will get all the answers!


Download ppt "2/6/05Salman Azhar: Database Systems1 XML Salman Azhar Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions."

Similar presentations


Ads by Google