2/6/05Salman Azhar: Database Systems1 XML Salman Azhar Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions.

Slides:



Advertisements
Similar presentations
2/10/05Salman Azhar: Database Systems1 XML Query Languages Salman Azhar XPATH XQUERY These slides use some figures, definitions, and explanations from.
Advertisements

XML: Extensible Markup Language
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
1 XML DTD & XML Schema Monica Farrow G30
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
1 XML Document Type Definitions XML Schema. 2 Well-Formed and Valid XML uWell-Formed XML allows you to invent your own tags. uValid XML conforms to a.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
Fall 2001Arthur Keller – CS 18017–1 Schedule Nov. 27 (T) Semistructured Data, XML. u Read Sections Assignment 8 due. Nov. 29 (TH) The Real World,
1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
Tutorial 3: Adding and Formatting Text. 2 Objectives Session 3.1 Type text into a page Copy text from a document and paste it into a page Check for spelling.
4/20/2017.
XML – Data Model, DTD and Schema
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
XP New Perspectives on XML Tutorial 3 1 DTD Tutorial – Carey ISBN
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
XML – what is it? eXtensible Markup Language Standard for publishing and interchange on the web and over the wire simpler version of SGML adapted to internet.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
MIS 315 Bsharah An Introduction to XML 1MIS Bsharah.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
Document Type Definitions XML Schema
XML Syntax - Writing XML and Designing DTD's
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Modern Databases Willem Visser RW334. The Web is Changing the Game Databases used to be the domain of corporations with limited amounts of data and limited.
CIS 451: XML DTDs Dr. Ralph D. Westfall February, 2009.
1 CS1368 Introduction* Relational Model, Schemas, SQL Semistructured Model, XML * The slides in this lecture are adapted from slides used in Standford's.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 27 XML: Extensible Markup Language.
XML Instructor: Charles Moen CSCI/CINF XML  Extensible Markup Language  A set of rules that allow you to create your own markup language  Designed.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
Semistructured Data Extensible Markup Language Document Type Definitions Zaki Malik November 04, 2008.
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
Tutorial 13 Validating Documents with Schemas
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
Exam II Syllabus uStorage & Buffer Management uIndexing: Btrees & Hash uMulti-dimensional Indexing uQuery processing (relational ops) uQuery optimization.
Semistructured-Data Model. Lu Chaojun, SJTU 2 Semistructured Data Structured data has a separate schema to describe its structure. –Advantage: efficient.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
CHAPTER NINE Accessing Data Using XML. McGraw Hill/Irwin ©2002 by The McGraw-Hill Companies, Inc. All rights reserved Introduction The eXtensible.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML: Extensible Markup Language
Semi-Structured data (XML Data MODEL)
DTD (Document Type Definition)
Lecture 8: XML Data Wednesday, October
CSE591: Data Mining by H. Liu
CE223 Database Systems Introduction
Semi-Structured data (XML)
Presentation transcript:

2/6/05Salman Azhar: Database Systems1 XML Salman Azhar Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs These slides use some figures, definitions, and explanations from Elmasri- Navathe’s Fundamentals of Database Systems and Molina-Ullman-Widom’s Database Systems

2/6/05Salman Azhar: Database Systems2 Framework 1. Information Integration : u Making databases from various places work as one. 2. Semi-structured Data : u A new data model designed to cope with problems of information integration. 3. XML : u A standard language for describing semi- structured data schemas and representing data.

2/6/05Salman Azhar: Database Systems3 1. Information Integration u Generally databases in an enterprises have: u Several underlying database management systems u Oracle,MS SQL Server, DB2, u Informix,Sybase (SQL Server), MS Access, etc. u Several underlying database schemas u Information in an employee table can contain u Employee Name, SSN, DOB, title, hrsPerWeek. modifiedTime, modifiedBy u Employee Name, SSN, DOB, title, degree, createTime, createBy u Employee Name, SSN, DOB, title, salary, modifiedTime, modifiedBy, createTime, createBy

2/6/05Salman Azhar: Database Systems4 2. Semi-structured Data u A new data model designed to cope with problems of information integration u Accommodates of different DBMS u Oracle,MS SQL Server, DB2, u Informix,Sybase (SQL Server), MS Access, etc. u Integrates different schemas u Employee Name, SSN, DOB, title, hrsPerWeek, modifiedTime, modifiedBy u Employee Name, SSN, DOB, title, degree, createTime, createBy u Employee Name, SSN, DOB, title, salary, createTime, createBy, modifiedTime, modifiedBy

2/6/05Salman Azhar: Database Systems5 3. XML u A standard language for describing semi-structured data schemas and representing data.

2/6/05Salman Azhar: Database Systems6 The Information-Integration Problem Major bottleneck in enterprise application integration For example… Hewlett Packard split into HP and Agilent Need to separate data into different destinations HP bought Compaq Need to integrate data from different sources

2/6/05Salman Azhar: Database Systems7 The Information-Integration Problem Related data exists in many places and could, in principle, work together. But different databases differ in: 1. Model w relational, object-oriented? 2. Schema w normalized/denormalized? 3. Terminology w are consultants employees? Retirees? Subcontractors? 4. Conventions w meters versus feet?

2/6/05Salman Azhar: Database Systems8 Example Consider merger of two stores in a Mall may be some overlap in the products sold but the databases are different

2/6/05Salman Azhar: Database Systems9 Example Each company has a database One may use a relational DBMS the other keeps the data in an MS-Word document One stores the phones of distributors, the other does not One distinguishes products in one department the other doesn’t One counts inventory by number of items, the other by cases

2/6/05Salman Azhar: Database Systems10 Two Approaches to Integration 1. Warehousing u Makes a copy of the data u More developed of the two 2. Mediation u Creates a view of the data u Newer and less developed

2/6/05Salman Azhar: Database Systems11 Warehouse Diagram Warehouse Wrapper Source 1Source 2 User queryResult

2/6/05Salman Azhar: Database Systems12 A Mediator Mediator Wrapper Source 1Source 2 User queryQuery Result

2/6/05Salman Azhar: Database Systems13 Warehousing u Make copies of the data sources at a central site and transform it to a common schema Reconstruct data daily/weekly Do not try to keep it more up-to-date than that. Pro: very well-developed several commercial tools are available Con: data can be old since updates are expensive 24-hour availability threatened by large data updates

2/6/05Salman Azhar: Database Systems14 Mediation u Create a view of all sources, as if they were integrated Answers a view query by translating it to terminology of the sources and querying them Pro: Current data Con: Can be slow as it requires real time merger of different data sources Lack of tools available

2/6/05Salman Azhar: Database Systems15 Warehouse Diagram Warehouse Wrapper Source 1Source 2 User queryResult

2/6/05Salman Azhar: Database Systems16 A Mediator Mediator Wrapper Source 1Source 2 User queryQuery Result

2/6/05Salman Azhar: Database Systems17 Semi-structured: Motivation Most effective approach to Information Integration: Semi-structured Data Model or Semi-structured Objects

2/6/05Salman Azhar: Database Systems18 Semi-structured: Motivation Main limitation of Object-Oriented Models: Object Models are Strongly Typed Objects of a class have one structure only Semi-structured approach solves this problem

2/6/05Salman Azhar: Database Systems19 Semi-structured Data Purpose: Represent data from independent sources more flexibly than either relational or object-oriented models

2/6/05Salman Azhar: Database Systems20 Semi-structured Data Each object has a class of their own and properties are defined whatever labels are attached to that object Properties mean attributes, relationships, methods, etc.

2/6/05Salman Azhar: Database Systems21 Semi-structured Data Think of objects but with the type of each object is the objects its own business not that of its “class” Labels to indicate meaning of substructures

2/6/05Salman Azhar: Database Systems22 Semi-structured Graphs Easy to think of Semi-structured data as Graphs Nodes = objects Labels on arcs = attributes leading to a leaf node relationships leading to another node

2/6/05Salman Azhar: Database Systems23 Semi-structured Graphs Atomic values at leaf nodes nodes with no arcs out Flexibility: no restriction on… labels out of a node number of successors with a given label

2/6/05Salman Azhar: Database Systems24 Example: Data Graph Pepsi PepsiCo BestSeller 2003 Main St KFC Sobe soda rest manf sellsAt name addr prize yearaward root The soda object for Pepsi (arc-in called soda; arc-out called name to Pepsi) Notice a new kind of data. Root object represents the entire DB. Often look like trees, but are not. The restaurant object for KFC (arc-in called rest; arc-out labeled name to KFC)

2/6/05Salman Azhar: Database Systems25 Stage is Now Set for XML A technology has application to different situations foundations remain the same applications changes

2/6/05Salman Azhar: Database Systems26 Extensible Markup Language (XML) XML uses tags for semantics (e.g., “this is an address”) HTML uses tags for formatting (e.g., “italic”), Key idea: create tag sets for a domain (e.g., genomics) translate all data into properly tagged XML docs

2/6/05Salman Azhar: Database Systems27 Well-Formed and Valid XML Well-Formed XML allows you to invent your own tags similar to labels in semi-structured data graph Valid XML involves a DTD (Document Type Definition) DTD gives a grammar for the use of labels limits the set of labels our of node the order and number of times a label occurs

2/6/05Salman Azhar: Database Systems28 Well-Formed XML All XML documents have Header Body Header defines version specifies that the document is in well-formed XML Body can include root tag several properly matching tags

2/6/05Salman Azhar: Database Systems29 Well-Formed XML: Header Start the document with a declaration surrounded by. Normal declaration for Well-Formed XML is: Version indicates version number Standalone = “yes” means no DTD no DTD means well-formed XML

2/6/05Salman Azhar: Database Systems30 Well-Formed XML: Body Body of document is a root tag surrounding nested tags. Body can include: several properly matching tags (as in html structure) special tag called root tag can have a special meaning such as document type or can be generic

2/6/05Salman Azhar: Database Systems31 Tags Tags, as in HTML are normally matched pairs, as … may be nested arbitrarily some tags requiring no matching ending such as in HTML, are also permitted however, we will not use these in examples

2/6/05Salman Azhar: Database Systems32 Example: Well-Formed XML Taco Bell Pepsi 1.00 Sobe 2.00 … … Root tag RESTS surrounds the entire document One of several nested REST tags representing information about a single REST tag specifies the REST name tags have names and price for each Soda nested in and tags Literal Data items are contained at the atomic level

2/6/05Salman Azhar: Database Systems33 XML and Semi-structured Data Consider this… Is Well-Formed XML documents with nested tags is exactly the same idea as trees of semi-structured data? Tags are the labels on edges Nodes represent data between matching tags Parent-child relationship is immediate nesting in XML

2/6/05Salman Azhar: Database Systems34 XML and Semi-structured Data Semi-structured approach allows for non-tree structures We shall see that XML also enables non-tree structures mimics the semi-structured data model

2/6/05Salman Azhar: Database Systems35 Group Exercise Convert the following into a Semi- structured representation Taco Bell Pepsi 1.00 Sobe 2.00 … … Note: Do not turn over to the next page before attempting this exercise yourself!

2/6/05Salman Azhar: Database Systems36 Solution: The semi-structured representation Taco Bell Pepsi1.00Sobe2.00 PRICE REST RESTS NAME... REST PRICE NAME SODA NAME Note: Data is stored in leaf nodes and structure (tags) in internal nodes Taco Bell Pepsi 1.00 Sobe 2.00 … …

2/6/05Salman Azhar: Database Systems37 Valid XML Switching gears: Well-formed to Valid XML Valid XML is the most interesting use of XML Essentially a context-free grammar for describing XML tags and their nesting Specified by DTD Each domain of interest creates one DTD that describes all the documents this group will share For example, electronic components, travel industry, etc., will have their own DTDs

2/6/05Salman Azhar: Database Systems38 DTD Structure [ ( ) ]> Note: !DOCTYPE is key word with being the name of DOCTYPE Between [ … ] list of ELEMENT definition Each !ELEMENT has a with the allowed list of usually in the order listed

2/6/05Salman Azhar: Database Systems39 DTD Elements Element definition consists of its name (tag) and a parenthesized description of any nested tags includes order of subtags and their multiplicity (0, 1, or many times) Leaves (text elements) have #PCDATA in place of nested tags

2/6/05Salman Azhar: Database Systems40 Example: DTD <!DOCTYPE RESTS [ ]> RESTS can have * (0 or more) REST REST has NAME and then + (1 or more) SODA… Order matters! NAME and PRICE are data (#PCDATA): No more tags just text SODA has NAME followed PRICE SODA’s NAME and PRICE are data (#PCDATA) GROUP EXERCISE: COMPLETE THE DTD Note: Do not turn over to the next page before attempting this exercise yourself!

2/6/05Salman Azhar: Database Systems41 Example: DTD <!DOCTYPE RESTS [ ]> RESTS can have * (0 or more) REST REST has NAME and then + (1 or more) SODA… Order matters! NAME and PRICE are data (#PCDATA): No more tags just text SODA has NAME followed PRICE

2/6/05Salman Azhar: Database Systems42 Element Descriptions Rules Subtags must appear in order shown A tag may be followed by a symbol to indicate its multiplicity: Identical to UNIX regular expressions. * = zero or more. + = one or more. ? = zero or one. Alternative sequences of tags can be connected by the symbol |

2/6/05Salman Azhar: Database Systems43 Example: Element Description A name is Either an optional title (e.g., “Dr.”), a first name, and a last name, in that order, or it is an IP address <!ELEMENT NAME ( (TITLE?, FIRST, LAST) | IPADDR )> Alternative symbol

2/6/05Salman Azhar: Database Systems44 Use of DTDs u In order to specify a document follows a particular DTD 1. Set STANDALONE = “no” a) Either include the DTD as a preamble of the XML document b) Follow DOCTYPE and the by SYSTEM and a path to the file where the DTD is stored

2/6/05Salman Azhar: Database Systems45 Example (a) <!DOCTYPE RESTS [ ]> Taco Bell Pepsi 1.00 Sobe 2.00 … … DTD Document Same as earlier but this time it conforms to the above DTD

2/6/05Salman Azhar: Database Systems46 Example (b) Assume the RESTS DTD is in file rest.dtd Taco Bell Pepsi 1.00 Sobe 2.00 … … Get the DTD from the file rest.dtd Document Same as earlier but this time it conforms to the DTD in rest.dtd

2/6/05Salman Azhar: Database Systems47 Attributes Attributes are another important component of DTD and XML docs Opening tags in XML can have attributes like in HTML In DTD … > gives a list of attributes and their data types for this element

2/6/05Salman Azhar: Database Systems48 Example: Attributes Rests can have an attribute kind which is either qsr, family, or other. The element definition is unchanged However, we add an ATTLIST. <!ATTLIST REST kind “qsr” | “family” | “other”>

2/6/05Salman Azhar: Database Systems49 Example: Attribute Use In a document that allows REST tags, we might see: KFC Pepsi New info: kind = “qsr”

2/6/05Salman Azhar: Database Systems50 IDs and IDREFs Introduce links from one object to another Allows the structure of an XML document to be a general graph rather than just a tree. These are pointers from one object to another in analogy to HTML’s NAME = “blah” and HREF = “#blah”

2/6/05Salman Azhar: Database Systems51 Creating IDs We give an element Elephant an attribute Attention of type ID in the DTD When using tag in an XML document, give its attribute Attention a unique value. For example,

2/6/05Salman Azhar: Database Systems52 Creating IDREFs IDREFs are similar to IDs: To allow objects of type Fig to refer to another object with an ID attribute, give Fig an attribute of type IDREF (single string of type ID) Or, let the attribute have type IDREFS, so the Fig –object can refer to any number of other objects (any number strings of type ID).

2/6/05Salman Azhar: Database Systems53 Example: IDs and IDREFs Let us redesign our RESTS DTD to include both REST and SODA sub-elements Both rests and sodas will have ID attributes called name Rests have PRICE sub-objects, consisting of a number (the price of one soda) and an IDREF theSoda leading to that soda Sodas have attribute soldBy, which is an IDREFS leading to all the rests that sell it

2/6/05Salman Azhar: Database Systems54 The DTD <!DOCTYPE Rests [ ]> RESTS have 0+ REST and 0+ SODA REST objects have name as an ID attribute and have one or more PRICE sub-objects PRICE objects have a number (the price) and one reference to a soda Soda objects have an ID attribute called name, and a soldBy attribute that is a set of Rest names

2/6/05Salman Azhar: Database Systems55 Example Document … <SODA name = “Pepsi”, soldBy = “KFC, TacoBell,…”> … <!DOCTYPE Rests [ ]>

2/6/05Salman Azhar: Database Systems56 Recap Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs

2/6/05Salman Azhar: Database Systems57 Perspective Here XML is used as a EDI medium EDI = electronic data interchange There are many other using for XML Each has its own utilization

2/6/05Salman Azhar: Database Systems58 Questions? Questions??? Doesn’t mean you will get all the answers!