Models and languages for semistructured data Bridging documents and databases.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

XML: Extensible Markup Language
XML and Enterprise Computing. What is XML? Stands for “Extensible Markup Language” –similar to SGML and HTML –document “tags” are used to define content.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
CS 898N – Advanced World Wide Web Technologies Lecture 21: XML Chin-Chih Chang
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Data models Relational, object, and semistructured.
1 COS 425: Database and Information Management Systems XML and information exchange.
Semantic Web 06 th March, 2002 Robert Kaminski, Thomas Panas.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
4/15/2002Bo Du 1 - Bo Du, April 15, XML - QL A Query Language for XML.
Sebastian Bitzer Seminar Semistructured Data University of Osnabrueck May 2, 2003 XML An introduction in relation to semistructured.
XML Verification Well-formed XML document  conforms to basic XML syntax  contains only built-in character entities Validated XML document  conforms.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Copyright © 2003 Pearson Education, Inc. Slide 3-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Chapter 10: XML.
XML-QL A Query Language for XML Charuta Nakhe
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
XML Extensible Markup Language. What is XML? ● meta-markup language ● a language for defining a family of languages ● semantic/structured mark-up language.
XMLI Structure of XML Data Structure of XML Data XML Document Schema XML Document Schema XPATH XPATH.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
How do I use HTML and XML to present information?.
Lecture 11 XSL Transformations (part 1: Introduction)
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
CIS 451: XML DTDs Dr. Ralph D. Westfall February, 2009.
XML Validation I DTDs Robin Burke ECT 360 Winter 2004.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
More XML namespaces, DTDs CS 431 – Carl Lagoze – Cornell University.
XML Name: Niki Sardjono Class: CS 157A Instructor : Prof. S. M. Lee.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
An Introduction to XML Sandeep Bhattaram
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
Introduction to DTD A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML: Extensible Markup Language
Managing XML and Semistructured Data
Querying XML and Semistructured Data
XML in Web Technologies
Managing XML and Semistructured Data
XML Data Introduction, Well-formed XML.
Lecture 11 XML Wednesday, Oct. 24, 2001.
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Lecture 9: XML Monday, October 17, 2005.
Lecture 8: XML Data Wednesday, October
Lecture 11: XML and Semistructured Data
Presentation transcript:

Models and languages for semistructured data Bridging documents and databases

Lectures 1. Introduction to data models 2. Query languages for relational databases 3. Models and query languages for object databases 4. Models and query languages for semistructured data, XML 5. Embedded query languages 6. Guest lecture on Object Role Modelling

Why do we like types? zTypes facilitate understanding zTypes enable compact representations zTypes enable query optimisation zTypes facilitate consistency enforcement

Background assumptions for typed data zData stable over time zOrganisational body to control data zExercise: Give an example of a context where these assumptions do not hold

Semistructured data Semistructured data is schemaless and self describing The data and the description of the data are integrated

An example {name: {first: “John”, last: “Smith”}, tel: , “John” “Smith” name tel first last

Another example person name age child &o1&o2 “Eva” 40 “Abel” 20 {person: &o1{name: “Eva”, age: 40, child: &o2}, person: &o2{name: “Abel”, age: 20}} An object identifier, such as &o1, before a structure, binds the object identifier to the identity of that structure. The object identifier can then be used to refer to the structure.

Terminology The following is an ssd-expression: &o1{name: “Eva”, age: 40, child: &o2} Label Value Object identifier

A database biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book …….

Path expressions A path expression is a sequence of labels: l 1.l 2 …l n A path expression results in a set of nodes Path properties are specified by regular expressions on two levels: on the alphabet of labels and on the alphabet of characters that comprise labels

A path expression biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book ……. biblio.book.author

A path expression biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book ……. biblio.(book l paper).author

Examples of path expressions zbiblio.book.author - authors of books zbiblio.paper.author - authors of papers zbiblio.(book l paper).author - authors of books or papers zbiblio._.author - authors of anything zbiblio._*.author - nodes at the ends of paths starting with biblio, ending with author, and having an arbitrary sequence of labels between

Example of a label pattern z((b l B)ook l (a l A)uthor) (s)? - book, Book, author, Author, books, Books, authors, Authors

An exercise biblio._*.author.(“[s l S]ection”) Which ones of the following paths match the path expression above? 1. Biblio.author.Section 2. Biblio.cat.rat.hat.author.section 3. Biblio.author 4. Biblio.cat.author.section.Section

A simple query Select author: X from biblio.book.author X Result: {author: “Darwin”, author: “Marx”}

A query with a condition select row: X from biblio._ X where “Crick” in X.author Result: {row: {author: “Crick”, author: “Wallace”, date: 1956, title: “The spiral DNA”}, …}

Two exercises select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z select row: {author: Y, date: Z} from biblio.book X, X.author Y, X.date Z

A database biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book ……. select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z

A database biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db author title date n3 Marx Kapital1860 book …….

Nested queries select row: (select author: Y from X.author Y) from biblio.book X

Three exercises zWhich authors have written a book or a paper in 1992? zWhich authors have written a book together with Jones? zWhich authors have written both a book and a paper?

Expressing relations a b c b d e r1r2 { r1: { row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2}, row: {a: 1, b:2, c:2} }, r2: { row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2}, row: {b: 1, d:2, e:2} } }

Expressing relational joins select a: A, d: D fromr1.row X r2.row Y X.a A, X.b B, Y.b B’, Y.d D where B = B’

Label variables select L: X from biblio._*.L X where matches(“.*Shakespeare.*”, X) Label variable biblio book author title date n2 Shakespeare Macbeth1622 db author title date n3 Smith Best of Shakespeare1992 book …….

Label variables select L: X from biblio._*.L X where matches(“.*Shakespeare.*”, X) {author: “Shakespeare”, title: “Best of Shakespeare”}

Turning labels into data select publ: {type: L, author: A} from biblio.L X, X.author A biblio paper book author title date Crick Wallace DNA spiral 1956 author title date n1 n2 Darwin Origin1848 db {publ: {type: “paper”, author: “Crick”}, publ: {type: “paper”, author: “Wallace”}, publ: {type: “book”, author: “Darwin”}

An exercise zList all publications in 1992, their types, and titles.

Basic XML syntax XML is a textual representation of data An element is a text bounded by tags John start-tag end-tagcontent element can be abbreviated as

Basic XML syntax Elements may contain subelements John

XML attributes An attribute is defined by a name-value pair within a tag

XML attributes and elements widget 10 widget

XML and ssd-expressions John {person: {name: “John”, tel: ,

XML references John Peter element identifier reference attribute

Document Type Definitions <!DOCTYPE db [ ]>

An exercise on DTDs as schemas a1 b1 a2 b2 a1 b1 c2 d2 a1 b1 Write down a DTD for the data above!

Attributes in DTDs trumpet <!ATTLIST name language CDATA #REQUIRED departmentCDATA #IMPLIED>

Reference attributes in DTDs <!DOCTYPE people [ <!ATTLIST person id ID#REQUIRED bossIDREF#REQUIRED friendsIDREFS#IMPLIED> ]>

An exercise id = “sven” boss = “olle”> Sven Svensson id = “olle” friends = “nils eva”> Olle Olsson id = “pelle” boss = “nils eva”> Per Persson Does this XML element conform to the previous DTD?

Limitations of DTDs as schemas zDTDs impose order zNo base types zThe types of IDREFs cannot be constrained

XSL - extensible stylesheet language t1 a1 a2 t2 a3 a4 t3 a5 a6

Template rules and XSL patterns } Template rule XSL pattern t1 t2 t3

Two exercises select row: {title: Y, date: Z} from biblio.paper X, X.title Y, X.date Z {row: {title: “The spiral DNA”, date: 1956}, {title: “Origin”, date: 1848}, {title: “Kapital”, date: 1860}} select row: {author: Y, date: Z} from biblio.book X, X.author Y, X.date Z

Which authors have written a book or a paper in 1992? select author: X from biblio.(book | paper) Y, Y.author X where Y.date = 1992

Which authors have written a book together with Jones? select author: X from biblio.book Y, Y.author X where “Jones” in Y.author

Which authors have written both a book and a paper? select author: A from biblio.book B, biblio.paper P, B.author A where B.author = P.author select author: A1 from biblio.book B, biblio.paper P, B.author A1, P.author A2 where A1 = A2

List all publications in 1992, their types, and titles. select publ: {type: L, title: T} from biblio.L X, X.title T where X.date = 1992

<!DOCTYPE db [ ]> a1 b1 a2 b2 a1 b1 c2 d2 a1 b1