Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

1 Web Data Management Path Expressions. 2 In this lecture Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul,
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
1 Web Data Management XML Schema. 2 In this lecture XML Schemas Elements v. Types Regular expressions Expressive power Resources W3C Draft:
&o1 &o12&o24&o29 &o43 &o96 &o243 &o206 &o25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” paper book paper references author title year http author.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
CSE 636 Data Integration XML Semistructured Data Document Type Definitions.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 Lecture 10: Database Design XML Wednesday, October 20, 2004.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
CSC056-Z1 – Database Management Systems – Vinnie Costa – Hofstra University1 Database Management Systems Session 10 Instructor: Vinnie Costa
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
Managing XML and Semistructured Data
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
1 Introduction to XML Yanlei Diao UMass Amherst April 19, 2007 Slides Courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives and Gerome Miklau.
Managing XML and Semistructured Data
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
End of SQL XML April 22 th, Null Values If x=Null then 4*(3-x)/7 is still NULL If x=Null then x=“Joe” is UNKNOWN Three boolean values: –FALSE =
XML, XML Schema, XPath and XQuery Query Languages CS561 Slides collated from several sources, including D. Suciu at Univ. of Washington.
Sebastian Bitzer Seminar Semistructured Data University of Osnabrueck May 2, 2003 XML An introduction in relation to semistructured.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
XML: Extensible Markup Language FST-UMAC Gong Zhiguo.
IS432 Semi-Structured Data
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
XML – what is it? eXtensible Markup Language Standard for publishing and interchange on the web and over the wire simpler version of SGML adapted to internet.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
Introduction to XQuery Resources: Official URL: Short intros:
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Semistructured data and XML CS 645 April 5, 2006 Some slide content courtesy of Ramakrishnan & Gehrke, Dan Suciu, Zack Ives.
S EMISTRUCTURED D ATA AND XML D ATA F ILES ON THE W EB HTML documents often generated by applications consumed by humans only easy access: across.
XML eXtensible Markup Language w3c standard Why? Store and transport data Easy data exchange Create more languages WSDL (Web Service Description Language)
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
1 What Is XML? eXtensible Markup Language for data –Standard for publishing and interchange –“Cleaner” SGML for the Internet Applications: –Data exchange.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
Lecture 20 XML. 2 Objectives What semistructured data is. Concepts of the Object Exchange Model (OEM), a model for semistructured data. Basics of Lore,
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 10: Schemas Monday, April 30, 2001.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
1 “Universal Data-Speak”: The eXtensible Markup Language Zack Ives CSE 590DB, Winter 2000 University of Washington 3 January 2000.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
XML SNU OOPSLA Lab. October Contents  Semistructured Data  Introduction  History  XML Application  DTD & XML Schema  DOM & SAX  Summary.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
S EMISTRUCTURED D ATA AND XML D ISCUSSION Q UESTION Think about your personal Itunes library. Should it be maintained in a database system?
Lecture 14: Relational Algebra Projects XML?
XML: Extensible Markup Language
XML path expressions CSE 350 Fall 2003.
Management of XML and Semistructured Data
Management of XML and Semistructured Data
Lecture 11 XML Wednesday, Oct. 24, 2001.
Lecture 12: XML, XPath, XQuery
Semi-Structured data (XML Data MODEL)
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Lecture 9: XML Monday, October 17, 2005.
CSE 544: Lecture 5 XML 4/15/2002.
Lecture 8: XML Data Wednesday, October
Introduction to Database Systems CSE 444 Lecture 10 XML
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of Washington

Database Management Systems, R. Ramakrishnan2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access: across platforms, across organizations  No application interoperability: HTML not understood by applications screen scraping brittle Database technology: client-server still vendor specific

Database Management Systems, R. Ramakrishnan3 New Universal Data Exchange Format: XML A recommendation from the W3C  XML = data  XML generated by applications  XML consumed by applications  Easy access: across platforms, organizations

Database Management Systems, R. Ramakrishnan4 Paradigm Shift on the Web  From documents (HTML) to data (XML)  From information retrieval to data management  For databases, also a paradigm shift: from relational model to semistructured data from data processing to data/query translation from storage to transport

Database Management Systems, R. Ramakrishnan5 Semistructured Data Origins:  Integration of heterogeneous sources  Data sources with non-rigid structure Biological data Web data

Database Management Systems, R. Ramakrishnan6 The Semistructured Data Model &o1 &o12&o24&o29 &o43 &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” paper book paper references author title year http author title publisher author title page firstname lastname firstnamelastnamefirst last Bib Object Exchange Model (OEM) complex object atomic object

Database Management Systems, R. Ramakrishnan7 Syntax for Semistructured Data Bib: &o1 { paper: &o12 { … }, book: &o24 { … }, paper: &o29 { author: &o52 “Abiteboul”, author: &o96 { firstname: &243 “Victor”, lastname: &o206 “Vianu”}, title: &o93 “Regular path queries with constraints”, references: &o12, references: &o24, pages: &o25 { first: &o64 122, last: &o92 133} } Observe: Nested tuples, set-values, oids!

Database Management Systems, R. Ramakrishnan8 Syntax for Semistructured Data May omit oids: { paper: { author: “Abiteboul”, author: { firstname: “Victor”, lastname: “Vianu”}, title: “Regular path queries …”, page: { first: 122, last: 133 } }

Database Management Systems, R. Ramakrishnan9 Characteristics of Semistructured Data  Missing or additional attributes  Multiple attributes  Different types in different objects  Heterogeneous collections Self-describing, irregular data, no a priori structure

Database Management Systems, R. Ramakrishnan10 Comparison with Relational Data { row: { name: “John”, phone: 3634 }, row: { name: “Sue”, phone: 6343 }, row: { name: “Dick”, phone: 6363 } } row name phone “John”3634“Sue”“Dick”

Database Management Systems, R. Ramakrishnan11 XML  A W3C standard to complement HTML  Origins: Structured text SGML Large-scale electronic publishing Data exchange on the web  Motivation: HTML describes presentation XML describes content  (version 2, 10/2000)

Database Management Systems, R. Ramakrishnan12 From HTML to XML HTML describes the presentation

Database Management Systems, R. Ramakrishnan13 HTML Bibliography Foundations of Databases Abiteboul, Hull, Vianu Addison Wesley, 1995 Data on the Web Abiteboul, Buneman, Suciu Morgan Kaufmann, 1999

Database Management Systems, R. Ramakrishnan14 XML Foundations… Abiteboul Hull Vianu Addison Wesley 1995 … XML describes the content

Database Management Systems, R. Ramakrishnan15 Why are we DB’ers interested?  It’s data, stupid. That’s us.  Proof by Google: database+XML – 1,940,000 pages.  Database issues: How are we going to model XML? (graphs). How are we going to query XML? (XQuery) How are we going to store XML (in a relational database? object-oriented? native?) How are we going to process XML efficiently? (many interesting research questions!)

Database Management Systems, R. Ramakrishnan16 Document Type Descriptors  Sort of like a schema but not really.  Inherited from SGML DTD standard  BNF grammar establishing constraints on element structure and content  Definitions of entities

Database Management Systems, R. Ramakrishnan17 Shortcomings of DTDs Useful for documents, but not so good for data:  Element name and type are associated globally  No support for structural re-use Object-oriented-like structures aren’t supported  No support for data types Can’t do data validation  Can have a single key item (ID), but: No support for multi-attribute keys No support for foreign keys (references to other keys) No constraints on IDREFs (reference only a Section)

Database Management Systems, R. Ramakrishnan18 XML Schema  In XML format  Element names and types associated locally  Includes primitive data types (integers, strings, dates, etc.)  Supports value-based constraints (integers > 100)  User-definable structured types  Inheritance (extension or restriction)  Foreign keys  Element-type reference constraints

Database Management Systems, R. Ramakrishnan19 Sample XML Schema …

Database Management Systems, R. Ramakrishnan20 Important XML Standards  XSL/XSLT: presentation and transformation standards  RDF: resource description framework (meta-info such as ratings, categorizations, etc.)  Xpath/Xpointer/Xlink: standard for linking to documents and elements within  Namespaces: for resolving name clashes  DOM: Document Object Model for manipulating XML documents  SAX: Simple API for XML parsing  XQuery: query language

Database Management Systems, R. Ramakrishnan21 XML Data Model (Graph) Issues: Distinguish between attributes and sub-elements? Should we conserve order?

Database Management Systems, R. Ramakrishnan22 XML Terminology  Tags: book, title, author, … start tag:, end tag:  Elements: …, … elements can be nested empty element: (Can be abbrv. )  XML document: Has a single root element  Well-formed XML document: Has matching tags  Valid XML document: conforms to a schema

Database Management Systems, R. Ramakrishnan23 More XML: Attributes Foundations of Databases Abiteboul … 1995 Attributes are alternative ways to represent data

Database Management Systems, R. Ramakrishnan24 More XML: Oids and References Jane Mary John oids and references in XML are just syntax

Database Management Systems, R. Ramakrishnan25 XML-Query Data Model  Describes XML data as a tree  Node ::= DocNode | ElemNode | ValueNode | AttrNode | NSNode | PINode | CommentNode | InfoItemNode | RefNode

Database Management Systems, R. Ramakrishnan26 XML-Query Data Model Element node (simplified definition):  elemNode : (QNameValue, {AttrNode }, [ ElemNode | ValueNode])  ElemNode  QNameValue = means “a tag name” Reads: “Give me a tag, a set of attributes, a list of elements/values, and I will return an element”

Database Management Systems, R. Ramakrishnan27 XML Query Data Model Example: <book price = “55” currency = “USD”> Foundations … Abiteboul Hull Vianu 1995 <book price = “55” currency = “USD”> Foundations … Abiteboul Hull Vianu 1995 book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8]) price2 = attrNode(…) /* next */ currency3 = attrNode(…) title4 = elemNode(title, string9) … book1= elemNode(book, {price2, currency3}, [title4, author5, author6, author7, year8]) price2 = attrNode(…) /* next */ currency3 = attrNode(…) title4 = elemNode(title, string9) …

Database Management Systems, R. Ramakrishnan28 XML Query Data Model Attribute node:  attrNode : (QNameValue, ValueNode)  AttrNode

Database Management Systems, R. Ramakrishnan29 XML Query Data Model Example: <book price = “55” currency = “USD”> Foundations … Abiteboul Hull Vianu 1995 <book price = “55” currency = “USD”> Foundations … Abiteboul Hull Vianu 1995 price2 = attrNode(price,string10) string10 = valueNode(…) /* next */ currency3 = attrNode(currency, string11) string11 = valueNode(…)

Database Management Systems, R. Ramakrishnan30 XML Query Data Model Value node:  ValueNode = StringValue | BoolValue | FloatValue …  stringValue : string  StringValue  boolValue : boolean  BoolValue  floatValue : float  FloatValue

Database Management Systems, R. Ramakrishnan31 XML Query Data Model Example: <book price = “55” currency = “USD”> Foundations … Abiteboul Hull Vianu 1995 <book price = “55” currency = “USD”> Foundations … Abiteboul Hull Vianu 1995 price2 = attrNode(price,string10) string10 = valueNode(stringValue(“55”)) currency3 = attrNode(currency, string11) string11 = valueNode(stringValue(“USD”)) title4 = elemNode(title, string9) string9 = valueNode(stringValue(“Foundations…”)) price2 = attrNode(price,string10) string10 = valueNode(stringValue(“55”)) currency3 = attrNode(currency, string11) string11 = valueNode(stringValue(“USD”)) title4 = elemNode(title, string9) string9 = valueNode(stringValue(“Foundations…”))

Database Management Systems, R. Ramakrishnan32 XML vs. Semistructured Data  Both described best by a graph  Both are schema-less, self-describing  XML is ordered, ssd is not  XML can mix text and elements: Making Java easier to type and easier to type Phil Wadler  XML has lots of other stuff: attributes, entities, processing instructions, comments