From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University

Slides:



Advertisements
Similar presentations
2/10/05Salman Azhar: Database Systems1 XML Query Languages Salman Azhar XPATH XQUERY These slides use some figures, definitions, and explanations from.
Advertisements

XML e X tensible M arkup L anguage (XML) By: Albert Beng Kiat Tan Ayzer Mungan Edwin Hendriadi.
XML: Extensible Markup Language
CIS 670 Fall 2001 (LN 5)1 XML 4 Introduction to XML –XML basics –DTDs –XML and semistructured data 4 Query languages for XML XML-QL, XQL, XSL 4 XML extensions.
XML: Extensible Markup Language. Slide Chapter Outline Introduction Structured, Semi structured, and Unstructured Data. XML Hierarchical (Tree)
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
1 XEM: Managing the Evolution of XML Documents Author: Hong Su, Diane Kramer. Li Chen, Kajal Claypool and Elke A. Rundensteiner Presented by: Li Shuhong.
Storing and Querying XML Data in Databases Anupama Soli
Introduction to XLink Transparency No. 1 XML Information Set W3C Recommendation 24 October 2001 (1stEdition) 4 February 2004 (2ndEdition) Cheng-Chia Chen.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998
1 COS 425: Database and Information Management Systems XML and information exchange.
1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,
Winter 2002Arthur Keller – CS 18018–1 Schedule Today: Mar. 12 (T) u Semistructured Data, XML, XQuery. u Read Sections Assignment 8 due. Mar. 14.
LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003.
4/15/2002Bo Du 1 - Bo Du, April 15, XML - QL A Query Language for XML.
Fall 2001Arthur Keller – CS 18017–1 Schedule Nov. 27 (T) Semistructured Data, XML. u Read Sections Assignment 8 due. Nov. 29 (TH) The Real World,
Lore: A Database Management System for Semistructured Data.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
OEM and LORE Query Language Sanjay Madria Department of Computer Science University of Missouri-Rolla
Putting Semi-structured Data to Practice Alon Levy Seattle, Washingon University of Washington.
Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant.
Jennifer Widom XML Data DTDs, IDs & IDREFs. Jennifer Widom DTDs, IDs & IDREFs “Well-Formed” XML Adheres to basic structural requirements Single root element.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Semi-Structured Data Models By Chris Bennett. Semi-Structured Data  What is it? Data where structure not necessarily determined in advance (often implicit.
Introduction to XPath Bun Yue Professor, CS/CIS UHCL.
XML – Data Model, DTD and Schema
XP New Perspectives on XML Tutorial 4 1 XML Schema Tutorial – Carey ISBN Working with Namespaces and Schemas.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Selective and Authentic Third-Party distribution of XML Documents - Yashaswini Harsha Kumar - Netaji Mandava (Oct 16 th 2006)
Lecture 21 XML querying. 2 XSL (eXtensible Stylesheet Language) In HTML, default styling is built into browsers as tag set for HTML is predefined and.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
CSCE 520- Relational Data Model Lecture 2. Relational Data Model The following slides are reused by the permission of the author, J. Ullman, from the.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Querying Structured Text in an XML Database By Xuemei Luo.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
Of 33 lecture 3: xml and xml schema. of 33 XML, RDF, RDF Schema overview XML – simple introduction and XML Schema RDF – basics, language RDF Schema –
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Database Systems Part VII: XML Querying Software School of Hunan University
XML Name: Niki Sardjono Class: CS 157A Instructor : Prof. S. M. Lee.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
Semistructured Data Extensible Markup Language Document Type Definitions Zaki Malik November 04, 2008.
Lecture 20 XML. 2 Objectives What semistructured data is. Concepts of the Object Exchange Model (OEM), a model for semistructured data. Basics of Lore,
1 Tutorial 14 Validating Documents with Schemas Exploring the XML Schema Vocabulary.
Tutorial 13 Validating Documents with Schemas
Lore: A Database Management System for Semistructured Data.
Lore: A Database Management System for Semi-structured Data Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom Stanford University.
Jennifer Widom XML Data Introduction, Well-formed XML.
Semistructured Data. Semistructured data is data that has some structure, but it may be irregular and incomplete and does not necessarily conform to a.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
CSCE 520- Relational Data Model Lecture 2. Oracle login Login from the linux lab or ssh to one of the linux servers using your cse username and password.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
임 순 범 숙명여대 정보과학부 멀티미디어학과 1 III. XML-QL 멀티미디어 데이터베이스 ( ~11.1)
Modified Slides from Dr.Peter Buneman 1 XML Constraints Constraints are a fundamental part of the semantics of the data; XML may not come with a DTD/type.
Unit 4 Representing Web Data: XML
XML Data DTDs, IDs & IDREFs.
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
XML Constraints Constraints are a fundamental part of the semantics of the data; XML may not come with a DTD/type – thus constraints are often the only.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Lecture 11: XML and Semistructured Data
Presentation transcript:

From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University

Introduction Lore –Originally a DBMS designed specifically for semistructured data –Semistructured data models and XML share many similarities –Migrating Lore to work with XML Modifications to data model Changes to query language Changes to DataGuides

OEM (Object Exchange Model) Lore’s original data model All entities are atomic or complex objects Each object has a unique object identifier (oid) Atomic objects contain a value from one of the atomic types (integer, real, string, etc…) Complex objects are sets of pairs Can be thought of as a labeled directed graph –objects are nodes –complex objects have labeled outgoing edges –atomic objects contain their value

Differences between XML and OEM XML has attributes XML is ordered, OEM is not XML does not directly support graph structure –Uses special attribute types to encode graph structure –Example: Attribute Id is of type ID, Colleague is of type IDREF, and Author is of type IDREFS Colleague Author Jennifer WidomJeff Ullman

Literal vs. Semantic Data Model Should an XML data model be a literal tree corresponding to XML’s text representation? (where IDREF(S) are nothing but string attributes) Or should it be a graph that includes all the intended links? (preserving the semantic graph structure) It should be... BOTH! –Both literal and semantic modes should be supported –The user or application can select between the two

Lore’s XML Data Model An XML element is a pair eid is a unique element identifier value is either an atomic text string or a complex value containing the following four components: –A string-valued tag corresponding to the XML tag for that element –An ordered list of attribute-name/atomic-value pairs (attribute- name is a string, atomic-value has an atomic type) –An ordered list of crosslink subelements of the form where label is a string. Crosslink subelements are introduced via an attribute of type IDREF(S) –An ordered list of normal subelements of the form where label is a string. Normal subelements are introduced via lexical nesting within an XML document

XML Document/Graph Example eids appear within nodes (&1, &2, etc…) Attributes appear within brackets next to the nodes Two types of edges: Normal subelement edges labeled with destination subelement’s tag (solid line) Crosslink edges labeled with the attribute name that introduced the link (dashed line) Semantic vs. Literal: In semantic mode, omit attributes of type IDREF(S) In literal mode, omit crosslink edges

Migrating Lorel (Lore’s query language) Distinguishing between attributes and subelements –Lorel uses path expressions A sequence of labels such as DBGroup.Member.Project.Title Can also contain wildcards and regular expressions –Path expression qualifiers differentiate between attributes and subelements Placing a ‘>‘ before a label matches subelements only Placing a before a label matches attributes only Absence of qualifier means match both –Examples: DBGroup.Member.>Name will match name elements that are subelements of DBGroup.Member elements will match name attributes of DBGroup.Member elements DBGroup.Member.Name will match both

Migrating Lorel (continued...) Comparisons –How do we compare two different things? (for example, comparing constants with attribute values) All XML components are treated as atomic values... Functions that transform elements into strings: –Flatten(e) : Ignoring all tags, recursively serialize all text values in the subtree rooted at element e –Concatenate(e) : Concatenates all immediate text children of element e (subelements are ignored) –Tag(e) : Returns the XML tag of element e –Eid(e) : Returns the eid of element e as a string –XML(e) : Tranforms the graph, starting with element e, into an XML document Default Semantics (when no functions are specified): –atomic (Text) element : the text itself –elements with no attributes and only one or more Text elements as children : concatenation of the children’s text values –all others : the element’s eid represented as a string

Migrating Lorel (continued...) Range qualifiers –The expression [range] can be optionally applied to any path expression component or variable Example: select y from DBGroup.Member x, x.Office[1-2] y –returns the first two Office subelements of every group member Example: select y[1-2] from DBGroup.Member x, x.Office y –returns the first two Office subelements over ALL members Order-by clause –Query results are ordered lists of eids that identify the elements selected by the query (attributes are coerced into elements) –order-by-document-order orders results based on original XML document Newly constructed elements are placed at the end of the document order with no specified order among them

Migrating Lorel (continued...) Transformations and structured results –Using queries to restructure XML data The with clause (added to the standard select-from-where construct) –Query result will replicate all data selected by the select clause, along with all data reachable via a set of path expressions in the with clause Skolem functions –Allows more expressive data restructuring –Accepts a list of variables as arguments and produces one unique element for every binding of elements and/or attributes to the arguments Updates –Lorel supports an expressive update language –Changes for XML model: ability to create both attributes and elements order-relevant updates

Migrating Lorel (continued...) DataGuides –Can be used when a DTD is not supplied –A notion of order must be introduced Problem - could result in very large DataGuides –When DTD’s exist, DataGuides are built from those DTD’s –Combining DTD’s and DataGuides DTD’s available for specific portions of an XML database DataGuides can be used over portions not specified by DTD’s Conclusion As of June 1999, the migration of Lore to an XML model is nearly complete