IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

XML DOCUMENTS AND DATABASES
1 Web Data Management Path Expressions. 2 In this lecture Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul,
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
1 XEM: Managing the Evolution of XML Documents Author: Hong Su, Diane Kramer. Li Chen, Kajal Claypool and Elke A. Rundensteiner Presented by: Li Shuhong.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
Typing Semistructured Data By, Keshava Reddy Kottapally Goutham Chinnapolamada Source: Serge Abiteboul, Dan Suciu, Peter Buneman, Data on the web: From.
1 COS 425: Database and Information Management Systems XML and information exchange.
Semantic Web 06 th March, 2002 Robert Kaminski, Thomas Panas.
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
Managing XML and Semistructured Data
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Managing XML and Semistructured Data Lecture 2: XML Prof. Dan Suciu Spring 2001.
4/20/2017.
XML – Data Model, DTD and Schema
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
Introduction to XML 1. XML XML started out as a standard data exchange format for the Web Yet, it has quickly become the fundamental instrument in the.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
1 Semi-structured data Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Dimitrios Skoutas Alkis Simitsis
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 27 XML: Extensible Markup Language.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Jeff Ullman: Introduction to XML 1 XML Semistructured Data Extensible Markup Language Document Type Definitions.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Representing data with XML SE-2030 Dr. Mark L. Hornick 1.
Part One XML and Databases Soumen Chakrabarti CSE, IIT Bombay.
IT Enablement Approaches Large Business may have hundreds of processes to be enabled by IT. Several Types of Application may be deployed –Departmental.
XML SNU OOPSLA Lab. October Contents  Semistructured Data  Introduction  History  XML Application  DTD & XML Schema  DOM & SAX  Summary.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
XML Extensible Markup Language
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
S EMISTRUCTURED D ATA AND XML D ISCUSSION Q UESTION Think about your personal Itunes library. Should it be maintained in a database system?
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
XML: Extensible Markup Language
XML QUESTIONS AND ANSWERS
Management of XML and Semistructured Data
Management of XML and Semistructured Data
Managing XML and Semistructured Data
eXtensible Markup Language (XML)
Semi-Structured data (XML Data MODEL)
Lecture 9: XML Monday, October 17, 2005.
INFO/CSE 100, Spring 2006 Fluency in Information Technology
Lecture 8: XML Data Wednesday, October
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Introduction to Database Systems CSE 444 Lecture 10 XML
Semi-Structured data (XML)
Lecture 11: XML and Semistructured Data
Presentation transcript:

IS432: Semi-Structured Data Dr. Azeddine Chikh

1. Semi Structured Data Object Exchange Model

Introduction From a database perspective : the Web has generated an enormous demand for recently developed database architectures for database integration such as data warehouses and mediation systems The Web has led to the development of semistructured data model with languages adapted to this model. 3

Introduction The emergence of XML as a standard for data representation on the Web is expected greatly to facilitate the publication of electronic data by providing a simple syntax for data that is both human and machine readable 4

Introduction Although the document and database viewpoints were, until quite recently, irreconcilable, there is now a convergence in technologies brought about by the development of XML for data on the Web and the closely related development of semistructured data in the database community 5

Unstructured Data 6 data can be of any type not necessarily following any format or sequence does not follow any rules is not predictable examples include text video sound images

Structured Data 7 data is organized in semantic chunks (entities) similar entities are grouped together (relations or classes) entities in the same group have the same descriptions (attributes) descriptions for all entities in a group (schema) have the same defined format have a predefined length are all present and follow the same order

Semi-Structured Data 8 idea predates XML but not HTML data is available electronically in database systems file systems, e.g., bibliographic data, Web data data exchange formats, e.g., EDI, scientific data attempt to reconcile database and document "worlds" semi-structured data organized in semantic entities similar entities are grouped together entities in same group may not have same attributes order of attributes not necessarily important not all attributes may be required size of same attributes in a group may differ type of same attributes in a group may differ

Example of Semi-Structured Data 9 name: Azeddine CHIKH name: first name: Mourad last name: Benchikh name: Ashraf Youcef affiliation: IS Department

Semi-Structured Data Models 10 based on labelled graphs rather than labelled trees used for data exchange among, and integration of, heterogeneous data sources schema information is in the edge labels sometimes called schemaless or self-describing data stored at the leaves

Graph Terminology (1) 11 a (directed) graph G = (N,E) consists of a set N of nodes and a set E of edges each edge in E is an (ordered) pair of nodes (x,y), where x is the source and y is the target a path from x1 to xn is a sequence of edges (x1, x2), (x2, x3),..., (xn-1, xn) the length of a path is the number of edges in it a node r is a root for graph G if there is a path from r to every other node in G a cycle is a path from a node to itself a graph with no cycles is called acyclic

Graph Terminology (2) 12 a graph is rooted if it has a single root a tree is a rooted graph G in which there is a unique path from the root to every other node in G a node is a leaf if it is not the source of any edge graphs can have node labels and/or edge labels in an edge-labelled graph G = (N,E,FE), FE is an edge labelling function that maps each edge to a label in a node-labelled graph G = (N,E,FN), FN is a node labelling function that maps each node to a label

Object Exchange Model (OEM) 13 original OEM used only node labels we use a variant in which the edges are labelled an OEM data graph is a rooted, labelled, directed graph its edge labels map to strings only its leaf nodes have labels which map to data values no ordering of edges leaving a node

OEM Syntax 14 example may be written as { book: { author: "Coetzee", title: "Disgrace", year: 1999} } simple label-value pairs labels can be repeated, e.g., for multiple authors this is a serialization syntax for the graph what about graphs that are not trees? introduce object identifiers (oids) for nodes

Example of OEM Data Graph (1) 15

Example of OEM Data Graph (2) 16

Example of OEM Syntax 17 bib: &1 { paper: &2 {... }, book: &3 {... }, paper: &4 { author: &10 { firstname: &15 "Serge", lastname: &16 "Abiteboul”}, author: &11 {... } title: &12 {... } pages: &13 { first: &17 122, last: & }, references: &2, references: &3 }

Characteristics of SSData 18 structure is irregular: missing or additional attributes (labels) parts of data lack structure, e.g., images some may yield little structure, e.g., plain text a-priori schema vs a-posteriori dataguide db: fix the schema, then populate the db web: design pages, then design schema to facilitate access schema is large schema is often ignored, e.g., information retrieval queries schema is rapidly evolving

Schema Graphs 19 given some semi-structured data, might want to extract a schema that describes it useful for browsing the data by types optimizing queries by reducing the number of paths searched improving storage of data schema graph specifies what edges are permitted in a data graph every path in the data graph occurs in the schema graph

Example of a Schema Graph 20

Data Graph Satisfying a Schema G. 21 given data graph D and schema graph S D is an instance of S (or D satisfies S) if there exists a simulation R from D to S such that (root(D), root(S)) is in R a simulation is a relation R between nodes: if (u,v) is in R and (u,x) labelled l is in D then there exists (v,y) labelled l in S such that (x,y) is in R for our example: node &1 in D related under R to node at target of edge labelled bib in S &2 and &4 related to node at target of edge labelled paper &3 related to node at target of edge labelled book note that above two cases need to satisfy requirements of edges labelled references as well &10 and &11 related to node at target of edge labelled author

A Less Specific Schema Graph 22

Data Guides 23 Data guide is a concise and accurate summary of a data graph accurate: every path in the data occurs in the data guide, and vice versa concise: every path in the data guide occurs exactly once data guide is the most specific schema graph for a given data graph i.e., there is a simulation from the data guide to every other schema graph the data graph satisfies

Example of a Data Guide (1) 24

Example of a Data Guide (2) 25

References 26 Tutorial on semi-structured data by Peter Buneman from Symposium on Principles of Database Systems, www-db.stanford.edu/lore/research/data.html Abiteboul S., Buneman P., Suciu D., «Data on the Web - From Relations to Semistructured Data and XML», Morgan Kaufmann Publishers, San Francisco, California