Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.

Slides:

Advertisements

Similar presentations

XML: Extensible Markup Language

Advertisements

W3C Workshop on Web Services Mark Nottingham

DOLAP'04 - Washington DC1 Constructing Search Space for Materialized View Selection Dimiti Theodoratos Wugang Xu New Jersey Institute of Technology.

Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.

1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

Page 1 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker Querying Relational Databases without Explicit Joins.

Information Retrieval in Practice

Xyleme A Dynamic Warehouse for XML Data of the Web.

NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.

Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.

Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.

2005rel-xml-ii1 The SilkRoute system  The system goals  Scenario, examples  View Forests  View forest and query composition  View forest efficient.

1 COS 425: Database and Information Management Systems XML and information exchange.

1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.

Storing and Querying Ordered XML Using a Relational Database System By Khang Nguyen Based on the paper of Igor Tatarinov and Statis Viglas.

Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by Context.

XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.

XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302

1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.

Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.

Using XSLT and XPath to Enhance HTML Documents Reference: Roger L. Costello

Intelligent Tutoring Systems Traditional CAI Fully specified presentation text Canned questions and associated answers Lack the ability to adapt to students.

Information Retrieval in Practice

Chapter 10 Architectural Design

Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 1: Introduction.

Protege OWL Plugin Short Tutorial. OWL Usage The world wide web is a natural application area of ontologies, because ontologies could be used to describe.

Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.

Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.

©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.

1 A Hierarchical Approach to Wrapper Induction Presentation by Tim Chartrand of A paper bypaper Ion Muslea, Steve Minton and Craig Knoblock.

Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.

Dimitrios Skoutas Alkis Simitsis

EXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M.

Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.

Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.

An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.

Database Systems Part VII: XML Querying Software School of Hunan University

Evaluation of Partial Path Queries on XML Data Stefanos Souldatos (NTUA, GREECE) Xiaoying Wu (NJIT, USA) Dimitri Theodoratos (NJIT, USA) Theodore Dalamagas.

6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.

Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.

1 1. Representing and Parameterizing Agent Behaviors Jan Allbeck and Norm Badler 연세대학교 컴퓨터과학과 로봇 공학 특강 학기 유 지 오.

A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.

The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.

Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.

1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,

Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.

COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.

Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.

16/11/ Web Services Choreography Requirements Presenter: Emilia Cimpian, NUIG-DERI, 07April W3C Working Draft.

Containment of Partially Specified Tree-Pattern Queries

A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.

Quality Assurance in the Presence of Variability Kim Lauenroth, Andreas Metzger, Klaus Pohl Institute for Computer Science and Business Information Systems.

Hierarchical Retrieval Fresher Learning Program December, 2011.

Relational-Style XML Query Taro L. Saito, Shinichi Morishita University of Tokyo June 10 th, SIGMOD 2008 Vancouver, Canada Presented by Sangkeun-Lee Reference.

1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.

SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.

Developing GRID Applications GRACE Project

4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.

Of 24 lecture 11: ontology – mediation, merging & aligning.

A Framework For Testing Web Services Based On XQPN Petri Nets Piotr Szwed, Dariusz Wadowski and Krzysztof Paździora Institute of Automatics, AGH University.

XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,

XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.

XML: Extensible Markup Language

Querying XML XPath.

Querying XML XPath.

Presentation transcript:

Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn. University of Athens, Greece)

Tree-structured Data Management Tree structures: a means to organize the information on the Web. Examples: taxonomies, thematic categories, concept hierarchies, product catalogs, etc. Organizing data in tree structures (tree-structured data) has been vastly established due to the popularity of the XML language. XML language (W3C): the standard data exchange format on the Web Data is stored natively in tree structures, or Data is publicly available in tree structures to enable its automatic processing by programs, scripts, and agents

Tree-structured Data Management Querying tree-structured data is based on path expression queries. Popular query languages for tree-structured data: XPath and XQuery (W3C), e.g: FOR $i IN /brand/type[price<900] RETURN {$i/id, $i/condition, $i/price} (find products cheaper than 900, and display their id, condition, and price) Querying tree-structured data hits to two major obstacles: the semistructured nature of data, lack of semantics. This is actually the penalty one has to pay for the flexibility offered by XML technologies.... Sony laptop 1 used

Semistructured Nature of Tree-structured Data Due to the first obstacle (i.e. semistructured nature): Querying tree-structured data requires to resolve structural differences and inconsistencies. The reason? different possible ways of organizing the same information in tree-structures. Examples: Structural differences: certain ‘nodes’ (i.e. categories, elements, etc...) exist in a tree-structured data source but not in another. Structural inconsistencies: variations in ‘node’ sequences (even within a single tree-structured data source).

Notebooks Custom Ultralight Multimedia Desktops 10'' Servers 8'' PDAs r MacHPSonyIBMSony HPIBM Notebooks Servers Desktops PDAs r Mac HP Sony HPIBM DellSony Used NewUsed NewUsedNew Product Catalog A Multimedia HP IBM Product Catalog B Structural difference Product catalog A has a finer categorization on notebooks, e.g.: Custom/Ultralight and 10’’/8’’ (for the ultralight) compared to Catalog B.

Notebooks Custom Ultralight Multimedia Desktops 10'' Servers 8'' PDAs r MacHPSonyIBM Sony HPIBM Notebooks New UsedServers Desktops PDAs r Mac HP SonyHPIBM MacSony DellSony Used NewUsed New Used NewUsedNew Product Catalog A Multimedia HPIBM Product Catalog B Structural inconsistency Product catalog A classifies notebooks by brand and next by condition, while catalog B the other way around (Sony/Used vs Used/Sony).

Semistructured Nature of Tree-structured Data... Sony laptop used laptop used Sony brand type condition type condition brand Structural inconsistency (...cont.) An XML doc includes the element sequence brand, type, condition, while another one (for same data) includes type, condition, brand. Such inconsistencies are observed even within tree-structured data of a single data source.

Semistructured Nature of Tree-structured Data How structural differences and inconsistencies affects querying of tree-structured data? The user should explicitly specify them as part of the query. Extremely cumbersome. E.g.: explicitly specify disjunctions of possible alternative node sequences: /brand/type[price<900] OR /type/condition[price<900] OR /condition/type[price<900].... Sony laptop used laptop used Sony

Semistructured Nature of Tree-structured Data However, sometimes specifying alternate node sequences is not due to the need to resolve structural differences and inconsistencies. Users should be able to pose queries even if they do not know (or do not care about) the exact structure of tree- structured data sources. e.g. find products cheaper than 900, and display their id, condition, and price...but I do not know (or I do not care!) whether condition is before brand and type! Currently, query formulation on tree-structured data is strictly dependent on the structure of data. Only ancestor/descendant relationship may produce relaxed path expressions (brand//type).

-10- Lack of Semantics in Tree-structured Data Reminder: Querying tree-structured data hits to two major obstacles: the semistructured nature of data (just explained) + lack of semantics. Tree-structured data provides mainly syntactic and not semantic information. However, there are inherent semantics in tree-structured data. Sets of nodes in a catalog are usually related under a semantic interpretation, e.g. Mac, HP, Sony refer to a brand name. Such information can be exploited to become part of query formulation and support query optimization. Currently, query formulation on tree-structured data ignores this issue.

-11- Our Approach We introduce the notion of dimension graphs to capture semantic information in tree-structured data. We design a query language for tree-structured data. Queries are not cast on the structure of tree-structured data. Queries can handle structural differences and inconsistencies effectively. We discuss query evaluation issues. We show how dimension graphs can be used to query multiple tree-structured data sources.

-12- Data Model We use value trees to represent tree-structured data. Values (i.e. nodes) in value trees are grouped to form dimensions. A dimension......is a set of semantically related nodes (i.e. values) in the value tree. The semantic interpretation is given by the user. Two nodes in the same path cannot belong to the same dimension.

-13- Data Model Notebooks NewUsedServers Desktops PDAs r MacHPSony pc_type brand HPIBMMacSony DellSony UsedNewUsed condition brand R Used Multimedia HPIBM brand conditionpc_category E.g. dimensions pc_type = {Notebooks, Desktops, PDAs}, pc_category = {Servers, Multimedia}, brand = {Mac, Sony, HP, IBM, Dell}, etc. pc_type

-14- Data Model We use dimension graphs to capture relationships between dimensions. The nodes of a dimension graph represent dimensions. There is an edge from dimension D1 to D2 if a value of D1 is the parent of some value in D2.

-15- Data Model Notebooks NewUsedServers Desktops PDAs r MacHPSony pc_type brand HPIBMMacSony DellSony UsedNewUsed condition brand R Used Multimedia HPIBM brand conditionpc_category condition R pc_type pc_category brand Value Tree T Dimension Graph of T pc_type

-16- Data Model Notebooks NewUsedServers Desktops PDAs r MacHPSony pc_type brand HPIBMMacSony DellSony UsedNewUsed condition brand R Used Multimedia HPIBM brand conditionpc_category condition R pc_type pc_category brand Value Tree T Dimension Graph of T

-17- Data Model A dimension graph... can be automatically extracted from a value tree, given the dimensions, provides an abstraction of the structural information of value trees, provides semantic query guidance to pose queries on tree- structured data, in the presence of structural differences and inconsistencies, supports query evaluation and optimization....will be explained soon.

-18- Querying Tree-structured Data Queries are defined on dimension graphs and not directly on value trees. The user annotates some dimensions. Also, she has the choice of not specifying or partially specifying parent-child and ancestor-descendant relationships between the annotated dimensions in a query. Our system identifies possible ‘valid’ orderings of dimensions exploiting the dimension graph. These orderings are used as patterns for constructing a set of path expressions to be sent directly to the value trees.

-19- Querying Notebooks NewUsedServers Desktops PDAs r MacHPSony pc_type brand HPIBMMacSony DellSony UsedNewUsed condition brand R Used Multimedia HPIBM brand conditionpc_category Value Tree T Query on Dimension Graph of T condition = {used} R pc_type = ? pc_category brand = {Sony, IBM} annotated dimension = ? the dimension can have any value = {... } the dimension should have specific values

-20- Querying Notebooks NewUsedServers Desktops PDAs r MacHPSony pc_type brand HPIBMMacSony DellSony UsedNewUsed condition brand R Used Multimedia HPIBM brand conditionpc_category Value Tree T Query on Dimension Graph of T condition = {used} R pc_type = ? pc_category brand = {Sony, IBM} ‘Find all Sony, IBM used products’, i.e. find paths in T from r to a leaf node that contain -any of the values of dimension pc_type, -the value ‘used’ of dimension condition, -either value ‘Sony’ or ‘IBM’ of dimension brand.

-21- Querying Notebooks NewUsedServers Desktops PDAs r MacHPSony pc_type brand HPIBMMacSony DellSony UsedNewUsed condition brand R Used Multimedia HPIBM brand conditionpc_category Value Tree T Query on Dimension Graph of T condition = {used} R pc_type = ? pc_category brand = {Sony, IBM} ‘Find all Sony, IBM used products’, i.e. find paths in T from r to a leaf node that contain -any of the values of dimension pc_type, -the value ‘used’ of dimension condition, -either value ‘Sony’ or ‘IBM’ of dimension brand.

-22- Querying Notebooks NewUsedServers Desktops PDAs r MacHPSony pc_type brand HPIBMMacSony DellSony UsedNewUsed condition brand R Used Multimedia HPIBM brand conditionpc_category Value Tree T Query on Dimension Graph of T condition = {used} R pc_type = ? pc_category brand = {Sony, IBM} Notice how query handles the structural inconsistencies!

-23- Querying Notebooks NewUsedServers Desktops PDAs r MacHPSony pc_type brand HPIBMMacSony condition R pc_category Value Tree T Query on Dimension Graph of T condition = {used} R pc_type = ? pc_category brand = {Sony, IBM} ‘Find all Sony, IBM used products. However, the nodes referring to brand name should be after the node ‘used’.’, i.e. Find paths in T from r to a leaf node that contain -any of the values of dimension pc_type, -the value ‘used’ of dimension condition, -either value ‘Sony’ or ‘IBM’ of dimension brand, However: values of condition should be parents of values of brand

-24- Querying Notebooks NewUsedServers Desktops PDAs r MacHPSony pc_type brand HPIBMMacSony DellSony UsedNewUsed condition brand R Used Multimedia HP IBM brand conditionpc_category Value Tree T Query on Dimension Graph of T condition = {used} R pc_type = ? pc_category brand = {Sony, IBM} Find paths in T from r to a leaf node that contain -any of the values of dimension pc_type, -the value ‘used’ of dimension condition, -either value ‘Sony’ or ‘IBM’ of dimension brand, However: values of condition should be parents of values of brand.

-25- Query Evaluation Query evaluation exploits dimension graphs to detect answer paths. An answer path is a path in a dimension graph that starts from R, includes all annotated dimensions, and ends on an annotated dimension. Query on Dimension Graph of T condition = {used} R pc_type = ? mobile_type pc_category brand = {Sony, IBM} Examples of answer paths: /R/pc_type/condition/brand, /R/pc_type/pc_category/brand/condition,....

-26- Query Evaluation Notebooks NewUsedServers Desktops PDAs r MacHPSony pc_type brand HPIBMMacSony DellSony UsedNewUsed condition brand R Used Multimedia HPIBM brand conditionpc_category Value Tree T Query on Dimension Graph of T condition = {used} R pc_type = ? pc_category brand = {Sony, IBM} Answer paths are used to generate path expressions to be exploited by e.g. an XQuery engine to retrieve the answers from a value tree. E.g. /R/pc_type/condition/brand gives /r/(Notebooks|Desktops)/Used/(Sony|IBM)

-27- Query Evaluation The answer paths help to detect ordering of values that can possibly exist in a value tree. Only these value orderings will be used to compute the answer of a query on the value tree. This is performed before query evaluation reaches the value tree. Detecting answers paths in a dimension graph is not a costly task since dimension graphs are much smaller than value trees.

-28- Query Evaluation Query evaluation exploits dimension graphs to detect unsatisfiable queries (i.e. queries with empty answers in the value tree). Examples of unsatisfiable queries: R pc_type = ? brand = ? mobile_type condition pc_category condition R pc_type = ? mobile_type = ? pc_category Brand = ? R pc_type brand mobile_type = ? condition =? pc_category = ? No answer paths! Two children have the same parent! No path from condition to mobile_type!

-29- Query Evaluation Dimension graphs can be used to query multiple value trees. Consider value trees T1, T2,..., Tn over a dimension set D. Let G1, G2,..., Gn be their dimension graphs. Construct a global dimension graph G by merging G1, G2,..., Gn. Queries are formed on G. The annotations are transferred to G1, G2,..., Gn. Query evaluation is performed as described before.

-30- Conclusions Querying tree-structured data using dimension graphs: Dimension graphs: capture semantic information in tree- structured data. Used for query formulation and evaluation. Queries are not cast on the structure of tree-structured data but on dimension graphs. Queries can handle structural differences and inconsistencies in value trees. Query evaluation exploits dimension graphs to generate appropriate path expressions to be be evaluated on the value trees. Dimension graphs can be also used to query multiple value trees.