Download presentation

Presentation is loading. Please wait.

Published byNeil Marske Modified over 2 years ago

1
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer Science of Toulouse (France)

2
2 The XML model The problem of querying XML documents Proposed techniques Our approach Implementation details Conclusion and future tasks Talk Outline

3
3 Document-centric vs. Data-centric Less regular or irregular structure, The order of sibling elements is important, Examples : Emails, books, etc. Document-centric More structured The order of sibling elements is often unimportant Examples : sales orders, configuration files, etc. Data-centric The XML Data Model

4
4 The XML Data Model (continued) Data are commonly modeled by a tree structure Nodes represent objects Edges represent relationships between objects Atomic values are attached to leaf nodes

5
The XML Data Model (continued) 4 2 1700 4 1100 1300 Variations in Structure cottage price 1300 identifier ″40″ character nbeds 4 cottage character identifier ″23″ nbeds 4 price room 1700 room cotglist nbeds 2 1100 summer winter

6
Query = Content + Structure Unknown, Irregular XML Document = Content + Structure Structure matching R.I. The Problem of Querying XML Documents Content matching Result Irregular structure In most cases, the queries return empty or incomplete set of answers Data has structural variations Relationships between objects are represented differently in different parts of the documents Data has ontology variations Different labels are used to describe objects of the same type (e.g. house, cottage)

7
Query should deal with different data structures Solution The Problem of Querying XML Documents (continued) The queries should not be rigid patterns (structure) Flexible handling of queries in order to find not only the answers that match exactly, but also with a similar structure and/or content

8
8 Proposed Techniques Query relaxation (S. AmerYahia, AT&T, 2002) Tree-edit distance (D. Shasha, K. Zhang, 1989 ) Correlation (A. Tversky, 1977 ) Data Relaxation (Damiani & Tanca, 2000 )

9
Our approach The minimum spanning tree (MST) - Optimization problem - A weighted graph Input Output - The cheapest subset of edges that keeps the graph in one connected component The minimum spanning tree

10
Proposed algorithm : Prim's algorithm (1957) Compute a minimum spanning tree by beginning with any vertex as the current tree. At each step add a least edge between any vertex not in the tree and any vertex in the tree. Continue until all vertices have been added. Kruskal's algorithm (1956) It maintains a set of partial minimum spanning trees, and repeatedly adds the shortest edge in the graph whose vertices are in different partial minimum spanning trees.

11
Querying XML documents with MST Define a similarity function that we will use for estimating the matching degree of the preferences The importance level determines the priority between the preferences replace the criteria by preferences with their importance levels The satisfaction degree of one preference is at least equal it importance level The answers subtrees are built gradually, starting by evaluating the leaf nodes and the most important preferences, going up until construct the answers tree like a Kruskal’s algorithm. cottage nbeds price 4 1400 0,8 0,6 Example : represent the queries by a weighted tree pattern

12
12 cottage i dentifier ″140″ character nbeds 4 cottage character identifier ″123″ nbeds 4 price room 1700 room cotglist nbeds 2 cottage nbeds price 4 1400 0,8 0,6 Sim(1300,1400)=0,9 Sim(price,price)=1 Sim(1300,1700) = 0,7 Sim=1 Sim=1,0 Sim=0,9 Sim=0,7 Example : price 1100 1300 summer winter

13
Index builder Query Processor Query Answer list Tag Index Attribute Index Data Index Term Index XML document XML collection Indexed collection The architecture of our querying system Some Implementation Details

14
Indexing method Efficiently determine the ancestors and descendent s of any node Dietz’s method ( 1982) Why Dietz’s method - for two given nodes x and y of a tree T, x is an ancestor of y iff x occurs before y in the preorder traversal and after y in the postorder traversal. A straightforward method Traversal order to determine the ancestor-descendant relationship

15
15 Future work Experiments within INEX (Initiative for the Evaluation of XML retrieval) Uses a Improving the similarity functions (Uses a thesaurus, etc.) Introducing the qualitative preferences (cheapest, nearest, small, etc.)

16
16 Thank You Questions?

Similar presentations

OK

CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.

CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on dc motor drives Ppt on collections framework in java One act play ppt on website Ppt on gestational diabetes mellitus Ppt on power grid failure usa Ppt on leverages resources Ppt on waves physics class 11 Ppt on service oriented architecture in healthcare Ppt on statistics and probability questions Ppt on great indian astronauts