Download presentation

Presentation is loading. Please wait.

Published byBailey Knop Modified over 3 years ago

1
Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong

2
Outline Motivations Overview Basic Concepts Cooperative Query Processing Experiment

3
Motivations XML data – same semantic content – very different structures

4
Example: same semantics, diff structures insurance claims related to smoking for woman User Query: Court Transcript: insurance claim plaintiff woman smoking Insurance Record: insurance claim insurer woman smoking

5
Motivations No exact query result phone number of Bob Who is the new sales manager User Query: personnel sales manager Joe phone number assistant sales manager Bob phone number salesman Data:

6
Overview Goal: – Return approximate answers for XML queries – approximate: semantic + structural similar Solution: – Return a set of results – ranked by an overall score score: indicates how well the subgraph containing the result satisfies the query criteria.

7
Basic Concepts:Query Tree Query:/restaurant[.//Soho]/phone_number Result Term For each edge: head: the end which is closer to nearest result term end: the other end In case of tie, head is the end closer to root Query Tree: restaurant soho phone_number r h t h t

8
Basic Concepts: Converging Order Order of edges considered in query processing Converge on a result term

9
Basic Concepts:Similarity Semantically similar topologies restaurant address soho restaurant soho restaurant soho eating_ places restaurant shopping_ center soho restaurant (a) (c)(e)(b)(d)

10
Basic Concepts: Similarity (cont.) Deviation Proximity (DP) – Measure how far one structure deviates from a desired structure – Given: r a : data node with value a r b : data node with value b Q(a,b): query tree edge – DP: the actual position of r b to the nearest position, r b, which satisfies the topological relationship specified by Q(a,b) Topological relationship: parent-child, ancestor-descendent

11
Deviation Proximity restaurant address soho restaurant soho eating_ places restaurant shopping_ center soho restaurant 02313 Q (restaurant, soho) requires parent-child relationship (soho) soho restaurant (soho) DP(restauarent, soho):

12
Deviation Proximity 02303 Q (restaurant, soho) requires anc-desc relationship restaurant address soho restaurant soho eating_ places restaurant shopping_ center soho restaurant (soho) soho restaurant (soho) DP(restauarent, soho):

13
Cooperative Query Processing Input: a Query Tree Q T, an XML Document Tree D T Output: ordered list of Cooperative Query Processing – Structural proximity calculation – Progressive Score

14
Cooperative Query Processing (cont.) Progressively matching edges in Q T with D T – Consider edges in converging order – For each edge Q T (a,b), where a is head and b is tail, get a list of r a is a node in D T with value a score is the progressive score of r a w.r.t the nearest r b use graph encoding to calculate structural proximity of r a and r b

15
Structural Proximity Calculation Encodings and Compressed Arrays – Compact – Preserve relationship to a larger graph – Facilitate distance calculations Proximity Searching

16
Encodings and Compressed Arrays Basic Concepts: – Common Node – Terminal Node – Annotated Node Path representation – Representing Single Path – Representing Multiple Paths – Representing Multiple Elements Compressed Arrays – Each encoding is a path/muti-path for a node/a set of nodes

17
Encodings and Compressed Arrays

18
Representing Single Path 1.1.1 y 1 1.2.1.1.1.1 y 2

19
Representing Multiple Paths 1.3 B.B.2.1.1 C.3 C.C.2 y 3

20
Representing Multiple Elements 1 A.A.1.1 y 1.2.1.1.1.1 y 2.3 B.B.2.1.1 C.3 C.C.2 y 3

21
Compressed Arrays

22
Drawback of Encoding 1 A.A.1 B.B.1 D.2 E. ?.2 C.C.1 F.2 G

23
Proximity Searching Multi-Element Comparison – Input: A compressed array, caN, containing the multi-element encoding of the Near Set. A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF. – output: dist, the shortest path from EF to the closest element in Near Set

24
Proximity Searching MinDist=5MinDist = 4MinDist = 2

25
Progressive Score Accumulative Deviation Proximity (DP) – Calculated from structural proximity Boolean operator at Query Tree branches a b c a b c prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))

26
Experiment Query: //restaurant/soho XML: Query Result:

27
Thank you!

28
Questions & Answers

Similar presentations

OK

DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.

DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google