Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

Similar presentations


Presentation on theme: "Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong."— Presentation transcript:

1 Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong

2 Outline Motivations Overview Basic Concepts Cooperative Query Processing Experiment

3 Motivations XML data – same semantic content – very different structures

4 Example: same semantics, diff structures insurance claims related to smoking for woman User Query: Court Transcript: insurance claim plaintiff woman smoking Insurance Record: insurance claim insurer woman smoking

5 Motivations No exact query result phone number of Bob Who is the new sales manager User Query: personnel sales manager Joe phone number assistant sales manager Bob phone number salesman Data:

6 Overview Goal: – Return approximate answers for XML queries – approximate: semantic + structural similar Solution: – Return a set of results – ranked by an overall score score: indicates how well the subgraph containing the result satisfies the query criteria.

7 Basic Concepts:Query Tree Query:/restaurant[.//Soho]/phone_number Result Term For each edge: head: the end which is closer to nearest result term end: the other end In case of tie, head is the end closer to root Query Tree: restaurant soho phone_number r h t h t

8 Basic Concepts: Converging Order Order of edges considered in query processing Converge on a result term

9 Basic Concepts:Similarity Semantically similar topologies restaurant address soho restaurant soho restaurant soho eating_ places restaurant shopping_ center soho restaurant (a) (c)(e)(b)(d)

10 Basic Concepts: Similarity (cont.) Deviation Proximity (DP) – Measure how far one structure deviates from a desired structure – Given: r a : data node with value a r b : data node with value b Q(a,b): query tree edge – DP: the actual position of r b to the nearest position, r b, which satisfies the topological relationship specified by Q(a,b) Topological relationship: parent-child, ancestor-descendent

11 Deviation Proximity restaurant address soho restaurant soho eating_ places restaurant shopping_ center soho restaurant 02313 Q (restaurant, soho) requires parent-child relationship (soho) soho restaurant (soho) DP(restauarent, soho):

12 Deviation Proximity 02303 Q (restaurant, soho) requires anc-desc relationship restaurant address soho restaurant soho eating_ places restaurant shopping_ center soho restaurant (soho) soho restaurant (soho) DP(restauarent, soho):

13 Cooperative Query Processing Input: a Query Tree Q T, an XML Document Tree D T Output: ordered list of Cooperative Query Processing – Structural proximity calculation – Progressive Score

14 Cooperative Query Processing (cont.) Progressively matching edges in Q T with D T – Consider edges in converging order – For each edge Q T (a,b), where a is head and b is tail, get a list of r a is a node in D T with value a score is the progressive score of r a w.r.t the nearest r b use graph encoding to calculate structural proximity of r a and r b

15 Structural Proximity Calculation Encodings and Compressed Arrays – Compact – Preserve relationship to a larger graph – Facilitate distance calculations Proximity Searching

16 Encodings and Compressed Arrays Basic Concepts: – Common Node – Terminal Node – Annotated Node Path representation – Representing Single Path – Representing Multiple Paths – Representing Multiple Elements Compressed Arrays – Each encoding is a path/muti-path for a node/a set of nodes

17 Encodings and Compressed Arrays

18 Representing Single Path 1.1.1 y 1 1.2.1.1.1.1 y 2

19 Representing Multiple Paths 1.3 B.B.2.1.1 C.3 C.C.2 y 3

20 Representing Multiple Elements 1 A.A.1.1 y 1.2.1.1.1.1 y 2.3 B.B.2.1.1 C.3 C.C.2 y 3

21 Compressed Arrays

22 Drawback of Encoding 1 A.A.1 B.B.1 D.2 E. ?.2 C.C.1 F.2 G

23 Proximity Searching Multi-Element Comparison – Input: A compressed array, caN, containing the multi-element encoding of the Near Set. A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF. – output: dist, the shortest path from EF to the closest element in Near Set

24 Proximity Searching MinDist=5MinDist = 4MinDist = 2

25 Progressive Score Accumulative Deviation Proximity (DP) – Calculated from structural proximity Boolean operator at Query Tree branches a b c a b c prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))

26 Experiment Query: //restaurant/soho XML: Query Result:

27 Thank you!

28 Questions & Answers


Download ppt "Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong."

Similar presentations


Ads by Google