# Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong.

## Presentation on theme: "Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong."— Presentation transcript:

Cooperative Query Answering for Semistructured Data Speakers: Chuan Lin & Xi Zhang By Michael Barg and Raymond K. Wong

Outline Motivations Overview Basic Concepts Cooperative Query Processing Experiment

Motivations XML data – same semantic content – very different structures

Example: same semantics, diff structures insurance claims related to smoking for woman User Query: Court Transcript: insurance claim plaintiff woman smoking Insurance Record: insurance claim insurer woman smoking

Motivations No exact query result phone number of Bob Who is the new sales manager User Query: personnel sales manager Joe phone number assistant sales manager Bob phone number salesman Data:

Overview Goal: – Return approximate answers for XML queries – approximate: semantic + structural similar Solution: – Return a set of results – ranked by an overall score score: indicates how well the subgraph containing the result satisfies the query criteria.

Basic Concepts:Query Tree Query:/restaurant[.//Soho]/phone_number Result Term For each edge: head: the end which is closer to nearest result term end: the other end In case of tie, head is the end closer to root Query Tree: restaurant soho phone_number r h t h t

Basic Concepts: Converging Order Order of edges considered in query processing Converge on a result term

Basic Concepts:Similarity Semantically similar topologies restaurant address soho restaurant soho restaurant soho eating_ places restaurant shopping_ center soho restaurant (a) (c)(e)(b)(d)

Basic Concepts: Similarity (cont.) Deviation Proximity (DP) – Measure how far one structure deviates from a desired structure – Given: r a : data node with value a r b : data node with value b Q(a,b): query tree edge – DP: the actual position of r b to the nearest position, r b, which satisfies the topological relationship specified by Q(a,b) Topological relationship: parent-child, ancestor-descendent

Deviation Proximity restaurant address soho restaurant soho eating_ places restaurant shopping_ center soho restaurant 02313 Q (restaurant, soho) requires parent-child relationship (soho) soho restaurant (soho) DP(restauarent, soho):

Deviation Proximity 02303 Q (restaurant, soho) requires anc-desc relationship restaurant address soho restaurant soho eating_ places restaurant shopping_ center soho restaurant (soho) soho restaurant (soho) DP(restauarent, soho):

Cooperative Query Processing Input: a Query Tree Q T, an XML Document Tree D T Output: ordered list of Cooperative Query Processing – Structural proximity calculation – Progressive Score

Cooperative Query Processing (cont.) Progressively matching edges in Q T with D T – Consider edges in converging order – For each edge Q T (a,b), where a is head and b is tail, get a list of r a is a node in D T with value a score is the progressive score of r a w.r.t the nearest r b use graph encoding to calculate structural proximity of r a and r b

Structural Proximity Calculation Encodings and Compressed Arrays – Compact – Preserve relationship to a larger graph – Facilitate distance calculations Proximity Searching

Encodings and Compressed Arrays Basic Concepts: – Common Node – Terminal Node – Annotated Node Path representation – Representing Single Path – Representing Multiple Paths – Representing Multiple Elements Compressed Arrays – Each encoding is a path/muti-path for a node/a set of nodes

Encodings and Compressed Arrays

Representing Single Path 1.1.1 y 1 1.2.1.1.1.1 y 2

Representing Multiple Paths 1.3 B.B.2.1.1 C.3 C.C.2 y 3

Representing Multiple Elements 1 A.A.1.1 y 1.2.1.1.1.1 y 2.3 B.B.2.1.1 C.3 C.C.2 y 3

Compressed Arrays

Drawback of Encoding 1 A.A.1 B.B.1 D.2 E. ?.2 C.C.1 F.2 G

Proximity Searching Multi-Element Comparison – Input: A compressed array, caN, containing the multi-element encoding of the Near Set. A compressed array, caF, containing the multi-path encoding or path encoding of all paths from the root to the specified element of the Find Set, EF. – output: dist, the shortest path from EF to the closest element in Near Set

Proximity Searching MinDist=5MinDist = 4MinDist = 2

Progressive Score Accumulative Deviation Proximity (DP) – Calculated from structural proximity Boolean operator at Query Tree branches a b c a b c prog(a) = prog(b)+prog(c) prog(a) = min (prog(b),prog(c))

Experiment Query: //restaurant/soho XML: Query Result:

Thank you!