Download presentation

Presentation is loading. Please wait.

Published byLily Foley Modified over 2 years ago

1
ICDT 2005 An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv

2
ICDT 2005 Motivation Queries and Databases Answers and Semantics Graph Properties

3
ICDT 2005 The Problem In many different domains, we are given the option to query some source of information Usually, the user only gets results if the query can be completely answered (satisfied) In many domains, this is not appropriate, e.g., The user is not familiar with the database The database does not contain complete information There is a mismatch between the ontology of the user and that of the database The query is a search that is not expected to be correct

4
ICDT 2005 Search for papers by Smith that appeared in ICDT 2004

5
ICDT 2005 Sorry, no matching record found

6
ICDT 2005 Search for buses from Haifa-Technion to Ben Gurion Airport

7
ICDT 2005 There is no direct bus line between the required destinations

8
ICDT 2005 Search for buses to Ben Gurion Airport

9
ICDT 2005 Must choose From and To

10
ICDT 2005 What Do Users Need? Users need a way to get interesting partial answers to their queries, especially if a complete answer does not exist These partial answers should contain maximal information Main Problems: What should be the semantics of partial answers? How can all partial answers be efficiently computed?

11
ICDT 2005 Previous Work Many solutions have been given for the main problems solutions differ, according to the problem domain Examples: Full disjunctions: Galindo-Legaria (94), Rajaraman, Ullman (96), Kanza, Sagiv (03) Queries with incomplete answers over semistructured data: Kanza, Nutt, Sagiv (99) FleXPath: Amer-Yahia, Lakshmanan, Pandit (04) Interconnections: Cohen, Kanza, Sagiv (03)

12
ICDT 2005 Our Contribution In the past, for each semantics considered, the query evaluation problem had to be studied anew. In this paper, we: Present a general framework for defining semantics for partial answers Framework is general enough to cover most previously studied semantics Query evaluation problem can be solved once within this framework – and reused for new semantics Results improve upon previous evaluation algorithms Presents relationship between this problem and that of the maximal P-subgraph problem

13
ICDT 2005 Motivation Queries and Databases Answers and Semantics Graph Properties

14
ICDT 2005 Databases Databases are modeled as data graphs: (V, E, r, l V, l E ) r: Can have a designated root l V : Labels on the vertices l E : Labels on the edges Note: Nodes correspond to data items Even databases that do not have an inherent graph structure can be modeled as graphs, e.g., relational databases

15
ICDT 2005 XML as a Data Graph Technion University Name Dept Name Faculty Name Faculty Professor Name Teaches Lecturer Name Teaches Computer Science Chana Israeli Databases Bioinformatics Avi Levy Biology Molecular Biology

16
ICDT 2005 Relational Database as a Data Graph Country Climate Canadadiverse UKtemporate USAtemporate Country City Hotel UKLondonPlaza CanadaMontrealHitlon CanadaTorontoRamada Country City Site UKLondonBuckingham USANYMetropolitan Climates Sites Accommodations

17
ICDT 2005 Relational Database as a Data Graph Country Climate Canadadiverse UKtemporate USAtemporate Country City Hotel UKLondonPlaza CanadaMontrealHitlon CanadaTorontoRamada Country City Site UKLondonBuckingham USANYMetropolitan Climates Sites Accommodations (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate))

18
ICDT 2005 Relational Database as a Data Graph Country City Hotel UKLondonPlaza CanadaMontrealHitlon CanadaTorontoRamada Country City Site UKLondonBuckingham USANYMetropolitan Sites Accommodations (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada))

19
ICDT 2005 Relational Database as a Data Graph Country City Site UKLondonBuckingham USANYMetropolitan Sites (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan))

20
ICDT 2005 Relational Database as a Data Graph (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan))

21
ICDT 2005 Queries Queries are modeled as query graphs: (V, E, r, C V, C E, s) r: Can have a designated root C V : Vertex constraints on the vertices (basically, a boolean function on vertices) C E : Edge constraints on the edges (basically, a boolean function on pairs of vertices) s: A structural constraint, one of the letters C, R, N (defines the required structure of answers, i.e., connected, rooted or none) Note: Nodes correspond to query variables

22
ICDT 2005 = Dept and ContainsText(Biology) XML Query as a Graph Returns faculty members from the Biology Department = University = Faculty = Name Is Descendent Is GrandChild Is Child Structural Constraint: Rooted

23
ICDT 2005 Join Query as a Graph C A S Belongs to: C Belongs to: ABelongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q2q2 q3q3 Structural Constraint: Connected

24
ICDT 2005 Motivation Queries and Databases Answers and Semantics Graph Properties

25
ICDT 2005 Assignment Graphs Assignment graphs are used to compactly represent assignments of query nodes to database nodes Basically, assignment graph for Q and D, written Q D has: Node (q,d) for each pair q Q and d D such that d satisfies the constraint on q Edge ((q,d), (q,d)) if there is an edge (q,q) in Q and (d,d) satisfies the constraint on (q,q) May also have a root (details omitted)

26
ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 )

27
ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

28
ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

29
ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

30
ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

31
ICDT 2005 Belongs to: A q2q2 Belongs to: C Belongs to: S C.Country = A.Country C.Country = S.Country A.Country = S.Company and A.City = S.City q1q1 q3q3 (C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) (A, (UK, London, Plaza)) (A, (Canda, Montreal, Hilton)) (A, (Canda, Toronto, Ramada)) (S, (UK, London, Buckingham)) (S, (USA, NY, Metropolitan)) c1c1 c2c2 c3c3 a1a1 a2a2 s1s1 s2s2 a3a3 (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

32
ICDT 2005 Partial Assignment A partial assignment is any subgraph of Q D that does not contain two different nodes (q,d) and (q,d) otherwise, would map the node q to two different database nodes Can distinguish special types of partial assignments: vertex complete edge complete structurally consistent Every query node must appear in the partial assignment Every edge constraint between query variables in the partial assignment holds The partial assignment satisfies the querys structural constraint

33
ICDT 2005 Vertex Complete, Edge Complete, Structurally Consistent Vertex Complete, Edge Complete, Structurally Consistent Vertex Complete, Edge Complete, Structurally Consistent Example (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 )

34
ICDT 2005 Semantics All partial assignments for Q over D that satisfy the vertex and edge constraints are encoded in Q D A semantics defines which subgraphs of the answer graph (i.e., which partial assignments) are in fact answers, e.g., S ves allows all partial assignments that are vertex complete, edge complete and structurally consistent S es allows all partial assignments that are edge complete and structurally consistent S s allows all partial assignments that are structurally consistent Usually, we are only interested in maximal partial assignemnts

35
ICDT 2005 Example: Join (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 ) Using semantics S ves we get the natural join

36
ICDT 2005 Example: Join becomes a Full Disjunction (q 3, s 1 ) (q 3, s 2 ) (q 1, c 1 ) (q 1, c 2 ) (q 1, c 3 ) (q 2, a 1 ) (q 2, a 2 ) (q 2, a 3 ) Using semantics S es we get the full disjunction

37
ICDT 2005 Other Examples Queries with incomplete answers over semistructured data: Kanza, Nutt, Sagiv (PODS 99) Weak semantics modeled by S es ; Or-semantics modeled by S s FleXPath: Amer-Yahia, Lakshmanan, Pandit (Sigmond 04) Modeled by S es Interconnections: Cohen, Kanza, Sagiv (03) Complete interconnection can be modeled by S es ; Reachable interconnection can be modeled by S s

38
ICDT 2005 Motivation Queries and Databases Answers and Semantics Graph Properties

39
ICDT 2005 Semantics are a type of Graph Property A graph property P is a set of graphs, e.g., is a clique is a bipartite graph A semantics defines a set of graphs, for every Q, D (these graphs are subgraphs of Q D) Therefore, semantics are a type of graph property

40
ICDT 2005 Hereditary Graph Properties and their Variants There are several interesting types of graph properties that have been studied in graph theory A graph property P is hereditary if every induced subgraph of a graph in P, is also in P (e.g., clique, is a forest) A graph property P is connected-hereditary if every connected induced subgraph of a graph in P, is also in P (e.g., is a tree) Can define rooted-hereditary similarly

41
ICDT 2005 Semantics are usually Hereditary Most semantics for partial answers considered in the past are hereditary (in some sense), i.e., subgraphs of a partial answer are also partial answers Many semantics require connectivity of results (e.g., full disjunctions) Some require answers to be rooted (e.g., FlexPath)

42
ICDT 2005 Maximal P-Subgraph Problem Given a graph property P, and a graph G The maximal P-subgraph problem is: Find all maximal induced subgraphs of G that have property P Therefore, the problem of finding all maximal answers for a query over a database, under a given semantics, is a special case of the maximal P-subgraph problem

43
ICDT 2005 Efficient Query Evaluation There are efficient algorithms that find all maximal P-subgraphs for hereditary, connected hereditary and rooted hereditary properties Efficient in terms of the input and the output (i.e., incremental polynomial time) Use these algorithms to find maximal query answers, e.g., to find full disjunctions, weak answers, or-answers, etc. Improves upon previous results

44
ICDT 2005 Conclusion Presented abstract framework Can model many different types of queries, databases and semantics in the framework Semantics in the framework are graph properties Solve the maximal P-subgraph problem once and reuse it to find maximal query answers

45
ICDT 2005 Future Work It is convenient to define ranking functions and return answers in ranking order How/when can this be done in our framework? Note: From the modeling it is immediately apparent that ranking cannot always be performed efficiently The problem of finding a maximal P-subgraph of size k is NP complete for hereditary and connected- hereditary graph properties (Yannakakis, STOC 78)

46
ICDT 2005 Thank you! Questions?

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google