Presentation is loading. Please wait.

Presentation is loading. Please wait.

Answering Imprecise Queries over Autonomous Web Databases Ullas Nambiar Dept. of Computer Science University of California, Davis Subbarao Kambhampati.

Similar presentations


Presentation on theme: "Answering Imprecise Queries over Autonomous Web Databases Ullas Nambiar Dept. of Computer Science University of California, Davis Subbarao Kambhampati."— Presentation transcript:

1 Answering Imprecise Queries over Autonomous Web Databases Ullas Nambiar Dept. of Computer Science University of California, Davis Subbarao Kambhampati Dept. of Computer Science Arizona State University 5 th April, ICDE 2006, Atlanta, USA

2 Answering Imprecise Queries over Autonomous Web Databases Dichotomy in Query Processing Databases User knows what she wants User query completely expresses the need Answers exactly matching query constraints IR Systems User has an idea of what she wants User query captures the need to some degree Answers ranked by degree of relevance

3 Answering Imprecise Queries over Autonomous Web Databases Why Support Imprecise Queries ? Want a ‘sedan’ priced around $7000 A Feasible Query Make =“Toyota”, Model=“Camry”, Price ≤ $7000 What about the price of a Honda Accord? Is there a Camry for $7100? Solution: Support Imprecise Queries  ………  1998  $6500  Camry  Toyota  2000  $6700  Camry  Toyota  2001  $7000  Camry  Toyota  1999  $7000  Camry  Toyota

4 Answering Imprecise Queries over Autonomous Web Databases Others are following …

5 Answering Imprecise Queries over Autonomous Web Databases The Problem: Given a conjunctive query Q over a relation R, find a set of tuples that will be considered relevant by the user. Ans(Q) ={x|x Є R, Relevance(Q,x) >c} Objectives – Minimal burden on the end user – No changes to existing database – Domain independent Motivation – How far can we go with relevance model estimated from database ? Tuples represent real-world objects and relationships between them – Use the estimated relevance model to provide a ranked set of tuples similar to the query What does Supporting Imprecise Queries Mean?

6 Answering Imprecise Queries over Autonomous Web Databases Challenges  Estimating Query-Tuple Similarity – Weighted summation of attribute similarities – Need to estimate semantic similarity  Measuring Attribute Importance – Not all attributes equally important – Users cannot quantify importance

7 Answering Imprecise Queries over Autonomous Web Databases Our Solution: AIMQ

8 Answering Imprecise Queries over Autonomous Web Databases An Illustrative Example Relation:- CarDB(Make, Model, Price, Year) Imprecise query Q :− CarDB(Model like “Camry”, Price like “10k”) Base query Q pr :− CarDB(Model = “Camry”, Price = “10k”) Base set A bs Make = “Toyota”, Model = “Camry”, Price = “10k”, Year = “2000” Make = “Toyota”, Model = “Camry”, Price = “10k”, Year = “2001”

9 Answering Imprecise Queries over Autonomous Web Databases Obtaining Extended Set  Problem: Given base set, find tuples from database similar to tuples in base set.  Solution: – Consider each tuple in base set as a selection query. e.g. Make = “Toyota”, Model = “Camry”, Price = “10k”, Year = “2000” – Relax each such query to obtain “similar” precise queries. e.g. Make = “Toyota”, Model = “Camry”, Price = “”, Year =“2000” – Execute and determine tuples having similarity above some threshold.  Challenge: Which attribute should be relaxed first? – Make ? Model ? Price ? Year ? Solution: Relax least important attribute first.

10 Answering Imprecise Queries over Autonomous Web Databases Least Important Attribute  Definition: An attribute whose binding value when changed has minimal effect on values binding other attributes. – Does not decide values of other attributes – Value may depend on other attributes E.g. Changing/relaxing Price will usually not affect other attributesbut changing Model usually affects Price  Requires dependence between attributes to decide relative importance – Attribute dependence information not provided by sources – Learn using Approximate Functional Dependencies & Approximate Keys Approximate Functional Dependency (AFD) X  A is a FD over r’, r’ ⊆ r If error(X  A ) = |r-r’|/ |r| < 1 then X  A is a AFD over r. Approximate in the sense that they are obeyed by a large percentage (but not all) of the tuples in the database TANE- an algorithm by Huhtala et al [1999] used to mine AFDs and Approximate Keys Exponential in the number of attributes Linear in the number of tuples

11 Answering Imprecise Queries over Autonomous Web Databases Deciding Attribute Importance  Mine AFDs and Approximate Keys  Create dependence graph using AFDs – Strongly connected hence a topological sort not possible  Using Approximate Key with highest support partition attributes into – Deciding set – Dependent set – Sort the subsets using dependence and influence weights  Measure attribute importance as CarDB(Make, Model, Year, Price) Decides: Make, Year Depends: Model, Price Order: Price, Model, Year, Make 1- attribute: { Price, Model, Year, Make} 2-attribute: {(Price, Model), (Price, Year), (Price, Make).. } Attribute relaxation order is all non- keys first then keys Greedy multi-attribute relaxation

12 Answering Imprecise Queries over Autonomous Web Databases Query-Tuple Similarity  Tuples in extended set show different levels of relevance  Ranked according to their similarity to the corresponding tuples in base set using – n = Count(Attributes(R)) and W imp is the importance weight of the attribute – Euclidean distance as similarity for numerical attributes e.g. Price, Year – VSim – semantic value similarity estimated by AIMQ for categorical attributes e.g. Make, Model

13 Answering Imprecise Queries over Autonomous Web Databases Categorical Value Similarity  Two words are semantically similar if they have a common context – from NLP  Context of a value represented as a set of bags of co-occurring values called Supertuple  Value Similarity: Estimated as the percentage of common {Attribute, Value} pairs – Measured as the Jaccard Similarity among supertuples representing the values ST(Q Make=Toy ota ) ModelCamry: 3, Corolla: 4,…. Year2000:6,1999:5 2001:2,…… Price5995:4, 6500:3, 4000:6 Supertuple for Concept Make=Toyota JaccardSim(A,B) =

14 Answering Imprecise Queries over Autonomous Web Databases Empirical Evaluation  Goal – Test robustness of learned dependencies – Evaluate the effectiveness of the query relaxation and similarity estimation  Database – Used car database CarDB based on Yahoo Autos CarDB( Make, Model, Year, Price, Mileage, Location, Color) Populated using 100k tuples from Yahoo Autos – Census Database from UCI Machine Learning Repository Populated using 45k tuples  Algorithms – AIMQ RandomRelax – randomly picks attribute to relax GuidedRelax – uses relaxation order determined using approximate keys and AFDs – ROCK: RObust Clustering using linKs (Guha et al, ICDE 1999) Compute Neighbours and Links between every tuple  Neighbour – tuples similar to each other  Link – Number of common neighbours between two tuples Cluster tuples having common neighbours

15 Answering Imprecise Queries over Autonomous Web Databases Robustness of Dependencies Attribute dependence order & Key quality is unaffected by sampling

16 Answering Imprecise Queries over Autonomous Web Databases Robustness of Value Similarities ValueSimilar Values25K100k Make=“Kia”Hyundai0.17 Isuzu0.15 Subaru0.13 Make=“Bronco”Aerostar0.190.21 F-35000.12 Econoline Van0.11 Year=“1985”19860.16 19840.130.14 19870.12

17 Answering Imprecise Queries over Autonomous Web Databases Efficiency of Relaxation Average 8 tuples extracted per relevant tuple for Є =0.5. Increases to 120 tuples for Є=0.7. Not resilient to change in Є Average 4 tuples extracted per relevant tuple for Є=0.5. Goes up to 12 tuples for Є= 0.7. Resilient to change in Є Random Relaxation Guided Relaxation

18 Answering Imprecise Queries over Autonomous Web Databases Accuracy over CarDB 14 queries over 100K tuples Similarity learned using 25k sample Mean Reciprocal Rank (MRR) estimated as Overall high MRR shows high relevance of suggested answers

19 Answering Imprecise Queries over Autonomous Web Databases Accuracy over CensusDB 1000 randomly selected tuples as queries Overall high MRR for AIMQ shows higher relevance of suggested answers

20 Answering Imprecise Queries over Autonomous Web Databases AIMQ - Summary  An approach for answering imprecise queries over Web database – Mine and use AFDs to determine attribute order – Domain independent semantic similarity estimation technique – Automatically compute attribute importance scores  Empirical evaluation shows – Efficiency and robustness of algorithms – Better performance than current approaches – High relevance of suggested answers – Domain independence

21 Answering Imprecise Queries over Autonomous Web Databases


Download ppt "Answering Imprecise Queries over Autonomous Web Databases Ullas Nambiar Dept. of Computer Science University of California, Davis Subbarao Kambhampati."

Similar presentations


Ads by Google