Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

Similar presentations


Presentation on theme: "The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the."— Presentation transcript:

1 The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly Zhen Zhang, Bin He, and Kevin C. Chang

2 MetaQuerier 2 The Context: MetaQuerier @ UIUC Exploring and integrating the deep Web Explorer source discovery source modeling source indexing Integrator source selection schema integration query mediation FIND sources QUERY sources db of dbs unified query interface Amazon.com Cars.com 411localte.com Apartments.com

3 MetaQuerier 3 The Need: Querying alternative sources in the same domain Sources are proliferating in the same domain  2004 survey found 10% Web sites are “deep”  totaling 450,000 DBs on the Web Each query can often find many useful DBs Different query needs different sources  How to query across dynamic sources?

4 MetaQuerier 4 The Problem: Query translation on-the-fly Challenge:  No pre-configured source-specific translation knowledge Requirements:  Within domain: Source generality  Across domain: Domain portability

5 MetaQuerier 5 Dynamic query translation – Essential tasks Reconcile three levels of query heterogeneities  Attribute level: schema matching  Predicate level: predicate mapping  Query level: query rewriting

6 MetaQuerier 6 Demo. Form Assistant to help navigate the deep Web.

7 MetaQuerier 7 Translation objective: Closest among the valid Tom Clancy Source query Q s on source form S U Target query form T Query Translation Filter : σ title contain “red storm” and price 12 Union Query Q t *: Input: output: Two goals:  Syntactic valid  semantic close

8 MetaQuerier 8 What is valid? Each source has a query model Vocabulary: predicate templates { P 1, P 2, P 3, P 4, P 5 } Syntax: valid combination of predicate templates { F 1, F 2, F 3, F 4, F 5, F 6, F 7, F 8 } P1P1 P3P3 P4P4 P2P2 F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F7F7 F8F8 P1P1 νν P2P2 νν P3P3 νν P4P4 νν P5P5 νννν Tom Clancy P5P5 F5:F5: F6:F6:

9 MetaQuerier 9 What is close? Define semantic closeness. Minimal subsuming C min  No false positive: Miss no answer  Minimizing false negative: Fewest extra answers  Clear semantics: DB content independent  Modular translation: Reduce translation complexity t1:t1: 025 t2:t2: 45 s: 350 t1 v t2:t1 v t2: 045 t3:t3: 6545 t1 v t2 v t3:t1 v t2 v t3: 065 ?  C min

10 MetaQuerier 10 Target Query Source Query Enumerate valid Search for closest Target Query Translation Source Query What mechanism? Attribute Match Predicate Mapping Query Rewriter C min ?

11 MetaQuerier 11 Form Extractor Source query Q s Target query form QI Attribute Matcher: Syntax-based schema matching Predicate Mapper: Type-based search-driven mapping Query Rewriter: Constraint-based query rewriting Target query Q t * Domain-specific Thesaurus Domain-specific type handlers System architecture: Modular & lightweight Modularized mechanism Lightweight domain knowledge [RahmBernstein- VLDBJ01] [Halevy-VLDBJ01] ? [ZhangHC- SIGMOD04] [HeChang- SIGMOD03] [WuYDM- SIGMOD04]

12 MetaQuerier 12 The core challenge: Predicate mapping Tasks  Choose operator  Fill in values Union of target predicate t* Predicate Mapping U Objective  Minimal subsuming Input: output:

13 MetaQuerier 13 Is source-specific translation applicable? 1……… 1 ………….. 1 …… 1 ……. adult = $t  passenger = $t … price<$t  if $t<25: [price:between:0,25] elseif $t<45: … …

14 MetaQuerier 14 Enable source-generic predicate mapping? What is the scope of translation? What is the mechanism of translation?

15 MetaQuerier 15 The right scope? Survey 150 sources for the Correspondence Matrix.  Correspondences occur within localities!

16 MetaQuerier 16 The right scope? Correspondence locality  Type-based translation Target template P Target Predicate t* Type Recognizer Domain Specific Handler Text Handler Numeric Handler Datetime Handler Predicate Mapper Source predicate s  Correspondences occur within localities  Translation by type-handler

17 MetaQuerier 17 The right mechanism: Is pairwise-rule based mechanism suitable? Template new template 1nn+1 1 n  Adding one template needs to add 2n rules!  And need knowledge of the old templates. attr<$t  if $t<25: [attr:between:0,25] elseif $t<45: … … Rule:

18 MetaQuerier 18 More extendable mechanism? Search-driven. Values of the type (virtual database) Evaluate over “database” Templates of same type Evaluation results Search for closest evaluator -infinite+infinite01 t1:t1: 025 t2:t2: 45 s:s: 350 t 1 v t 2 : 2545 s t … u evaluator

19 MetaQuerier 19 Greedy search to construct C min mapping Find mapping iteratively Each iteration, greedily choose the one covering maximal uncovered t1:t1: 025 t2:t2: 45 s:s: 350 t3:t3: 4565

20 MetaQuerier 20 Experiments Translating 120 queries in total Between randomly paired sources from 8 domains With domain thesaurus but no type handler Accuracy as ratio of correct condition per query Matching 18% 40% 42% Extraction Mapping Average accuracyError distribution Basic: 3 domainsNew: 5 domains

21 MetaQuerier 21 Conclusion System:  Form assistant for querying Web databases Problem  Dynamic query translation Contributions:  Framework: Light-weight domain-based architecture  Techniques: Type-based search-driven pred. mapping  Insight: Holistic integration holds promise!

22 MetaQuerier 22 Thank You! For more information: http://metaquerier.cs.uiuc.edu kcchang@cs.uiuc.edu

23 MetaQuerier 23 What is close? Define semantic closeness. Minimal subsuming C min  No false positive Miss no correct answer  Minimizing false negative Contain fewest extra answers  Clear semantic Database content independent  Modular translation Reduce translation complexity t1:t1: 025 t2:t2: 45 s: 350 t 1 v t 2 : 2565 t3:t3: 45 t 2 v t 3 : 2565 ? C min

24 MetaQuerier 24 Experiment: Accuracy distribution Accuracy distribution for Basic dataset Accuracy distribution for New dataset

25 MetaQuerier 25 Text handler: Search space Conceptually, union of all target predicate Practically, close-world assumption

26 MetaQuerier 26 Text handler: Closeness estimation Ideally, logic reasoning Practically, evaluation-by-materialization  Materialize query against a “complete” database 


Download ppt "The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the."

Similar presentations


Ads by Google