The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.

Slides:



Advertisements
Similar presentations
-- MetaQuerier Mid-flight -- Toward Large-Scale Integration: Building a MetaQuerier over Databases on the Web Kevin C. Chang Joint work with: Bin He, Zhen.
Advertisements

Natural Language Interfaces to Ontologies Danica Damljanović
BIOMEDICAL DATA INTEGRATION BASED ON METAQUERIER ARCHITECTURE GROUP MEMBERS -NAIEEM KHAN -EUSUF ABDULLAH MIM -M SAMIULLAH CHOWDHURY ADVISOR : KHONDKER.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
SRDC Ltd. 1. Problem  Solutions  Various standardization efforts ◦ Document models addressing a broad range of requirements vs Industry Specific Document.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Dec 2003, DRTC©C.Watters 1 Mobile Access to the Digital Library.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
CS652 Spring 2004 Summary. Course Objectives  Learn how to extract, structure, and integrate Web information  Learn what the Semantic Web is  Learn.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He, Kevin Chen-Chuan Chang, Jiawei Han Presented by Dayi Zhou.
Mining in the Middle: From Search to Integration on the Web Kevin C. Chang Joint with : the UIUC and Cazoodle Teams Mining Integration Search.
Finding Hidden Correlations and Filtering out Incorrect Matchings with Compatibility Detection across Web Query Interfaces Lei Lei June 11, 2004 June 11,
1 Statistical Schema Matching across Web Query Interfaces Bin He , Kevin Chen-Chuan Chang SIGMOD 2003.
CS246 Query Translation. Mind Your Vocabulary Q: What is the problem? A: How to integrate heterogeneous sources when their schema & capability are different.
Information Extraction from HTML: General Machine Learning Approach Using SRV.
11/8/20051 Ontology Translation on the Semantic Web D. Dou, D. McDermott, P. Qi Computer Science, Yale University Presented by Z. Chen CIS 607 SII, Week.
Distributed Database Management Systems. Reading Textbook: Ch. 4 Textbook: Ch. 4 FarkasCSCE Spring
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
MetaQuerier Mid-flight: Toward Large-Scale Integration for the Deep Web Kevin C. Chang.
Ontology translation: two approaches Xiangkui Yao OntoMorph: A Translation System for Symbolic Knowledge By: Hans Chalupsky Ontology Translation on the.
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Query Translation of Web Database Integration: Issues, Advances and Directions Fangjiao Jiang.
© Ramesh Jain Ramesh Jain CTO, PRAJA inc. and Professor Emeritus, UCSD Emergent Semantics and Experiential Computing.
Public Conversations Architecture Clustering Results Conversation Map Conclusion CEES: Intelligent Access to Public Conversations William Lee,
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
ITEC224 Database Programming
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
Light-weight Domain-based Form Assistant: Querying Web Databases On The Fly Authors:Z. Zhang, B. He, K. C.-C. Chang (Univ. of Illinois at Urbana-Champaign)
DATA-DRIVEN UNDERSTANDING AND REFINEMENT OF SCHEMA MAPPINGS Data Integration and Service Computing ITCS 6010.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
Architectural Patterns Support Lecture. Software Architecture l Architecture is OVERLOADED System architecture Application architecture l Architecture.
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Entity Search Are you searching for what you want? Kevin C. Chang Joint work with: Bin He, Zhen Zhang, Chengkai Li, Govind Kabra, Shui-Lung Chuang, Joe.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Kevin C. Chang. About the collaboration -- Cazoodle 2 Coming next week: Vacation Rental Search.
Large-Scale Deep Web Integration: Exploring and Querying Structured Data on the Deep Web Kevin C. Chang Tutorial in SIGMOD’06.
Supporting Ranking and Clustering as Generalized Order-By and Group-By Chengkai Li (UIUC) joint work with Min Wang Lipyeow Lim Haixun Wang (IBM) Kevin.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Understanding Web Query Interfaces: Best-Efforts Parsing with Hidden Syntax.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Ewa Deelman, Virtual Metadata Catalogs: Augmenting Existing Metadata Catalogs with Semantic Representations Yolanda Gil, Varun Ratnakar,
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He Joint work with: Kevin Chen-Chuan Chang, Jiawei Han Univ.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Supporting Ranking and Clustering as Generalized Order-By and Group-By
Statistical Schema Matching across Web Query Interfaces
Kyriaki Dimitriadou, Brandeis University
Query Optimization.
Toward Large Scale Integration
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
Presentation transcript:

The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly Zhen Zhang, Bin He, and Kevin C. Chang

MetaQuerier 2 The Context: UIUC Exploring and integrating the deep Web Explorer source discovery source modeling source indexing Integrator source selection schema integration query mediation FIND sources QUERY sources db of dbs unified query interface Amazon.com Cars.com 411localte.com Apartments.com

MetaQuerier 3 The Need: Querying alternative sources in the same domain Sources are proliferating in the same domain  2004 survey found 10% Web sites are “deep”  totaling 450,000 DBs on the Web Each query can often find many useful DBs Different query needs different sources  How to query across dynamic sources?

MetaQuerier 4 The Problem: Query translation on-the-fly Challenge:  No pre-configured source-specific translation knowledge Requirements:  Within domain: Source generality  Across domain: Domain portability

MetaQuerier 5 Dynamic query translation – Essential tasks Reconcile three levels of query heterogeneities  Attribute level: schema matching  Predicate level: predicate mapping  Query level: query rewriting

MetaQuerier 6 Demo. Form Assistant to help navigate the deep Web.

MetaQuerier 7 Translation objective: Closest among the valid Tom Clancy Source query Q s on source form S U Target query form T Query Translation Filter : σ title contain “red storm” and price 12 Union Query Q t *: Input: output: Two goals:  Syntactic valid  semantic close

MetaQuerier 8 What is valid? Each source has a query model Vocabulary: predicate templates { P 1, P 2, P 3, P 4, P 5 } Syntax: valid combination of predicate templates { F 1, F 2, F 3, F 4, F 5, F 6, F 7, F 8 } P1P1 P3P3 P4P4 P2P2 F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F7F7 F8F8 P1P1 νν P2P2 νν P3P3 νν P4P4 νν P5P5 νννν Tom Clancy P5P5 F5:F5: F6:F6:

MetaQuerier 9 What is close? Define semantic closeness. Minimal subsuming C min  No false positive: Miss no answer  Minimizing false negative: Fewest extra answers  Clear semantics: DB content independent  Modular translation: Reduce translation complexity t1:t1: 025 t2:t2: 45 s: 350 t1 v t2:t1 v t2: 045 t3:t3: 6545 t1 v t2 v t3:t1 v t2 v t3: 065 ?  C min

MetaQuerier 10 Target Query Source Query Enumerate valid Search for closest Target Query Translation Source Query What mechanism? Attribute Match Predicate Mapping Query Rewriter C min ?

MetaQuerier 11 Form Extractor Source query Q s Target query form QI Attribute Matcher: Syntax-based schema matching Predicate Mapper: Type-based search-driven mapping Query Rewriter: Constraint-based query rewriting Target query Q t * Domain-specific Thesaurus Domain-specific type handlers System architecture: Modular & lightweight Modularized mechanism Lightweight domain knowledge [RahmBernstein- VLDBJ01] [Halevy-VLDBJ01] ? [ZhangHC- SIGMOD04] [HeChang- SIGMOD03] [WuYDM- SIGMOD04]

MetaQuerier 12 The core challenge: Predicate mapping Tasks  Choose operator  Fill in values Union of target predicate t* Predicate Mapping U Objective  Minimal subsuming Input: output:

MetaQuerier 13 Is source-specific translation applicable? 1……… 1 ………….. 1 …… 1 ……. adult = $t  passenger = $t … price<$t  if $t<25: [price:between:0,25] elseif $t<45: … …

MetaQuerier 14 Enable source-generic predicate mapping? What is the scope of translation? What is the mechanism of translation?

MetaQuerier 15 The right scope? Survey 150 sources for the Correspondence Matrix.  Correspondences occur within localities!

MetaQuerier 16 The right scope? Correspondence locality  Type-based translation Target template P Target Predicate t* Type Recognizer Domain Specific Handler Text Handler Numeric Handler Datetime Handler Predicate Mapper Source predicate s  Correspondences occur within localities  Translation by type-handler

MetaQuerier 17 The right mechanism: Is pairwise-rule based mechanism suitable? Template new template 1nn+1 1 n  Adding one template needs to add 2n rules!  And need knowledge of the old templates. attr<$t  if $t<25: [attr:between:0,25] elseif $t<45: … … Rule:

MetaQuerier 18 More extendable mechanism? Search-driven. Values of the type (virtual database) Evaluate over “database” Templates of same type Evaluation results Search for closest evaluator -infinite+infinite01 t1:t1: 025 t2:t2: 45 s:s: 350 t 1 v t 2 : 2545 s t … u evaluator

MetaQuerier 19 Greedy search to construct C min mapping Find mapping iteratively Each iteration, greedily choose the one covering maximal uncovered t1:t1: 025 t2:t2: 45 s:s: 350 t3:t3: 4565

MetaQuerier 20 Experiments Translating 120 queries in total Between randomly paired sources from 8 domains With domain thesaurus but no type handler Accuracy as ratio of correct condition per query Matching 18% 40% 42% Extraction Mapping Average accuracyError distribution Basic: 3 domainsNew: 5 domains

MetaQuerier 21 Conclusion System:  Form assistant for querying Web databases Problem  Dynamic query translation Contributions:  Framework: Light-weight domain-based architecture  Techniques: Type-based search-driven pred. mapping  Insight: Holistic integration holds promise!

MetaQuerier 22 Thank You! For more information:

MetaQuerier 23 What is close? Define semantic closeness. Minimal subsuming C min  No false positive Miss no correct answer  Minimizing false negative Contain fewest extra answers  Clear semantic Database content independent  Modular translation Reduce translation complexity t1:t1: 025 t2:t2: 45 s: 350 t 1 v t 2 : 2565 t3:t3: 45 t 2 v t 3 : 2565 ? C min

MetaQuerier 24 Experiment: Accuracy distribution Accuracy distribution for Basic dataset Accuracy distribution for New dataset

MetaQuerier 25 Text handler: Search space Conceptually, union of all target predicate Practically, close-world assumption

MetaQuerier 26 Text handler: Closeness estimation Ideally, logic reasoning Practically, evaluation-by-materialization  Materialize query against a “complete” database 