EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Systems Analysis Requirements structuring Process Modeling Logic Modeling Data Modeling  Represents the contents and structure of the DFD’s data flows.
Automatically Identifying Record Patterns from the Extracted Data Fields of Genealogical Microfilm Kenneth Tubbs David W. Embley.
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Text Retrieval and Spreadsheets Class 4 LBSC 690 Information Technology.
Learning Object Identification Rules for Information Integration Sheila Tejada Craig A. Knobleock Steven University of Southern California.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
FHIRFarm – How to build a FHIR Server Farm (quickly)
Firat Batmaz, Chris Hinde Computer Science Loughborough University A Diagram Drawing Tool For Semi–Automatic Assessment Of Conceptual Database Diagrams.
October 23, Expanding the Serials Family Continuing resources in the library catalogue.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 10 Structuring.
Chapter 9 Database Management
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Computer System Analysis Chapter 10 Structuring System Requirements: Conceptual Data Modeling Dr. Sana’a Wafa Al-Sayegh 1 st quadmaster University of Palestine.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
Help Desk System How to Deploy them? Author: Stephen Grabowski.
ITGS Case Study Theatre Booking System Ayushi Pradhan.
The Key to Successful Searching Software patents pending. ™ Trademarks of SLICCWARE Corporation All rights reserved. SM Service Mark of SLICCWARE Corporation.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Dimitrios Skoutas Alkis Simitsis
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Presenter: Shanshan Lu 03/04/2010
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.
You Are What You Tag Yi-Ching Huang and Chia-Chuan Hung and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Facilitating Document Annotation using Content and Querying Value.
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
44220: Database Design & Implementation Introduction to Module Ian Perry Room: C49 Ext.: 7287
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
BOOTSTRAPPING INFORMATION EXTRACTION FROM SEMI-STRUCTURED WEB PAGES Andrew Carson and Charles Schafer.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Facilitating Document Annotation Using Content and Querying Value.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Engineering Fundamentals and Problem Solving, 6e
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
Introduction to Customizing Reports in SAP
Alexandra Cristea Toshio Okamoto and Safia Belkada
The ultimate in data organization
Chapter 10 Structuring System Requirements: Conceptual Data Modeling
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
WSExpress: A QoS-Aware Search Engine for Web Services
Lecture 10 Structuring System Requirements: Conceptual Data Modeling
Presentation transcript:

EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton

Traditional Integrated Interface Domain list Integrated interface of Job Q Manually

What does EasyQuerier look like EasyQuerier EasyQuerier …… Integrated interface of Job Q Q Q Manually Automatically

New Features of EasyQuerier Automatically domain mapping Automatically domain mapping User do not need to select domain from long list User do not need to select domain from long list More flexible Keyword Query More flexible Keyword Query Different kinds of data type Different kinds of data type Text, numeric, currency, date Text, numeric, currency, date More logic relation covered More logic relation covered “ and ”, “ or ”, “ between … and ” “ and ”, “ or ”, “ between … and ” Q1: New York or Washington, education, $2000-$3000 U 1 ={}, logic: or U 1 ={New York, Washington}, logic: or U 2 ={education} U 2 ={education} U 3 ={$2000, $3000}, logic: range U 3 ={$2000, $3000}, logic: range Automatically query translation Automatically query translation

EasyQuerier: overview Part 1: Domain Map Part 1: Domain Map Collect the domain knowledge from candidate domains Collect the domain knowledge from candidate domains Similarity based domain mapping strategy Similarity based domain mapping strategy Part 2: Query translation Part 2: Query translation Partially Keyword-attribute map Partially Keyword-attribute map Holistically Keyword-attribute map Holistically Keyword-attribute map

Challenge 1: Domain Mapping Problem statement Problem statement Map a user query to the correct domain automatically without domain information to be separately entered. Our solution Our solution Domain representation model Term weight assignment Query-domain similarity

Domain mapping(1) Domain representation model D = d_ID: unique domain identifier. CT = {ct i |i=1,2, … } is a set of Conceptual Terms, which describe the whole domain concept AT = ∪ A ∈ D DAL(d_ID, A i ) is a set of Attribute Label Terms consisting of attribute labels of the products in this domain InteLabel, LocalLabel, OtherLabel VT = ∪ A ∈ D DAV(d_ID, A i ) is a set of the Value Terms associated with the products ’ attributes in the domain Text Attribute: inteValue, LocalValue, Other Value Non-text Attribute: VT can be characterized by the pre-defined ranges available on the integrated interfaces.

Domain mapping(2) Different terms have different ability to differentiate the domains. “ price ” is less powerful than “ title ” in differentiating the book from others Term weight assignment Term weight assignment Adopt idea of CVV, Adopt idea of CVV, used to measure the skew of the distribution of terms across all document databases If ij means how many If ij means how many times t j appears in either AT or VT in D i CVV j as the CVV for t j Weight(D i t j ) = CVV j * if ij.

Domain mapping(3) Q = {u 1, u 2, …, u n }, u i ={v i 1, v i 2, … } Q = {u 1, u 2, …, u n }, u i ={v i 1, v i 2, … } Q1 example Q1 example U 1 = {}, v i 1 ={New York}, v i 2 = {Washington} U 1 = {New York, Washington}, v i 1 ={New York}, v i 2 = {Washington} For each term tj in VT or AT For each term tj in VT or AT we only record the most matching term tj we only record the most matching term tj = =

Challenge 2: Query translation Problem statement Problem statement Translate the query to the integrated interface Translate the query to the integrated interface Just like filling the integrated interface with a set of keywords Just like filling the integrated interface with a set of keywords Computation model Computation model Def 4.1 (Keyword-Attribute Matching (KAM)). KAM(u,A). Def 4.2 (Degree of Matching (DM)). For each KAM is has a matching degree. Def 4.3 (Query Translation Solution (QTS)) A QTS represents a strategy of filling in the query interface. A QTS is comprised of several KAMs. Def 4.4 (Conviction) This measurement determines whether a QTS is reasonable. The larger the DM of a KAM, the more reasonable the KAM is. Such KAMs combined together will generate optimal QTS

Query translation(1) Computation of DM Computation of DM Q = {u 1, u 2, …, u n }, u i ={v i 1, v i 2, … } For Q = {u 1, u 2, …, u n }, u i ={v i 1, v i 2, … }, Sim(v x i, A j ) is the maximum value of all Sim(v x i,t j ) Where the t j in the VT of A j, Sim(v x i,t j ) (same as domain map)

Query translation(2) Conviction Conviction Conviction value of a QTS is a weighted sum of the DMs of the related KAMs Why weight? If an attribute appears in more local interfaces of a domain, it is more important in the domain. weight w(A j ) for each attribute A j based on its interface frequency if i For an attribute within the domain D

Experiment Settings Settings 9 domains, each covers 50 web databases 9 domains, each covers 50 web databases 10 students, 20 keyword queries for each domain 10 students, 20 keyword queries for each domain Measurement Measurement Correct/acceptable/wrong Correct/acceptable/wrong Overall/with domain/with attribute label/value only Overall/with domain/with attribute label/value only Fig1: domain mapping accuracy Fig2: query translation accuracy

Conclusion In this paper, we proposed a novel keyword based interface system EasyQuerier for ordinary users to query structured data in various Web databases. We developed solutions to two technical challenges map keyword query to appropriate domains translate the keyword query to a query for the integrated search interface of the domain

Thank you~ Thank you~