Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.

Slides:



Advertisements
Similar presentations
Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
Advertisements

Database System Concepts and Architecture
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Information Retrieval in Practice
Web Categorization Crawler – Part I Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Final Presentation Sep Web Categorization.
Search Engines and Information Retrieval
ADVISE: Advanced Digital Video Information Segmentation Engine
Information Retrieval in Practice
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
An Intelligent Broker Approach to Semantics-based Service Composition Yufeng Zhang National Lab. for Parallel and Distributed Processing Department of.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.
Overview of Search Engines
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Search Engines and Information Retrieval Chapter 1.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Master Thesis Defense Jan Fiedler 04/17/98
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Question Answering over Implicitly Structured Web Content
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Search Engine Architecture
Talk Schedule Question Answering from Bryan Klimt July 28, 2005.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
OWL Representing Information Using the Web Ontology Language.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
1 Centroid Based multi-document summarization: Efficient sentence extraction method Presenter: Chen Yi-Ting.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Search Engine Architecture
OUTLINE Basic ideas of traditional retrieval systems
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Information Integration for Digital Libraries
Lecture 16: Probabilistic Databases
DBMS with probabilistic model
Data Mining Chapter 6 Search Engines
Web Mining Department of Computer Science and Engg.
Panagiotis G. Ipeirotis Luis Gravano
Search Engine Architecture
Information Retrieval and Web Design
Presentation transcript:

Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan

2 Agenda Properties of Web Data Approaches of Web Data Searching WebQA Introduction WebQA System Architecture WebQA Implementation WebQA System Evaluation Conclusion

3 Web Data Searching Search Engine is Enough? Web Data Query is Necessary?

4 Characteristics of Web Data Properties of Web Data –Wide distribution, large volume –High percentage of volatility –Unstructuredness, redundancy, inconsistency of redundant copies –Representation heterogeneity –Dynamism DB Perspective Difficulties of Querying Web Data –No schema –Short of scalability in searching the whole web –No exact web query language

5 Web Data Searching Approaches Information Retrieval Approach –Search engine and Metasearchers Database-Oriented Web Querying –Information Integration –Semistructured Data Querying –Special Web Query Languages –Question-Answer

6 Question-Answer Approach Basic principle – Web pages that could contain the answer to the user query are retrieved – The answer is extracted from these pages. NLP and Information Retrieval (IR) technologies Answer extracted by Information Extraction (IE) techniques. Example Systems – Mulder [Kwork et al, 2001] – WebQA [Lam & Özsu, 2002]

7 WebQA Question-answer approach –Accepts short factual queries –Returns the exact answers Aims at : –Accept fuzziness in user queries –Return actual answers, not URLs –Query entire webs and easily scale with new data sources

8 WebQA System Architecture Query Parser Semantic Cache Manager Answer Formatter Resource Locator/ Decomposer Answer Collector Complex Query Evaluation Search Engine Web Data Source Web Site … Search Engine Cache User Reference [1]

9 WebQA Prototype Architecture Query Parser (QP)Answer Extractor (AE) Summary Retriever (SR) Search Engine Web Data Source Web Site … Search Engine Cache User Valid WebQAL Query Keywords Category Keywords / Description Keywords/ Description List of Ranked Records Reference [1]

10 Query Parser Query Example: which country produced the most computers in the world? WebQAL Syntax: [-output ] -keywords place –output country –keywords producer most computers Categorizer WebQAL Generator WebQAL Checker NL question category Valid WebQAL WebQAL Reference [1] User query

11 List of Ranked Records Summary Retriever Web SiteRemote Database Keyword GeneratorSource Ranker Record Consolidator/Ranker Record Retriever Wrapper #2Wrapper #1 WebQAL Record Retriever Search Engine Wrapper #N Record Retriever Reference [1] Source Ranker identifies better data resources to answer certain types of questions. Ranked records are based on the source ranking first and local ranking second.

12 Answer Extractor Candidate Retriever Rearrange Output Converter List of Ranked Records Top ten answers (user readable) Reference [1] Candidate is retrieved based on word frequency of occurrence of the answer and the score of the rule that adds it to the candidate list. The higher the score, the more likely is the candidate the answer to the user’s query. The shorter the answer, the higher the score.

13 WebQA Implementation Architecture Web Server JSPs, HTMLs QA Server Thread QA Engine QA Server Thread QA Server Client #1 Client #2 Client #N Question/Answer (HTTP) Q/A Question answer (string) Reference [3]

14 System Evaluation Evaluation is using TREC-9 and measured in two aspects: accuracy and efficiency Reference [3]

15 Conclusion WebQA is in Question-Answer approach. –query input, exact answer –NLP, IR and IE technologies Data schema-independent. Query multiple Web sources: –Search engines –Data sources (CIA’ World Factbook) –Web Sites.

16 Future work To develop a full-fledged Web query system –Execution algorithms for more complex queries –Common aggregation functions on retrieving answers To think about other query types –Continuous query Ex: notify me whenever the Ottawa’s temperature drops below zero –Procedural query Ex: How do I make pancakes?

17 References 1.S. Lam and M.T. Özsu, "Querying Web Data - The WebQA Approach. WISE D. Florescu and A. Levy and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, 27(3):59-74, Web Data Management -Some Issues, M.T. Özsu, Course Slides