Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07.

Slides:

Advertisements

Similar presentations

Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.

Advertisements

Distinción semántica de compuestos léxicos en Recuperación de Información Anselmo Peñas, Julio Gonzalo y Felisa Verdejo Dpto. Lenguajes y Sistemas Informáticos,

Introduction to Information Retrieval

Effective Keyword Search in Relational Databases Fang Liu (University of Illinois at Chicago) Clement Yu (University of Illinois at Chicago) Weiyi Meng.

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Search Engines and Information Retrieval

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.

Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,

T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.

Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.

The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Web Logs and Question Answering Richard Sutcliffe 1, Udo Kruschwitz 2, Thomas Mandl University of Limerick, Ireland 2 - University of Essex, UK 3.

Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.

Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.

1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,

Employing Two Question Answering Systems in TREC 2005 Harabagiu, Moldovan, et al 2005 Language Computer Corporation.

WHAT HAVE WE DONE SO FAR?  Weeks 1 – 8 : various components of an information retrieval system  Now – look at various examples of information retrieval.

Chapter 5: Information Retrieval and Web Search

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Search Engines and Information Retrieval Chapter 1.

CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.

Aardvark Anatomy of a Large-Scale Social Search Engine.

“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.

Clustering User Queries of a Search Engine Ji-Rong Wen, Jian-YunNie & Hon-Jian Zhang.

WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.

HYP Progress Update By Zhao Jin. Outline Background Progress Update.

1 Query Operations Relevance Feedback & Query Expansion.

1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.

Chapter 6: Information Retrieval and Web Search

Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,

CIKM Recognition and Classification of Noun Phrases in Queries for Effective Retrieval Wei Zhang 1 Shuang Liu 2 Clement Yu 1

Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.

Noun-Phrase Analysis in Unrestricted Text for Information Retrieval David A. Evans, Chengxiang Zhai Laboratory for Computational Linguistics, CMU 34 th.

Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

Diversifying Search Result WSDM 2009 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University Center for E-Business.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)

Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.

Evaluation of (Search) Results How do we know if our results are any good? Evaluating a search engine  Benchmarks  Precision and recall Results summaries:

Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.

CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.

Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.

Automatic Question Answering  Introduction  Factoid Based Question Answering.

Data Mining: Text Mining

Information Retrieval

ITrails: Pay-as-you-go Information Integration in Dataspaces Presented By Marcos Vaz Salles, Jens Dittrich, Shant Karakashian, Olivier Girard, Lukas Blunschi.

Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,

A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.

Survey Jaehui Park Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.

Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.

Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.

Clustering (Search Engine Results) CSE 454. © Etzioni & Weld To Do Lecture is short Add k-means Details of ST construction.

UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.

Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.

WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

Evaluation of IR Performance

IL Step 2: Searching for Information

CSE 635 Multimedia Information Retrieval

CS246: Information Retrieval

Presentation transcript:

Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park

Copyright  2008 by CEBT Survey So Far…  Jaehui Term Proximity Scoring  Jung-Yeon Semantic Query  Jongheum Index Structure Optimized for Multi-keyword Query 2

Copyright  2008 by CEBT My Topic: Phrase-based IR  Why? The presence of phrases is one significant difference between single word queries and multi word queries. And identifying phrases is important for understanding real meanings of sentences. – Ex) “hot dog” Thus, how to identify and use phrases in queries is important in devising processing strategy for multi word queries.  Focus of Survey Using Phrases(Judging Relevance) – Skipped the contents about identifying phrases 3

Copyright  2008 by CEBT Early Researches on Phrase-based IR  Using fixed proximity constraints(window size) “The Use of Phrase and Structural Queries in Information Retrieval”(1991) “Evaluation of Syntactic Phrase Indexing”(1996) … 4 word#1word#2word#3 Relevant Document Query Phrase word#1 word#2 word#3 Window

Copyright  2008 by CEBT Progress #1: Structural Proximity  “Phrase-based Information Retrieval” A.T. Arampatiz et al Identifying noun phrases in documents, and using the noun phrases for criteria of “nearness” 5 … A noun phrase identified by NLP engine … radioprogramsBBC Relevant Document Query Phrase The studios for later BBC on radio programs

Copyright  2008 by CEBT Progress #1: Structural Proximity, Experiment  Experiment Result Gained high precision But loses recall – The auhors wrote it can be addressed by taking into account linguistic variation and anaphora. 6

Copyright  2008 by CEBT Progress #2: Varied Window Size  “An Effective Approach to Document Retrieval via Utilizing Wordnet and Recognizing Phrases” Shuang Liu et al – Their consequent work was published in 2007 Classifying phrases into four types – Proper name – Dictionary phrase – Simple phrase – Complex phrase – Proximity constraints of each types are different! 7

Copyright  2008 by CEBT Progress #2: Varied Window Size, Example 8 SungchanPark NOT Relevant DocumentQuery Phrase #1 Sungchan Park … was hospitalized for mental problem … and had been on lithium for his illness Recently … mentalillness Relevant DocumentQuery Phrase #2 mental illness

Copyright  2008 by CEBT Progress #2: Varied Window Size, Solution  Solution Learning the window size for each phrase types. – Result by Decision Tree Proper name : 0 Dictionary phrase : 16 Simple phrase : 48 Complex phrase : 78 9

Copyright  2008 by CEBT Progress #2: Varied Window Size, Experiment  Experiment Result The author did not compare their approach with naïve approach. In my focus, above result only shows that phrase-based IR can improve performance of IR system. 10

Copyright  2008 by CEBT Conclusion  Phrase-based relevance model have been researched by only few researchers However, the progresses are interesting – Determine nearness via sentence structure. – Varying proximity constraints according to type of query phrase. 11

Copyright  2008 by CEBT References  The Use of Phrase and Structural Queries in Information Retrieval, 1991  Evaluation of Syntactic Phrase Indexing, 1996  Phrase-based Information Retrieval, 1998  Phrase Recognition and Expansion for Short, Precision-biased Queries based on a Query log, 1999  The Use of Phrases from Query Texts in Information Retrieval, 2000  An Effective Approach to Document Retrieval via Utilizing Wordnet and Recognizing Phrases, 2004  The Role of Multi-word Units in Interactive Information Retrieval, 2005  Recognition and Classification of Noun Phrases in Queries for Effective Retrieval,