Survey Jaehui Park 2008. 07. 17.. Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested.

Slides:

Advertisements

Similar presentations

Chapter 5: Introduction to Information Retrieval

Advertisements

SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.

Information Retrieval Models: Probabilistic Models

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,

A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.

Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.

Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.

HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,

Chapter 5: Information Retrieval and Web Search

Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.

Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber Max-Planck-Institut für Informatik CIDR 2007)

Concept Unification of Terms in Different Languages for IR Qing Li, Sung-Hyon Myaeng (1), Yun Jin (2),Bo-yeong Kang (3) (1) Information & Communications.

“ SINAI at CLEF 2005 : The evolution of the CLEF2003 system.” Fernando Martínez-Santiago Miguel Ángel García-Cumbreras University of Jaén.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

The Development of a search engine & Comparison according to algorithms Sungsoo Kim Haebeom Lee The mid-term progress report.

Question Answering From Zero to Hero Elena Eneva 11 Oct 2001 Advanced IR Seminar.

Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.

Query Operations J. H. Wang Mar. 26, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text.

Querying Structured Text in an XML Database By Xuemei Luo.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber CIDR 2007) Conference on Innovative Data Systems.

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.

Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.

Chapter 6: Information Retrieval and Web Search

Information retrieval 1 Boolean retrieval. Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text)

Distributed Information Retrieval Server Ranking for Distributed Text Retrieval Systems on the Internet B. Yuwono and D. Lee Siemens TREC-4 Report: Further.

Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.

Noun-Phrase Analysis in Unrestricted Text for Information Retrieval David A. Evans, Chengxiang Zhai Laboratory for Computational Linguistics, CMU 34 th.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)

A Language Modeling Approach to Information Retrieval 한 경 수  Introduction  Previous Work  Model Description  Empirical Results  Conclusions.

Vector Space Models.

Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.

Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

DivQ: Diversification for Keyword Search over Structured Databases Elena Demidova, Peter Fankhauser, Xuan Zhou and Wolfgang Nejfl L3S Research Center,

Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.

Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.

A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR

Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park

The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.

GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.

Federated text retrieval from uncooperative overlapped collections Milad Shokouhi, RMIT University, Melbourne, Australia Justin Zobel, RMIT University,

LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.

An Efficient Algorithm for Incremental Update of Concept space

Information Retrieval Models: Probabilistic Models

Murat Açar - Zeynep Çipiloğlu Yıldız

Chapter 5: Information Retrieval and Web Search

Combining Keyword and Semantic Search for Best Effort Information Retrieval Andrew Zitzelberger 1.

Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.

Information Retrieval and Web Design

Presentation transcript:

Survey Jaehui Park

Copyright  2008 by CEBT Introduction  Members Jung-Yeon Yang, Jaehui Park, Sungchan Park, Jongheum Yeon  We are interested in Issues in Information Retrieval – About crawling, indexing, searching and ranking methods How to process multi-term queries in information retrieval environments – Ex) Today US Today Today Weather Paris Today Weather -> Multi-term queries express more complex information need than single queries. 2

Copyright  2008 by CEBT Main Topic  Long Queries in Keyword Search  Keywords: – Compound query, Evidence Combination, Phrasal Query, Multi-term Query, Multiple Keyword Search, Multiword Unit, and so on.  Issues proximity or distance syntactic structure (order) semantic NLP remedies … 3

Copyright  2008 by CEBT Proximity  An intuitive concept for processing multiple term queries  Readings Term Proximity Scoring for Keyword-Based Retrieval Systems – [ECIR 2003] Yves Rasolofo and Jacques Savoy Efficiency vs. Effectiveness in Terabyte-Scale Information Retrieval – [TREC 2005] Stefan Buttcher and Charles L. A. Clarke Efficient Text Proximity Search – [SPIRE 2007] Ralf Schenkel, et al. Why Bigger Windows Are Better Than Smaller Ones – [TR-UM 1997] Ron Papka and James Allan … 4

Term Proximity Scoring for Keyword-Based Retrieval Systems Yves Rasolofo and Jacques Savoy European Colloquium on IR Research(ECIR) 2003, LNCS Presented by Jaehui Park

Copyright  2008 by CEBT Introduction  Phrase, term proximity or term distance in IR Focus on adding a word pair scoring module Okapi probabilistic model + proximity measurement  Previous work Salton & McGil [1983] – Generating statistical phrases based on word co-occurrence Fagan [1987] – Considering syntactic relation or syntactic structures Mitra et al. [1997] – “Once a good basic ranking scheme is used, the use of phrases do not have a major effect on precision at high ranks” Arampatzis et al.[2000] – The lack of success when using NLP technique in IR Hawking & Thistlewaite [1996] – The use of proximity scoring within the PADRE system (Z-mode method) 6

Copyright  2008 by CEBT Okapi  Okapi [Robertson & Spark Jones 1976] Document ranking function according to their relevance to a given search query based on the probabilistic retrieval model Considering – Term frequency – Document length The weight for a given term t i in document d 7

Copyright  2008 by CEBT Okapi  Okapi [Robertson & Spark Jones 1976] (continued) The weight for the term t i within a query The retrieval status value (for a document according to a query) 8

Copyright  2008 by CEBT Term Proximity Weighting  Improving retrieval performance by using term proximity scoring  Assumption If a document contains sentences having at least two query terms within them, the probability that this document will be relevant must be greater. The closer are the query terms, the higher is the relevance probability.  Objective Assigning more importance to those keywords having a short distance between their occurrences. 9

Copyright  2008 by CEBT Term Proximity Weighting  1. expand the request(query) using keyword pairs extracted from the query’s wording  2. compute a term pair instance weight “information retrieval “ : 1.0 “the retrieval of medical information” : 0.11 (1/9) 10

Copyright  2008 by CEBT Term Proximity Weighting  3. sum all the corresponding term pairs  4. compute the contribution of all occurring term pairs in the document  5. compute the final retrieval status value 11

Copyright  2008 by CEBT Experiments  Test Collections TREC-8 document (528,155 docs) – Financial Times, Federal Register, Foreign Broadcast Information Service, LA Times TREC-9, TREC-10 (1,692,096 docs) 12

Copyright  2008 by CEBT Experiments  Evaluation 13

Copyright  2008 by CEBT Experiments  Evaluation 14

Copyright  2008 by CEBT Experiments  Evaluation 15

Copyright  2008 by CEBT Conclusion  The impact of a new term proximity algorithm on retrieval effectiveness for keyword-based system was examined. Improve ranking for documents having query term pairs occurring within a given distance constraint.  The term proximity scoring approach Improve precision after retrieving a few documents 16