Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07.

Similar presentations


Presentation on theme: "Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07."— Presentation transcript:

1 Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07.

2 Copyright  2008 by CEBT Survey So Far…  Jaehui Term Proximity Scoring  Jung-Yeon Semantic Query  Jongheum Index Structure Optimized for Multi-keyword Query 2

3 Copyright  2008 by CEBT My Topic: Phrase-based IR  Why? The presence of phrases is one significant difference between single word queries and multi word queries. And identifying phrases is important for understanding real meanings of sentences. – Ex) “hot dog” Thus, how to identify and use phrases in queries is important in devising processing strategy for multi word queries.  Focus of Survey Using Phrases(Judging Relevance) – Skipped the contents about identifying phrases 3

4 Copyright  2008 by CEBT Early Researches on Phrase-based IR  Using fixed proximity constraints(window size) “The Use of Phrase and Structural Queries in Information Retrieval”(1991) “Evaluation of Syntactic Phrase Indexing”(1996) … 4 word#1word#2word#3 Relevant Document Query Phrase word#1 word#2 word#3 Window

5 Copyright  2008 by CEBT Progress #1: Structural Proximity  “Phrase-based Information Retrieval” A.T. Arampatiz et al. 1998 Identifying noun phrases in documents, and using the noun phrases for criteria of “nearness” 5 … A noun phrase identified by NLP engine … radioprogramsBBC Relevant Document Query Phrase The studios for later BBC on radio programs

6 Copyright  2008 by CEBT Progress #1: Structural Proximity, Experiment  Experiment Result Gained high precision But loses recall – The auhors wrote it can be addressed by taking into account linguistic variation and anaphora. 6

7 Copyright  2008 by CEBT Progress #2: Varied Window Size  “An Effective Approach to Document Retrieval via Utilizing Wordnet and Recognizing Phrases” Shuang Liu et al. 2004 – Their consequent work was published in 2007 Classifying phrases into four types – Proper name – Dictionary phrase – Simple phrase – Complex phrase – Proximity constraints of each types are different! 7

8 Copyright  2008 by CEBT Progress #2: Varied Window Size, Example 8 SungchanPark NOT Relevant DocumentQuery Phrase #1 Sungchan Park … was hospitalized for mental problem … and had been on lithium for his illness Recently … mentalillness Relevant DocumentQuery Phrase #2 mental illness

9 Copyright  2008 by CEBT Progress #2: Varied Window Size, Solution  Solution Learning the window size for each phrase types. – Result by Decision Tree Proper name : 0 Dictionary phrase : 16 Simple phrase : 48 Complex phrase : 78 9

10 Copyright  2008 by CEBT Progress #2: Varied Window Size, Experiment  Experiment Result The author did not compare their approach with naïve approach. In my focus, above result only shows that phrase-based IR can improve performance of IR system. 10

11 Copyright  2008 by CEBT Conclusion  Phrase-based relevance model have been researched by only few researchers However, the progresses are interesting – Determine nearness via sentence structure. – Varying proximity constraints according to type of query phrase. 11

12 Copyright  2008 by CEBT References  The Use of Phrase and Structural Queries in Information Retrieval, 1991  Evaluation of Syntactic Phrase Indexing, 1996  Phrase-based Information Retrieval, 1998  Phrase Recognition and Expansion for Short, Precision-biased Queries based on a Query log, 1999  The Use of Phrases from Query Texts in Information Retrieval, 2000  An Effective Approach to Document Retrieval via Utilizing Wordnet and Recognizing Phrases, 2004  The Role of Multi-word Units in Interactive Information Retrieval, 2005  Recognition and Classification of Noun Phrases in Queries for Effective Retrieval, 2007 12


Download ppt "Survey on Long Queries in Keyword Search : Phrase-based IR Sungchan Park 2008. 08. 07."

Similar presentations


Ads by Google