Download presentation
Presentation is loading. Please wait.
1
LING 573: Deliverable 3 Group 7 Ryan Cross Justin Kauhl Megan Schneider
2
The Basics Implemented in Python with Indri – For document retrieval used standard #combine (“query”) operator #combine(x1 x2 … xn) = (score for x1)^(1/n) * (score for x2)^(1/n) * … (score for xn)^(1/n) – Used passage#:# to get windows for passage retrieval (100:50, 150:50, 150:75, also 150:10, 150:15, and longer windows) – Used regexes to clean up the Indri printPassages output
3
Approaches Stemming Stop word removal Question word removal Query expansion
4
Approaches (cont.) Stemming – Tried with stemming in index and stemming query – Porter and Krovetz stemmers – Krovetz performed better (less aggressive) TREC 2004 (150/75/20) MAPMRR StrictMRR Lenient Porter.3105.34048.49745 Krovetz.3698.39126.54396
5
Approaches (cont.) Stop word removal – Made runtime faster when removed from index – Offered improvement in all circumstances if removed from queries Question word removal – Performed in almost all cases for query; some improvement. – Largely intuitive. However some questions had slightly better results when left in because of Q&A files in the corpus.
6
Approaches (cont.) Query expansion – Tried adding synonyms from Wordnet – Only added synonyms for nouns, verbs, adjectives, and adverbs – Restricted synonyms added based on a word’s POS (as predicted by NLTK.pos_tag) – Also tried not restricting synonyms by POS
7
Approaches (cont.) Query expansion – In both cases, retrieval results were worse with query expansion TREC 2004 DataMAPStrict MRSLenient MRS No synonyms.3698.39126.54396 Restricted synonyms.1677.23641.36416 Full synonyms.1172.18949.29462
8
Approaches (cont.) Passage retrieval – Used Indri #combine[passage size:increment]( “query” ) operator – Originally intended to only use documents returned from document retrieval phase – Decided instead to run passage retrieval as a standalone system.
9
Approaches (cont.) Passage retrieval results – Attempted with a few different variables. – Krovetz stemming, stopwords + question words removed. – Trying to get a window size that did not return too many characters and meaningful increments. TREC 2004 Data Window size/Increment Strict MRSLenient MRS 100/500.349410.47146 150/500.369470.52955 150/750.391260.54396
10
Overall Krovetz stemmer Stopwords removed from query (kept in index) 150/75/20MAPStrict MRSLenient MRS TREC 2004.3698.39126.54396 TREC 2005.3105.31485.51904
11
Critical Analysis Our query expansion attempts did not help – Too many misleading terms were introduced Stopword based results were unusual – Assumed that removing them from the index would help. Passage retrieval yielded better results than document retrieval – It is more meaningful to see a query term in a passage
12
References Hitesh Sabnani, Prasenjit Majumder. Question Answering System: Retrieving Relevant Passages. In Proceedings of Cross-Language Evaluation Forum - CLEF. Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering. 2003. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
13
Questions? ?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.