An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St., New Brunswick, NJ 08901, USA Context Personalization in Information Retrieval –Customizing ranking algorithms based on profile (domain knowledge, familiarity with the topic, interests, etc) context (activities being conducted in parallel, etc) –Relevance feedback Explicit – direct elicitation of a profile; relevance judgments Implicit – interpretation of behavioral cues & actions Research questions Is it worth involving users in interactive IR ? –Is it worth trying to elicit more information about the searcher’s information need ? –Is it worth asking the searcher to judge candidate terms for query expansion ? Are automatic alternatives to query reformulation more effective ? –Pseudo-relevance feedback (PRF) –Language modeling-based query expansion derived from collection (improve query clarity) –Web-based query expansion (Web as external, balanced large training corpus) Evaluation of queries and expansion terms Overlap with “optimal” topic representation –Is there an “optimal” representation ? How can it be built ? Effectiveness of retrieval –Quality of the expansion terms –Quality of the expanded query Combination of evidence –Combination of rankings/scores vs. combination of terms –Weighted vs. un-weighted terms Optimal representation Candidate representation threshold 1 threshold 2 threshold 3 threshold 4 threshold n Expansion based on term combinationExpansion based on score combination Conclusions – general The system’s interaction with a human information seeker is less likely to produce good query terms, and therefore less likely to achieve retrieval effectiveness superior to that obtained via fully automatic methods Machines are better than humans at computing document and collection statistics, and at generating topic representations More attention should be given to systems based on ostention and mediation Significant correlation between term representations of topics and effectiveness – use query terms known to work well Conclusions – specific Using a small number of relevance judgments is nearly as effective as using a large number Cheap personalization is possible – no relevance judgments are necessary Terms weighting is important –If possible, use weighted queries –Otherwise, repeat the important terms Term ranking is important – use the top-ranking terms Pseudo relevance feedback is a reliable technique that is consistent in improving performance for most queries Experimental setting HARD TREC 2005 setting –Given a topic, use knowledge about the user’s profile and context to return a personalized list of hits –Involves interactions with human users –Effectiveness: MAP, R-P, P@10 Optimal topic representation for - based on TREC relevance judgments Some results Using documents judged relevant ( rel ) provides a reasonable upperbound performance Using just a small random fraction of as judgments does not significantly degrade performance Blind relevance feedback works: assume relevant all the documents retrieved/opened by the user Term ranking is important – cut-off at 30 performs well

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

Similar presentations

Presentation on theme: "An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

Similar presentations

Presentation on theme: "An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,"— Presentation transcript:

Similar presentations

About project

Feedback