The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill SIGIR 2005 TREC 2004 HARD Track

Objective IR query expansion by asking the searcher to fill out a form The form contains four questions: –(Q1) How many times have you searched for information about this topic in the past? –(Q2) Describe what you already know about the topic. –(Q3) Why do you want to know about this topic? –(Q4) Please input any additional keywords that describe your topic.

The Form

Query Expansion Add all terms appearing in the response to the query Q1 “How many times have you searched for information about this topic in the past?” is not used for query expansion. Q1 is used to measure familiarity, but is not explored in this paper.

Weights for New Terms ( 猜 ) Robertson Selection Value N : number of documents in the collection n : number of documents containing the term R : number of documents known to be relevant r : number of relevant documents containing the term R = r = 0 if relevant information is not known

TREC 2004 硬 Track HARD = High Accuracy Retrieval from Document Task 2 : To determine if a single, highly focused interaction with the user could be used to improve retrieval

HARD Protocol The corpus consists of 1.5 GB of English news articles 13 human judges propose 50 topics in total Participants can submit 1 to 3 forms for each topic. Forms for different topics can be different. The authors use the same form for all topics.

HARD Protocol (Cont.) Human judges fill the forms and return them to the participants. Participants decide what to do with the filled forms. Document relevance is decided by pooling method.

Evaluation Metric The authors use MAP (Mean Average Precision) For each topic, calculate the precision at each relevant document and average them over the total number of relevant documents. For a set of topics, average the MAP of all the topics over the total number of topics.

Authors’ Runs IR 之王 Okapi BM25 (Baseline) Okapi + Pseudo Relevance Feedback (Baseline) –Examine top 10 retrieved documents –Expand query by top 5, 10, 20 and 50 terms –pseudo?? Okapi + Pseudo Relevance Feedback with Relevant documents (Upper Bound Baseline) –Examine 10 relevant documents randomly –Expand query by top 5, 10, 20 and 50 terms –rfrel?? Okapi + Query Expansion by Different Combinations of Reponses from Q2, Q3 and Q4

Experiment Results Not on the graph: rfrel05 : 0.4367 rfrel10 : 0.5284 rfrel20 : 0.5743 rfrel50 : 0.6129

爆料 : Longer Query => Better??? Q2 has longer response than Q3, which has longer response than Q4. Q2 has higher MAP than Q3, which has higher MAP than Q4.

秘密圖證

Conclusion Interactive relevance feedback is better than pseudo relevance feedback Long queries are better than short queries

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

Similar presentations

Presentation on theme: "The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

Similar presentations

Presentation on theme: "The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill."— Presentation transcript:

Similar presentations

About project

Feedback