Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

Similar presentations


Presentation on theme: "Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of."— Presentation transcript:

1 Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of Montreal Wei-Ying Ma Microsoft Research Asia, China

2 Outline Motivations Central ideas Establishing correlations between query terms and document terms Query expansion based on term correlations Evaluations Conclusions

3 Motivations More severe challenges on web searching Very short queries (less than two words) Inconsistency of term usages on two sides  The Web is not well-organized  Users express queries with their own vocabulary Most search engines are keyword based. Previous query expansion techniques focus on one side only – documents Our solution – concentrate on both sides

4 Big gap between the query space and the document space Query space and document space. For each document, measure the cosine value of the internal angle between the two spaces. Big gap: 73.68 degree on average (Cos A=0.28)

5 Outline Motivations Central ideas Establishing correlations between query terms and document terms Query expansion based on term correlations Evaluations Conclusions

6 Principle of exploiting query logs Query logs Means to explore the query side. session= := [clicked document] Central idea Log-based query expansion. Probabilistic correlations between query terms and index terms in the clicked documents against the respective queries.

7 Assumption The clicked documents are relevant to the given query. Reasonable because: Users do not click documents randomly. Stable from a statistical view Our previous work on query clustering proved it.

8 Compared with Local Feedback and Relevance Feedback

9 Characteristic of the log-based query expansion Local technique in general. Feasibility in computation. No initial retrieval. Reflecting most users’ intentions An example Evolve with the accumulations of user usages

10 Outline Motivations Central ideas Establishing term correlations Query expansion based on term correlations Evaluations Conclusions

11 Query sessions as a bridge Query Sessions Netscape Bill Gates Java Microsoft Programming Windows OS #Doc1 #Doc2 *Query1 #Doc3 *Query2 #Doc1 #Doc4 *Query3 Document Space Query Space

12 Correlations between query terms and document terms Bill Gates Java Windows Netscape Microsoft Programming OS 0.83 0.89 0.24 0.17 0.67 0.04 Query SpaceDocument Space

13 Term-Term Probabilistic correlations Term-Term Correlations are represented as the conditional probability: Query Term Index Term #Doc1 #Doc2 *Query

14 Term-Term probabilistic correlations (Cont) Estimate of the two conditional probabilities.

15 Outline Motivations Central ideas Establishing term correlations Query expansion based on term correlations Evaluations Conclusions

16 Query expansion based on term correlations For a whole query, we have to select candidate expansion terms. Top ranked document terms are added into the original query to formulate a new one.

17 Outline Motivations Central ideas Establishing term correlations Query expansion based on term correlations Evaluations Conclusions

18 Data and methodology Data Two month query logs (Oct 2000-Dem 2000) 41,942 documents 30 evaluation queries (mostly are short queries) Document relevance judged by human assessors. Comparing our method with the baseline and the Local Context Analysis (LCA)

19 Experiment I---Retrieval effectiveness Average Improvement 75.42% over Baseline 38.95% over LCA Significant improvement from a statistical view

20 Experiment II---Quality of expansion terms Examining 50 expansion terms obtained by the log-based method and LCA. LC Analysis (base) Log Based Improvement (%) Relevant Terms (%) 23.2730.73+32.03 Example – “Steve Jobs” “Apple Computer”, “CEO”, “Macintosh”, “Microsoft”, “GUI”, “Personal Computers”

21 Experiment III---Impact of phrases For TREC queries, phrases may not be as effective as expected. Not the case in short query context. A example. Phrases are extracted from user logs. Experiments show 11.37% improvement when using phrases in average.

22 Experiment IV---Impact of number of expansion terms The more expansion terms, the better? The best performance can be achieved by adding 40 to 60 expansion terms.

23 Summary for evaluation The log-based query expansion produces significant improvements over the baseline and LCA in terms of precision and recall. Query expansion is of great importance for short queries on the Web. Phrases can improve the performance of search engines.

24 Outline Motivations Central ideas Establishing term correlations Query expansion based on term correlations Evaluations Conclusions

25 We show how big the gap exists between the query space and the document space. A new log-based query expansion method considering both sides of the problem. Experimental results show our solution is effectual for short queries in Web searching. User log mining is a promising direction for future research.

26 Thanks !


Download ppt "Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of."

Similar presentations


Ads by Google