Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.

Similar presentations


Presentation on theme: "ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang."— Presentation transcript:

1 ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign

2 ACM CIKM 2008, Oct. 26-30, Napa Valley 2 Ineffective Queries reduce space command latex

3 ACM CIKM 2008, Oct. 26-30, Napa Valley 3 Effective Queries squeeze space command latex

4 ACM CIKM 2008, Oct. 26-30, Napa Valley 4 More Examples If you want to wash your vehicle –vehicle wash, auto wash –car wash, truck wash If you want to buy a car –auto quotes –auto sale quotes? –auto insurance quotes?

5 ACM CIKM 2008, Oct. 26-30, Napa Valley 5 What Makes a Query Ineffective? Vocabulary mismatch –reduce space command latex vs squeeze space command latex –auto wash vs car wash Lack of discrimination –auto quotes vs auto sale quotes … How can we help improving ineffective queries? Term substitution Term addition

6 ACM CIKM 2008, Oct. 26-30, Napa Valley 6 Our Contribution We cast query reformulation as term level pattern mining from search logs We define two basic types of patterns at term level and propose probabilistic methods –Context-sensitive term substitution auto car | _wash, car auto | _trade –Context-sensitive term addition +sale | auto_quotes We evaluate our methods on commercial search engine logs and show their effectiveness

7 ACM CIKM 2008, Oct. 26-30, Napa Valley 7 Problem Formulation Query Collection Task 1: Contextual Models Task 2: Translation Models q = auto wash Task 3: Pattern Mining auto car | _wash auto truck | _wash +southland | _auto wash … Patterns Search logs Offline partOnline part car wash truck wash southland auto wash …

8 ACM CIKM 2008, Oct. 26-30, Napa Valley 8 Task 1: Contextual Models enterprise car rental rental car budget car rental car pricing car pictures car accidents … G: General context Syntagmatic relations Capture terms frequently co-occur with w inside queries Sample query collection rental: 0.375 enterprise: 0.125 budget: 0.125 pricing: 0.125 … Model P G ( * |car)

9 ACM CIKM 2008, Oct. 26-30, Napa Valley 9 Task 1: Contextual Models enterprise car rental rental car budget car rental car pricing car pictures car accidents … Model: P L1 ( * | car) Syntagmatic relations Capture terms frequently co-occur with w inside queries Sample query collection rental: 0.333 enterprise: 0.333 budget: 0.333 … L 1 : 1 st Left Context

10 ACM CIKM 2008, Oct. 26-30, Napa Valley 10 Task 1: Contextual Models enterprise car rental rental car budget car rental car pricing car pictures car accidents … Model: P R1 ( * |w) Syntagmatic relations Capture terms frequently co-occur with w inside queries Sample query collection rental: 0.4 pricing: 0.2 pictures: 0.2 accidents: 0.2 … R 1 : 1 st Right context

11 ACM CIKM 2008, Oct. 26-30, Napa Valley 11 Task 2: Translation Models Paradigmatic relations (car and auto) Capture terms that are substitutable with w Similar contexts high translation probability Translation models Probability of generating ss context from ws contextual model Size of L 1 contextSize of R 1 context

12 ACM CIKM 2008, Oct. 26-30, Napa Valley 12 Task 3.1: Pattern Mining–Term Substitution q=[w 1 …w i-1 w i w i+1 …w n ] q=[w 1 …w i-1 sw i+1 …w n ] Substitute w i by s Which word s should be chosen? Local factor Global factor: translation model

13 ACM CIKM 2008, Oct. 26-30, Napa Valley 13 Estimating Local Factor Independence w 1 …w i-1 __w i+1 …w n s …… Ignore those terms far away

14 ACM CIKM 2008, Oct. 26-30, Napa Valley 14 Task 3.2: Pattern Mining–Term Addition q=[w 1 …w i-1 w i …w n ] q=[w 1 …w i-1 rw i …w n ] Adding r before w i Similar to the Local Factor in Term Substitution Patterns Uniform

15 ACM CIKM 2008, Oct. 26-30, Napa Valley 15 Evaluation: Data Preparation From Microsoft Live Labs 5/1/2006 5/31/20065/20/2006 History Logs Future logs History Collection 4.4M queries 1.6M are distinct 1.3M user sessions Used to construct test cases

16 ACM CIKM 2008, Oct. 26-30, Napa Valley 16 Examples of Contextual Models Left and Right contexts are different General context mixed them together

17 ACM CIKM 2008, Oct. 26-30, Napa Valley 17 Examples of Translation Models Conceptually similar keywords have high translation probabilities Provide possibility for exploratory search in an interactive manner

18 ACM CIKM 2008, Oct. 26-30, Napa Valley 18 Examples of Term Substitution Substitution is context sensitive Intuitively, reworded queries are more effective

19 ACM CIKM 2008, Oct. 26-30, Napa Valley 19 Effectiveness Comparison of Term Substitution – Experiment Design Q1Q1 Q2Q2 QkQk R 21 R 22 R 23 … R k1 R k2 R k3 … C3C3 C2C2 C1C1 Session … … How well can a reformulated query rank C 1, C 2, and C 3 on the top? Q1Q1 reformulation Q 1 dxC3C1C2dx…dxC3C1C2dx… Q 2 Q 3 dxC1dxdxdx…dxC1dxdxdx… dxC2dxC3dx…dxC2dxC3dx… P@50.60.20.4 Best P@5=0.6

20 ACM CIKM 2008, Oct. 26-30, Napa Valley 20 Results Our method reformulates queries more effectively [Jones06] Our method #Recommended Queries

21 ACM CIKM 2008, Oct. 26-30, Napa Valley 21 Term Addition Patterns Term addition patterns can refine a broad query

22 ACM CIKM 2008, Oct. 26-30, Napa Valley 22 Related Work Query suggestions [e.g., Jones06, Sahami et al06] –Discover pattern at query level –Rely on external resources or training data –Does not consider the effectiveness Query modifications in IR [Rocchio71, Anick03] –Expand queries from returned documents –Does not rely on search logs, mostly adding terms Related work in NLP community [Lin98, Rapp02] –Finding synonym or near synonyms –Syntagmatic and paradigmatic relations –Not used for query reformulation

23 ACM CIKM 2008, Oct. 26-30, Napa Valley 23 Conclusions and Future Work We propose a new way to mine search logs for patterns to address ineffective queries –Vocabulary mismatch –Lack of discrimination We define and mine two basic patterns at term level –Context-sensitive term substitution patterns –Context-sensitive term addition patterns Experiments show the effectiveness of our methods In the future, –Use relevance judgments instead of clicks –Exploit click information for better query reformulation

24 ACM CIKM 2008, Oct. 26-30, Napa Valley 24 Thank You!

25 ACM CIKM 2008, Oct. 26-30, Napa Valley 25 Offline Efficiency Linear scalability with data size More data

26 ACM CIKM 2008, Oct. 26-30, Napa Valley 26 Enhancement by User Sessions Improve translation models by user sessions –t(express|idol) is very high –american express and american idol are frequent Method w=idol top N thresholding t(idols|idol)=1 Normalized Mutual Information

27 ACM CIKM 2008, Oct. 26-30, Napa Valley 27 Formal Definitions Query is a sequence of keywords –q = [w 1 w 2 …w n ] Context-sensitive term substitution –[w w|c L _c R ] Context-sensitive term addition –[+w|c L _c R ] Query rewording: replace a word w i by s –q = [w 1 …w i-1 w i w i+1 …w n ] q = [w 1 …w i-1 sw i+1 …w n ] Query refinement: add a new word r –q = [w 1 …w i w i+1 …w n ] q = [w 1 …w i rw i+1 …w n ]


Download ppt "ACM CIKM 2008, Oct. 26-30, Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang."

Similar presentations


Ads by Google