Presentation is loading. Please wait.

Presentation is loading. Please wait.

A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.

Similar presentations


Presentation on theme: "A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign."— Presentation transcript:

1 A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign

2 Kullback-Leibler Divergence Retrieval Method Document d A text mining paper data mining Doc Language Model (LM) θ d : p(w|  d ) text 4/100=0.04 mining 3/100=0.03 clustering 1/100=0.01 … data = 0 computing = 0 … Query q Data ½=0.5 Mining ½=0.5 Query Language Model θ q : p(w|  q ) Data ½=0.4 Mining ½=0.4 Clustering =0.1 … ? p(w|  q’ ) text =0.039 mining =0.028 clustering =0.01 … data = 0.001 computing = 0.0005 … Similarity function Smoothed Doc LM θ d' : p(w|  d’ ) 2

3 Smoothing a Document Language Model 3 Retrieval performance  estimate LM  smoothing LM text 4/100 = 0.04 mining 3/100 = 0.03 Assoc. 1/100 = 0.01 clustering 1/100=0.01 … data = 0 computing = 0 … text = 0.039 mining = 0.028 Assoc. = 0.009 clustering =0.01 … data = 0.001 computing = 0.0005 … Assign non-zero prob. to unseen words Estimate a more accurate distribution from sparse data text = 0.038 mining = 0.026 Assoc. = 0.008 clustering =0.01 … data = 0.002 computing = 0.001 …

4 Previous Work on Smoothing d Collection d Clusters d Nearest Neighbors Interpolate MLE with Reference LM Estimate a Reference language model θ ref using the collection (corpus) [Ponte & Croft 98] [Liu & Croft 04] [Kurland& Lee 04] 4

5 Problems of Existing Methods Smoothing with Global Background –Ignoring collection structure Smoothing with Document Clusters –Ignoring local structures inside cluster Smoothing using Neighbor Documents –Ignoring global structure Different heuristics on θ ref and interpolation –No clear objective function for optimization –No guidance on how to further improve the existing methods 5

6 Research Questions What is the right corpus structure to use? What are the criteria for a good smoothing method? – Accurate language model? What are we ending up optimizing? Could there be a general optimization framework? 6

7 Our Contribution Formulation of smoothing as optimization over graph structures A general optimization framework for smoothing both document LMs and query LMs Novel instantiations of the framework lead to more effective smoothing methods 7

8 A Graph-based Formulation of Smoothing A novel and general view of smoothing 8 d P(w|d): MLE P(w|d): Smoothed P(w|d) = Surface on top of the Graph projection on a plain Smoothed LM = Smoothed Surface! Collection = Graph (of Documents) Collection P(w|d 1 ) P(w|d 2 ) d1d1 d2d2

9 Covering Existing Models 9 d C1C1 C2C2 C3C3 C4C4 Background Smoothing with Graph Structure Smoothing with Nearest Neighbor - Local Graph Smoothing with Document Clusters - Forest w/ Pseudo docs Smoothing with Global Background - Star graph Collection = Graph Smoothed LM = Smoothed Surfaces

10 Instantiations of the Formulation 10 Language Models to be Smoothed Types of GraphsDocument LMQuery LM Star Graph w/ Background Node [Ponte & Croft 98], [Hiemstra & Kraaij 98], [Miller et al. 99], [ Zhai & Lafferty 01]… N/A Forest w/ Cluster roots [Liu and Croft 04] N/A Local kNN graph [Kurland and Lee 04] [Tao et al. 06] N/A Document Similarity GraphNovelN/A Word Similarity GraphNovel Other graphs??? Document Graphs

11 Smoothing over Word Graphs w P(w u |d)/Deg(u) Smoothed LM = Smoothed Surface! Similarity graph of words Given d, {P(w|d)} = Surface over the word graph! P(w u |d) P(w v |d) 11

12 The General Objective of Smoothing 12 Fidelity to MLE Smoothness of the surface Importance of vertices - Weights of edges (1/dist.)

13 The Optimization Framework 13 Criteria: –Fidelity: keep close to the MLE –Surface Smoothness: local and global consistency –Constraint: Unified optimization objective: Fidelity to MLESmoothness of the surface

14 The Procedure of Smoothing 14 Construct a document/word graph; d Iterative updating Define reasonable w(u) and w(u,v); Additional Dirichlet Smoothing Define reasonable f u Define graph Define surfaces Smooth surfaces

15 Smoothing Language Models using a Document Graph 15 Construct a kNN graph of documents; d w(u): Deg(u) w(u,v): cosine Additional Dirichlet Smoothing f u = p(w|d u ); or f u = s(q, d u ); Document language model: Alternative: Document relevance score : e.g., (Diaz 05)

16 Smoothing Language Models using a Word Graph 16 Construct a kNN graph of words; w w(u): Deg(u) w(u,v): PMI Additional Dirichlet Smoothing fu=fu= Document language model: Query Language Model

17 Intuitive Interpretation – Smoothing using Word Graph 17 w Stationary distribution of a Markov Chain w Writing a document = random walk on the word Markov chain; write down w whenever passing w

18 Intuitive Interpretation – Smoothing using Document Graph d d 10 Absorption Probability to the “1” state Writing a word w in a document = random walk on the doc Markov chain; write down w if reaching “1” Act as neighbors do 18

19 Experiments Data Sets # docs Avg doc length queries# relevant docs AP88-90243k27351-15021829 LA132k290301-4002350 SJMN90k26651-1504881 TREC8528k477401-4504728 19 Liu and Croft ’04 Tao ’06 Smooth Document LM on Document Graph (DMDG) Smooth Document LM on Word Graph (DMWG) Smooth relevance Score on Document Graph (DSDG) Smooth Query LM on word graph (QMWG) Evaluate using MAP

20 Effectiveness of the Framework 20 Data SetsDirichletDMDGDMWG † DSDGQMWG AP88-900.2170.254 *** (+17.1%) 0.252 *** (+16.1%) 0.239 *** (+10.1%) 0.239 (+10.1%) LA0.2470.258 ** (+4.5%) 0.257 ** (+4.5%) 0.251 ** (+1.6%) 0.247 SJMN0.2040.231 *** (+13.2%) 0.229 *** (+12.3%) 0.225 *** (+10.3%) 0.219 (+7.4%) TREC80.2570.271 *** (+5.4%) 0.271 ** (+5.4%) 0.261 (+1.6%) 0.260 (+1.2%) † DMWG: reranking top 3000 results. Usually this yields to a reduced performance than ranking all the documents Wilcoxon test: *, **, *** means significance level 0.1, 0.05, 0.01 Graph-based smoothing >> Baseline Smoothing Doc LM >> relevance score >> Query LM

21 Comparison with Existing Models 21 Data Sets CBDM (Liu and Croft) DELM (Tao et al.) DMDG (1 iteration) AP88-900.2330.2500.2540.252 LA0.2590.2650.2600.258 SJMN0.2170.2270.2350.229 TREC8N/A0.2670.2710.270 Graph-based smoothing > state-of-the-art More iterations > Single iteration (similar to DELM)

22 Combined with Pseudo-Feedback 22 Data SetsFBFB+QMWG AP88-900.2710.273 LA0.2580.267 SJMN0.2450.246 TREC80.2780.280 Data SetsDMWGFBFB+DMWG AP88-900.2520.2660.271 ** LA0.257 0.267 ** SJMN0.2290.2410.249 ** TREC80.2710.2780.292 *** d q w smooth w rerank Top docs

23 Related Work Language modeling in Information Retrieval; smoothing using collection model –(Ponte & Croft 98); (Hiemstra & Kraaij 98); (Miller et al. 99); (Zhai & Lafferty 01), etc. Smoothing using corpus structures –Cluster structure: (Liu & Croft 04), etc. –Nearest Neighbors: (Kurland & Lee 04), (Tao et al. 06) Relevance score propagation (Diaz 05), (Qin et al. 05) Graph-based learning –(Zhu et al. 03); (Zhou et al. 04), etc. 23

24 Conclusions Smoothing language models using document/word graphs A general optimization framework –Various effective instantiations Improved performance over state-of-the-art Future Work: –Combine document graphs with word graphs –Study alternative ways of constructing graphs 24

25 Thanks! 25

26 Parameter Tuning 26 Fast Convergence


Download ppt "A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign."

Similar presentations


Ads by Google