Presentation is loading. Please wait.

Presentation is loading. Please wait.

Passage Retrieval using HMMs HARD 2004 University of Illinois at Urbana-Champaign Jing JiangChengXiang Zhai.

Similar presentations


Presentation on theme: "Passage Retrieval using HMMs HARD 2004 University of Illinois at Urbana-Champaign Jing JiangChengXiang Zhai."— Presentation transcript:

1 Passage Retrieval using HMMs HARD 2004 University of Illinois at Urbana-Champaign Jing JiangChengXiang Zhai

2 Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … its mobile N-Gage game … … … … … …features of a cell phone, MP3-player … … … … … … … … Nokia is the cell phone market leader … Nintendo Co.’s … now works as a videophone … … … … … … which makes mobile and Internet equipment … … … … … … … … … … … … Nintendo has sold more than 10 million Game Boy … Motivation – Variable Length Passages APE20030911.0887APE20030922.0156

3 Nintendo Co.’s … now works as a videophone … … … … … … which makes mobile and Internet equipment … … … … … … … … … … … … Nintendo has sold more than 10 million Game Boy … Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … its mobile N-Gage game … … … … … …features of a cell phone, MP3-player … … … … … … … … Nokia is the cell phone market leader … HARD-422 video game crash APE20030911.0887APE20030922.0156 Motivation – Variable Length Passages document-dependent

4 Nintendo Co.’s … now works as a videophone … … … … … … which makes mobile and Internet equipment … … … … … … … … … … … … Nintendo has sold more than 10 million Game Boy … Nokia, the world’s biggest … acquired Sega … Japanese video game maker, … … … … … … … … … … … … its mobile N-Gage game … … … … … …features of a cell phone, MP3-player … … … … … … … … Nokia is the cell phone market leader … HARD-443 hand-held electronics Motivation – Variable Length Passages query-dependent HARD-422 video game crash APE20030911.0887APE20030922.0156

5 Research Question Passage length is document-dependent query-dependent How to detect variable-length passages?

6 Previous Work on Passage Retrieval Structural or semantic boundary Passage is not query-specific. Fixed-length Passage length is not query-specific. Passage content may not be coherent. Arbitrary – MultiText Only query words are considered. Heuristics are used to reduce search space. HMM-based The method is promising, but previous work didn’t fully explore its potential.

7 HMM-Based Method ww…ww…wwwww…ww document

8 HMM-Based Method relevant passage Q: hand-held electronics p(w|B1) the: 0.060 … cell: 0.00001 mp3: 0.000005 … p(w|R) the: 0.031 cell: 0.033 mp3: 0.016 … p(w|B2) the: 0.060 … cell: 0.00001 mp3: 0.000005 … B1RB2 p(R|B1) = 0.1 p(B2|R) = 0.05 p(B1|B1) = 0.9 p(R|R) = 0.95 p(B2|B2) = 1 HMM: ww…ww…wwwww…ww document

9 HMM-Based Method p(w|B1) the: 0.060 … cell: 0.00001 mp3: 0.000005 … p(w|R) the: 0.031 cell: 0.033 mp3: 0.016 … p(w|B2) the: 0.060 … cell: 0.00001 mp3: 0.000005 … B1RB2 p(R|B1) = 0.1 p(B2|R) = 0.05 p(B1|B1) = 0.9 p(R|R) = 0.95 p(B2|B2) = 1 HMM: BR…BB…RRRRB…BR relevant passage ww…ww…wwwww…ww document Q: hand-held electronics

10 Constructing the HMM B1RB2

11 Constructing the HMM B1RB2E end-of-doc state

12 Constructing the HMM B3 B1QB2E 0.01 0.99 0.005 smoothing achieved by transitions end-of-doc state

13 Constructing the HMM B3 B1FBB2E 0.01 0.99 0.005 expanded query LM to incorporate feedback smoothing achieved by transitions end-of-doc state

14 Constructing the HMM B3 B1FBB2E 0.01 0.99 0.005 expanded query LM to incorporate feedback smoothing achieved by transitions end-of-doc state transition probabilities trained for each document

15 Passage Extension ww…www…wwwww…ww short passage with artificial boundary B3 B1FBB2E ww…www…wwwww…www passage extended to the natural topical boundary ww…www…wwwww…ww true passage

16 Retrieval – Approach 1

17 1 2 3 … n ranking

18 Retrieval – Approach 1 … 1 2 3 n … ranking passage extraction

19 Retrieval – Approach 2 1 2 3 n … ranking passage extraction

20 Retrieval – Our Approach fixed- length passages 1 2 3 n … ranking HMM … 1 2 3 n b0: whole-document ranking, pseudo-feedback f0: 120-word passages, relevance feedback f1: HMM-extended 60- word passages, relevance feedback our focus

21 Passage-Level Results Overall, baseline was the best. RunExplanation BPref @ 12K chars Rec @ 10 psgs Prec @ 10 psgs b0whole-doc, pseudo FB0.27100.25170.1570 f0fixed 120 psgs, rel FB0.20800.10670.2391 f1 fixed 60 psgs + HMM, rel FB 0.18600.14940.1411

22 Effectiveness of HMM method MethodBPref @ 12KPrec @ 12KCharRPrec Fixed 600.12080.16230.0776 Fixed 60 + HMM0.18680.21430.1424 Relative improvement 54.6%32.0%83.5% Fixed 1200.17380.20880.1043 Fixed 120 + HMM0.21310.22650.1562 Relative improvement 22.6%8.48%49.8% HMM method improved performance over fixed- length passages Less improvement if fixed-length closer to optimal length

23 Diagnosis Runs FactorFeedbackRankingHMMOverall b0pseudo FBdocyesN/A f1rel FBpassagenoN/A f1 vs. b0 (BPref@12K) -12.1%-29.7%+10.0%-32.1% non-optimal parameter setting KL-divergence works poorly on passages HMM improves boundaries

24 Discussions and Conclusions HMM method improved the performance over fixed-length passages LM (KL-divergence) method gives worse performance on passage ranking than on document ranking

25 The End Questions?


Download ppt "Passage Retrieval using HMMs HARD 2004 University of Illinois at Urbana-Champaign Jing JiangChengXiang Zhai."

Similar presentations


Ads by Google