Presentation is loading. Please wait.

Presentation is loading. Please wait.

Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL 2008 2008/7/91Rick Liu.

Similar presentations


Presentation on theme: "Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL 2008 2008/7/91Rick Liu."— Presentation transcript:

1 Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL 2008 2008/7/91Rick Liu

2 2008/7/92Rick Liu

3  Question Search  Help users to search previous answers 2008/7/93Rick Liu  Any nice hotels in Berlin or Hamburg?  How long does it take to Hamburg from Berlin?  Cheap hotels in Berlin?

4 2008/7/94Rick Liu

5  Identifying question topic & focus  Question tree  Determining the tree cut  Modeling question topic & focus for search  Language model 2008/7/9Rick Liu5

6  Topic terms  BaseNP, WH-ngram  Topic profile  probability distribution of categories  Specificity  inverse of the entropy of the topic profile  Topic chain  topic terms ordered by specificity value (desc)  Topic tree 2008/7/9Rick Liu6

7 2008/7/9Rick Liu7

8  M = ( Γ, θ )  Γ = [ C1, C2,.. Ck ], tree cut  Θ = [ P(C1), P(C2),.. P(Ck) ], prob param vector  A cut is any set of nodes  Σ i=1..k P( Ci ) = 1 2008/7/9Rick Liu8

9 2008/7/9Rick Liu9 [n 0, n 11 ], [n 12, n 21, n 22, n 23 ], [n 13, n 24 ] [n 11, n 21, n 22, n 23, n 24 ]

10 2008/7/9Rick Liu10  Minimum Description Length Ref : Li and Abe, 1998

11 2008/7/9Rick Liu11

12  P( q | q )  q : queried question  q : targeted question 2008/7/9Rick Liu12 ~ ~

13  Yahoo! Answers  Resolved questions  travel : 314,616 items  computers & internet : 210,785 items  Tree fields  title ( only used )  description  answers 2008/7/9Rick Liu13

14  Employed Vector Space Model  Manual judgments : relevant / irrelevant  Baseline : VSM, LMIR  Evaluation : MAP, R-precision, MRR 2008/7/9Rick Liu14

15 2008/7/9Rick Liu15

16 2008/7/9Rick Liu16

17 2008/7/9Rick Liu17

18  Examine the correctness of question topics and question foci  200 queried question => 69 question incorrect  (a) Only have the head part ( 59 )  (b) Incorrect order ( 10 )  (a) explains why λ is 0.7 2008/7/9Rick Liu18

19  FAQ data  Community based  Jeon et al., 2005  Compared four different retrieval methods ▪ Vector space model ▪ Okapi ▪ Language model ▪ Translation-based model  Translation-based model performed the best 2008/7/9Rick Liu19

20  Lexical chasm  Where to stay in Hamburg?  The best hotel in Hamburg?  IBM model 1  Use question titles and question description as the parallel corpus 2008/7/9Rick Liu20

21 2008/7/9Rick Liu21

22 1) Data Structure 2) Use MDL-based Tree Cut Model to Identify 3) A new form of language modeling for question search 4) Extensive experiments 2008/7/9Rick Liu22 Now only community-based From forum sites / FAQ sites

23 2008/7/9Rick Liu23

24 2008/7/9Rick Liu24

25 2008/7/9Rick Liu25


Download ppt "Huizhong Doan, Yonbo Cao, Chin-Yew Lin and Yong Yu Shanghai Jiao Tong University & MSRA ACL 2008 2008/7/91Rick Liu."

Similar presentations


Ads by Google