Download presentation
Presentation is loading. Please wait.
Published byJohnathan King Modified over 8 years ago
1
A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio
2
Schedule Introduction 1 Fine-grained Expert Search 2 Conclusion 4 Experimental Results 3
3
3 Introduction Expert Search “who is an expert on X?” UserQuery Search Engine Experts Who are experts on Semantic Web Search Engine ?
4
Introduction Pioneering Expert Search Systems Log data in software development Kautz et al., 1996; Mockus and Herbsleb, 2002; McDonald and Ackerman, 1998; etc. Email communications Campbell et al., 2003; Dom et al. 2003; Sihn and Heeren, 2001; etc. General documents Yimam, 1996; Davenport and Prusak, 1998; Steer and Lochbaum, 1988; Mattox et al., 1999; Hertzum and Pejtersen, 2000; Craswell et al., 2001; etc.
5
Introduction Expert Search at TREC A new task at TREC 2005, 2006, 2007 Craswell et al., 2005; Soboroff et al., 2006; Bailey et al., 2007; Many approaches have been proposed Two generative models, Balog et al. 2006 Prior distribution, relevance feedback, Fang et al. 2006 Hierarchical language model, Petkova and Croft 2006 Voting and data fusion, Macdonald and Ounis 2006 …
6
Introduction Coarse-grained approach. Expert search is carried out under a grain of document. Further improvements are hard to achieve Different blocks of electronic documents Different functions and qualities Different impacts for expert search
7
Windowed Section Relation irrelevant Window relevant queried topic 7 Examples
8
Title-Author Relation Title Author Query: Timed Text 8 Examples
9
Reference Section Relation 9 Examples
10
Query: W3C Management Team 10 Examples Section Title-Body Relation
11
Schedule Introduction 1 Fine-grained Expert Search 2 Conclusion 4 Experimental Results 3
12
12 Fine-grained Evidence Who are experts on Semantic Web Search Engine? Fine-grained Expert Search --Evidence Extraction Document-001: “…a high-level plan of the architecture of the semantic web by Tim Berners- Lee… ” “…later, Berners-Lee describes a semantic web search engine experience…” E1: E2: Tim Berners-Lee
13
Fine-grained Expert Search –Search Model (t,p,r,d) Expert Candidate (c) Query (q) Expert Matching Model Evidence Matching Model
14
Fine-grained Expert Search -- Expert Matching MaskSample Full NameRitu Raj Tiwari Email Namertiwari@nuance.com Combined NameTiwari, Ritu R; Abbr. NameRitu Raj ; Ritu Short NameRRT Alias, new emailrtiwari@hotmail.com ( for short)
15
Fine-grained Expert Search -- Evidence Matching TypeSample Query Semantic Web Search Engine Phrase “Semantic Web Search Engine” Bi-gram “Semantic Web” “Search Engine” Proximity “Semantic … Web Search Engine” Fuzzy “Samentic Web Saerch Engine” Stemmed “Semantic Web Search Engin” Relation Type Same Section Windowed Section Reference Section Title-Author Section Title-Body Quality Type Dynamic Quality Static Qualify
16
Schedule Introduction 1 Fine-grained Expert Search 2 Conclusion 4 Experimental Results 3
17
Experimental Result W3C Corpus 331,307 web pages 10 training topics of TREC 2005 50 test topics of TREC 2005 49 test topics of TREC 2006 Evaluation Metrics Mean average precision (MAP) R-precision (R-P) Top N precision (P@N)
18
Experimental Result Query Matching TREC 2005TREC 2006 MAPR-PP@10MAPR-PP@10 Baseline 0.18400.21360.30600.37520.45850.5604 +Bi-gram 0.19570.24380.33200.41400.49100.5799 +Proximity 0.20240.25010.33600.45300.51370.5922 + Fuzzy, Stemmed 0.20300.25010.33600.45800.51120.5901 Improv. 10.33%17.09%9.80%22.07%11.49%5.30% T-test 0.00840.0000
19
Experimental Result Person Matching TREC 2005TREC 2006 MAPR-PP@10MAPR-PP@10 Baseline 0.20300.25010.33600.45800.51120.5901 + Combined Name 0.20560.25390.34630.47090.51520.5931 + Abbr. Name 0.21060.25450.34000.50100.51810.6000 + Short Name 0.21110.25780.34000.51210.51920.6000 + Alias, new email 0.21560.25910.34000.52210.52120.6000 Improv. 6.21%3.60%1.19%14.00%1.96%1.68% T-test 0.00640.0057
20
Experimental Result Multiple Relations TREC 2005TREC 2006 MAPR-PP@10MAPR-PP@10 Baseline 0.21560.25910.34000.52210.52120.6000 +Windowed Section 0.21580.26330.33800.52550.53110.6082 +Reference Section 0.21600.26300.33800.52720.53140.6061 +Title-Author 0.22340.26340.35800.53540.53550.6245 +Section Title-Body 0.25860.31070.37400.56570.56690.6510 Improv. 19.94%19.91%10.00%8.35%8.77%8.50% T-test 0.00130.0043
21
Experimental Result Evidence Quality TREC 2005TREC 2006 MAPR-PP@10MAPR-PP@10 Baseline 0.25860.31070.37400.56570.56690.6510 +Static quality 0.27110.31880.37200.59000.58130.6796 +Dynamic quality 0.27550.32520.38800.59430.58770.7061 Improv. 6.13%4.67%3.74%2.86%3.67%8.61% T-test 0.03600.0252 Rank 1 @TREC0.27490.33300.45200.59470.57830.7041
22
Schedule Introduction 1 Fine-grained Expert Search 2 Conclusion 4 Experimental Results 3
23
Conclusion Fine-grained expert search Probabilistic model and its implementation Evaluation on the TREC data set
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.