1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

Slides:

Advertisements

Similar presentations

Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!

Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.

Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.

Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.

1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.

A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1

IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.

Ph.D. Thesis Defense 1 Web Mining Techniques for Query Log Analysis and Expertise Retrieval Hongbo Deng Department of Computer Science and Engineering.

Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.

Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

1 PageSim: A Link-based Similarity Measure for the World Wide Web Zhenjiang Lin, Irwin King, and Michael, R., Lyu Computer Science & Engineering, The Chinese.

Scalable Text Mining with Sparse Generative Models

Federated Search of Text Search Engines in Uncooperative Environments Luo Si Language Technology Institute School of Computer Science Carnegie Mellon University.

1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.

SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.

1 Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi,

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.

Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

11 Learning to Suggest Questions in Online Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.

Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.

Multilingual Relevant Sentence Detection Using Reference Corpus Ming-Hung Hsu, Ming-Feng Tsai, Hsin-Hsi Chen Department of CSIE National Taiwan University.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. A language modeling framework for expert finding Presenter : Lin, Shu-Han Authors : Krisztian Balog,

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG,

Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.

AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Dependence Language Model for Information Retrieval Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao, Dependence Language Model for Information Retrieval,

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June , 2008, Columbus Ohio.

Hongbo Deng, Michael R. Lyu and Irwin King

1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

PERSONALIZED DIVERSIFICATION OF SEARCH RESULTS Date: 2013/04/15 Author: David Vallet, Pablo Castells Source: SIGIR’12 Advisor: Dr.Jia-ling, Koh Speaker:

DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.

Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.

The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Doctoral Thesis Presentation Mohammed Nazim Uddin Dept. of Computer Science & Information Engineering, INHA University, Korea Advisor: Professor Geun-Sik.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

WSRec: A Collaborative Filtering Based Web Service Recommender System

An Empirical Study of Learning to Rank for Entity Search

Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD

A research literature search engine with abbreviation recognition

Applying Key Phrase Extraction to aid Invalidity Search

Weakly Learning to Match Experts in Online Community

Zhenjiang Lin, Michael R. Lyu and Irwin King

Retrieval Performance Evaluation - Measures

Presentation transcript:

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Dec. 16, 2008 ICDM2008

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Introduction  Traditional information retrieval  Expert finding task Data mining

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Outline  Introduction  Related work  Methodology Modeling Expertise Statistical language model Topic-based model Hybrid model  Experiments  Conclusions

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Introduction  Expert finding received increased interest W3C collection in 2005 and 2006 (introduced and used by TREC) CSIRO collection in 2007  Nearly all of the work has been evaluated on the W3C collection  We address the expert finding task in a real world academic field An important practical problem Some special problems and difficulties II. Introduction

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Problems  How to represent the expertise of a researcher? The publications of a researcher  How to identify experts for a given query? Relevance between a query and publications Publications act as the “bridge” between query and experts  What dataset can be used? DBLP bibliography ( limited information) Use Google Scholar as a data supplement  How to measure the relevance between a query and docs? Language model, vector space model, etc.  Should we treat each publication equally? II. Introduction

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Our Work  Our setting: DBLP bibliography and Google scholar More than 955,000 articles with over 574,000 authors About 20GB metadata crawled from Google Scholar  Differ from the W3C setting Cover a wider range of topics Contain much more expert candidates  Applications Find experts for consultation on a new research field Assign papers to reviewers automatically Recommend panels of reviews for grant applications II. Introduction

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Related Work  Document model & Candidate model (Balog et al., SIGIR’06 & SIGIR’07)  Hierarchical language models (Petkova and Croft, ICTAI’06)  Voting model (Macdonald and Ounis, CIKM’06)  Author-Persona-Topic model (Mimno and McCallum, KDD’07)  ……  They do not consider the importance of documents. Hardly to be used in large-scale expert finding.

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Expertise Modeling  Expert finding p(ca|q): what is the probability of a candidate ca being an expert given the query topic q? Rank candidates ca according to this probability.  Approach: Using Bayes’ theorem, where p(ca, q) is joint probability of a candidate and a query, p(q) is the probability of a query. III. Methodology

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Expertise Modeling  Problem: How to estimate p(ca, q)? Model 1: Statistical language model  Document-based approach  Find out the experts from the associated publications Model 2: Topic-based model  Association between the query with several similar topics Model 3: Hybrid model  Combination of Model1 and Model2 III. Methodology

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM III. Model 1: Statistical language model Basic Language Model  The probability p l (ca,q): Language Model Conditionally independent Fig1. Baseline model Find out documents relevant to the query Model the knowledge of an expert from the associated documents

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Weighted Language Model Fig2. A query example Fig3. Weighted model III. Model 1: Statistical language model

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Topic-based Model  Observation: researchers usually describe their expertise as a combination of several topics  Each candidate is represented as a weighted sum of multiple topics Z Similarity between query and topics z -> as a query estimate p III. Model 2: Topic-based model Fig4. Topic-based model

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Topic-based Model Information retrieval 1. Introduction to Modern Information retrieval 2. Information retrieval 3. Modern Information retrieval 5. A language modeling approach to information retrieval 7. Information filtering and information retrieval …… 99. Cross-language information retrieval 100. On modeling information retrieval with probabilistic inference Topic z Google Scholar θ z represent III. Model 2: Topic-based model

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Topic-based Model  Challenge: What similar topics would be selected?  T1: Calculate p(q|θ z ), select the top K ranked topics Assume topics are independent  Ideal similar topics: Include topics from many different subtopics Not include topics with high redundancy Define a conditional probability function to quantify the novelty and penalize the redundancy of a topic  T2:  T3: III. Model 2: Topic-based model

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Topic Selection Algorithm T2: T3: III. Model 2: Topic-based model

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Hybrid Model  Aggregate the advantage of the p l and p t  Defined as: III. Model 3: Hybrid model

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Experiments  DBLP Collection Limitation  No abstract and index terms  Hard to represent the document Representation for documents  Use Google Scholar for data supplementation Title as query, crawled top 10 returned records Up to 20 GB metadata (HTML pages) The citation number of the publication IV. Experiments

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Topic Collection  2,498 well-defined topics from eventseer  Crawl the top 100 returned records from Google Scholar IV. Experiments

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Benchmark Dataset  A benchmark dataset with 7 topics and expert lists IV. Experiments

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Evaluation Metrics  Precision at rank n  Mean Average Precision (MAP):  Bpref: The score function of the number of non-relevant candidates IV. Experiments

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Preliminary Experiments  Performed on two corpora using basic language model (B1) “Title” corpus: only using the title “GS” corpus: the representation of Google Scholar  Evaluation results on two corpora (%) More effective to represent d using Google Scholar IV. Experiments

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Model 1: Statistical Language Models  Evaluation results of language modes Weighted language model B3 and B2 outperform B1 Important to consider the prior probability IV. Experiments

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Model 2: Topic-based Models  Vary the number of topics (K) from 5 to 100  Results by using different values for K. The number of topics will be cutoff automatically for T2 & T3 IV. Experiments

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Model 2: Topic-based Models  Comparison of the three topic-based models IV. Experiments

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Model 3: Hybrid Models  Evaluation results of hybrid model Hybrid model outperforms the pure language model and topic-based model in most of the metrics IV. Experiments

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Conclusions and Future Work  Conclusions Address expert finding task in a real world academic field Propose a weighted language model Investigate a topic-base model to interpret the expert finding task Integrate the language model with the topic-based model Demonstrate that hybrid model achieves the best performance in evaluation results  Future work Take into account other types of information Refine the results by utilizing social network analysis

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Q&A Thanks!

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Comparison to Other Systems  Evaluation results of our language models and the method TS

Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong ICDM Example results