Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

A Maximum Coherence Model for Dictionary-based Cross-language Information Retrieval Yi Liu, Rong Jin, Joyce Y. Chai Dept. of Computer Science and Engineering.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

Evaluating Search Engine

Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.

Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.

Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.

MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

Modeling, Searching, and Explaining Abnormal Instances in Multi-Relational Networks Chapter 1. Introduction Speaker: Cheng-Te Li

Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA

 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.

Querying Structured Text in an XML Database By Xuemei Luo.

1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:

Algorithmic Detection of Semantic Similarity WWW 2005.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A N) Supervisor: Dongyuan Lu 1.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.

A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.

Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Using Social Annotations to Improve Language Model for Information Retrieval Shengliang Xu, Shenghua Bao, Yong Yu Shanghai Jiao Tong University Yunbo Cao.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.

CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,

KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.

A Supervised Machine Learning Algorithm for Research Articles Leonidas Akritidis, Panayiotis Bozanis Dept. of Computer & Communication Engineering, University.

Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor ： Dr. Koh Jia-Ling Speaker ： Chou-Bin Fan Date ：

Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.

Sul-Ah Ahn and Youngim Jung * Korea Institute of Science and Technology Information Daejeon, Republic of Korea { snowy; * Corresponding Author: acorn

1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Recommendation in Scholarly Big Data

A Deep Learning Technical Paper Recommender System

Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.

Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.

Applying Key Phrase Extraction to aid Invalidity Search

Learning Literature Search Models from Citation Behavior

Presentation transcript:

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou Sun Presenter: Zubair Amjad

Outline Introduction Preliminaries and Problem Definition Research Methods Context-rich Heterogeneous Network Construction Pseudo Feedback Generation Restricted Meta-Path-Based Ranking with Feedback Combine Different Ranking Features via Learning to Rank Experiment Study Results Conclusion

Introduction Researchers need academic retrieval systems to efficiently locate the scientific publications they are looking for as the candidate citations An innovative publication ranking method with PRF by leveraging a number of meta-paths on the heterogeneous bibliographic graph Paper proposed “Restricted Meta-Path” facilitated by a new context-rich heterogeneous network extracted from full-text publication content along with citation context Twofold contribution An innovative ranking method with PRF by employing a number of meta- paths and learning to rank on the heterogeneous graph for citation recommendation task Use meta-path plus random walk as the PRF ranking functions Can prioritize important publications on the graph based on a number of seed nodes Seed Nodes The publication seed nodes (top ranked papers in the retrieved result) Keyword seed nodes (from user queries)

Introduction Candidate cited papers are ranked based on the likelihood that the paper is written by a relevant paper’s authors A “context-rich heterogeneous graph” is constructed by using full-text publication data along with citation motivation modeling Citation is a node (C) on the graph instead of an edge

Preliminaries and Problem Definition Heterogeneous Information Networks Network with more than one type of node or link Meta-path Meta-path P is a path defined on the graph of network schema T G = (A,R) Example: P - K - P denotes a meta-path between papers who connect together due to shared keyword(s) The goal is to enhance the citation recommendation performance based on a piece of text query and a number of user provided keywords Required Input: A piece of text to briefly summarize the research work Optional Input: Scientific keywords Output: A list of ranked papers could potentially be cited given user’s input

Research Methods Context-rich Heterogeneous Network Construction Citations are extracted in the full-text publication data by using regular expression By using the text window before and after each target citation, citation topical motivation is inferred by using Labeled LDA (LLDA) algorithm Heterogeneous graph is constructed on the basis of this information For any vertex on the graph, the sum of the same type of outgoing links equals 1 Limitation Large number of publications in the corpus do not have keyword metadata Greedy matching is used to generate pseudo-keywords for each paper Using PageRank with Prior algorithm, contribution of each paper is estimated

Pseudo Feedback Generation Two kinds of seed nodes could be employed based on previous studies Explicit relevant keyword nodes from user initial query Pseudo relevant feedback paper nodes from top ranked papers A key parameter should be optimized for pseudo relevance feedback The number of seed paper nodes (fbDocs) When only a few paper seed nodes are utilized The paper seed nodes are more likely to be relevant for the initial query For instance, if too few paper nodes are used The number of selected authors is also small The feedback ranking result could be biased to those top ranked papers’ authors Authors hypothesize that the optimized paper seed numbers are different for different meta-paths

Restricted Meta-Path-Based Ranking with Feedback Feedback generated using the previous approach gives seed nodes in the network that are most relevant to the query Meta-path based ranking functions via heterogeneous graph are then used to find most relevant papers to these seeds Restricted Meta-Path Restricted meta-paths are proposed to confine interested path instances A restricted meta-path can be represented as is a selection operator and means only objects in Ai that satisfies predicate Si will be considered Type A 1 is the type with seeds, and type A l+1 is the type of nodes to be queried

Restricted Meta-Path To quantify the ranking score of candidates relevant to the seeds following the meta-path, a random walk based measure is proposed to compute the relevance between objects in (e.g., the candidate cited papers) and objects in (e.g., the seed papers P) In many cases, the node prior probability needs to be added to the random walk function Combined Restricted Meta-Path Two or multiple parallel meta-paths leading to the same type of query nodes are also considered Similarity from different sets of objects to a result node from different meta-paths is then defined 18 meta-paths investigated in this study

Meta-Path PRF features

Combine Different Ranking Features via Learning to Rank To statistically combine different ranking features while avoiding manual parameter tuning, learning to rank is used A simple algorithm, Coordinate Ascent is used for learning to rank Iteratively optimizes a multivariate objective ranking function, for meta-path PRF feature integration and algorithm evaluation Paper abstract and author provided keywords are used as the initial user query Paper provided references (cited papers) are used as relevant publications MAP or NDCG can be used as the ranking function training and evaluation metrics MAP: binary judgment is provided for each candidate cited paper NDCG estimates the cumulative relevance gain a user receives by examining recommendation results up to a given rank on the list An importance score, 0-4, is used as the candidate cited paper importance to calculate NDCG scores

Experiment Study Data and Preprocessing 41,370 publications (as candidate citation collection) from 111 journals and 1,442 conference proceedings or workshops on computer science were used for the experiment These papers were published between 1951 and 2011 In a total of 223,810 references (paper1 cites paper2 relations), 94,051 references were successfully identified Employed an author disambiguation algorithm to enhance the authorship quality Takes an author’s name, affiliation, , paper title, co-authors, position in the author list as input and matches the author to a canonic author record in the ACM database

LLDA Topic Model Training and Graph Construction 10,000 publications (with full text) were sampled to train the LLDA topic model Author-provided keywords were used as topic labels If a keyword appeared less than 10 times in the selected publications, it was removed from the training topic space For publication content, tokenization was used to extract words from the title, abstract, and publication full text If the word had less than three characters, it was removed Snowball stemming was then employed to extract the root of the target word Most frequently used 100 stemmed words and words that appeared less than three times in the training collection were also removed LLDA model was trained with 3,911 topics (keywords) Heterogeneous graph was constructed with 41,370 paper nodes, 63,323 author nodes, 369 venue nodes, 3,911 keyword (topic) nodes, and 168,554 citation nodes

Pseudo Relevance Feedback Experiment Result Meta-path ranking performance comparison Different meta-paths for pseudo relevance feedback ranking were validated and compared Figure depicts the paper seed number change (fbDocs change from 3 to 100) for ranking MAP performance Three selected meta-paths and KL-divergency Mixture model are compared

Pseudo Relevance Feedback Experiment Result Table compares MAP and NDCG for 18 experimental meta- paths

Important findings Different meta-paths have different optimized paper seed node number KL-Divergence feedback performance is lower than meta-path based PRF, and KL-divergence feedback performance consistently decrease when fbDocs increasing While the performances differ for various meta-paths, in most cases, complex meta-paths outperform simple ones The restrictions, in most cases, are very helpful in enhancing the ranking performance Among all the meta-paths, for different kinds of relationships in the meta-paths, Citation > Author > Venue from ranking performance perspective

PRF ranking integration via learning-to-rank Integrated different ranking functions via learning-to-rank From Table: Some meta-path functions are not well performed compared with others They are still used for ranking integration Learning-to-rank algorithm uses the ranking feature as long as the feature provide new useful ranking information For all the PRF methods (text, PageRank, and meta-path based PRF methods), language model with Dirichlet prior smoothing was employed as the initial ranking algorithm for PRF 10 cross fold validation for learning-to-rank based ranking evaluation were employed

Baseline Feature Groups Employed T: Text ranking features PR: PageRank ranking (query-independent) with homogeneous paper-paper citation relationship T-PRF(5): Text-based pseudo relevance feedback with top 5 feedback papers (fbDocs = 5) T-PRF(10): Top 10 feedback papers (fbDocs = 10) PR-PRF(5): PageRank-based pseudo relevance feedback top 5 feedback papers (fbDocs = 5) PR-PRF(10): PageRank-based pseudo relevance feedback top 10 feedback papers (fbDocs = 10) Experimental ranking feature groups used with meta-path based PRF MP-PRF(5): meta-path based pseudo feedback with top 5 feedback papers (fbDocs = 5) MP-PRF(10): meta-path based pseudo feedback with top 10 feedback papers (fbDocs = 10)

Results

Conclusion This paper proposed a new ranking method with pseudo relevance feedback by investigating a number of hypothesis driven meta-paths on the scholarly heterogeneous graph Propose restricted meta-path and combined meta-path facilitated by an innovative context-rich heterogeneous graph Experiment result with ACM full-text data shows that meta- path based PRF is very effective for scholar citation recommendation task Compared with text-based and PageRank-based PRF Restricted meta-path is found efficient for ranking

Thank You