1 Heterogeneous Cross Domain Ranking in Latent Space Bo Wang 1, Jie Tang 2, Wei Fan 3, Songcan Chen 1, Zi Yang 2, Yanzhu Liu 4 1 Nanjing University of.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
1 Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi,
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google)
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Personalized Search Result Diversification via Structured Learning
1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.
1 Bo Wang 1, Jie Tang 2, Wei Fan 3, Songcan Chen 1, Zi Yang 2, Yanzhu Liu 4 1 Nanjing University of Aeronautics and Astronautics 2 Tsinghua University.
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
1 Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi,
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Matching Users and Items Across Domains to Improve the Recommendation Quality Created by: Chung-Yi Li, Shou-De Lin Presented by: I Gde Dharma Nugraha 1.
Comparative Study of Name Disambiguation Problem using a Scalable Blocking-based Framework Byung-Won On, Dongwon Lee, Jaewoo Kang, Prasenjit Mitra JCDL.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Supervised Clustering of Label Ranking Data Mihajlo Grbovic, Nemanja Djuric, Slobodan Vucetic {mihajlo.grbovic, nemanja.djuric,
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Citation Provenance FYP/Research Update WING Meeting 28 Sept 2012 Heng Low Wee 1/5/
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
Self-taught Clustering – an instance of Transfer Unsupervised Learning † Wenyuan Dai joint work with ‡ Qiang Yang, † Gui-Rong Xue, and † Yong Yu † Shanghai.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Unsupervised Streaming Feature Selection in Social Media
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
Queensland University of Technology
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Cross Domain Distribution Adaptation via Kernel Mapping
An Empirical Study of Learning to Rank for Entity Search
Correlative Multi-Label Multi-Instance Image Annotation
Collective Network Linkage across Heterogeneous Social Platforms
Learning to Rank Shubhra kanti karmaker (Santu)
Applying Key Phrase Extraction to aid Invalidity Search
Example: Academic Search
Weakly Learning to Match Experts in Online Community
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Citation-based Extraction of Core Contents from Biomedical Articles
Knowledge Transfer via Multiple Model Local Structure Mapping
Example: Academic Search
Feature Selection for Ranking
Learning to Rank with Ties
Low-Rank Sparse Feature Selection for Patient Similarity Learning
Presentation transcript:

1 Heterogeneous Cross Domain Ranking in Latent Space Bo Wang 1, Jie Tang 2, Wei Fan 3, Songcan Chen 1, Zi Yang 2, Yanzhu Liu 4 1 Nanjing University of Aeronautics and Astronautics 2 Tsinghua University 3 IBM T.J. Watson Research Center, USA 4 Peking University

2 Introduction The web is becoming more and more heterogeneous Ranking is the fundamental problem over web –unsupervised v.s. supervised –homogeneous v.s. heterogeneous

3 Motivation Heterogeneous cross domain ranking Main Challenges 1) How to capture the correlation between heterogeneous objects? 2) How to preserve the preference orders between objects across heterogeneous domains? Main Challenges 1) How to capture the correlation between heterogeneous objects? 2) How to preserve the preference orders between objects across heterogeneous domains?

4 Outline Related Work Heterogeneous cross domain ranking Experiments Conclusion

5 Related Work Learning to rank –Supervised: [Burges, 05] [Herbrich, 00] [Xu and Li, 07] [Yue, 07] –Semi-supervised: [Duh, 08] [Amini, 08] [Hoi and Jin, 08] –Ranking adaptation: [Chen, 08] Transfer learning –Instance-based : [Dai, 07] [Gao, 08] –Feature-based : [Jebara, 04] [Argyriou, 06] [Raina, 07] [Lee, 07] [Blitzer, 06] [Blitzer, 07] –Model-based : [Bonilla, 08]

6 Outline Related Work Heterogeneous cross domain ranking –Basic idea –Proposed algorithm: HCDRank Experiments Conclusion

7 Query: “data mining” Conference Expert Latent Space Source Domain Target Domain mis-ranked pairs

8 The Proposed Algorithm — HCDRank How to optimize?How to define? Non-convex Dual problem

9 alternately optimize matrix M and D O(2T*sN logN) Construct transformation matrix O(d 3 ) learning in latent space O(sN logN) O((2T+1)*sN log(N) + d 3

10 Outline Related Work Heterogeneous cross domain ranking Experiments –Ranking on Homogeneous data –Ranking on Heterogeneous data –Ranking on Heterogeneous tasks Conclusion

11 Experiments Data sets –Homogeneous data set: LETOR_TR 50/75/106 queries with 44/44/25 features for TREC2003_TR, TREC2004_TR and OHSUMED_TR –Heterogeneous academic data set: ArnetMiner.org 14,134 authors, 10,716 papers, and 1,434 conferences –Heterogeneous task data set: 9 queries, 900 experts, 450 best supervisor candidates Evaluation measures –MAP –NDCG

12 Ranking on Homogeneous data LETOR_TR –We made a slight revision of LETOR 2.0 to fit into the cross- domain ranking scenario –three sub datasets: TREC2003_TR, TREC2004_TR, and OHSUMED_TR Baselines

13 Cosine Similarity=0.01 OHSUMED_TR TREC2004_TRTREC2003_TR Cosine Similarity=0.23 Cosine Similarity=0.18

14 Training Time

15 Ranking on Heterogeneous data ArnetMiner data set ( 14,134 authors, 10,716 papers, and 1,434 conferences Training and test data set: –44 most frequent queried keywords from log file Author collection: Libra, Rexa and ArnetMiner Conference collection: Libra, ArnetMiner Ground truth: –Conference: online resources –Expert: two faculty members and five graduate students from CS provided human judgments for expert ranking

16 Feature Definition FeaturesDescription L1-L10Low-level language model features H1-H3High-level language model features S1How many years the conference has been held S2The sum of citation number of the conference during recent 5 years S3The sum of citation number of the conference during recent 10 years S4How many years have passed since his/her first paper S5The sum of citation number of all the publications of one expert S6How many papers have been cited more than 5 times S7How many papers have been cited more than 10 times

17 Expert Finding Results

18 Feature Correlation Analysis

19 Ranking on Heterogeneous tasks Expert finding task v.s. best supervisor finding task Training and test data set: –expert finding task: ranking lists from ArnetMiner or annotated lists –best supervisor finding task: 9 most frequent queries from log file of ArnetMiner For each query, we collected 50 best supervisor candidates, and sent s to 100 researchers for annotation Ground truth: –Collection of feedbacks about the candidates (yes/ no/ not sure)

20 Feature Definition FeaturesDescription L1-L10Low-level language model features H1-H3High-level language model features B1The year he/she published his/her first paper B2The number of papers of an expert B3The number of papers in recent 2 years B4The number of papers in recent 5 years B5The number of citations of all his/her papers B6The number of papers cited more than 5 times B7The number of papers cited more than 10 times B8PageRank score SumCo1-SumCo8The sum of coauthors’ B1-B8 scores AvgCo1-AvgCo8The average of coauthors’ B1-B8 scores SumStu1-SumStu8The sum of his/her advisees’ B1-B8 scores AvgStu1-AvgStu8The average of his/her advisees’ B1-B8 scores

21 Best supervisor finding results

22 Experimental Results

23 Outline Related Work Heterogeneous cross domain ranking Experiments Conclusion

24 Conclusion Formally define the problem of heterogeneous cross domain ranking and propose a general framework We provide a preferred solution under the regularized framework by simultaneously minimizing two ranking loss functions in two domains The experimental results on three different genres of data sets verified the effectiveness of the proposed algorithm

25 Data Set

26 Ranking on Heterogeneous data A subset of ArnetMiner ( authors, papers, and 1434 conferences 44 most frequent queried keywords from log file Author collection: –For each query, we gathered top 30 experts from Libra, Rexa and ArnetMiner Conference collection: –For each query, we gathered top 30 conferences from Libra and ArntetMiner Ground truth: –Three online resources –Two faculty members and five graduate students from CS provided human judgments

27 Best supervisor finding Training/test set and ground truth –724 mails sent –Fragment of mail 27 – Feedbacks in effect > 82 (increasing) – Rate each candidate by the definite feedbacks (yes/no)

28 Ranking on Heterogeneous tasks For expert finding task, we can use results from ArnetMiner or annotated lists as training data For best supervisor task, 9 most frequent queries from log file of ArnetMiner are used –For each query, we sent s to 100 researchers Top 50 researchers by ArnetMiner Top 50 researchers who start publishing papers only in recent years (91.6% of them are currently graduates or postdoctoral researchers) –Collection of feedbacks 50 best supervisor candidates (yes/ no/ not sure) Also add other candidates –Ground truth