Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Cross-domain Collaboration Recommendation Jie Tang 1, Sen Wu 1, Jimeng Sun 2, Hang Su 1 1 Tsinghua University 2 IBM TJ Watson Research Center.

Similar presentations


Presentation on theme: "1 Cross-domain Collaboration Recommendation Jie Tang 1, Sen Wu 1, Jimeng Sun 2, Hang Su 1 1 Tsinghua University 2 IBM TJ Watson Research Center."— Presentation transcript:

1 1 Cross-domain Collaboration Recommendation Jie Tang 1, Sen Wu 1, Jimeng Sun 2, Hang Su 1 1 Tsinghua University 2 IBM TJ Watson Research Center

2 2 Cross-domain Collaboration Interdisciplinary collaborations have generated huge impact, for example, –51 (>1/3) of the KDD 2012 papers are result of cross-domain collaborations between graph theory, visualization, economics, medical inf., DB, NLP, IR –Research field evolution Biology Computer Science bioinfor matics

3 3 Cross-domain Collaboration (cont.) Increasing trend of cross-domain collaborations Data Mining(DM), Medical Informatics(MI), Theory(TH), Visualization(VIS)

4 4 Challenges ? ? DataMining Data MiningTheory Sparse Connection: <1% 1 Complementary expertise 2 Topic skewness: 9% 3 Large graph heterogeneous network Sociall network Graph theory Complexity theory Automata theory

5 5 Related Work -Collaboration recommendation Collaborative topic modeling for recommending papers –C. Wang and D.M. Blei. [2011] On social networks and collaborative recommendation –I. Konstas, V. Stathopoulos, and J. M. Jose. [2009] CollabSeer: a search engine for collaboration discovery –H.-H. Chen, L. Gou, X. Zhang, and C. L. Giles. [2007] Referral web: Combining social networks and collaborative filtering –H. Kautz, B. Selman, and M. Shah. [1997] Fab: content-based, collaborative recommendation –M. Balabanovi and Y. Shoham. [1997]

6 6 Related Work -Expert finding and matching Topic level expertise search over heterogeneous networks –J. Tang, J. Zhang, R. Jin, Z. Yang, K. Cai, L. Zhang, and Z. Su. [2011] Formal models for expert finding in enterprise corpora –K. Balog, L. Azzopardi, and M.de Rijke. [2006] Expertise modeling for matching papers with reviewers –D. Mimno and A. McCallum. [2007] On optimization of expertise matching with various constraints –W. Tang, J. Tang, T. Lei, C. Tan, B. Gao, and T. Li. [2012]

7 7 Approach Framework —Cross-domain Topic Learning

8 8 Author Matching GSGS v1v1 v2v2 vNvN vqvq … GTGT v' 1 v' 2 v ' N' … Data Mining Medical Informatics Author Coauthorships Query user Cross-domain coauthorships Sparse connection! 1

9 9 Topic Matching GSGS v1v1 v2v2 vNvN vqvq … GTGT v' 1 v' 2 v ' N' … … … z' 1 z' 2 z' 3 z' T Data MiningMedical Informatics Topics Topics Extraction Topics correlations z1z1 z2z2 z3z3 zTzT Complementary Expertise! Topic skewness! 2 3

10 10 Cross-domain Topic Learning GSGS v1v1 v2v2 vNvN vqvq … GTGT v' 1 v' 2 v ' N' … … z1z1 z2z2 z3z3 zKzK Data MiningMedical Informatics Topics Identify “cross-domain” Topics

11 11 Collaboration Topics Extraction Step 1: Step 2:

12 12 Intuitive explanation of Step 2 in CTL Collaboration topics

13 13 Experiments

14 14 Arnetminer ( available at http://arnetminer.org/collaboration ) http://arnetminer.org/collaboration Baselines –Content Similarity(Content) –Collaborative Filtering(CF) –Hybrid –Katz –Author Matching(Author), Topic Matching(Topic) Data Set and Baselines DomainAuthorsRelationshipsSource Data Mining6,28222,862 KDD, SDM, ICDM, WSDM, PKDD Medical Informatics9,15031,851 JAMIA, JBI, AIM, TMI, TITB Theory5,44927,712 STOC, FOCS, SODA Visualization5,26819,261 CVPR, ICCV, VAST, TVCG, IV Database7,59037,592 SIGMOD, VLDB, ICDE

15 15 Performance Analysis Cross Domain ALGP@10P@20MAPR@100ARHR -10 ARHR -20 Data Mining(S) to Theory(T) Content 10.310.210.931.44.92.1 CF 15.613.323.126.24.92.8 Hybrid 17.419.120.029.55.02.4 Author 27.222.325.732.410.16.4 Topic 28.026.032.433.513.47.1 Katz 30.429.821.627.411.25.9 CTL 37.736.440.635.614.37.5 Content Similarity(Content): based on similarity between authors’ publications Collaborative Filtering(CF): based on existing collaborations Hybrid: a linear combination of the scores obtained by the Content and the CF methods. Katz: the best link predictor in link-prediction problem for social networks Author Matching(Author): based on the random walk with restart on the collaboration graph Topic Matching(Topic): combining the extracted topics into the random walking algorithm Training: collaboration before 2001 Validation: 2001-2005

16 16 Performance on New Collaboration Prediction CTL can still maintain about 0.3 in terms of MAP which is significantly higher than baselines.

17 17 Parameter Analysis (a) varying the number of topics T (b) varying α parameter (c) varying the restart parameter τ in the random walk (d) Convergence analysis

18 18 Prototype System http://arnetminer.org/collaborator http://arnetminer.org/collaborator Treemap: representing subtopic in the target domain Recommend Collaborators & Their relevant publications

19 19 Conclusion Study the problem of cross-domain collaboration recommendation Propose the cross-domain topic model for recommending collaborators Experimental results in a coauthor network demonstrate the effectiveness and efficiency of the proposed approach

20 20 Future work Connect cross-domain collaborative relationships with social theories (e.g. social balance, social status, structural hole) Apply the proposed method to other networks

21 21 System: http://arnetminer.org/collaboratorhttp://arnetminer.org/collaborator Code&Data: http://arnetminer.org/collaborationhttp://arnetminer.org/collaboration Thanks!

22 22 Challenge always be side with opportunity! Sparse connection: –cross-domain collaborations are rare; Complementary expertise: –cross-domain collaborators often have different expertise and interest; Topic skewness: –cross-domain collaboration topics are focused on a subset of topics. How to handling these patterns?

23 23 Performance Analysis Cross Domain ALGP@10P@20MAPR@100ARHR -10 ARHR -20 Medical Info.(S) to Database(T ) Content 10.110.912.545.93.62.1 CF 18.320.221.447.65.33.9 Hybrid 25.026.528.459.16.44.2 Author 26.229.632.254.810.55.4 Topic 29.426.334.759.311.55.2 Katz 27.528.330.757.210.55.0 CTL 32.530.036.959.811.45.4 Content Similarity(Content): based on similarity between authors’ publications Collaborative Filtering(CF): based on existing collaborations Hybrid: a linear combination of the scores obtained by the Content and the CF methods. Katz: the best link predictor in link-prediction problem for social networks Author Matching(Author): based on the random walk with restart on the collaboration graph Topic Matching(Topic): combining the extracted topics into the random walking algorithm

24 24 Performance Analysis Cross Domain ALGP@10P@20MAPR@100ARHR -10 ARHR -20 Medical Info.(S) to Data Mining(T) Content 5.85.79.519.81.90.9 CF 13.717.818.934.32.71.3 Hybrid 18.019.019.836.73.41.3 Author 20.123.829.364.45.32.1 Topic 26.025.033.948.110.75.6 Katz 21.223.832.448.110.24.8 CTL 30.024.035.649.612.26.0 Content Similarity(Content): based on similarity between authors’ publications Collaborative Filtering(CF): based on existing collaborations Hybrid: a linear combination of the scores obtained by the Content and the CF methods. Katz: the best link predictor in link-prediction problem for social networks Author Matching(Author): based on the random walk with restart on the collaboration graph Topic Matching(Topic): combining the extracted topics into the random walking algorithm

25 25 Performance Analysis Cross Domain ALGP@10P@20MAPR@100ARHR -10 ARHR -20 Visual.(S) to Data Mining(T) Content 9.611.813.218.93.11.8 CF 14.020.826.429.46.94.3 Hybrid 16.020.027.630.16.34.4 Author 22.025.227.731.111.96.7 Topic 26.325.032.331.413.28.8 Katz 23.025.129.330.210.45.4 CTL 28.326.032.836.314.09.1 Content Similarity(Content): based on similarity between authors’ publications Collaborative Filtering(CF): based on existing collaborations Hybrid: a linear combination of the scores obtained by the Content and the CF methods. Katz: the best link predictor in link-prediction problem for social networks Author Matching(Author): based on the random walk with restart on the collaboration graph Topic Matching(Topic): combining the extracted topics into the random walking algorithm


Download ppt "1 Cross-domain Collaboration Recommendation Jie Tang 1, Sen Wu 1, Jimeng Sun 2, Hang Su 1 1 Tsinghua University 2 IBM TJ Watson Research Center."

Similar presentations


Ads by Google