Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cross-domain Link Prediction and Recommendation

Similar presentations


Presentation on theme: "Cross-domain Link Prediction and Recommendation"— Presentation transcript:

1 Cross-domain Link Prediction and Recommendation
Jie Tang Department of Computer Science and Technology Tsinghua University 1

2 Networked World 1.26 billion users 700 billion minutes/month
280 million users 80% of users are 80-90’s 555 million users .5 billion tweets/day 560 million users influencing our daily life 79 million users per month 9.65 billion items/year 800 million users ~50% revenue from network life 500 million users 35 billion on 11/11

3 Challenge: Big Social Data
We generate 2.5x1018 byte big data per day. Big social data: 90% of the data was generated in the past 2 yrs Mining in single data center  mining deep knowledge from multiple data sources

4 Social Networks Info. Space vs. Social Space Info. Space Social Space
Opinion Mining Info. Space Interaction Innovation diffusion Social Space Understanding the mechanisms of interaction dynamics Business intelligence

5 Core Research in Social Network
Application Information Diffusion Search Prediction Advertise Macro Meso Micro Social Network Analysis BA model ER model Community Group behavior Structural hole Social tie Action Social influence Theory Social Theories Algorithmic Foundations BIG Social Data

6 (KDD 2010, PKDD 2011 Best Runnerup)
Part A: Let us start with a simple case “inferring social ties in single network” (KDD 2010, PKDD 2011 Best Runnerup)

7 Real social networks are complex...
Nobody exists merely in one social network. Public network vs. private network Business network vs. family network However, existing networks (e.g., Facebook and Twitter) are trying to lump everyone into one big network FB/QQ tries to solve this problem via lists/groups however… Google circles I do not think you want to talk about that you went to a bar in last weekend with your family network or your business network.

8 Even complex than we imaged!
Only 16% of mobile phone users in Europe have created custom contact groups users do not take the time to create it users do not know how to circle their friends The Problem is that online social network are black white…

9 Example 1. From BW to Color (KDD’10)

10 Example 2. From BW to Color (PKDD’11, Best Paper Runnerup)
Enterprise network CEO How to infer Manager Employee User interactions may form implicit groups

11 What is behind? Publication network Twitter’s following network
From Home 08:40 From Office 11:35 Both in office 08:00 – 18:00 15:20 From Outside 21:30 17:55 Twitter’s following network Mobile communication network

12 What is behind? Questions: - What are the fundamental forces behind?
- A generalized framework for inferring social ties? - How to connect the different networks? Publication network From Home 08:40 From Office 11:35 Both in office 08:00 – 18:00 15:20 From Outside 21:30 17:55 Twitter’s following network What are the fundamental forces behind that form the structure of different networks? Mobile communication network

13 inferring social ties in single network
Learning Framework

14 Dynamic collaborative network
Problem Analysis Dynamic collaborative network Labeled network Output: potential types of relationships and their probabilities: (type, prob, [s_time, e_time]) [1] C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu, and J. Guo. Mining Advisor-Advisee Relationships from Research Publication Networks. KDD'10, pages

15 Overall Framework ai: author i pj: paper j py: paper year pn: paper#
1 2 ai: author i pj: paper j py: paper year pn: paper# sti,yi: starting time edi,yi: ending time ri,yi: probability 4 3 Filter the candidates using the constraints. For remaining candidates, model the joint probability of all the hidden variables. Time-constrained Probabilistic Factor Graph (TPFG). Compute the result by maximizing the joint probability. Forward-backward propagation. The problem is cast as, for each node, identifying which neighbor has the highest probability to be his/her advisor, i.e., P(yi=j |xi, x~i, y), where xj and xi are neighbors.

16 Time-constrained Probabilistic Factor Graph (TPFG)
Hidden variable yx: ax’s advisor stx,yx: starting time edx,yx: ending time g(yx, stx, edx) is pairwise local feature fx(yx,Zx)= max g(yx , stx, edx) under time constraint Yx: set of potential advisors of ax Suppose g(Ax, Sx, Ex | X) is predefined local feature. Maximize P=∏x\in X g(Ax, Sx, Ex | X) Under the constraint EAx < Sx < Ex for each x A simplification: maximize P({yx})=∏x fx (yx| {z|x\in Yz}) fx(yx| Z)= max g(yx, stx,yx, edx,yx | X) The problem becomes optimization on {yx} ri,j = max P({yx} | yi=j)=∏x fx (yx| {z|x\in Yz})

17 Maximum likelihood estimation
A general likelihood objective func can be defined as where Φ(.) can be instantiated in different ways, e.g.,

18 Inference algorithm of TPFG
rij = max P(y1, , yna|yi = j) = exp (sentij + recvij) Exact inference: max-product+ junctiontree Low efficiency Consumes a lot of memory Our algorithm: approximation. Message passing. O(∑d2) Other approach (different model): regression, heuristics

19 Results of Model 1 DBLP data: 654, 628 authors, 1,076,946 publications, years provided. Ground truth: MathGenealogy Project; AI Genealogy Project; Faculty Homepage Datasets RULE SVM IndMAX Model 1 TEST1 69.9% 73.4% 75.2% 78.9% 80.2% 84.4% TEST2 69.8% 74.6% 79.0% 81.5% 84.3% TEST3 80.6% 86.7% 83.1% 90.9% 88.8% 91.3% Testing data: 3 sets. Manually copied from advisor’s homepage Manually labeled by Jingyi Guo Crawled from mathematics genealogy project by Yuntao Jia TEST1=Teacher(257)+ MathGP(1909)+ Colleague(2166) TEST2=PHD(100)+ MathGP(1909)+ Colleague(4351) TEST3=AIGP(666)+ Colleague(459) How to do prediction according to ranking? θ): Top k potential advisors, with r>θ heuristics Supervised learning Empirical parameter optimized parameter

20 Results [1] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. ArnetMiner: Extraction and Mining of Academic Social Networks. KDD’08, pages

21 Part B: Extend the problem to cross-domain “cross-domain collaboration recommendation”
(KDD 2012, WSDM

22 Cross-domain Collaboration
Interdisciplinary collaborations have generated huge impact, for example, 51 (>1/3) of the KDD 2012 papers are result of cross-domain collaborations between graph theory, visualization, economics, medical inf., DB, NLP, IR Research field evolution Biology collaborations between biology and computer science revolutionized the field of bioinformatics. Because of the cross-domain collaborations, originally extremely expensive tasks such DNA sequence analysis have become scalable and affordable to a much broader population. bioinformatics Computer Science [1] J. Tang, S. Wu, J. Sun, and H. Su. Cross-domain Collaboration Recommendation. KDD’12, pages (Full Presentation & Best Poster Award)

23 Cross-domain Collaboration (cont.)
Increasing trend of cross-domain collaborations Data Mining(DM), Medical Informatics(MI), Theory(TH), Visualization(VIS) increasing trend of cross-domain collaborations over the past fifteen years across different domains in a publication database (Cf. § 4 for details). In most of the cases, there exists a clear increasing trend of the cross-domain collaborations.

24 Challenges Sparse Connection: <1% Data Mining 1 Theory ? ?
Large graph ? ? Automata theory heterogeneous network Complementary expertise Graph theory 2 Sociall network Complexity theory Describe the problem First, sparse connection, compared to traditional collaborations within a domain, cross-domain collaborations are very rare. partly because it is difficult for an outsider to find the right collaborator in the field that one does not know. This also makes it challenging to directly use a supervised learning approach due to the lack of training samples. Second, complementary expertise, cross-domain collaborators often have different expertise and interest; For example, data mining researchers can easily identify who they want to work with in the data mining field, because they are familiar with the sub topics. However, for a cardiologist who wants to apply data mining techniques to predict heart failures, it will be difficult for her to find the right collaborators in data mining. Because these two fields (cardiology and data mining) are totally different with different terminology and problems. It is very difficult for one from cardiology to identify the right topics in data mining to look for collaborators. Third, topic skewness, not all topics are relevant for crossdomain collaborations. In fact, in our study, only less than one-tenth of all possible topics pairs across domains have collaborations. Therefore, for the task of cross-domain collaboration recommendation, we should focus on better modeling those topics with high probability of having cross-domain collaborations Topic skewness: 9% 3

25 Related Work-Collaboration recommendation
Collaborative topic modeling for recommending papers C. Wang and D.M. Blei. [2011] On social networks and collaborative recommendation I. Konstas, V. Stathopoulos, and J. M. Jose. [2009] CollabSeer: a search engine for collaboration discovery H.-H. Chen, L. Gou, X. Zhang, and C. L. Giles. [2007] Referral web: Combining social networks and collaborative filtering H. Kautz, B. Selman, and M. Shah. [1997] Fab: content-based, collaborative recommendation M. Balabanovi and Y. Shoham. [1997]

26 Related Work-Expert finding and matching
Topic level expertise search over heterogeneous networks J. Tang, J. Zhang, R. Jin, Z. Yang, K. Cai, L. Zhang, and Z. Su. [2011] Formal models for expert finding in enterprise corpora K. Balog, L. Azzopardi, and M.de Rijke. [2006] Expertise modeling for matching papers with reviewers D. Mimno and A. McCallum. [2007] On optimization of expertise matching with various constraints W. Tang, J. Tang, T. Lei, C. Tan, B. Gao, and T. Li. [2012]

27 cross-domain collaboration recommendation
Approach Framework —Cross-domain Topic Learning

28 Author Matching … … Data Mining Medical Informatics GS GT v1 v'1 v2
Cross-domain coauthorships Author v1 v'1 v2 v'2 Coauthorships vN Based on the historic cross-domain collaborations, we create a collaboration graph, as shown in Figure 2(a). The problem is to rank relevant nodes in the target domain GT for a given query node vq in the source domain GS. Measuring the relatedness of two nodes in the graph can be achieved using the Random Walks with Restarts (RWR) theory [22, 28]. Starting from node vq, a RWR is performed by following a link to another node according to the weight of the link at each step.1 Also, in every step, there is a probability to return the node vq. The relatedness score of node vi wrt node vq is defined as the stationary probability ri that the random walk will finally arrive node vi, The problem of the author matching method is that the connections across the two domains are rare. This makes it difficulty for …. Another problem for the The author matching method is that it only considers the network structure information, but ignores the content (topic) information. v' N' vq Query user

29 Recall Random Walk Let us begin with PageRank[1] 0.2 0.2 3 4 2 0.2 5
? ? 3 4 2 ? ? 5 1 ( * *1/3+0.2) *0.2 [1] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report SIDL-WP , Stanford University, 1999.

30 Random Walk with Restart[1]
0.1 1/3 3 0.1 4 2 1/3 1/3 0.15 0.25 q 1 1 Uq=1 0.4 [1] J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly detection in bipartite graphs. In ICDM’05, pages 418–425, 2005.

31 Author Matching Sparse connection! … … Data Mining Medical Informatics
GT Cross-domain coauthorships Sparse connection! Author v1 v'1 1 v2 v'2 Coauthorships vN Based on the historic cross-domain collaborations, we create a collaboration graph, as shown in Figure 2(a). The problem is to rank relevant nodes in the target domain GT for a given query node vq in the source domain GS. Measuring the relatedness of two nodes in the graph can be achieved using the Random Walks with Restarts (RWR) theory [22, 28]. Starting from node vq, a RWR is performed by following a link to another node according to the weight of the link at each step.1 Also, in every step, there is a probability to return the node vq. The relatedness score of node vi wrt node vq is defined as the stationary probability ri that the random walk will finally arrive node vi, The problem of the author matching method is that the connections across the two domains are rare. This makes it difficulty for …. Another problem for the The author matching method is that it only considers the network structure information, but ignores the content (topic) information. v' N' vq Query user

32 Topic Matching … … … … Complementary Expertise! Topic skewness!
Topics Extraction Data Mining Medical Informatics Topics Topics Complementary Expertise! Topic skewness! GS GT z1 2 z'1 v1 3 v'1 z2 z'2 v2 v'2 z3 z'3 vN First, we apply the topic model such as LDA or ACT model to the source and the target domains respectively and obtain two sets of topic distributions. Then we estimate the alignment between topics of these two domains. We calculate the alignment according to the historic cross-domain collaborations. zT v' N' z'T vq Topics correlations

33 Recall Topic Model Usage of a theme: Summarize topics/subtopics
Navigate documents Retrieve documents Segment documents All other tasks involving unigram language models

34 Topic Model w1 w2 wW … d1 d2 dD z1 z2 zZ P(d) P(z|d) P(w|z)
A generative model for generating the co-occurrence of documents d∈D={d1,…,dD} and terms w∈W={w1,…,wW}, which associates latent variable z∈Z={z1,…,zZ}. The generative processing is: w1 w2 wW d1 d2 dD z1 z2 zZ P(d) P(z|d) P(w|z) [1] T. Hofmann. Probabilistic latent semantic indexing. SIGIR’99, pages 50–57, 1999.

35 Topic Model w1 w2 wW d1 d2 dD z1 z2 zZ P(d) P(z|d) P(w|z)

36 Maximum-likelihood Definition
We have a density function P(x|Θ) that is govened by the set of parameters Θ, e.g., P might be a set of Gaussians and Θ could be the means and covariances We also have a data set X={x1,…,xN}, supposedly drawn from this distribution P, and assume these data vectors are i.i.d. with P. Then the log-likehihood function is: The log-likelihood is thought of as a function of the parameters Θ where the data X is fixed. Our goal is to find the Θ that maximizes L. That is

37 Topic Model Following the likelihood principle, we determines P(d), P(z|d), and P(w|d) by maximization of the log-likelihood function co-occurrence times of d and w. Which is obtained according to the multi-distribution P(d), P(z|d), and P(w|d) Unobserved data Observed data

38 Jensen’s Inequality Recall that f is a convex function if f ”(x)≥0, and f is strictly convex function if f ”(x)>0 Let f be a convex function, and let X be a random variable, then: Moreover, if f is strictly convex, then E[f(X)]=f(EX) holds true if and only if X=E[X] with probability 1 (i.e., if X is a constant)

39 Basic EM Algorithm However, Optimizing the likelihood function is analytically intractable but when the likelihood function can be simplified by assuming the existence of and values for additional but missing (or hidden) parameters: Maximizing L(Θ) explicitly might be difficult, and the strategy is to instead repeatedly construct a lower-bound on L(E-step), and then optimize that lower bound (M-step). For each i, let Qi be some distribution over z (∑zQi(z)=1, Qi(z)≥0), then The above derivation used Jensen’s inequality. Specifically, f(x) = logx is a concave function, since f”(x)=-1/x2<0

40 Parameter Estimation-Using EM
According to Basic EM: Then we define Thus according to Jensen’s inequality

41 (1)Solve P(w|z) We introduce Lagrange multiplier λwith the constraint that ∑wP(w|z)=1, and solve the following equation:

42 The final update Equations
E-step: M-step:

43 PLSI(SIGIR’99) z w Nd D d Document Topic Word
Alpha, beta: for words that don’t appear, we don’t want to give them 0 prob == smoothing [1] T. Hofmann. Probabilistic latent semantic indexing. SIGIR’99, pages 50–57, 1999. 43

44 LDA (JMLR’03) D α z Nd w β Document specific distribution over topics
θ Document specific distribution over topics D Document α z Nd Topic Φ T Topic distribution over words w Word Alpha, beta: for words that don’t appear, we don’t want to give them 0 prob == smoothing β [1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993–1022, 2003.

45 Cross-domain Topic Learning
Identify “cross-domain” Topics Data Mining Medical Informatics Topics GS GT z1 v1 v'1 z2 v2 v'2 z3 The topic matching method does not discriminate the “collaboration” topics from those topics existing in only one domain. As a result, the “irrelevant” topics (irrelevant to collaboration) may hurt the collaboration recommendation performance. We develop a new topic modeling approach called Cross-domain Topic Learning (CTL) to model topics of the source domain and the target domain simultaneously. vN v' N' zK vq

46 Collaboration Topics Extraction
Step 1: Step 2: The basic idea here is to use two correlated generative processes to model the source and the target domains together. The first process is tomodel documents written by authors from single domain (either source or target). The second process is to model collaborated documents. For each word in a collaborated document, we use a Bernoulli distribution to determine whether it is generated from a “collaboration” topic or a topic-specific to one domain only.

47 Intuitive explanation of Step 2 in CTL
Collaboration topics Figure 4 illustrates an example of CTL learning. Before CTL learning, each author only has topic distribution in either source or target domain (zero probability on topics from the other domain). Then, CTL smoothes topics distributions across the two domains. Users from the source domain will also have a probability over topics extracted from the target domain, and vice versa. After training the CTL model, we also obtain a set of “collaboration topics” between the two domains, i.e., topics with the highest posterior probabilities P(z|s = 0, ·) (or P(z|s = 0, ·) > ) in the collaborated documents. (Here, · indicates all the other parameters we should consider when calculating the probability.) For example in right hand side of Figure 4, the box indicates those collaboration topics.

48 cross-domain collaboration recommendation
Experiments

49 Data Set and Baselines Arnetminer (available at Baselines Content Similarity(Content) Collaborative Filtering(CF) Hybrid Katz Author Matching(Author), Topic Matching(Topic) Domain Authors Relationships Source Data Mining 6,282 22,862 KDD, SDM, ICDM, WSDM, PKDD Medical Informatics 9,150 31,851 JAMIA, JBI, AIM, TMI, TITB Theory 5,449 27,712 STOC, FOCS, SODA Visualization 5,268 19,261 CVPR, ICCV, VAST, TVCG, IV Database 7,590 37,592 SIGMOD, VLDB, ICDE We consider the following five sub-domains: • Data Mining: We use papers of the following data mining conferences: KDD, SDM, ICDM, WSDM and PKDD, which result in a network with 6,282 authors and 22,862 co-author relationships. Content Similarity(Content) Based on similarity between authors’ publications Collaborative Filtering(CF) Leverage the existing collaborations to make the recommendation. Hybrid Linear combination of the scores obtained by the Content and the CF methods. Katz Best link predictor in link-prediction problem for social networks Author Matching(Author) Based on the random walk with restart on the collaboration graph Topic Matching(Topic) Combining the extracted topics into the random walking algorithm

50 Data Mining(S) to Theory(T)
Performance Analysis Training: collaboration before Validation: Cross Domain ALG MAP ARHR -10 -20 Data Mining(S) to Theory(T) Content 10.3 10.2 10.9 31.4 4.9 2.1 CF 15.6 13.3 23.1 26.2 2.8 Hybrid 17.4 19.1 20.0 29.5 5.0 2.4 Author 27.2 22.3 25.7 32.4 10.1 6.4 Topic 28.0 26.0 33.5 13.4 7.1 Katz 30.4 29.8 21.6 27.4 11.2 5.9 CTL 37.7 36.4 40.6 35.6 14.3 7.5 Content Similarity(Content): based on similarity between authors’ publications Collaborative Filtering(CF): based on existing collaborations Hybrid: a linear combination of the scores obtained by the Content and the CF methods. Katz: the best link predictor in link-prediction problem for social networks Author Matching(Author): based on the random walk with restart on the collaboration graph Topic Matching(Topic): combining the extracted topics into the random walking algorithm

51 Performance on New Collaboration Prediction
CTL can still maintain about 0.3 in terms of MAP which is significantly higher than baselines.

52 Parameter Analysis (a) varying the number of topics T (b) varying α parameter (c) varying the restart parameter τ in the random walk (d) Convergence analysis

53 Prototype System http://arnetminer.org/collaborator
Treemap: representing subtopic in the target domain Recommend Collaborators & Their relevant publications

54 Part C: Further incorporate user feedback “interactive collaboration recommendation”
(ACM TKDD, TIST, WSDM )

55 Example Finding co-inventors in IBM (>300,000 employers)
Luo Gang Philip S. Yu Kun-Lung Wu Jimeng Sun Ching-Yung Lin Milind R Naphade Refined Recommendations Recommended collaborators by interactive learning Kun-Lung Wu is matching to me Ching-Yung Lin Milind R Naphade Find me a partner to collaborate on Healthcare… Jimeng Sun Philip is not a healthcare people Luo Gang Kun-Lung Wu Philip S. Yu Recommend Candidates Interactive feedback Existing co-inventors Recommendation [1] S. Wu, J. Sun, and J. Tang. Patent Partner Recommendation in Enterprise Social Networks. WSDM’13, pages

56 Challenges What are the fundamental factors that influence the formation of co-invention relationships? How to design an interactive mechanism so that the user can provide feedback to the system to refine the recommendations? How to learn the interactive recommendation framework in an online mode? In this work, we study the problem of recommending patent partners in enterprise social networks. We address three challenges: First, what are the fundamental factors that influence the formation of co-invention relationships? Will inventors with similar interests tend to collaborate, or they collaborate because they have complementary research interest, or simply due to the geographical proximity? Second, how to design an interactive mechanism so that the user can provide feedback to the system to refine the recommendations? Third, technically, how to learn the interactive recommendation framework in an online mode?

57 interactive collaboration recommendation
Learning framework

58 RankFG Model constraint
Random variable constraint Social correlation factor function Pairwise factor function Map each pair to a node in the graphical model Recommended collaborator The problem is cast as, for each relationship, identifying which type has the highest probability.

59 Modeling with exponential family
Partially Labeled Model Likelihood objective function

60 Ranking Factor Graphs Pairwise factor function:
Correlation factor function: Log-likelihood objective function: Model learning

61 Learning Algorithm Expectation Computing Loopy Belief Propagation

62 How to incrementally incorporate users’ feedback?
Still Challenge How to incrementally incorporate users’ feedback?

63 Learning Algorithm Incremental estimation

64 Interactive Learning add new factor nodes to the factor graph built in the model learning process. 2) 𝑙-step message passing: Start from the new variable node (root node). Send messages to all of its neighborhood factors. Propagate the messages up to 𝑙-step Perform a backward messages passing. 3) Calculate an approximate value of the marginal probabilities of the newly factors. New variable New factor node

65 From passive interactive to active
Entropy Threshold Influence model [1] Z. Yang, J. Tang, and B. Xu. Active Learning for Networked Data Based on Non-progressive Diffusion Model. WSDM’14. [2] L. Shi, Y. Zhao, and J. Tang. Batch Mode Active Learning for Networked Data. ACM Transactions on Intelligent Systems and Technology (TIST), Volume 3, Issue 2 (2012), Pages 33:1--33:25.

66 Active learning via Non-progressive diffusion model
Maximizing the diffusion NP-hard!

67 MinSS Greedily expand Vp

68 MinSS(cont.)

69 Lower Bound and Upper Bound

70 Approximation Ratio

71 interactive collaboration recommendation
Experiments

72 Average increase #patent Average increase #co-invention
Data Set PatentMiner (pminer.org) Baselines: Content Similarity (Content) Collaborative Filtering (CF) Hybrid SVM-Rank DataSet Inventors Patents Average increase #patent Average increase #co-invention IBM 55,967 46,782 8.26% 11.9% Intel 18,264 54,095 18.8% 35.5% Sony 8,505 31,569 11.7% 13/0% Exxon 19,174 53,671 10.6% 14.7% [1] J. Tang, B. Wang, Y. Yang, P. Hu, Y. Zhao, X. Yan, B. Gao, M. Huang, P. Xu, W. Li, and A. K. Usadi. PatentMiner: Topic-driven Patent Analysis and Mining. KDD’12, pages

73 Performance Analysis-IBM
Training: collaboration before Validation: Data ALG MAP IBM Content 23.0 23.3 18.8 15.6 24.0 33.7 CF 13.8 12.8 11.3 11.5 21.7 36.4 Hybrid 13.9 21.8 36.7 SVMRank 13.3 11.9 9.6 9.8 22.2 43.5 RankFG 31.1 27.5 25.6 22.4 40.5 46.8 RankFG+ 31.2 26.6 22.9 42.1 51.0 RankFG+: it uses the proposed RankFG model with 1% interactive feedback.

74 Interactive Learning Analysis
The result of interactive learning is very similar to complete learning when the got 0.6% feedback. Interactive learning achieves a close performance to the complete learning with only 1/100 of the running time used for complete training.

75 Parameter Analysis Factor contribution analysis Convergence analysis
RankFG-C: stands for ignoring referral chaining factor functions. RankFG-CH: stands for ignoring both referral chaining and homophily. RankFG-CHR: stands for further ignoring recency.

76 Results of Active Learning

77 Summaries Inferring social ties in single network
Time-dependent factor graph model Cross-domain collaboration recommendation Cross-domain topic learning Interactive collaboration recommendation Ranking factor graph model Active learning via non-progressive diffusion

78 ? Future Work ? ? Family Inferring social ties Friend Reciprocity
You Lady Gaga You Lady Gaga You You Triadic Closure ? Lady Gaga Shiteng Lady Gaga Shiteng

79 References Tiancheng Lou, Jie Tang, John Hopcroft, Zhanpeng Fang, Xiaowen Ding. Learning to Predict Reciprocity and Triadic Closure in Social Networks. In TKDD, 2013. Yi Cai, Ho-fung Leung, Qing Li, Hao Han, Jie Tang, Juanzi Li. Typicality-based Collaborative Filtering Recommendation. IEEE Transaction on Knowledge and Data Engineering (TKDE). Honglei Zhuang, Jie Tang, Wenbin Tang, Tiancheng Lou, Alvin Chin, and Xia Wang. Actively Learning to Infer Social Ties. DMKD, Vol. 25, Issue 2 (2012), pages Lixin Shi, Yuhang Zhao, and Jie Tang. Batch Mode Active Learning for Networked Data. ACM Transactions on Intelligent Systems and Technology (TIST), Volume 3, Issue 2 (2012), Pages 33:1--33:25. Jie Tang, Jing Zhang, Ruoming Jin, Zi Yang, Keke Cai, Li Zhang, and Zhong Su. Topic Level Expertise Search over Heterogeneous Networks. Machine Learning Journal, Vol. 82, Issue 2 (2011), pages Zhilin Yang, Jie Tang, and Bin Xu. Active Learning for Networked Data Based on Non-progressive Diffusion Model. WSDM’14. Sen Wu, Jimeng Sun, and Jie Tang. Patent Partner Recommendation in Enterprise Social Networks. WSDM’13, pages Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. Cross-domain Collaboration Recommendation. KDD’12, pages (Full Presentation & Best Poster Award) Jie Tang, Bo Wang, Yang Yang, Po Hu, Yanting Zhao, Xinyu Yan, Bo Gao, Minlie Huang, Peng Xu, Weichang Li, and Adam K. Usadi. PatentMiner: Topic-driven Patent Analysis and Mining. KDD’12, pages Jie Tang, Tiancheng Lou, and Jon Kleinberg. Inferring Social Ties across Heterogeneous Networks. WSDM’12, pages Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu, and Jingyi Guo. Mining Advisor-Advisee Relationships from Research Publication Networks. KDD'10, pages Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. KDD’08, pages

80 Thank you! Collaborators: John Hopcroft, Jon Kleinberg (Cornell)
Jiawei Han and Chi Wang (UIUC) Tiancheng Lou (Google) Jimeng Sun (IBM) Jing Zhang, Zhanpeng Fang, Zi Yang, Sen Wu (THU) Jie Tang, KEG, Tsinghua U, Download all data & Codes,


Download ppt "Cross-domain Link Prediction and Recommendation"

Similar presentations


Ads by Google