Presentation is loading. Please wait.

Presentation is loading. Please wait.

LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A0074403N) Supervisor: Dongyuan Lu 1.

Similar presentations


Presentation on theme: "LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A0074403N) Supervisor: Dongyuan Lu 1."— Presentation transcript:

1 LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A0074403N) Supervisor: Dongyuan Lu 1

2 Introduction Link prediction Introduce future connections within the network scope Co-authorship network A network of collaborations among researchers, scientists, academic writers 2

3 Introduction Potential applications Recommend experts or group of researchers for individual researcher. 3

4 Outline Problem Background Related Work Workflow Conclusion Result Analysis Research plan 4

5 Problem Background What connect researchers together ? Given an instance of co-authorship network: A researcher connect to another if they collaborated on at least one paper. 5 Problem Background Related Work WorkflowConclusion X 2001 Y 2004 X X XYXY

6 Problem Background How to predict the link? Based on criteria: Co-authorship network topology Researcher’s personal information Researcher’s papers Boost up link predictions performance Recommend link should be really relevant to the interest of the authors or at least possible for researcher to collaborate. 6 Problem Background Related Work WorkflowConclusion

7 Related Work Link prediction problems in Social network Liben ‐ Nowell, D., & Kleinberg, J., 2007 Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S., 2013 In social network, interactions among users are very dynamic with: Creation of new link within a few days Deletion or replacement of the existent links Different features present by the two networks Characteristics of individual researcher : citations, affiliations, institutions,... Characteristics of person : marriage status, ages, working places, … 7 Problem Background Related Work WorkflowConclusion

8 Three mainstream approaches for link prediction: Similarity based estimation Liben ‐ Nowell, D., & Kleinberg, J., 2007 Maximum likelihood estimation Murata, T., & Moriyasu, S., 2008 Guimerà, R., & Sales-Pardo, M., 2009 Supervised Learning model Pavlov, M., & Ichise, R., 2007 Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M., 2006 8 Problem Background Related Work WorkflowConclusion

9 Similarity Based Estimation Use metrics to estimate proximities of pairs of researchers Based on those proximities to rank pairs of researchers The top pairs of researchers will likely to be the recommendations. 9 Problem Background Related Work WorkflowConclusion

10 Similarity Based Estimation Network structure based measurement 10 Some conventions: Problem Background Related Work WorkflowConclusion

11 Similarity Based Estimation Common Neighbor: 11 X Y Problem Background Related Work WorkflowConclusion

12 Similarity Based Estimation Jaccard’s coefficient: 12 X Y Problem Background Related Work WorkflowConclusion

13 Similarity Based Estimation Preferential Attachment: 13 X Y Problem Background Related Work WorkflowConclusion

14 Similarity Based Estimation 14 Adamic/Adar: X Y Z Problem Background Related Work WorkflowConclusion

15 Similarity Based Estimation Shortest Path: Defines the minimum number of edges connecting two nodes. 15 PageRank: A random walk on the graph assigning the probability that a node could be reach. The proximity between a pair of node can be determined by the sum of the node PageRank. Problem Background Related Work WorkflowConclusion

16 Maximum Likelihood Estimation Predefine specific rules of a network Required a prior knowledge of the network The likelihood of any non-connected link is calculated according to those rules. 16 Problem Background Related Work WorkflowConclusion

17 Supervised Learning Model Construct dimensional feature vectors Fetch these vectors to classifiers to optimize a target function (training model) Link prediction becomes a binary classification 17 Problem Background Related Work WorkflowConclusion

18 Supervised Learning Model Related work (Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M., 2006) using: Decision Tree SVM (Linear Kernel) K nearest neighbor Multilayer Perceptron Naives Bayes Bagging Combine many classifiers (Pavlov, M., & Ichise, R., 2007) Decision stump + AdaBoost Decision Tree + AdaBoost SMO + AdaBoost 18 Problem Background Related Work WorkflowConclusion

19 Summary Similarity based estimation Not quite well-perform Maximum likelihood Depend on the network Supervised learning model Perform better than similarity based estimation 19 Problem Background Related Work WorkflowConclusion

20 Workflow 20 Problem Background Related Work WorkflowConclusion Classifier ModelFeatures

21 Graph Description Co-authorship graph: Undirected graph G (V, E) Node or Vertex ( Author ) Author ID Author Name Link or Edge (Co-authorship) Pair of author ID List of publication year followed by paper title (Ex: 2004 :”Introduction to …” ) 21 Problem Background Related Work WorkflowConclusion

22 Setting up data Dataset is separated into 2 timing spans: 2000 – 2010 and 2010 – 2013 The first is for training, the latter is for testing. Currently, there are 134,307 researchers in the network 2000 – 2013. Crop out authors who are not available in testing period, remaining 104,265 researchers 22 Problem Background Related Work WorkflowConclusion

23 Setting up data Choose a subset from 104,265 researchers Experiment on 937 researchers 23 2000-20102010-2013 Real Network No of node104,265 No of link413,69135,558 Experiment Network No. of node937 No. of link309357 Problem Background Related Work WorkflowConclusion

24 Baseline Features Extract features from the network structure: Local similarity Common Neighbor Adamic / Adar Preferential Attachment Jaccard’s coefficient Global similarity Shortest Path PageRank 24 Problem Background Related Work WorkflowConclusion

25 Baseline Features Feature for co-authorship network Keyword matching (Cohen, S., & Ebel, L., 2013 ) A suggested metric to measure the textual relavancy uses a TF-IDF based function to determine. 25 Problem Background Related Work WorkflowConclusion

26 Proposed Features Productivity of the authors Observe the “history” of an author For example, at a particular node A: 26 Problem Background Related Work WorkflowConclusion T 2 = 2005T 0 = 2000T 1 = 2004T 3 = 2006 i=0i=1i=2i=3 n=3 m=1 n=4 m=2 n=6 m=2 n=7 m=3 n : No. of shared paper m: No. of collaborators

27 Proposed Features 27 α : a constant to assign the weight of each time period Problem Background Related Work WorkflowConclusion Productivity of the authors Observe the “history” of an author The “productivity” of node A:

28 Training set Set up training data With n nodes, there is possible links. Among those, separate two links Positive link: links appear in training years. Negative link: the remaining non-existent link in training years. Note: Avoid bias training by balancing the number of instances between true and false label. Classify all the non-existent links Compare with the testing data 28 Problem Background Related Work WorkflowConclusion

29 Experimental Results Measurement of performance Precision: Recall: Harmonic mean: 29 New links to predict: 57 links Problem Background Related Work WorkflowConclusion Prediction True LinkFalse Link True Link2631 False Link5,588429,778

30 Result Analysis Possible reasons Features Small set of data – sampling problem Instances of the negative links used for training 30 Problem Background Related Work WorkflowConclusion

31 Research Plan Use weighted graph with parameters: No. of papers No. of neighbor No. of citations Focus on features that specifically target the co-authorship network: Citations Institutions Enlarge the experiment dataset size 31 Thank you Problem Background Related Work WorkflowConclusion

32 References Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social networks, 25(3), 211-230. Al Hasan, M., Chaoji, V., Salem, S., & Zaki, M. (2006). Link prediction using supervised learning. In SDM’06: Workshop on Link Analysis, Counter-terrorism and Security. Liben ‐ Nowell, D., & Kleinberg, J. (2007). The link ‐ prediction problem for social networks. Journal of the American society for information science and technology, 58(7), 1019-1031. Pavlov, M., & Ichise, R. (2007). Finding Experts by Link Prediction in Co- authorship Networks. FEWS, 290, 42-55. Murata, T., & Moriyasu, S. (2008). Link prediction based on structural properties of online social networks. New Generation Computing, 26(3), 245-257. Guimerà, R., & Sales-Pardo, M. (2009). Missing and spurious interactions and the reconstruction of complex networks. Proceedings of the National Academy of Sciences, 106(52), 22073-22078. Bliss, C. A., Frank, M. R., Danforth, C. M., & Dodds, P. S. (2013). An Evolutionary Algorithm Approach to Link Prediction in Dynamic Social Networks. arXiv preprint arXiv:1304.6257. Cohen, S., & Ebel, L. (2013). Recommending collaborators using keywords. In Proceedings of the 22nd international conference on World Wide Web companion 959-962. 32

33 Link per year of training set is greater than link per year of testing set: In testing period, only consider “new” collaborations. Any collaborations between researchers that already has a link will be disregarded. 33 2000-20102010-2013 No of node937 No of link309357

34 Results with different classifiers Classifier Precision (Positive Predictive Value) (%) Recall (Hit rate) (%) F1 (Harmonic mean) (%) Decision Tree0.324.60.5 SMO0.545.60.9 Bagging0.428.10.7 Naive Bayes0.277.20.3 Multilayer Perceptron 0.447.30.8 34

35 Proposed Feature The reason for proposing this feature: Keep track of the researcher tendency Give “bonus” to researcher who tend to collaborate with “new” colleagues rather than “old” ones Also give high score for prolific researchers (based on number of published paper) 35

36 Stochastic Block Model Guimerà, R., & Sales-Pardo, M., 2009 36 Problem Background Related Work WorkflowConclusion

37 Stochastic Block Model 37 1 2 3 4 5 6 7 X Y Problem Background Related Work WorkflowConclusion The reliability of an individual link is:


Download ppt "LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A0074403N) Supervisor: Dongyuan Lu 1."

Similar presentations


Ads by Google