CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Advanced Piloting Cruise Plot.
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
1 Discreteness and the Welfare Cost of Labour Supply Tax Distortions Keshab Bhattarai University of Hull and John Whalley Universities of Warwick and Western.
2010 fotografiert von Jürgen Roßberg © Fr 1 Sa 2 So 3 Mo 4 Di 5 Mi 6 Do 7 Fr 8 Sa 9 So 10 Mo 11 Di 12 Mi 13 Do 14 Fr 15 Sa 16 So 17 Mo 18 Di 19.
ZMQS ZMQS
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
ABC Technology Project
EU market situation for eggs and poultry Management Committee 20 October 2011.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
2 |SharePoint Saturday New York City
Green Eggs and Ham.
VOORBLAD.
15. Oktober Oktober Oktober 2012.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Sets Sets © 2005 Richard A. Medeiros next Patterns.
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.4 Polynomials in Several Variables Copyright © 2013, 2009, 2006 Pearson Education, Inc.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
1 Using one or more of your senses to gather information.
H to shape fully developed personality to shape fully developed personality for successful application in life for successful.
Januar MDMDFSSMDMDFSSS
Week 1.
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
VPN AND REMOTE ACCESS Mohammad S. Hasan 1 VPN and Remote Access.
Essential Cell Biology
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
CpSc 3220 Designing a Database
Presentation transcript:

CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1

Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 2

Background Homogeneous networks One type of objects One type of links Examples: Friendship network in Facebook Link prediction in homogeneous networks Predict whether a link between two objects will appear in the future, according to: Topological feature of the network Attribute feature of the objects (usually cannot be fully obtained) 3

Motivation In reality, heterogeneous networks are ubiquitous Multiple types of objects Multiple types of links Examples: bibliographic network movie network From link prediction to relationship prediction A relationship between two objects could be a composition of two or more links E.g., two authors have a co-author relationship if and only if they have co- written a paper Need to re-design topological features in heterogeneous information network Our goal: Study the topological features in heterogeneous networks in predicting the co-author relationship building 4

Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 5

Heterogeneous Information Networks 6

The DBLP Bibliographic Network The underneath meta structure is the network schema: 7

Co-author Relationship Prediction 8

Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 9

The PathPredict Model PathPredict: meta path-based relationship prediction model Propose meta path-based topological features in HIN Topological features used in homogeneous networks cannot be directly used Using logistic regression-based supervised learning methods to learn the coefficients associated with each feature 10

Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 11

Existing Topological Measure in Homogeneous Networks 12

Meta Path-based Topological Features 13

Meta Paths for Co-authorship Prediction in DBLP 14 List of all the meta paths between authors under length 4

Four Meta Path-based Measures 15

Example A-P-V-P-A meta path PC(J,M) = 7 NPC(J,M) = (7+7)/(7+9) RW(J,M) = ½ SRW(J,M) = ½ + 1/16 16

Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 17

Supervised Learning Framework Training: T0-T1 time framework T0: feature collection (x) T1: label of relationship collection (y) Testing: T0’-T1’ time framework, which may have a shift of time compared with training stage 18

Prediction Model 19

Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 20

Experiment Setting 21

Homogeneous measures vs. heterogeneous measures Homogeneous measures: Only consider co-author sub-network : common neighbor; rooted PageRank Consider the whole network and mix all types together: total path count Heterogeneous measure: Heterogeneous path count Heterogeneous Path Count produces the best accuracy! 22

Over Four Datasets 23

Compare among Different Heterogeneous Measures Normalized path count is slightly better and the hybrid measure that combines all measures is the best 24

Over four datasets 25

Impacts of Collaboration Frequency on Different Measures Symmetric random walk is better for predicting frequent co-author relationship 26

Model Generalization Over Time Conclusion: Historical training can help predict future relationship building 27

Learned Significance for Each Topological Feature The co-attending venues and the shared co-authors are very critical in determining two authors’ future collaboration 28

29 Case Studies for Queries

Content Background and motivation Problem definition PathPredict: meta path-based relationship prediction model Meta path-based topological features The supervised learning framework and model Experiments Conclusions 30

Conclusions Problem: Extend link prediction problem in homogeneous networks into relationship prediction in HIN, using co-authorship prediction as a case study Solution Propose meta path-based topological features and measures in HIN Using logistic regression-based supervised learning methods to learn the coefficients associated with each feature Results Hetero. measures beats homo. measures Hybrid measure beats single measures 31

Thank you! 32 Q & A