Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph Data Management Lab School of Computer Science 2012, Bristol, UK.

Similar presentations


Presentation on theme: "Graph Data Management Lab School of Computer Science 2012, Bristol, UK."— Presentation transcript:

1 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Deqing Yang, Yanghua Xiao, Bo Xu, Hanghang Tong, Wei Wang, Sheng Huang School of Computer Science, Fudan University, China ECML-PKDD’2012 Which Topic will You Follow?

2 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Outline  Introduction  Preliminaries  Empirical Study  Modeling

3 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Who are the most appropriate candidates to receive a call-for- paper or call-for-participation? How can you deliver the call-for-paper emails to the authors who are interested in the proposed topic instead of flooding it blindly?

4 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? What session topics should we propose for a conference of next year? Furthermore, how many sessions are necessary for a certain topic?

5 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Can we predict the topic of an author’s next paper?

6 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Basic Idea  Use features of authors in Scientific Collaboration Network (SCN) to model author’s topic- following behavior  Two candidate features Social influence an individual tends to adopt behaviors of his neighbors or friends Homophily the tendency of individuals to choose friends with similar characteristics

7 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Contributions  Verify that social influence and homophily are the two factors determining topic diffusion in SCN  Propose a Multiple Logistic Regression (MLR) model to predict author’s topic-following behavior  Conduct extensive experiments to prove our model has good prediction performance

8 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Outline  Introduction  Preliminaries  Empirical Study  Modeling

9 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Scientific Collaboration Network  SCN A temporal, undirected and edge-weighted graph Vertex: author Edge: coauthoring relationship Edge-weight: number of papers coauthored by the two ends of the edge  Settings DBLP dataset 25 representative topics

10 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Homophily  We use topic similarity to characterize homophily A 25-dim vector u represents an author’s topic history Topic similarity between two authors u and v: Topic similarity between an author u and a group of authors U: is also a 25-dim vector each dimension of which is i-th topic’s paper number published by all users in U

11 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Outline  Introduction  Preliminaries  Empirical Study  Modeling

12 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Driving Forces of Topic-Following  U=U 0 ∪ V 0, U 0 ∩ V 0 =   U 0 : the users who have published papers of a given topic before a certain year  V 0 : U 1 ~U 4 N(u) is neighbor set of u U 1: affected by social influence and homophily U 2 : affected merely by social influence U 3 : affected merely by homophily U 4 : not affected by these two forces

13 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Driving Forces of Topic-Following (cont.)  Two forces are mixed together to impact topic- following  Impacts are time-sensitive and decrease in an exponential way

14 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Social Influence  An author adopts a topic with more probability when more of his neighbors have followed the topic before  It is more probable for an author to follow the topics that have been adopted by his neighbors (direct propagation) who have coauthored more papers with him

15 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Outline  Introduction  Preliminaries  Empirical Study  Modeling

16 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Model Variables  Model selection Two-category classification Multiple Logistic Regression (MLR) model Explanatory Variables Social Influence An author u’s tendency to follow topic s in year t

17 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Explanatory Variables Homophily W.r.t. those users who have followed topic s before t, i.e.,, we measure u’s homophily as Then, the whole MLR model is Baseline ( Anagnostopoulos et al.,2008 ) Model Variables (cont.)

18 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Parameter Estimation  By maximum likelihood (training set in [2004,2008]) β 2 has larger Wald value than β 1 indicating F TS (homophily) is more crucial to impact topic-following behavior than F SI (social influence)

19 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Evaluation Results  Model evaluation Metrics (testing set in 2009) Recall/sensitivity, specificity, precision, accuracy, AUC (Area under ROC curve),  Results for topic XML AUC: 0.743 vs. 0.638

20 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Evaluation Results (cont.)  For other 4 representative topics, MLR outperforms the baseline in both accuracy and F β

21 Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK Which Topic will You Follow? Thank you! Any question is welcome


Download ppt "Graph Data Management Lab School of Computer Science 2012, Bristol, UK."

Similar presentations


Ads by Google