Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xintao Wu Jan 18, 2013 Retweeting Behavior and Spectral Graph Analysis in Social Media.

Similar presentations


Presentation on theme: "Xintao Wu Jan 18, 2013 Retweeting Behavior and Spectral Graph Analysis in Social Media."— Presentation transcript:

1 Xintao Wu Jan 18, 2013 Retweeting Behavior and Spectral Graph Analysis in Social Media

2 Social Media Customer Analytics 2 Network topology namesexagediseasesalary AdaF18cancer25k BobM25heart110k … idSexageaddressIncome 5FYNC25k 3MYSC110k Structured profile Retweet sequence Unstructured text (e.g., blog, tweet) Customer profile Customer transaction Inventory Product desc and review … Entity resolution Patterns Temporal/spatial Scalability Visualization Sentiment Privacy

3 Outline Examining retweeting behavior to understand information propagation Multi-factor interaction analysis Coverage prediction Burst detection Spectral graph analysis Community partition Fraud detection 3

4 Multi-factor interaction analysis 4 For each following relationship, what factors affect the user A’s decision on whether to forward messages from B to A’ s followers? We examine users’ retweet behaviors by using various features Power ratio (A) Link structure (B) Location factor (C) Gender factor (D) … We apply a fitted Log-linear model to capture and interpret interaction patterns among features A-D and retweet E.

5 Interpreting interaction effect 5

6 Interpretation example Neither gender nor location has any significant effect on retweeting solely. However, considering link structure, Females are more conservative and have a lower tendency to retweet messages from non-friend (especially female) users, but have a higher tendency to retweet messages from friends or superstars. Males are more open-minded and have a higher tendency to retweet messages from non-friend (especially female) users. 6

7 Outline Examining retweeting behavior to understand information propagation Multi-factor interaction analysis Coverage prediction Burst detection Spectral graph analysis Community partition Fraud detection 7

8 Retweet Sequence Information dynamically flows through the network. 8 Alice Bob Cathy DavidEllenFred D1D2 D3 … … … … … … …… t1m1A

9 Retweet Sequence Information dynamically flows through a social network. 9 Alice Bob Cathy DavidEllenFred D1D2 D3 … … … … … … …… t1m1A t2m2Bt1m1A

10 Flow Through Tree Structure Information dynamically flows through a social network. 10 Alice Bob Cathy DavidEllenFred D1D2 D3 … … … … … … …… t1m1A t2m2Bt1m1A t3m3D\t Bt1m1A

11 Flow Through Tree Structure Information dynamically flows through a social network. 11 Alice Bob Cathy DavidEllenFred D1D2 D3 … … … … … … …… t1m1A t2m2Bt1m1A t3m3D\t Bt1m1A t4m4Ct1m1A …

12 WISE12 Challenge Sina Weibo # of user: 5,636,858 # of tweets: 46,584,914 # of retweets: 190,920,026 33 test messages each with 100 initial retweets composed by 27 users from 6 events For each message, predict M1: the number of retweets in 30 days M2: the number of possible-views in 30 days 12

13 Idea We treat retweeting activities of each original message in the training data as a time series Each value corresponds to the number of times that the original message during time period t For each message in the test data 13 Known from 100 retweets Use ARMA to predict

14 Prediction Result 14 Runner-up award (2 nd place) on WISE 2012 Challenge – Mining Track. Death of Steve Jobs Xiaomi Release Yao Jiaxin Murder Case Xiaomi Release

15 Outline Examining retweeting behavior to understand information propagation Multi-factor interaction analysis Coverage prediction Burst detection Spectral graph analysis Community partition Fraud detection 15

16 Bursts 16 Peak Time Duration Time

17 Topic 17

18 Retweet vs. Time 18

19 Retweet vs. Time 19

20 Burst Analysis : Users Top 100 users tend to have: shorter path length, shorter peak time, shorter duration time. 20

21 Burst Prediction Extract features User related including profile and history information Tweet-related including time series and retweet tree Run classifiers Logistic regression Random forest Decision tree Naïve bayes SVM KNN Achieve 83.2% accuracy 21

22 Outline Examining retweeting behavior to understand information propagation Multi-factor interaction analysis Coverage prediction Burst detection Spectral graph analysis Community partition Fraud detection 22

23 Spectral graph analysis Spectral coordinate: Polbook Network 23

24 Accuracy of AdjCluster Lap [Miller and Teng 1998] : Laplacian based Ncut [Shi and Malik, 2000] : Normalized cut HE’ [Wakita and Tsurumi, 2007] : Modularity based agglomerative clustering SpokEn [Prakash et al., 2010] : EigenSpoke Accuracy: where :the i-th community produced by different algorithms 24 Refer to IJCAI 11 for details

25 Evaluation on Web spam challenge data SPCTRA fraud detection 25 GREEDY: based on outer-triangles [Shrivastava, ICDE, 2008] 100-1000 times faster Refer to ICDE11details.

26 Acknowledgments This work was supported in part by U.S. National Science Foundation CNS- 0831204 and CCF-1047621, and UNC Charlotte Chancellor’s Special Fund. Thank You! Questions? 26


Download ppt "Xintao Wu Jan 18, 2013 Retweeting Behavior and Spectral Graph Analysis in Social Media."

Similar presentations


Ads by Google