Presentation is loading. Please wait.

Presentation is loading. Please wait.

D IFFERENT A PPROACHES TO C OMMUNITY E VOLUTION P REDICTION IN B LOGOSPHERE Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław.

Similar presentations


Presentation on theme: "D IFFERENT A PPROACHES TO C OMMUNITY E VOLUTION P REDICTION IN B LOGOSPHERE Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław."— Presentation transcript:

1 D IFFERENT A PPROACHES TO C OMMUNITY E VOLUTION P REDICTION IN B LOGOSPHERE Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław Kazienko, Jarosław Kolak

2 O UTLINE : Introduction and motivation Methods of events identification in group evolution: SCGI GED Predicting group evolution in the social network Dataset and experiment setup Classifiers – reminder For each method we will compare results between different classifiers conclusion 2 Different Approaches to Community Evolution Prediction in Blogosphere

3 G ENERAL I DEA Predicting the future direction of community evolution allows to determine which characteristics describing communities have importance from the point of view of their future behavior. 3 Different Approaches to Community Evolution Prediction in Blogosphere

4 M OTIVATION Making decision concerning investing in contact with members of a given community and carrying out actions to achieve a key position in it Allows to determine effective ways of forming opinions. Allows to protect group participants against such activities. 4 Different Approaches to Community Evolution Prediction in Blogosphere

5 INTRODUCTION – PREDICTION Link prediction (Best investigated) link prediction problem: predicting the existence of a link (relation) between two nodes (users) within a social network. Liben-Nowell - focused on path and common neighbours between pair of nodes Lichtenwalter consider degrees and mutual information between them. 5 Different Approaches to Community Evolution Prediction in Blogosphere

6 INTRODUCTION – PREDICTION Link sign prediction - Sign in this context means that predicted relation between users may be positive or negative Symeonidis looked at paths between the node pair and use the notion of similarity to predict the sign Leskovec use degree and mutual information between pair of nodes for link prediction and profits from the theory of balance and status to predict the link sign. Richter and Wai-Ho faced the very important task of churn prediction (the number of individuals moving out of a collective over a specific period of time). Richter presented a new approach and tried to predict churn based on analysis of group behavior. This approach touches another aspect, not well studied yet, where evolution of the whole group is being predicted, i.e. which event will be next in group lifetime. 6 Different Approaches to Community Evolution Prediction in Blogosphere

7 P REDICTION OF THE GROUP EVOLUTION. What is a group? Set of vertices which communicate to each other more frequently than with vertices outside of a group A new method for future event prediction has been developed - based on stable group changes identification algorithm (SGCI) has been developed Prediction in this method is being made based on previous events in group lifetime extracted by SGCI group profile described by group size, cohesion, leadership and density 7 Different Approaches to Community Evolution Prediction in Blogosphere

8 METHODS OF EVENTS IDENTIFICATION IN GROUP EVOLUTION 8 Different Approaches to Community Evolution Prediction in Blogosphere

9 SGCI ALGORITHM Stable group changes identification Step 1. Identification of fugitive groups in the separate time frames. Whole network is divided into time frames In each time frame the method of finding communities in network is applied. Step 2. Identification of group continuation – assigning transitions between groups in neighboring time steps. After extracting communities in time frames: The communities from neighboring time frames are matched and algorithm assigns transitions between them (from group in time frame t to group in time frame t+1) 9 Different Approaches to Community Evolution Prediction in Blogosphere

10 SGCI ALGORITHM Algorithm for stable group changes identification Step 1. Identification of fugitive groups in the separate time frames. Whole network is divided into time frames In each time frame the method of finding communities in network is applied. Step 2. Identification of group continuation – assigning transitions between groups in neighboring time steps. After extracting communities in time frames: The communities from neighboring time frames are matched and algorithm assigns transitions between them (from group in time frame t to group in time frame t+1) 10 Different Approaches to Community Evolution Prediction in Blogosphere For each pair of non-empty groups A,B from neighboring time slots we will calculate: MJ (- Modified Jaccard Measure) ds (- difference in size) If MJ(A,B) is above a defined threshold and ds(A,B) between these groups is no more than specified, then the algorithm make transition between these groups. For each pair of non-empty groups A,B from neighboring time slots we will calculate: MJ (- Modified Jaccard Measure) ds (- difference in size) If MJ(A,B) is above a defined threshold and ds(A,B) between these groups is no more than specified, then the algorithm make transition between these groups.

11 SGCI ALGORITHM Step 3. Separation of the stable groups (lasting for at least required subsequent time steps). In this step, the stable groups are retrieved. Step 4. Identification of types of group changes. Assigning events describing the change of the state of the group to the transitions. Each transition between stable groups from neighboring time frames. We can define some types of group changes (A and B are the groups from the first and the second time transitions accordingly). sh and dh are some thresholds. 11 Different Approaches to Community Evolution Prediction in Blogosphere

12 SGCI ALGORITHM addition - when a small group attaches to a large one: deletion - when a small group detaches from a large one: merge - many groups in one time frame form a new larger group in the next time frame. split – group divides into some smaller groups in next time frame. split_merge - occurs when a group divides into at least 2 groups in the next time frame and one of this groups from next time frame is a result of merging with another from a previous time frame. constancy - simple transition without significant change of the group size: change size – simple transition with the change of the group size: decay - group does not exist in next time frame. 12 Different Approaches to Community Evolution Prediction in Blogosphere dh

13 SGCI ALGORITHM For a given group it is possible to match more than one event from this group to groups in the next time frame. Some events can coexist with other ones but some of them cannot. Constancy event, can’t coexist with change size, merge or split event, Constancy event, can coexist with addition or deletion events. The addition and the deletion events can coexist with each event type, except the decay event. The decay event is always a single event for the group. 13 Different Approaches to Community Evolution Prediction in Blogosphere

14 GED: G ROUP E VOLUTION D ISCOVERY For GED method we will calculate inclusion measure. It allows to evaluate the inclusion of one group in another. The inclusion of group G1 in group G2 is: NI G 1 (x) – the importance of the node x in group G 1. The GED method takes into account both the quantity and quality of the group members. * Quantity can be expressed by any user importance measure e.g. centrality degree, betweenness degree, page rank, social position etc. 14 Different Approaches to Community Evolution Prediction in Blogosphere group quality group quantity *

15 PREDICTING GROUP EVOLUTION IN THE SOCIAL NETWORK 15 Different Approaches to Community Evolution Prediction in Blogosphere

16 P REDICTING G ROUP E VOLUTION U SING SGCI R ESULTS This approach for prediction future events of groups employs classifier. Structure: sequences of 3 states of groups (present time and two previous times) 16 Different Approaches to Community Evolution Prediction in Blogosphere

17 P REDICTING G ROUP E VOLUTION U SING SGCI R ESULTS Measures for the state of each group: leadership - measure describing centralization in graph or group (the largest value is for star network) d - max means maximum value of degree in group n - number of nodes in group. density - measure expressing how many connections between nodes are present in network in relation to all possible connections between them [16] where a(i,j) =1 when there is connection from node i to node j 17 Different Approaches to Community Evolution Prediction in Blogosphere

18 P REDICTING G ROUP E VOLUTION U SING SGCI R ESULTS Measures for the state of each group – cont.: cohesion - measure characterizing strength of connections inside group in relation to connections outside group (from group members) where w is function assigning weight between nodes, G is group, n - number of nodes in group and N - number of nodes in network group size - number of nodes in group 18 Different Approaches to Community Evolution Prediction in Blogosphere

19 P REDICTING G ROUP E VOLUTION U SING SGCI R ESULTS Described sequence of group states is an input for classifier. The predicted variable is the dominating next event for the last group in a sequence. 19 Different Approaches to Community Evolution Prediction in Blogosphere

20 P REDICTING G ROUP E VOLUTION U SING SGCI R ESULTS Dominating event - one of events assigned for a given group. The event with the highest priority among the assigned events is chosen. We use the following order of events (from the highest priority to the lowest one): constancy, change size, split, merge, addition, deletion, split_merge, decay. 20 Different Approaches to Community Evolution Prediction in Blogosphere

21 P REDICTING G ROUP E VOLUTION U SING SGCI R ESULTS The group Gn,1 has two assigned events: change size and addition, so the dominating event for group Gn,1 is change size because this event has higher priority. 21 Different Approaches to Community Evolution Prediction in Blogosphere

22 P REDICTING G ROUP E VOLUTION U SING GED R ESULTS The idea is using a simple sequence as an input for the classifier: preceding groups profiles and events. The learnt model will be able to produce very good results even for simple classifiers The sequences of groups sizes and events between time frames can be extracted from the GED results. For each event - four group profiles in four previous time frames together with three associated events are identified as the input for the classification model, separately for each group. A single group in a given time frame (Tn) is a case (instance) for classification, for which its event TnTn+1 is being predicted. 22 Different Approaches to Community Evolution Prediction in Blogosphere

23 P REDICTING G ROUP E VOLUTION U SING GED R ESULTS The sequence presented in Figure 2 is used as an input for classification. The first part of the sequence is used as input features (variables): the group profiles per timeframe and the event types between them. The goal of classification is to predict (classify) Event T n T n+1 type – out of the six possible classes: growing, continuing, shrinking, dissolving, and splitting. Forming was excluded since it can only start the sequence. 23 Different Approaches to Community Evolution Prediction in Blogosphere

24 Dataset description: DATASET AND EXPERIMENT SETUP 24 Different Approaches to Community Evolution Prediction in Blogosphere Data from which contains many blogs (mainly political) Data from which contains many blogs (mainly political) 26,722 users 285,532 posts 4,173,457 comments For tests we will use half of the data set: 04/04/2010 – 31/03/2012 Each time frame lasts 7 days Time frames overlap each other by 4 days Yields a total of 182 time frames

25 DATASET AND EXPERIMENT SETUP Group extraction: After separation of time frames the groups were extracted in each of the time frames. Done using CPM method (CPMd version) from CFinder tool (http://www.cfinder.org/) for k=5. CFinder is a tool for finding and visualizing overlapping dense groups of nodes in networks, based on the Clique Percolation Method (CPM) 25 Different Approaches to Community Evolution Prediction in Blogosphere

26 DATASET AND EXPERIMENT SETUP Group sizes As we can notice in Figure 3 there are many small groups and groups with size 5 outnumber other ones. 26 Different Approaches to Community Evolution Prediction in Blogosphere

27 DATASET AND EXPERIMENT SETUP Experiment setup: SGCI method experiments were conducted using following parameters: MJ=0.5, ds=50, sh=10 and dh=0.05. GED method was run on the dataset with all combination of GED parameters from the set: Quantity: {50%, 60%, 70%, 80%, 90%, 100%}. Quality (node importance): social position measure was utilized (measure similar to page rank). 27 Different Approaches to Community Evolution Prediction in Blogosphere Reminder: group quality group quantity *

28 DATASET AND EXPERIMENT SETUP Experiment setup: To describe the group profile, its size, density, cohesion and leadership were used Seven different classifiers were utilized with default settings All classifiers were utilized for both approaches: SGCI and GED 28 Different Approaches to Community Evolution Prediction in Blogosphere

29 DATASET AND EXPERIMENT SETUP Classifiers – reminder: What is a classifier? Adaptive system that learns to perform the best action given its input - identifying to which of a set of categories (sub-populations) a new observation belongs. What Is Multiclass Classification? Each training point belongs to one of N di ff erent classes. The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs 29 Different Approaches to Community Evolution Prediction in Blogosphere

30 DATASET AND EXPERIMENT SETUP Multi-Class Classification: direct approaches : Nearest Neighbor Generative approach & Naïve Bayes Linear classification: Multi-label classification: 30 Different Approaches to Community Evolution Prediction in Blogosphere Is it eatable? Is it sweet? Is it a fruit? Is it a banana? Is it an apple? Is it an orange? Is it a pineapple? Is it a banana? Is it yellow? Is it sweet? Is it round? Nested/ HierarchicalExclusive/ Multi-class General/Structured

31 DATASET AND EXPERIMENT SETUP Multi-Class Classification – real world examples: 31 Different Approaches to Community Evolution Prediction in Blogosphere Object recognition 100 Automated protein classification Digit recognition 10 Phoneme recognition

32 DATASET AND EXPERIMENT SETUP A Simple Idea — One-vs-All Classification Pick a good technique for building binary classifiers. Build N different binary classifiers. For the i’th classifier, let the positive examples be all the points in class i, and let the negative examples be all the points not in class i. Let fi be the i’th classifier. Classify with single classifier is trained per class to distinguish that class from all other classes 32 Different Approaches to Community Evolution Prediction in Blogosphere

33 DATASET AND EXPERIMENT SETUP 33 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7

34 DATASET AND EXPERIMENT SETUP 34 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A

35 DATASET AND EXPERIMENT SETUP 35 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY

36 DATASET AND EXPERIMENT SETUP 36 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY <0.2

37 DATASET AND EXPERIMENT SETUP 37 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY <0.2 B B

38 DATASET AND EXPERIMENT SETUP 38 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY COHISION >0.2 <0.2 B B

39 DATASET AND EXPERIMENT SETUP 39 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY <0.2 COHISION >0.2 B B >0.8

40 DATASET AND EXPERIMENT SETUP 40 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY <0.2 COHISION >0.2 B B >0.8 B B

41 DATASET AND EXPERIMENT SETUP 41 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY <0.2 COHISION >0.2 B B >0.8 B B <0.8 GROUP SIZE

42 DATASET AND EXPERIMENT SETUP 42 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY <0.2 COHISION >0.2 B B >0.8 B B <0.8 GROUP SIZE <10

43 DATASET AND EXPERIMENT SETUP 43 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY <0.2 COHISION >0.2 B B >0.8 B B <0.8 GROUP SIZE <10 C C

44 DATASET AND EXPERIMENT SETUP 44 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY <0.2 COHISION >0.2 B B >0.8 B B <0.8 GROUP SIZE <10 C C >10

45 DATASET AND EXPERIMENT SETUP 45 Different Approaches to Community Evolution Prediction in Blogosphere LeadershipDensityCohesionGroup sizeGroup A 0, A 0, B A C B B C A B B A LEADERSHIP >0.7 A A <0.7 DENSITY <0.2 COHISION >0.2 B B >0.8 B B <0.8 GROUP SIZE <10 C C >10 A A

46 EXPERIMENTS Predicting Group Evolution Using SGCI Results The measure selected is F-measure (AKA F1-measure) – represents accuracy of result program's precision = program's recall = The F measure is: 46 Different Approaches to Community Evolution Prediction in Blogosphere

47 EXPERIMENTS Predicting Group Evolution Using SGCI Results Results of prediction events for different classifiers: Tree classifiers (J48, Random Forest and Simple CART) and Decision Table (rule classifier) achieved the best results. Notably worse results are for Naive Bayes and IBk. 47 Different Approaches to Community Evolution Prediction in Blogosphere

48 EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont. Results of classification for 3 tree classifiers. One can see that results for these 3 classifiers are very similar - the biggest difference is for the decay event which seemed harder to classify. Other events are well classified. Results of event classification for decision tree classifiers 48 Different Approaches to Community Evolution Prediction in Blogosphere

49 Predicting Group Evolution Using SGCI Results – cont. Results of prediction obtained by probabilistic classifiers. BayesNet achieved quite good results, but NaiveBayes much worse. Explenatuon: this classifier is based on assumption of independence features used to classification task. This requirement is not met because some values of one measure are correlated with values of another measure e.g. generally density has higher values for smaller groups. Results of event classification for probabilistic classifiers EXPERIMENTS 49 Different Approaches to Community Evolution Prediction in Blogosphere

50 EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont. Here we can see results for other tested classifiers. Decay event is significantly worse classified than other events (as seen before). The Ibk classifier accomplished worse results of prediction than DecisionTable one. For Ibk classifier the hardest event to classify seemed to be constancy. Results of event classification for other classifiers 50 Different Approaches to Community Evolution Prediction in Blogosphere

51 EXPERIMENTS Predicting Group Evolution Using SGCI Results – cont. Most popular event is the addition event (there is significantly more events of this type than other types of events). This is why this event is very well classified for each tested classifier. The percentage of events in dataset. 51 Different Approaches to Community Evolution Prediction in Blogosphere

52 EXPERIMENTS Predicting Group Evolution Using GED Results F-measure comparison for all event types (classes) and all classifiers. 3 tree classifiers achieved the best results (the worst F-measure value is 0.57 for continuing) From the rest the Decision Table also achieved quite good results. F-measure for each event type (class) and each classifier. 52 Different Approaches to Community Evolution Prediction in Blogosphere

53 EXPERIMENTS Predicting Group Evolution Using GED Results – cont. Results of event classification for decision tree classifiers 53 Different Approaches to Community Evolution Prediction in Blogosphere

54 EXPERIMENTS Predicting Group Evolution Using GED Results – cont. Results of event classification for probabilistic classifiers 54 Different Approaches to Community Evolution Prediction in Blogosphere

55 EXPERIMENTS Predicting Group Evolution Using GED Results – cont. Results of event classification for probabilistic classifiers 55 Different Approaches to Community Evolution Prediction in Blogosphere

56 EXPERIMENTS Predicting Group Evolution Using GED Results – cont. Results of event classification for other classifiers 56 Different Approaches to Community Evolution Prediction in Blogosphere

57 EXPERIMENTS Predicting Group Evolution Using GED Results – cont. each classifier achieves the best results for splitting, merging and dissolving events and the worst for continuing, shrinking and growing. Why? 57 Different Approaches to Community Evolution Prediction in Blogosphere because of uneven distribution of different event types instances

58 EXPERIMENTS The number of splitting events is much higher than for the rest of events probably because the time frame size is too short for the most communities and they continuously splits and merge as service users migrates from one topic to another. For the merging and dissolving events, most classifiers are able to produce very good results, despite the fact that they constitute only a small fraction of all events. 58 Different Approaches to Community Evolution Prediction in Blogosphere

59 DISCUSSION, CONCLUSIONS AND FUTURE WORK The new method for future event prediction based on SGCI algorithm presented with comparison to the method based on GED algorithm. A high level of prediction quality was obtained using both presented methods 59 Different Approaches to Community Evolution Prediction in Blogosphere

60 DISCUSSION, CONCLUSIONS AND FUTURE WORK The best results: In the case of both methods, best results were obtained using different decision tree classifiers. The worst results: In the SGCI method - using Naive Bayesian classifier In GED.- Naive Bayesian and Bayes Network classifiers. 60 Different Approaches to Community Evolution Prediction in Blogosphere

61 THE END Different Approaches to Community Evolution Prediction in Blogosphere 61


Download ppt "D IFFERENT A PPROACHES TO C OMMUNITY E VOLUTION P REDICTION IN B LOGOSPHERE Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław."

Similar presentations


Ads by Google