A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee.

A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee

Outline of Presentation The Challenge: Dynamic Social Networks Framework and Problem Formulation Individual and Group Colorings Group Coloring Heuristics Experimental Results Future Directions

The Problem Many well-known approaches to identify communities in social networks –Graph Partitioning –Clustering –Various measures of closeness or density But, these approaches generally assume static networks Most social networks are dynamic

Dynamic Social Networks Social Networks change over time –Membership changes –Interaction changes Most community identification techniques: –Use a single snapshot –Or use time-averaged measurements –Lose important information

Importance of Dynamic Information Networks 1 and 2: same average characteristics, but… –Network 1 shows an oscillation –Network 2 suggests that C joins the community AB ABC AB ABC AB ABC time AB AB AB ABC ABC ABC T1 T2 T3 T4 T5 T6 Network 1Network 2

Proposal New framework for modeling social networks over time Algorithms and Heuristics to identify dynamic communities Experiments to verify the concept and the computational performance

Problem Formation Given: –A set of individuals –A sequence of snapshot observations Find: –A best-fit set of time-varying communities C(t) –Best-fit time-varying community membership for each individual Approach: –Combinatorial optimization –Graph coloring

Model: Individuals and Groups Set of individualsX = {i 1, i 2, …i n } Sequence of observations –Discrete time –Record interaction between individuals The set of individuals interacting at time t define a group. –If A interacts with B, and B interacts with C, than {A,B,C} ⊆ a group A B C

Group vs Community Snapshot Graph –Individual is a vertex –Interaction is an edge –Group is a connected subgraph –Assumption: interaction is sufficiently limited so that the graph is not connected (we have disjoint groups) Group ≠ Community –Groups capture observed interaction at a point in time –Communities extend over time

Graphing the Observations Each time slice is one observation Edges within a time slice show observed interaction at time t Add edges joining all observations of the same individual No edges between groups from one time to another ○ = individual □ = group

Refine the Problem A community appears as a sequence of groups, of at most one group per time slice. Tasks: –Assign each group to a community (color the group vertices) –Assign each individual to a community, for each time step (color individual vertices) More Assumptions: –Individuals belong to one community at a time –Individuals don’t change community frequently –Individuals frequently appear in their community

Cost Model Quantify a “good” community identification Assign costs to undesirable behavior: –I-cost:  when an individual changes color. –G-costs:  1 when an individual is absent from its community.  2 when an individual is present in a different community. –C-cost:  for each color that I uses Find a coloring with minimum cost

Coloring Choices and Costs Coloring 1: C changes community and then changes back. –Cost = 2*  (+  if this color hasn’t been used before) Coloring 2: C stays in its original community and just visits. –Cost =  1 +  2 Optimal coloring depends on comparison (  1 +  2) < (2*  +  ) or (2*  ) AB AB C AB AB C time T1 T2 T3 T4 Coloring 1Coloring 2 C C D D D DAB AB C AB AB C C C D D D D At time T3, C temporarily changes its interaction.

Finding Optimal Colorings Finding the optimal solution is NP-hard Partition the problem: 1.Find an optimal set of communities 2.Find optimal assignment of individuals to communities If Phase 1 (Group Coloring) is completed first: –Phase 2 is reduced from O(2 N ) to O(2 G ), N = # of individuals, G = # of groups –The cost incurred by one individual’s coloring is independent of the colors chosen by others.

Independence of Individual Color Choice Proof: Cost of an individual’s behavior = A (I-cost) + B (G-cost) + C * (C-cost) Costs are assessed individually: –I-cost=  ∗ (# of color changes) –G-cost=  1 ∗ (# absences from its group) +  2 ∗ (# visits to other groups) –C-cost=  ∗ (# of colors that an individual uses) So, we can solve for each individual one at a time. Moreover, we can assess cost incrementally, from time t to time t+1…

Individual Coloring Algorithm C = set of all colors observed to be used by an individual i  (t) = {S ⊆ C: 1 ≤ |S| ≤ t} all possible subsets of colors up to time t G(t,x)= G-cost to use color x at time t I(t,x,y)= I-cost to use color x at time t-1 and color y at time t C(x,R)= C-cost to use color x when color set R has been used Min. cost at time t, using color x, with color set S used: At time=1:  (I, {x}, x) = G(1,x) At time=t:  (t, S, x) = G(t, x) + min [  (t-1, R, y) + I(t, x, y) + C(x, R) ] over all R and y, where R ∈  (t-1), y ∈ R R U {x} = S, i-cost: changing color g-cost: wrong group c-cost: new color

Optimal Individual Coloring Given a group coloring, the minimum cost of coloring the individual I is min  (T, S, x) S ∈  (T), x ∈ S Time complexity is O( nT|C| 2 2 |C| ) Space requirement is O( |C| 2 |C| ) If the number of groups |C| is not large, the complexity is tractable.

Optimal Group Coloring Determine the best mapping of groups at time t to groups at time t+1 Groups that are mapped across time are part of the same community and have the same color A coloring is good if most individuals can retain their color from step to step. A possible coloring

Bipartite Matching Heuristic Matching Graph –For each pair of groups g, g’ at times t, t’=t+1, add a weighted edge from v g,t to v g’,t’ –Weight = |g ∩ g’|(similarity of g to g’) Find the maximum weight bipartite matching Evaluation –Weights i-cost more than g-cost –Performs well if membership is fairly stable –No long range perspective –More efficient heuristics? i-cost: changing color g-cost: wrong group c-cost: new color

Greedy Heuristics for Group Coloring Approach: Maximize pairwise similarity between groups, for all pairs of groups over all timesteps Jaccard’s index: Jac(g, g′) = | g ∩ g′| | g U g′| Weighted for temporal proximity: JacD(g, g′) = Jac(g, g′) | t - t′ | overlap between g and g′, scaled to size of g and g′

Greedy Heuristics for Group Coloring Greedy Heuristic 1 (time is not a factor) –Construct a square similarity matrix of size |#groups| –Using agglomerative clustering Greedy Heuristic 2 (look backwards in time) For t=1 to T do –Match most similar pairs g, g′ for any time t′ < t –If similarity=0 or all colors have been used, add a new color Greedy Heuristic 3 (look back the shortest interval) –Like Heuristic 2, but use t′, t′ is the closest value to t such that ∃ similarity(g, g′) > 0

Experiment 1: Verify the Framework Does the framework capture the intuitive concept of dynamic community? Procedure –Construct small, synthetic datasets –Use exhaustive search to get a truly optimal coloring

Experiment 1A: “Assembly Line” (A) (      ) =(1,0,1,1)(B)      = (1,0,3,1) At each time step, 1 member leaves and 1 enters a group, resulting in a complete membership change in 3 steps. Results change as costs change. (A) favors stable membership. (B) allows for more fluid membership.

Experiment 1B: “Dutiful Children” 2, 3, and 4 are Children. 0 and 1 are Parents that visit a different child each timestep. Results: Framework succeeds at detecting the individual children as well as the visitation pattern. (A) (      ) =(1,0,1,1)(B)      = (1,0,3,1)

Experiment 2: Quality of Heuristic Results Do the heuristics obtain colorings similar to those of an exhaustive search? Procedure –Re-test the synthetic datasets using the various heuristics Results: At least one Heuristic method obtains the same coloring and total cost as Exhaustive Search

Experiment 3: Real World Datasets Do the framework and heuristics together obtain expected results using real-world datasets?

Experiment 3A: “Southern Women” Eighteen women in 1933 in Natchez, Tennessee Tracks their attendance at 14 social events

Experiment 3A: Prior Results Twenty one analyses (1941 to 2001) all show similar results –Two clear communities –The membership of individuals 8, 9, and 16 is less certain.

Experiment 3A: Results Detects 4 communities, which are subsets of the traditional 2 communities Individuals 6 and 10 change membership over time By adjusting cost factors, the results of most of the 21 prior analyses can be duplicated      =(1,1,1,1)

Experiment 3B: “Grevy’s Zebra” 28-member zebra herd observed 44 times over 3 months in 2002 The graph to the left shows the aggregate interaction. Temporal information is lost.

Experiment 3B: Results Inferred communities agree with manual results obtained by biologists. –4 stable communities –Some short-lived communities and some visiting

Conclusions We present a framework for identifying communities in dynamic social networks The framework produces meaningful results compared to traditional methods Heuristic methods produce near-optimal solutions Future Directions –Develop an approximation algorithm which guarantees the quality of the result –Investigate scalability over network size and time –Relax assumptions about interaction and dynamics

A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee.

Similar presentations

Presentation on theme: "A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee.

Similar presentations

Presentation on theme: "A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee."— Presentation transcript:

Similar presentations

About project

Feedback