Download presentation

Presentation is loading. Please wait.

Published byZander Bigwood Modified about 1 year ago

1
THE COMMUNITY-SEARCH PROBLEM AND HOW TO PLAN A SUCCESSFUL COCKTAIL PARTY Mauro Sozio and Aristides Gionis Presented By: Raghu Rangan, Jialiang Bao, Ge Wang 1

2
Introduction Graphs are one of the most popular data representation Have a wide range of applications Communities and social networks as graphs have gained attention People represented as nodes Connection between people are edges This paper focuses on the query-dependent variant of the community search problem 2

3
Planning a Cocktail Party Participants should be “close” to the organizers (e.g. a friend of a friend). Everybody should know some of the participants. The graph should be connected. The number of participants should not be too small Not too large either This is difficult Alice Bob Charlie David 3

4
Community Search Problem Need to find the community that a given set of users belongs to. Given a graph and a set of nodes, find a densely connected subgraph containing the set of users given in input. 4

5
Related Work Connectivity Subgraphs Work has been done to find a subgraph that connects as set of query nodes Not enough Need to extract best community that query nodes define Community Detection Finding communities in large graphs and social networks Typical approach looks at optimizing modularity measure Problem is most methods consider static community detection problem 5

6
Related Work Team Formation Lappas et. al studied this problem Given a network where nodes are labeled with a set of skills Find subgraph in which all skills are present and communication cost is small A variant of this problem is present for cocktail party planning 6

7
Problem definition Problem 1: Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, find the most dense sub graph H = (V H, E H ) of G, such that: 1. V H contains Q (all query nodes must be included) 2. H is connected 3. f(H) is maximized among all feasible choices of H (the large the better) 7

8
Problem 1: Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, find the most dense sub graph H = (V H, E H ) of G, such that: 1.V H contains Q (all query nodes must be included) 2.H is connected 3.f(H) is maximized among all feasible choices of H (the large the better) What is query node? They are the nodes that form the community. What is goodness function? It is to define the dense degree. Average degree Minimum degree Query node and goodness function? 8

9
Lead to unintuitive result Easy to add unrelated but dense part Why not choose Average degree function? 9

10
Problem 2: Given an undirected(connected) graph G(V,E), a set of query nodes Q, a goodness function f, and a number d as distance, find the most dense sub graph H = (V H, E H ) of G, such that: V H contains Q (all query nodes must be included) H is connected D Q (H) <= d f(H) is maximized among all feasible choices of H (the larger the better) We have distance constraint now. Problem definition 10

11
Greedy algorithm: Steps: 1.Set G 0 = G, 2.Delete the minimum degree node and all its edges, go to 2 Termination condition: Either: At least one of the query nodes Q has minimum degree The Query node Q is no longer connected Maximizing the minimum degree 11

12
Greedy can be implemented in linear time. Idea: 1.Make separate lists of nodes with degree d, for d = 1, …, n 2.When Remove a node u from G, a neighbor of u with degree d will be remove from list d to list d – 1. So total amount of moves is O(m) (m is the edge ) 3.We can locate the min node in O(1) time, so running time is O(n + m) Time complexity? 12

13
Minimum degree function is actually a member of this family of functions. But sometimes we want some other functions to define the node density. Generalization to monotone functions 13

14
Problem 3: Given an undirected(connected) graph G(V,E), a set of query nodes Q, a node monotone function f, and a number d as distance, find the most dense sub graph H = (V H, E H ) of G, such that: V H contains Q (all query nodes must be included) H is connected D Q (H) <= d f(H) is maximized among all feasible choices of H (the larger the better) We have node monotone function now. Problem definition 14

15
Greedy algorithm: Steps: 1.Set G 0 = G, 2.Delete the minimum degree node 3.Delete the node which f(G,V) is minimum, and all its edges, go to 3 Termination condition: Either: At least one of the query nodes Q has the minimum f(G,v) The Query node Q is no longer connected Greedy Gen 15

16
Communities with Size Restriction Drawback of previous algorithm They may return subgraphs with very large size. 16

17
Complexity Formal definition of minimum degree with upper bound on the size An integer k (size constraint) Subgraph H has at most k nodes NP-hard 17

18
Algorithm Two heuristics that can be used to find communities with bounded size Inspired the Greedy algorithm for maximizing the minimum degree GreedyDist, GreedyFast 18

19
Algorithm GreedyDist The tighter the distance constraint is, the smaller communities are 19

20
Algorithm GreedyDist Invoke GreedyGen If the query nodes are connected but the size constraint is not satisfied, re-execute GreedyGen with a tighter distance constraint Repeat until the size constraint is satisfied or the query nodes are disconnected 20

21
Algorithm GreedyFast Preprocess: the input graph is restricted to k’ closest nodes to the query nodes Execute Greedy on the restricted graph The closer a node is to the query nodes, the more related the node is to the query nodes, the more likely it is to belong to their community 21

22
Experiment Evaluation DBLP A coauthorship graph extracted from a recent snapshot of the DBLP database 226K nodes, 1.4M edges Tag A tag graph extracted from the flickr photo-sharing portal 38K nodes, 1.3M edges BIOMINE A graph extracted from the database of the Biomine project 16K nodes, 491K edges 22

23
Quantitative Results BASELINE: a simple and natural baseline algorithm |Q|: the number of query nodes d: distance bound k: size bound l: inter-distance between query nodes 23

24
Quantitative Results 24

25
25

26
Conclusion Aim to find the compact community that contains the given query nodes and it is densely connected Measurement based on constraints Minimum degree Distance Size Heuristics GreedyGen GreedyDist GreedyFast 26

27
Questions? 27

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google