Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡

Similar presentations


Presentation on theme: "1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡"— Presentation transcript:

1 1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡

2 Motivation Generate synthetic topologies that resemble real ones – development and testing of algorithms, protocols – anonymization Obtaining full topology is not always possible – data owner not willing to release it – exhaustive measurement impractical due to size 2

3 Our methodology 3 Node Samples Synthetic Graph Real Graph ( unknown ) Crawl using UIS, WIS, RW Estimated Graph Properties Estimate Generate

4 Which graph properties? dk-series framework [Mahadevan et al, Sigcomm ’06] – 0K specifies the average node degree – 1K specifies the node degree distribution – 2K specifies the joint node degree distribution (JDD) – 3K specifies the distribution of subgraphs of 3 nodes –.. – nK specifies the entire graph Accuracy vs practicality 4 2.5K

5 2.5K-Graphs Joint Node Degree Distribution (2K): Degree-dependent Average Clustering Coefficient Distribution: 5 all triangles using node u all possible triangles 1a 2a 4a 3b 3a 1b 4b k all nodes of degree k c 3a = 2 3

6 Related Work Network Modeling – Exponential Random Graph Models [Handcock et al, ‘10] – Kronecker graphs [Kim & Leskovec, ‘11] – dk-Series [Mahadevan et al, Sigcomm ’06] Graph Generation – Simple 1K graphs [Molloy & Reed ’95] – 1K + [Bansal et. al. ’09] – 1K + Triangle Sequence [Newman ’09] – 1K + [Serrano & Boguna, ’05] – 2K [Krioukov et. al. ‘06, Stanton & Pinar ’11] – 3K [Krioukov et. al. ’06] 6 2.5K

7 7 Node Samples Synthetic Graph Real Graph (unknown) Crawl using UIS, WIS, RW Estimated c(k) & JDD Estimate Generate

8 Estimation Node Samples 8 UIS – Uniform Independence Sample WIS – Weighted Independence Sample sampling probability w(v) proportional to node degree RW – Random Walk Sample

9 Estimation c(k) 9 all triangles using node a all possible triangles UIS WIS - all nodes of degree k - neighbors of node a - # shared partners for nodes (a,b) - all sampled nodes of degree k - number of samples for node b - sampling weight of node a when ((RW)

10 Estimation JDD 10 UIS WIS - all nodes of degree k - all edges - all sampled nodes of degree k - sampling weight of node a when (RW)

11 Estimation Random Walks RW Traversed Edges Induced Edges Induced Edges with safety margin M=1 Hybrid estimator combines Traversed Edges + Induced Edges with margin M with safety margin M=0 = WIS

12 12 Node Samples Real Graph (unknown) Crawl using UIS, WIS, RW Estimated c(k) & JDD Generate Estimate Synthetic Graph

13 Node Samples 2.5K Graph Real Graph (unknown) Crawl using UIS, WIS, RW Estimate Target c(k) Target JDD simple graph exact JDD many triangles 2K Construction Smart Double Edge Swaps Generate

14 2K Construction (1) Step K k k# K

15 2K Construction (2) Step k# vk v 1a 1b 2a 3a 3b 4a 4b 1a1a 2a2a 4a4a 3b3b 3a3a 1b1b 4b4b 2K 1K

16 2K Construction (1) Step a 2a 4a 3b 3a 1b 4b v k v r v 1a 1b 2a 3a 3b 4a 4b a 2a b 4b 90 1b 4a 70 3a k

17 2K Construction (2) Step a 2a 4a 3b 3a 1b 4b v k v r 1a 1b 2a 3a 3b 4a 4b (3b,4b) (1a,4a) (1a,2a) a 2a b 4b 90 1b 4a 70 3a k (1b,2a) (1b,4b) (2a,4b) (2a,3b) (1b,3a) (2a,4a) (1a,3b) (1b,3b) (4a,4b) (3b,4a)(1a,4b) (2a,3a) (3a,3b) (3a,4b) (3a,4a) (1a,1b)(1b,4a) (1a,3a)

18 2K Construction Step a 2a 4a 3b 3a 1b 4b v k v r 1a 1b 2a 3a 3b 4a 4b k b 4a 3a 3b k=3 k=4 Degree pair (4,3) Construction stuck. Iterate over all unsaturated degree pairs Iterate over all unsaturated node pairs for the degree pair (4,3) Move 1: Add edge (4b, 3a) Move 2: Add edge (4b,3*) or (3a,4*) Move 3: Add edge (4*, 3*)

19 Node Samples 2.5K Graph Real Graph (unknown) Crawl using UIS, WIS, RW Estimate Target c(k) Target JDD simple graph exact JDD many triangles 2K Construction Smart Double Edge Swaps Generate

20 “Smart” double edge swaps 2K-preserving c(k)-targeting 20 degree k 1 Target Destroy triangles Create triangles k1 k2 k1 k3 Double edge swap [ Mahadevan et. al., Sigcomm ‘06 ] How we select edges for double edge swaps – Triangle creation » select edges with low number of shared partners – Triangle destruction » select random edges Current at end of 2K JDD(k1,k2) JDD(k1,k3) unchanged

21 21 Node Samples Real Graph (unknown) Crawl using UIS, WIS, RW Generate Estimate Evaluation 2.5K Graph Estimated c(k) & JDD

22 Evaluation Total Number of Triangles Average Clustering Coefficient Number of Nodes Number of Edges Average Degree

23 Evaluation Efficiency of Estimators (RW) 23 Error Metric : Hybrid combines advantages of IE & TE Facebook New Orleans

24 Evaluation Speed of 2.5K Graph Generation 24 our 2.5K prior 2.5K Higher gain for graphs with more triangles

25 Evaluation From Sampling to Generation 25 Facebook New Orleans

26 Conclusion Complete and practical methodology to generate synthetic graphs from node samples – novel estimators that measure c(k) and JDD – novel and fast 2.5K generator Python implementation – 26 Node Samples Synthetic Graph Real Graph (unknown) Crawl using UIS, WIS, RW c(k) & JDD Estimate Generate


Download ppt "1 2.5K-Graphs: from Sampling to Generation Minas Gjoka, Maciej Kurant ‡, Athina Markopoulou UC Irvine, ETZH ‡"

Similar presentations


Ads by Google