Presentation is loading. Please wait.

Presentation is loading. Please wait.

Searching for large cliques in large scale networks WORKSHOP ON CLUSTERING AND SEARCH TECHNIQUES IN LARGE SCALE NETWORKS (3-8 Nov. 2014) This work is partially.

Similar presentations


Presentation on theme: "Searching for large cliques in large scale networks WORKSHOP ON CLUSTERING AND SEARCH TECHNIQUES IN LARGE SCALE NETWORKS (3-8 Nov. 2014) This work is partially."— Presentation transcript:

1 Searching for large cliques in large scale networks WORKSHOP ON CLUSTERING AND SEARCH TECHNIQUES IN LARGE SCALE NETWORKS (3-8 Nov. 2014) This work is partially funded by the Spanish National Government (DPI2010- 21247-C02) and CAR (UPM-CSIC) Pablo San Segundo Carrillo (Associate professor in UPM)

2 Overview 2  Basic concepts related to exact (large) clique search  Enumeration  Pruning scheme: greedy sequential coloring  K-core analysis  An O(|E|) algorithm  Bit string encoding of graphs  BITSCAN / GRAPH C++ libraries  Encoding of sparse graphs  BBMCS: a new maximum clique algorithm for large scale networks  Pseudocode  Results  Summary

3 Basic clique enumeration 3 {2,3,4} 3 2 4 BINOMIAL SEARCH TREE (with repetitions) 3 1 2 4 1 2 4 1 {4} 2 4 {1,4} 24 3 3 4 4

4 Basic pruning scheme: greedy coloring (I) 4 4 3 1 2 5 C1C1 C2C2 C3C3 13 24 5 SEQ: GREEDY COLORING PROCEDURE 1.Define a vertex ordering 2.Color vertices sequentially with the least possible color The size of any feasible coloring C(G) is an upper bound on the size of a maximum clique in G (  (G) ≤ |C(G)| ) Proposition 1 Balas & Yu (1986) How to define a good ordering?

5 Basic pruning scheme: greedy coloring(II) 5 1 Search node at depth level k 4 2 3 1 Is it worth selecting vertex 1 as candidate ? 3 4 2 U’ size of current growing clique size of current champion G[U] Application of color bound Since the current largest clique cannot be improved, vertex 1 is pruned

6 Initial sorting of nodes for maximu clique 6  Absolute  Degenerate  At each step each selected vertex is removed from the original graph and degrees are recomputed Initially vertices should be sorted in non-decreasing degree order Proposition II. How should vertices be sorted initially? 1 2 1 3 1 Absolute0 (1)2 (1)3 (1)4 (2)1 (3) Degenerate0 (1)2 (1)3 (1)1 (1)4 (1)

7 State of the Art (last decade): middle size graphs 7  MCQ : Tomita & Seki 2003  Heuristic decision based on color  MaxClique-Dyn: Konc & Janecic 2007  MCS: Tomita & al. 2011  BBMC: San Segundo & al. 2011  Use of bitstrings  Initial order of vertices fixed  BBMCL: San Segundo & al. 2013  Impact of an initial large clique: Batsyn & al. 2013  MaxSAT: Li & al. 2010, 2013  BBMCX: San Segundo, Batsyn, Nikolaev 2014  Initial sorting improvements: San Segundo, Batsyn, Nikolaev 2014  Vertical coloring: Nikolaev, Bastsyn, San Segundo 2014 REAL GRAPHS

8 8 K-CORE DECOMPOSITION

9 Preliminaries 9 A maximal subgraph such that all its vertices have minimum degree k Definition I: k-core of a graph The largest k-core to which the vertex belongs Definition II: core number K(v) of a vertex k-core decomposition is hierarchical Proposition III 1 1 1 1 1 Degenerate0 (1)2 (1)3 (1)1 (1)4 (1) The core number of a graph +1 is an upper bound for maximum clique (  (G) ≤ K(G)+1) Proposition IV 0-core 1-core 2-core 3-core

10 Quality of core number bounds for clique 10  (G)≤ |C(G)| ≤ K(G)+1 ≤  G +1 Proposition V. There exists an O(|E|) algorithm to compute k-core decomposition Proposition VI Batagelj & Zaversnik 2002 I.Order vertices by degree using bin-sort II.Critical operation: reduce degree of a vertex keeping all vertices sorted by degree Swap the vertex with the first vertex in the same bin and increment the bin pointer by one Sketch of proof bins of deg012 vertices023514 5 023541

11 Pruning with core numbers 11 Given a clique of size  c  any vertex v s.t. K(v) <  c cannot be part of a larger clique so it may be pruned Proposition VII 1 1 1 2 2 1 degree 1 1 1 1 1 1 core numbers any clique of size 2 cuts all vertices Can the coloring of a vertex c(v) be used in the same manner?

12 12 ENCODING OF THE MAXIMUM CLIQUE PROBLEM WITH BITSTRINGS

13 Preliminaries 13  Membership to a set  1-bit : member  0-bit: not a member  Storage of a subset of natural numbers Masks (C-C++) A U BA b | B b A ∩ BA b & B b A – BA b &~ B b (A B)?{B b &~ A b } ≠

14 BITSCAN: a C++ library for bitstrings 14  Inspired by optimization requirements for bit string data structures found during 10 years of research in combinatorial optimization problems.  Implementation of exact algorithms for NP-hard problems related to graphs (maximum clique-BBMC, vertex coloring-PASS etc.)  Some of these requirements  Fast bitscanning loops  Forward and reverse directions  Destructive and non-destructive  Sparsity  Semi-sparsity

15 GRAPH: Graph encoding with BITSCAN 15 0 1 4 3 2 Vertices01234 0x1100 11x110 211x01 3010x1 40011x Adjacency Matrix 0 1 2 bitarray 0 bitarray 2 bitarray 3 bitarray 4 bitarray 1 #include "pablodev/graph/graph.h“ #define NUMBER_OF_VERTICES 5 void main(){ //undirected graph ugraph ug(NUMBER_OF_VERTICES); ug.add_edge(0, 1); ug.add_edge(0, 2); ug.add_edge(1, 2); ug.add_edge(1, 3); ug.add_edge(3, 4); //… } #include "pablodev/graph/graph.h“ #define NUMBER_OF_VERTICES 1000000 void main(){ //undirected graph sparse_ugraph ug(NUMBER_OF_VERTICES); ug.add_edge(0, 1); ug.add_edge(0, 2); ug.add_edge(1, 2); ug.add_edge(1, 3); ug.add_edge(3, 4); //… }

16 Subgraphs and sets of vertices as bitstrings 16  For large scale networks it is CRITICAL to use a sparse bitstring encoding. 4 3 1 2 5 G=(V, E) W={1,2, 4} / G[W]11010 V={1,2, 3, 4, 5} / G11111 U={2, 3, 5} / G[U]01101

17 NEW BBMCS MAXIMUM CLIQUE ALGORITHM FOR LARGE SCALE NETOWRKS

18 The new maximum clique algorithm(I) 18 BBMCS (G=(V, E)) Initial operations: U=V 1. K = core numbers of U // computed in O(|E|) 2. H= initial heuristic clique 3. Remove vertices s.t. K(v)0 6.select vertex u with minimum kcore 7.INIT_BRANCH(U, u) //unrolling of first level 8.remove u from U 9. end-repeat 10. return  (G)

19 The maximum clique algorithm(II) 19  BRANCH is the new implementation of BBMC for sparse graphs INIT_BRANCH(U, u) //unrolling of first level 1. P = N U (u) + u //neighbor set of u (w.r.t. remaining vertices) plus u (a sparse bitstring) 2. if |P|<|H| return //CUT based on size 3. if |COLOR(P)| ≤ H return // a good H possibly solves the graph 4. K p = core numbers of P 5. if K p (P) < |H| return //graph core number cut 6. Remove any vertex v from P s.t. K p (v)<|H| //vertex core number cut 7. L= P sorted by non decreasing K(v) 8. BRANCH (P, L) //BRANCH is the extension of BBMC to the sparse case

20 Experiments 20  PMC algorithm  Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage, Ryan Rossi et al., arXiv.org, 2013  THE state of the art algorithm by far  HW: XEON 20 core, Linux Server, 64GB RAM  Only one core used in all cases  Datasets  http://www.networkrepository.com/

21 Results DIMACS 10(I) 21 categoryname|V||E|∆d avg K(G)+1wowo w DIMACS 10 (massive)hugebubbles-00020211981193179017933.0322 DIMACS 10 (triangular)delaunay_n241677721650331601266.0534 DIMACS 10 (massive)hugetrace-00010120574411808217933.0322 DIMACS 10 (triangular)delaunay_n23838860825165784286.0534 DIMACS 10 (massive)hugetric-0002071227921068077733.0322 DIMACS 10adaptive68157441362432044.0322 DIMACS 10 (massive)hugetric-000106592765988585433.0322 DIMACS 10 (massive)hugetric-000005824554873352333.0322 DIMACS 10channel-500x100x100-b0504802000426813721817.81044 DIMACS 10 (massive)hugetrace-000004588484687913333.0322 DIMACS 10 (triangular)delaunay_n22419430412582869236.0534 DIMACS 10packing-500x100x100-b0502145852174882431816.31044 DIMACS 10 (triangular)delaunay_n2120971526291408236.0534 DIMACS 10 (triangular)delaunay_n2010485763145686236.0534 DIMACS 10 (random geometric)rgg_n_2_20_s0104857668916203613.1181317 DIMACS 10 (triangular)delaunay_n195242881572823216.0534 DIMACS 10auto44869533146113714.81057 DIMACS 10citationCiteseer268495115664713188.6161013 DIMACS 10 (triangular)delaunay_n18262144786396216.0534 DIMACS 10m14b21476516790184015.61077 DIMACS 1014414464910743932614.91067 DIMACS 10fe-ocean14343740959365.7522 DIMACS 10 (triangular)delaunay_n17131072393176176.0534 DIMACS 10598a1109717419342613.4957 DIMACS 10fe_rotor9961766243112513.3945 DIMACS 10fe-tooth781364525913911.6845 DIMACS 10 (triangular)delaunay_n1665536196575176.0534 DIMACS 10wing6203212154443.9433 DIMACS 10fe-body45087163734287.3746 DIMACS 10 (triangular)delaunay_n153276898274186.0534

22 Results DIMACS 10(II) 22 categoryname|V||E|PMCBBMCS%impratio imp DIMACS 10 (massive)hugebubbles-00020211981193179017934.894.1488.148.43 DIMACS 10 (triangular)delaunay_n24167772165033160129.608.8170.243.36 DIMACS 10 (massive)hugetrace-00010120574411808217914.092.482.975.87 DIMACS 10 (triangular)delaunay_n2383886082516578415.084.1972.213.60 DIMACS 10 (massive)hugetric-000207122792106807778.711.3984.036.26 DIMACS 10adaptive6815744136243206.831.3680.095.02 DIMACS 10 (massive)hugetric-00010659276598858548.071.3483.396.02 DIMACS 10 (massive)hugetric-00000582455487335236.121.0682.695.78 DIMACS 10channel-500x100x100-b05048020004268137220.653.8481.405.38 DIMACS 10 (massive)hugetrace-00000458848468791335.000.8882.415.69 DIMACS 10 (triangular)delaunay_n224194304125828697.102.170.413.38 DIMACS 10packing-500x100x100-b0502145852174882438.913.1464.762.84 DIMACS 10 (triangular)delaunay_n21209715262914083.391.0269.913.32 DIMACS 10 (triangular)delaunay_n20104857631456861.620.4771.063.46 DIMACS 10 (random geometric)rgg_n_2_20_s0104857668916200.45<.001 DIMACS 10 (triangular)delaunay_n1952428815728230.630.2363.522.74 DIMACS 10auto44869533146111.580.4670.923.44 DIMACS 10citationCiteseer26849511566470.130.0376.044.17 DIMACS 10 (triangular)delaunay_n182621447863960.300.166.863.02 DIMACS 10m14b21476516790180.710.2170.533.39 DIMACS 1014414464910743930.440.1272.923.69 DIMACS 10fe-ocean1434374095930.150.0473.783.81 DIMACS 10 (triangular)delaunay_n171310723931760.160.0568.733.20 DIMACS 10598a1109717419340.300.1162.972.70 DIMACS 10fe_rotor996176624310.270.163.362.73 DIMACS 10fe-tooth781364525910.180.0666.262.96 DIMACS 10 (triangular)delaunay_n16655361965750.070.0273.213.73 DIMACS 10wing620321215440.060.0184.196.33 DIMACS 10fe-body450871637340.01 8.851.10 DIMACS 10 (triangular)delaunay_n1532768982740.040.0172.773.67

23 Results: Social (I) 23 categoryname|V||E|∆d avg K(G)+1wowo w Social facebooksocfb-A-anon309716523667394491515.3751725 Social facebooksocfb-B-anon293761220959854435614.3641124 Socialsoc-flixster2523386791880114746.3693031 Web graphsweb-wikipedia20091864433450731526244.867631 Socialsoc-pokec1632803223019641485427.3481429 Socialsoc-lastfm1191805451933051507.6711214 Socialsoc-youtube-snap11348902987624287545.3521217 Socialsoc-digg77079959071321764315.32372650 Socialsoc-FourSquare639014321498610621810.1642630 Socialsoc-delicious536108136596132165.1341621 Socialsoc-flickr5139693190452436912.43105358 Socialsoc-youtube4959571936748254097.8501116 Socialsoc-twitter-follows4047197133196263.52936 Socialsoc-gowalla196591950327147309.7521429 Socialsoc-douban1549083271622874.216511 Socialsoc-LiveMocha1041032193083298042.193815 Socialsoc-buzznet10116327630666428954.61542831 Socialsoc-BlogCatalog887842093195944447.22224145 Socialsoc-slashdot70068358647250710.2542526 Social facebooksocfb-OR63392816886109825.8532430 Socialsoc-brightkite5673921294511347.55337 Social facebooksocfb-Penn94415361362220441065.6632944 Social facebooksocfb-Texas84363641590651631287.5824751 Social facebooksocfb-UF351111465654824683.5844755 Social facebooksocfb-UIllinois307951264421463282.1861857 Social facebooksocfb-Indiana297321305757135887.8773748 Socialsoc-epinions265881001204437.5331216 Social facebooksocfb-Wisconsin8723831835946348470.2613437 Social facebooksocfb-Berkeley1322900852419343474.4653342 Social facebooksocfb-UCLA20453747604118073.1664051 Social facebooksocfb-UConn17206604867170970.3664250

24 Results: Social(II) 24 categoryname|V||E|PMCBBMCS%impratio imp Social facebooksocfb-A-anon30971652366739420.4710.3949.251.97 Social facebooksocfb-B-anon29376122095985418.5314.1923.421.31 Socialsoc-flixster252338679188011.970.4577.154.38 Web graphsweb-wikipedia2009186443345073150.750.1777.414.43 Socialsoc-pokec1632803223019649.408.727.241.08 Socialsoc-lastfm119180545193302.640.6973.853.82 Socialsoc-youtube-snap113489029876242.200.2787.718.14 Socialsoc-digg770799590713210.391.8981.815.50 Socialsoc-FourSquare639014321498648.770.3699.26135.47 Socialsoc-delicious53610813659610.180.0477.984.54 Socialsoc-flickr513969319045220.202.3788.278.52 Socialsoc-youtube49595719367481.420.2483.145.93 Socialsoc-twitter-follows4047197133190.290.0679.234.81 Socialsoc-gowalla1965919503270.250.0676.364.23 Socialsoc-douban1549083271620.060.0350.812.03 Socialsoc-LiveMocha10410321930832.991.0066.592.99 Socialsoc-buzznet101163276306615.731.1992.4313.22 Socialsoc-BlogCatalog88784209319512.172.4380.035.01 Socialsoc-slashdot700683586470.080.0187.988.32 Social facebooksocfb-OR633928168860.280.0967.493.08 Socialsoc-brightkite567392129450.030.0160.162.51 Social facebooksocfb-Penn944153613622200.930.3562.262.65 Social facebooksocfb-Texas843636415906511.240.3869.463.27 Social facebooksocfb-UF3511114656541.030.3070.963.44 Social facebooksocfb-UIllinois3079512644210.690.3845.101.82 Social facebooksocfb-Indiana2973213057570.880.3659.252.45 Socialsoc-epinions265881001200.020.0155.922.27 Social facebooksocfb-Wisconsin87238318359460.560.1867.723.10 Social facebooksocfb-Berkeley13229008524190.580.2065.332.88 Social facebooksocfb-UCLA204537476040.400.1465.112.87 Social facebooksocfb-UConn172066048670.210.0957.912.38

25 Results: infrastructure 25 categoryname|V||E|∆d avg K(G)+1wowo w DIMACS 10 (infrastructure)inf-europe_osm5091201854054660132.1434 Infrastructureinf-road-usa239473472885431292.4434 DIMACS 10 (infrastructure)inf-road_usa239473472885431292.4434 DIMACS 10 (infrastructure)inf-road_central140818161693341382.4434 DIMACS 10 (infrastructure)inf-germany_osm1154884512369181132.1433 DIMACS 10 (infrastructure)inf-great-britain_osm7733822815651782.1433 DIMACS 10 (infrastructure)inf-netherlands_osm2216688244123872.2433 DIMACS 10 (infrastructure)inf-belgium_osm14412951549970102.2433 Infrastructureinf-roadNet-PA1087562154151492.8434 DIMACS 10 (infrastructure)inf-luxembourg_osm11459911966662.1323 categoryname|V||E|PMCBBMCS%impratio imp DIMACS 10 (infrastructure)inf-europe_osm5091201854054660ts<.001 Infrastructureinf-road-usa23947347288543127.65<.00199.997650 DIMACS 10 (infrastructure)inf-road_usa23947347288543127.96<.00199.997964 DIMACS 10 (infrastructure)inf-road_central14081816169334135.08<.00199.985077 DIMACS 10 (infrastructure)inf-germany_osm11548845123691812.94<.00199.972944 DIMACS 10 (infrastructure)inf-great-britain_osm773382281565172.01<.00199.952012 DIMACS 10 (infrastructure)inf-netherlands_osm221668824412380.47<.00199.79472 DIMACS 10 (infrastructure)inf-belgium_osm144129515499700.29<.00199.66291 Infrastructureinf-roadNet-PA108756215415140.25<.00199.60250 DIMACS 10 (infrastructure)inf-luxembourg_osm114599119666ts<.001

26 Results: technological(I) 26 categoryname|V||E|∆d avg K(G)+1wowo w DIMACS 10 (technological)venturiLevel34026819805423764.0423 technologicaltech-as-skitter1694616110942093545513.11125667 Scientific computingsc-ldoor952203207708077643.63521 Scientific computingsc-msdoor41586393786507645.13521 Scientific computingsc-pwtk217891565322117951.9361624 DIMACS 10 (technological)tech-caidaRouterLevel19224460906610716.333717 technologicaltech-RL-caida19091460761010716.433717 Scientific computingsc-shipsec517910422000767524.63024 Scientific computingsc-shipsec114038517077596724.3252024 Scientific computingsc-pkustk1394893326096729968.7423036 Scientific computingsc-pkustk1187804256505413158.4482436 technologicaltech-p2p-gnutella62561147878954.7724 DIMACS 10 (technological)t60k600058944033.0322 Scientific computingsc-nasasrb54870131122727547.83624 technologicaltech-internet-as401648512333704.2241416 technologicaltech-as-caida2007264755338126284.0231516 technologicaltech-WHOIS747656943107915.2895758

27 Results: technological(II) 27 categoryname|V||E|PMCBBMCS%impratio imp DIMACS 10 (technological)venturiLevel3402681980542373.522431.0071.613.52 technologicaltech-as-skitter1694616110942091.501940.0994.0116.69 Scientific computingsc-ldoor9522032077080710.2040.9890.4010.41 Scientific computingsc-msdoor41586393786504.929390.4491.0711.20 Scientific computingsc-pwtk21789156532212.481660.291.9412.41 DIMACS 10 (technological)tech-caidaRouterLevel1922446090660.08201310.0363.422.73 technologicaltech-RL-caida1909146076100.08175990.0275.544.09 Scientific computingsc-shipsec517910422000760.1350090.0192.5913.50 Scientific computingsc-shipsec114038517077590.105410.0281.035.27 Scientific computingsc-pkustk139489332609671.831090.1591.8112.21 Scientific computingsc-pkustk118780425650540.5804640.0886.227.26 technologicaltech-p2p-gnutella625611478780.052170.0261.662.61 DIMACS 10 (technological)t60k60005894400.04902890.0179.604.90 Scientific computingsc-nasasrb5487013112270.5653960.0591.1611.31 technologicaltech-internet-as40164851230.015501<.00193.5515.50 technologicaltech-as-caida200726475533810.011528<.00191.3311.53 technologicaltech-WHOIS7476569430.04405<0.00197.7344.05

28 Results: trivially solved during unrolling 28 categoryname|V||E|∆d avg K(G)+1wowo w DIMACS 10 (random geometric)rgg_n_2_23_s08388608635013934015.121 DIMACS 10 (random geometric)rgg_n_2_22_s04194304303591983614.520 Socialsoc-livejournal403313727933062265113.9214 DIMACS 10 (random geometric)rgg_n_2_21_s02097152144879953713.819 temporal reachibilityscc_retweet-crawl1131801240151950.0420 Collaborationca-hollywood-200910691265630665311467105.32209 Collaborationca-coauthors-dblp54048615245729329956.4337 DIMACS 10co-papers-dblp54048615245729329956.4337 DIMACS 10 (random geometric)rgg_n_2_19_s052428832697663012.518 Web graphsweb-it-2004509338717841346928.2432 DIMACS 10co-papers-citeseer43410216036720118873.9845 Collaborationca-MathSciNet3326898206444964.925 Collaborationca-dblp-201231708010498663436.6114 DIMACS 10coAuthorsCiteseer22732081413413727.287 Collaborationca-dblp-20102264137164602386.375 Web graphsweb-arabic-20051635981747269110221.4102 DIMACS 10 (random geometric)rgg_n_2_17_s01310727287532811.115 Web graphsweb-uk-200512963211744049850181.2500 Web graphsweb-sk-20051214223344195905.582 recommendation Netrec-amazon9181312570452.7555 DIMACS 10 (random geometric)rgg_n_2_16_s0655363421272710.414 DIMACS 10 (random geometric)rgg_n_2_15_s032768160240249.813 Collaborationca-CondMat21363912862798.526 Collaborationca-AstroPh1790319697250422.057 Web graphsweb-webbase-2001160622559316793.233 Web graphsweb-BerkStan1230519500593.229 Web graphsweb-indochina-200411358476061998.450 Collaborationca-HepPh1120411761949121.0239 temporal reachibilityscc_infect-dublin1097217557321932.084

29 Summary 29  The main ideas behind finding the largest clique in large scale networks have been described  Coloring and k-core bounds  Initial sorting decision heuristic  Sparse bitstring data structures  A new algorithm BBMCS has been presented and compared with state of the art reference algorithm PMC.  BBCMS has ouperformed PMC clearly in extensive empirical tests

30 Related bibliography 30  Initial sorting of vertices in the maximum clique problem reviewed. Pablo San Segundo, Alvaro Lopez, Mikhail Batsyn. LION 8 Conf. February, Florida, 2014.  Relaxed approximate coloring in exact maximum clique search. Pablo San Segundo, Cristobal Tapia, COR 2014.  An improved bit parallel exact maximum clique algorithm. Pablo San Segundo et. al., OPL, 2011.  A new DSATUR-based algorithm for exact vertex coloring. Pablo San Segundo, COR, 2011.  An exact bit-parallel algorithm for the maximum clique problem. Pablo San Segundo et. al., COR 2011.  Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage, Ryan Rossi et al., arXiv.org, 2013.  Efficient Search Using Bitboard Models. Pablo San Segundo, et al., ICTAI Conf., 2006.


Download ppt "Searching for large cliques in large scale networks WORKSHOP ON CLUSTERING AND SEARCH TECHNIQUES IN LARGE SCALE NETWORKS (3-8 Nov. 2014) This work is partially."

Similar presentations


Ads by Google