Searching for large cliques in large scale networks WORKSHOP ON CLUSTERING AND SEARCH TECHNIQUES IN LARGE SCALE NETWORKS (3-8 Nov. 2014) This work is partially.

Slides:



Advertisements
Similar presentations
Searching for maximum cliques (implementation details) WORKSHOP ON CLUSTERING AND SEARCH TECHNIQUES IN LARGE SCALE NETWORKS This work is partially funded.
Advertisements

Greedy Algorithms Greed is good. (Some of the time)
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
Minimum Spanning Trees Definition Two properties of MST’s Prim and Kruskal’s Algorithm –Proofs of correctness Boruvka’s algorithm Verifying an MST Randomized.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
16:36MCS - WG20041 On the Maximum Cardinality Search Lower Bound for Treewidth Hans Bodlaender Utrecht University Arie Koster ZIB Berlin.
Balanced Graph Partitioning Konstantin Andreev Harald Räcke.
Recent Development on Elimination Ordering Group 1.
Computational Methods for Management and Economics Carla Gomes
Graph Triangulation by Dmitry Pidan Based on the paper “A sufficiently fast algorithm for finding close to optimal junction tree” by Ann Becker and Dan.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
Michael Bender - SUNY Stony Brook Dana Ron - Tel Aviv University Testing Acyclicity of Directed Graphs in Sublinear Time.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Symmetric Connectivity With Minimum Power Consumption in Radio Networks G. Calinescu (IL-IT) I.I. Mandoiu (UCSD) A. Zelikovsky (GSU)
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Utrecht, february 22, 2002 Applications of Tree Decompositions Stan van Hoesel KE-FdEWB Universiteit Maastricht
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
ECE669 L10: Graph Applications March 2, 2004 ECE 669 Parallel Computer Architecture Lecture 10 Graph Applications.
K-Coloring k-coloring: A k-coloring of a graph G is a labeling f: V(G)  S, where |S|=k. The labels are colors; the vertices of one color form a color.
Part I: Introductory Materials Introduction to Graph Theory Dr. Nagiza F. Samatova Department of Computer Science North Carolina State University and Computer.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Minimum Spanning Trees and Clustering By Swee-Ling Tang April 20, /20/20101.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Fixed Parameter Complexity Algorithms and Networks.
CS 3343: Analysis of Algorithms Lecture 21: Introduction to Graphs.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Exact methods for ALB ALB problem can be considered as a shortest path problem The complete graph need not be developed since one can stop as soon as in.
Approximation Algorithms
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
CSCI 3160 Design and Analysis of Algorithms Chengyu Lin.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Applications of Dynamic Programming and Heuristics to the Traveling Salesman Problem ERIC SALMON & JOSEPH SEWELL.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
1/24 Introduction to Graphs. 2/24 Graph Definition Graph : consists of vertices and edges. Each edge must start and end at a vertex. Graph G = (V, E)
CS270 Project Overview Maximum Planar Subgraph Danyel Fisher Jason Hong Greg Lawrence Jimmy Lin.
Data Structures and Algorithms in Parallel Computing Lecture 2.
1 Approximation algorithms Algorithms and Networks 2015/2016 Hans L. Bodlaender Johan M. M. van Rooij TexPoint fonts used in EMF. Read the TexPoint manual.
Algorithms for hard problems Introduction Juris Viksna, 2015.
Computing Approximate Weighted Matchings in Parallel Fredrik Manne, University of Bergen with Rob Bisseling, Utrecht University Alicia Permell, Michigan.
Applying bitwise operations for solving combinatorial optimization problems Mikhail Batsyn 1, Larisa Komosko 1, Alexey Nikolaev 1, Pablo San Segundo 2.
Lecture. Today Problem set 9 out (due next Thursday) Topics: –Complexity Theory –Optimization versus Decision Problems –P and NP –Efficient Verification.
Graph Representations And Traversals. Graphs Graph : – Set of Vertices (Nodes) – Set of Edges connecting vertices (u, v) : edge connecting Origin: u Destination:
Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
Greedy & Heuristic algorithms in Influence Maximization
CSCE 411 Design and Analysis of Algorithms
Computing Connected Components on Parallel Computers
A Continuous Optimization Approach to the Minimum Bisection Problem
PC trees and Circular One Arrangements
CS120 Graphs.
MST in Log-Star Rounds of Congested Clique
Heuristics Definition – a heuristic is an inexact algorithm that is based on intuitive and plausible arguments which are “likely” to lead to reasonable.
CSE 373 Data Structures and Algorithms
Finding Subgraphs with Maximum Total Density and Limited Overlap
CS 583 Analysis of Algorithms
Exact Inference Continued
Backtracking and Branch-and-Bound
CSE 373: Data Structures and Algorithms
The Greedy Approach Young CS 530 Adv. Algo. Greedy.
Constructing a m-connected k-Dominating Set in Unit Disc Graphs
Presentation transcript:

Searching for large cliques in large scale networks WORKSHOP ON CLUSTERING AND SEARCH TECHNIQUES IN LARGE SCALE NETWORKS (3-8 Nov. 2014) This work is partially funded by the Spanish National Government (DPI C02) and CAR (UPM-CSIC) Pablo San Segundo Carrillo (Associate professor in UPM)

Overview 2  Basic concepts related to exact (large) clique search  Enumeration  Pruning scheme: greedy sequential coloring  K-core analysis  An O(|E|) algorithm  Bit string encoding of graphs  BITSCAN / GRAPH C++ libraries  Encoding of sparse graphs  BBMCS: a new maximum clique algorithm for large scale networks  Pseudocode  Results  Summary

Basic clique enumeration 3 {2,3,4} BINOMIAL SEARCH TREE (with repetitions) {4} 2 4 {1,4}

Basic pruning scheme: greedy coloring (I) C1C1 C2C2 C3C SEQ: GREEDY COLORING PROCEDURE 1.Define a vertex ordering 2.Color vertices sequentially with the least possible color The size of any feasible coloring C(G) is an upper bound on the size of a maximum clique in G (  (G) ≤ |C(G)| ) Proposition 1 Balas & Yu (1986) How to define a good ordering?

Basic pruning scheme: greedy coloring(II) 5 1 Search node at depth level k Is it worth selecting vertex 1 as candidate ? U’ size of current growing clique size of current champion G[U] Application of color bound Since the current largest clique cannot be improved, vertex 1 is pruned

Initial sorting of nodes for maximu clique 6  Absolute  Degenerate  At each step each selected vertex is removed from the original graph and degrees are recomputed Initially vertices should be sorted in non-decreasing degree order Proposition II. How should vertices be sorted initially? Absolute0 (1)2 (1)3 (1)4 (2)1 (3) Degenerate0 (1)2 (1)3 (1)1 (1)4 (1)

State of the Art (last decade): middle size graphs 7  MCQ : Tomita & Seki 2003  Heuristic decision based on color  MaxClique-Dyn: Konc & Janecic 2007  MCS: Tomita & al  BBMC: San Segundo & al  Use of bitstrings  Initial order of vertices fixed  BBMCL: San Segundo & al  Impact of an initial large clique: Batsyn & al  MaxSAT: Li & al. 2010, 2013  BBMCX: San Segundo, Batsyn, Nikolaev 2014  Initial sorting improvements: San Segundo, Batsyn, Nikolaev 2014  Vertical coloring: Nikolaev, Bastsyn, San Segundo 2014 REAL GRAPHS

8 K-CORE DECOMPOSITION

Preliminaries 9 A maximal subgraph such that all its vertices have minimum degree k Definition I: k-core of a graph The largest k-core to which the vertex belongs Definition II: core number K(v) of a vertex k-core decomposition is hierarchical Proposition III Degenerate0 (1)2 (1)3 (1)1 (1)4 (1) The core number of a graph +1 is an upper bound for maximum clique (  (G) ≤ K(G)+1) Proposition IV 0-core 1-core 2-core 3-core

Quality of core number bounds for clique 10  (G)≤ |C(G)| ≤ K(G)+1 ≤  G +1 Proposition V. There exists an O(|E|) algorithm to compute k-core decomposition Proposition VI Batagelj & Zaversnik 2002 I.Order vertices by degree using bin-sort II.Critical operation: reduce degree of a vertex keeping all vertices sorted by degree Swap the vertex with the first vertex in the same bin and increment the bin pointer by one Sketch of proof bins of deg012 vertices

Pruning with core numbers 11 Given a clique of size  c  any vertex v s.t. K(v) <  c cannot be part of a larger clique so it may be pruned Proposition VII degree core numbers any clique of size 2 cuts all vertices Can the coloring of a vertex c(v) be used in the same manner?

12 ENCODING OF THE MAXIMUM CLIQUE PROBLEM WITH BITSTRINGS

Preliminaries 13  Membership to a set  1-bit : member  0-bit: not a member  Storage of a subset of natural numbers Masks (C-C++) A U BA b | B b A ∩ BA b & B b A – BA b &~ B b (A B)?{B b &~ A b } ≠

BITSCAN: a C++ library for bitstrings 14  Inspired by optimization requirements for bit string data structures found during 10 years of research in combinatorial optimization problems.  Implementation of exact algorithms for NP-hard problems related to graphs (maximum clique-BBMC, vertex coloring-PASS etc.)  Some of these requirements  Fast bitscanning loops  Forward and reverse directions  Destructive and non-destructive  Sparsity  Semi-sparsity

GRAPH: Graph encoding with BITSCAN Vertices x x x x x Adjacency Matrix bitarray 0 bitarray 2 bitarray 3 bitarray 4 bitarray 1 #include "pablodev/graph/graph.h“ #define NUMBER_OF_VERTICES 5 void main(){ //undirected graph ugraph ug(NUMBER_OF_VERTICES); ug.add_edge(0, 1); ug.add_edge(0, 2); ug.add_edge(1, 2); ug.add_edge(1, 3); ug.add_edge(3, 4); //… } #include "pablodev/graph/graph.h“ #define NUMBER_OF_VERTICES void main(){ //undirected graph sparse_ugraph ug(NUMBER_OF_VERTICES); ug.add_edge(0, 1); ug.add_edge(0, 2); ug.add_edge(1, 2); ug.add_edge(1, 3); ug.add_edge(3, 4); //… }

Subgraphs and sets of vertices as bitstrings 16  For large scale networks it is CRITICAL to use a sparse bitstring encoding G=(V, E) W={1,2, 4} / G[W]11010 V={1,2, 3, 4, 5} / G11111 U={2, 3, 5} / G[U]01101

NEW BBMCS MAXIMUM CLIQUE ALGORITHM FOR LARGE SCALE NETOWRKS

The new maximum clique algorithm(I) 18 BBMCS (G=(V, E)) Initial operations: U=V 1. K = core numbers of U // computed in O(|E|) 2. H= initial heuristic clique 3. Remove vertices s.t. K(v)<H // a good H possibly solves the graph 4. Sort vertices in G by non decreasing K(v) // typical degeneracy order 5. repeat while |U|>0 6.select vertex u with minimum kcore 7.INIT_BRANCH(U, u) //unrolling of first level 8.remove u from U 9. end-repeat 10. return  (G)

The maximum clique algorithm(II) 19  BRANCH is the new implementation of BBMC for sparse graphs INIT_BRANCH(U, u) //unrolling of first level 1. P = N U (u) + u //neighbor set of u (w.r.t. remaining vertices) plus u (a sparse bitstring) 2. if |P|<|H| return //CUT based on size 3. if |COLOR(P)| ≤ H return // a good H possibly solves the graph 4. K p = core numbers of P 5. if K p (P) < |H| return //graph core number cut 6. Remove any vertex v from P s.t. K p (v)<|H| //vertex core number cut 7. L= P sorted by non decreasing K(v) 8. BRANCH (P, L) //BRANCH is the extension of BBMC to the sparse case

Experiments 20  PMC algorithm  Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage, Ryan Rossi et al., arXiv.org, 2013  THE state of the art algorithm by far  HW: XEON 20 core, Linux Server, 64GB RAM  Only one core used in all cases  Datasets 

Results DIMACS 10(I) 21 categoryname|V||E|∆d avg K(G)+1wowo w DIMACS 10 (massive)hugebubbles DIMACS 10 (triangular)delaunay_n DIMACS 10 (massive)hugetrace DIMACS 10 (triangular)delaunay_n DIMACS 10 (massive)hugetric DIMACS 10adaptive DIMACS 10 (massive)hugetric DIMACS 10 (massive)hugetric DIMACS 10channel-500x100x100-b DIMACS 10 (massive)hugetrace DIMACS 10 (triangular)delaunay_n DIMACS 10packing-500x100x100-b DIMACS 10 (triangular)delaunay_n DIMACS 10 (triangular)delaunay_n DIMACS 10 (random geometric)rgg_n_2_20_s DIMACS 10 (triangular)delaunay_n DIMACS 10auto DIMACS 10citationCiteseer DIMACS 10 (triangular)delaunay_n DIMACS 10m14b DIMACS DIMACS 10fe-ocean DIMACS 10 (triangular)delaunay_n DIMACS 10598a DIMACS 10fe_rotor DIMACS 10fe-tooth DIMACS 10 (triangular)delaunay_n DIMACS 10wing DIMACS 10fe-body DIMACS 10 (triangular)delaunay_n

Results DIMACS 10(II) 22 categoryname|V||E|PMCBBMCS%impratio imp DIMACS 10 (massive)hugebubbles DIMACS 10 (triangular)delaunay_n DIMACS 10 (massive)hugetrace DIMACS 10 (triangular)delaunay_n DIMACS 10 (massive)hugetric DIMACS 10adaptive DIMACS 10 (massive)hugetric DIMACS 10 (massive)hugetric DIMACS 10channel-500x100x100-b DIMACS 10 (massive)hugetrace DIMACS 10 (triangular)delaunay_n DIMACS 10packing-500x100x100-b DIMACS 10 (triangular)delaunay_n DIMACS 10 (triangular)delaunay_n DIMACS 10 (random geometric)rgg_n_2_20_s <.001 DIMACS 10 (triangular)delaunay_n DIMACS 10auto DIMACS 10citationCiteseer DIMACS 10 (triangular)delaunay_n DIMACS 10m14b DIMACS DIMACS 10fe-ocean DIMACS 10 (triangular)delaunay_n DIMACS 10598a DIMACS 10fe_rotor DIMACS 10fe-tooth DIMACS 10 (triangular)delaunay_n DIMACS 10wing DIMACS 10fe-body DIMACS 10 (triangular)delaunay_n

Results: Social (I) 23 categoryname|V||E|∆d avg K(G)+1wowo w Social facebooksocfb-A-anon Social facebooksocfb-B-anon Socialsoc-flixster Web graphsweb-wikipedia Socialsoc-pokec Socialsoc-lastfm Socialsoc-youtube-snap Socialsoc-digg Socialsoc-FourSquare Socialsoc-delicious Socialsoc-flickr Socialsoc-youtube Socialsoc-twitter-follows Socialsoc-gowalla Socialsoc-douban Socialsoc-LiveMocha Socialsoc-buzznet Socialsoc-BlogCatalog Socialsoc-slashdot Social facebooksocfb-OR Socialsoc-brightkite Social facebooksocfb-Penn Social facebooksocfb-Texas Social facebooksocfb-UF Social facebooksocfb-UIllinois Social facebooksocfb-Indiana Socialsoc-epinions Social facebooksocfb-Wisconsin Social facebooksocfb-Berkeley Social facebooksocfb-UCLA Social facebooksocfb-UConn

Results: Social(II) 24 categoryname|V||E|PMCBBMCS%impratio imp Social facebooksocfb-A-anon Social facebooksocfb-B-anon Socialsoc-flixster Web graphsweb-wikipedia Socialsoc-pokec Socialsoc-lastfm Socialsoc-youtube-snap Socialsoc-digg Socialsoc-FourSquare Socialsoc-delicious Socialsoc-flickr Socialsoc-youtube Socialsoc-twitter-follows Socialsoc-gowalla Socialsoc-douban Socialsoc-LiveMocha Socialsoc-buzznet Socialsoc-BlogCatalog Socialsoc-slashdot Social facebooksocfb-OR Socialsoc-brightkite Social facebooksocfb-Penn Social facebooksocfb-Texas Social facebooksocfb-UF Social facebooksocfb-UIllinois Social facebooksocfb-Indiana Socialsoc-epinions Social facebooksocfb-Wisconsin Social facebooksocfb-Berkeley Social facebooksocfb-UCLA Social facebooksocfb-UConn

Results: infrastructure 25 categoryname|V||E|∆d avg K(G)+1wowo w DIMACS 10 (infrastructure)inf-europe_osm Infrastructureinf-road-usa DIMACS 10 (infrastructure)inf-road_usa DIMACS 10 (infrastructure)inf-road_central DIMACS 10 (infrastructure)inf-germany_osm DIMACS 10 (infrastructure)inf-great-britain_osm DIMACS 10 (infrastructure)inf-netherlands_osm DIMACS 10 (infrastructure)inf-belgium_osm Infrastructureinf-roadNet-PA DIMACS 10 (infrastructure)inf-luxembourg_osm categoryname|V||E|PMCBBMCS%impratio imp DIMACS 10 (infrastructure)inf-europe_osm ts<.001 Infrastructureinf-road-usa < DIMACS 10 (infrastructure)inf-road_usa < DIMACS 10 (infrastructure)inf-road_central < DIMACS 10 (infrastructure)inf-germany_osm < DIMACS 10 (infrastructure)inf-great-britain_osm < DIMACS 10 (infrastructure)inf-netherlands_osm < DIMACS 10 (infrastructure)inf-belgium_osm < Infrastructureinf-roadNet-PA < DIMACS 10 (infrastructure)inf-luxembourg_osm ts<.001

Results: technological(I) 26 categoryname|V||E|∆d avg K(G)+1wowo w DIMACS 10 (technological)venturiLevel technologicaltech-as-skitter Scientific computingsc-ldoor Scientific computingsc-msdoor Scientific computingsc-pwtk DIMACS 10 (technological)tech-caidaRouterLevel technologicaltech-RL-caida Scientific computingsc-shipsec Scientific computingsc-shipsec Scientific computingsc-pkustk Scientific computingsc-pkustk technologicaltech-p2p-gnutella DIMACS 10 (technological)t60k Scientific computingsc-nasasrb technologicaltech-internet-as technologicaltech-as-caida technologicaltech-WHOIS

Results: technological(II) 27 categoryname|V||E|PMCBBMCS%impratio imp DIMACS 10 (technological)venturiLevel technologicaltech-as-skitter Scientific computingsc-ldoor Scientific computingsc-msdoor Scientific computingsc-pwtk DIMACS 10 (technological)tech-caidaRouterLevel technologicaltech-RL-caida Scientific computingsc-shipsec Scientific computingsc-shipsec Scientific computingsc-pkustk Scientific computingsc-pkustk technologicaltech-p2p-gnutella DIMACS 10 (technological)t60k Scientific computingsc-nasasrb technologicaltech-internet-as < technologicaltech-as-caida < technologicaltech-WHOIS <

Results: trivially solved during unrolling 28 categoryname|V||E|∆d avg K(G)+1wowo w DIMACS 10 (random geometric)rgg_n_2_23_s DIMACS 10 (random geometric)rgg_n_2_22_s Socialsoc-livejournal DIMACS 10 (random geometric)rgg_n_2_21_s temporal reachibilityscc_retweet-crawl Collaborationca-hollywood Collaborationca-coauthors-dblp DIMACS 10co-papers-dblp DIMACS 10 (random geometric)rgg_n_2_19_s Web graphsweb-it DIMACS 10co-papers-citeseer Collaborationca-MathSciNet Collaborationca-dblp DIMACS 10coAuthorsCiteseer Collaborationca-dblp Web graphsweb-arabic DIMACS 10 (random geometric)rgg_n_2_17_s Web graphsweb-uk Web graphsweb-sk recommendation Netrec-amazon DIMACS 10 (random geometric)rgg_n_2_16_s DIMACS 10 (random geometric)rgg_n_2_15_s Collaborationca-CondMat Collaborationca-AstroPh Web graphsweb-webbase Web graphsweb-BerkStan Web graphsweb-indochina Collaborationca-HepPh temporal reachibilityscc_infect-dublin

Summary 29  The main ideas behind finding the largest clique in large scale networks have been described  Coloring and k-core bounds  Initial sorting decision heuristic  Sparse bitstring data structures  A new algorithm BBMCS has been presented and compared with state of the art reference algorithm PMC.  BBCMS has ouperformed PMC clearly in extensive empirical tests

Related bibliography 30  Initial sorting of vertices in the maximum clique problem reviewed. Pablo San Segundo, Alvaro Lopez, Mikhail Batsyn. LION 8 Conf. February, Florida,  Relaxed approximate coloring in exact maximum clique search. Pablo San Segundo, Cristobal Tapia, COR  An improved bit parallel exact maximum clique algorithm. Pablo San Segundo et. al., OPL,  A new DSATUR-based algorithm for exact vertex coloring. Pablo San Segundo, COR,  An exact bit-parallel algorithm for the maximum clique problem. Pablo San Segundo et. al., COR  Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage, Ryan Rossi et al., arXiv.org,  Efficient Search Using Bitboard Models. Pablo San Segundo, et al., ICTAI Conf., 2006.