Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation J. Gramm, J. Guo, F. Hüffner and R. Niedermeier Theory of Computing Systems.

Slides:



Advertisements
Similar presentations
1 Bart Jansen Polynomial Kernels for Hard Problems on Disk Graphs Accepted for presentation at SWAT 2010.
Advertisements

Edge-connectivity and super edge-connectivity of P 2 -path graphs Camino Balbuena, Daniela Ferrero Discrete Mathematics 269 (2003) 13 – 20.
Introduction to Kernel Lower Bounds Daniel Lokshtanov.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
NP-Completeness: Reductions
Approximative Kernelization: On the Trade-off between Fidelity and Kernel Size joint with Michael Fellows and Frances Rosamond Charles Darwin University.
Department of Computer Science & Engineering
Fixed Parameter Complexity Algorithms and Networks.
Combinatorial Algorithms
Fast FAST By Noga Alon, Daniel Lokshtanov And Saket Saurabh Presentation by Gil Einziger.
Course Presentation: A Polynomial Kernel for Multicut in Trees, STACS 2009 Authors: Nicolas Bousquet, Jean Daligault, Stephan Thomasse, Anders Yeo Speaker:
Approximation Algorithms
16:36MCS - WG20041 On the Maximum Cardinality Search Lower Bound for Treewidth Hans Bodlaender Utrecht University Arie Koster ZIB Berlin.
HCS Clustering Algorithm
An Efficient Fixed Parameter Algorithm for 3-Hitting Set
Minimum-Cost Spanning Tree weighted connected undirected graph spanning tree cost of spanning tree is sum of edge costs find spanning tree that has minimum.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 11 Instructor: Paul Beame.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
1 Refined Search Tree Technique for Dominating Set on Planar Graphs Jochen Alber, Hongbing Fan, Michael R. Fellows, Henning Fernau, Rolf Niedermeier, Fran.
Data reduction lower bounds: Problems without polynomial kernels Hans L. Bodlaender Joint work with Downey, Fellows, Hermelin, Thomasse, Yeo.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
Fixed Parameter Complexity Algorithms and Networks.
Fixed Parameter Complexity Algorithms and Networks.
Simple and Improved Parameterized Algorithms for Multiterminal Cuts Mingyu Xiao The Chinese University of Hong Kong Hong Kong SAR, CHINA CSR 2008 Presentation,
Kernel Bounds for Structural Parameterizations of Pathwidth Bart M. P. Jansen Joint work with Hans L. Bodlaender & Stefan Kratsch July 6th 2012, SWAT 2012,
Lecture 22 More NPC problems
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
1 Bart Jansen Vertex Cover Kernelization Revisited: Upper and Lower Bounds for a Refined Parameter STACS 2011, Dortmund March 10 th, 2011 Joint work with.
1 Bart Jansen Independent Set Kernelization for a Refined Parameter: Upper and Lower bounds TACO Day, Utrecht January 12 th, 2011 Joint work with Hans.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
1 Steiner Tree Algorithms and Networks 2014/2015 Hans L. Bodlaender Johan M. M. van Rooij.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
1 Bart Jansen Independent Set Kernelization for a Refined Parameter: Upper and Lower bounds ALGORITMe Staff Colloquium, Utrecht September 10 th, 2010 Joint.
Spanning Trees CSIT 402 Data Structures II 1. 2 Two Algorithms Prim: (build tree incrementally) – Pick lower cost edge connected to known (incomplete)
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
The Dominating Set and its Parametric Dual  the Dominated Set  Lan Lin prepared for theory group meeting on June 11, 2003.
NP-Complete problems.
Data Reduction for Graph Coloring Problems Bart M. P. Jansen Joint work with Stefan Kratsch August 22 nd 2011, Oslo.
1 22c:31 Algorithms Minimum-cost Spanning Tree (MST)
Exponential time algorithms Algorithms and networks.
Algorithms for hard problems Parameterized complexity – definitions, sample algorithms Juris Viksna, 2015.
Chapter 13 Backtracking Introduction The 3-coloring problem
Algorithms for hard problems Introduction Juris Viksna, 2015.
Introduction to NP Instructor: Neelima Gupta 1.
Kernel Bounds for Path and Cycle Problems Bart M. P. Jansen Joint work with Hans L. Bodlaender & Stefan Kratsch September 8 th 2011, Saarbrucken.
P & NP.
Kernelization: The basics
Joint work with Hans Bodlaender
Fixed Parameter Complexity
Computing Connected Components on Parallel Computers
Algorithms for hard problems
Fixed parameter tractability II: Kernelization
NP-Completeness Yin Tat Lee
Graph Algorithms Using Depth First Search
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
Planarity Testing.
CSE 421: Introduction to Algorithms
Constrained Bipartite Vertex Cover: The Easy Kernel is Essentially Tight Bart M. P. Jansen June 4th, WORKER 2015, Nordfjordeid, Norway.
Parameterised Complexity
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
Bart Jansen Polynomial Kernels for Hard Problems on Disk Graphs
CSE 373 Data Structures and Algorithms
The Power of Preprocessing: Gems in Kernelization
Bart Jansen Polynomial Kernels for Hard Problems on Disk Graphs
On the effect of randomness on planted 3-coloring models
Miniconference on the Mathematics of Computation
Graphs and Algorithms (2MMD30)
NP-Completeness Yin Tat Lee
Presentation transcript:

Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation J. Gramm, J. Guo, F. Hüffner and R. Niedermeier Theory of Computing Systems (2005) Student: Vishal Kapoor

Presentation Outline Problem Introduction Past Research Results of the paper CLUSTER EDITING –Kernelization –Search Tree CLUSTER DELETION Questions

Problem Statement Make k changes to the edge set of an input graph to get vertex disjoint cliques. Each connected component is a clique in the resulting cluster graph CLUSTER EDITING –Both edge additions and deletions are allowed CLUSTER DELETION –Only edge deletions are allowed Used in clustering of data – vertices are adjacent iff their similarity exceeds a threshold

Past Research [2000] Study of both these problems started by Shamir et. al. who proved that they are NPC and APX-hard [1996] Cai studied the problem of edge additions and deletions and vertex deletions for certain graphs and showed it is FPT [2001] Natanzon et. al. gave a general c-approximation for deletion and editing problems on bounded degree graphs for graphs with certain properties [2002] Khot and Raman investigated the complexity of vertex deletion problems to find subgraphs with hereditary properties

Results of this paper CLUSTER EDITING – O(2.27 k +|V| 3 ) CLUSTER DELETION – O(1.77 k +|V| 3 ) By using certain reduction rules, the resulting kernel size = O(k 3 ) –Has at most 2k vertices and 2k 3 +k 2 edges.

u v common neighbor non-common neighbor CLUSTER EDITING

Reduction Rules Rule1: a.If u and v have more than k common neighbors then {u,v} is set to ADDED and added to E if not already there b.If u and v have more than k non-common neighbors then {u,v} is set to DELETED and deleted from E if already there c.If u and v have both more than k common neighbors and more than k non-common neighbors then the instance has no solution

Reduction Rules Rule2: For every 3 vertices u, v and w: a.If {u,v} = ADDED and {u,w} = ADDED then {v,w} should be set to ADDED and added if not already in E b.If {u,v} = ADDED and {u,w} = DELETED then {v,w} should be set to DELETED and deleted from E if already present

Running Time What is checked? –Every pair of vertices Every vertex which is a neighbor of both of them Takes time O(|V| 3 )

Kernel Size The kernel contains at most (2k+1).k vertices and at most (2k+1 choose 2).k edges. Proof Skipped

Branch and Search Algorithm Identify a bad triple (of 3 vertices) in the kernel and repair it by adding/deleting edges to/from it, to transform the graph into disjoint cliques Overall at most k edge additions/deletions are allowed 2 branching strategies: –Basic = O(3 k ) –Advanced = O(2.27 k )

Lemma: A graph consists of disjoint cliques iff there are no three vertices u,v,w such that {u,v}, {u,w} are edges, but {v,w} is not an edge i.e. among such a triple, there should either be a single edge or a triangle Thus if a graph is not a union of disjoint cliques, then a bad triple can be found and repaired Basic Branching vw u

Basic Branch Algorithm 1.If G is a union of disjoint cliques, return SUCCESS 2.If k <= 0, return FAIL 3.Otherwise, find 3 vertices u,v,w such that edges {u,v}, {u,w} exist and {v,w} does not and branch on 3 instances of G’ as follows: a.E’ = E – {u,v}, k’=k-1 and set {u,v}=DELETED b.E’ = E – {u,w}, k’=k-1 and set {u,w} and {v,w}=DELETED, {u,v}=ADDED c.E’ = E + {v,w}, k’=k-1 and set all edges=ADDED

Branching Rules vw u vw u vw u vw u ? ? BR3 BR2 BR1

Running time The algorithm solves CLUSTER EDITING in time = O(3 k.k 2 +|V| 3 ) 1.O(|V| 3 ) is the time required to find all bad triples 2.O(3 k ) is the size of the search tree 3.The kernel (modified input G’) has |V| = O(k 2 ) vertices. So a newly added/deleted edge can create/delete at most O(k 2 ) bad triples. [And the edge list can then be updated only for vertices affected by that edge in O(k 2 ) time.]

Eg. NOTE: The time can be improved to O(3 k +|V| 3 ) by using repeated kernelization at every search tree node whenever possible for a polynomial size problem kernel Similarly CLUSTER-DELETION can be solved in time = O(2 k +|V| 3 )

Advanced Branch Algorithm 1.Bad triples are considered, but their classification is refined further as follows: v w u v w u v w u C1 C2 C3

Branching for each case For C1: BR3 cannot give a solution better than both BR1 and BR2 and can be omitted If N(v) >= N(w), then total edges changed to make 1 clique >= total edges changed to make 2 cliques u2 v2 w2 v1 w1 u1 v w u C1

Edges added to make 1 clique = –{v,w} added = +1 –{v,N(w)} added – {u,v} existing = N(v) – 1 –{w,N(v)} added – {u,w} existing = N(w) – 1 –joining all N(w) and N(v) = ([N(w)+N(v)] choose 2) –joining each N(v) and N(w) with u = N(v)+N(w) –Total = 2.[N(v) + N(w)] + ([N(w)+N(v)] choose 2) – 1 =>(A) Edges changed to make 2 cliques = –N(w) deleted = N(w) –{v,N(w)} added – {u,v} existing = N(v) – 1 –joining all N(w) and N(v) = ([N(w)+N(v)] choose 2) –joining each N(v) and N(w) with u = N(v)+N(w) –Total = N(v) + 3.N(w) + ([N(w)+N(v)] choose 2) – 1 =>(B) Conclusion: As N(v) >= N(w) So (A) >= (B). u2 v2 w2 v1 w1 u1 v w u C1

Thus only BR1 and BR2 can be used: So resulting graphs = G\{u,v} or G\{u,w} and branching vector = (1,1) And final recurrence relation: T(k) = 2.T(k-1) with root = 2. So final tree size for C1 = 2 k. vw u vw u ? ? BR2 BR1

For C2: Branching Vector = (1,2,3,2,3)

For C3: Branching Vector = (1,2,3,2,3)

Overall Running Time Solve T(k) = T(k-1) + 2 [T(k-2) + T(k-3)] So final worst search tree size = O(2.27 k ) Thus CLUSTER-EDITING can be solved in O(2.27 k +|V| 3 )

Cases for CLUSTER-DELETION: Branching Vector = (2,3,2,3) and running time = O(1.77 k + |V| 3 )

Questions? Thanks.