Clustering and greedy algorithms — Part 2 Prof. Noah Snavely CS1114

Slides:



Advertisements
Similar presentations
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Advertisements

1 Discrete Structures & Algorithms Graphs and Trees: III EECE 320.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
10.4 Spanning Trees. Def Def: Let G be a simple graph. A spanning tree of G is a subgraph of G that is a tree containing every vertex of G See handout.
Chapter 3 The Greedy Method 3.
Great Theoretical Ideas in Computer Science.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
1 Spanning Trees Lecture 20 CS2110 – Spring
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
1 Greedy Algorithms. 2 2 A short list of categories Algorithm types we will consider include: Simple recursive algorithms Backtracking algorithms Divide.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
CSE 421 Algorithms Richard Anderson Dijkstra’s algorithm.
Approximation Algorithms Lecture for CS 302. What is a NP problem? Given an instance of the problem, V, and a ‘certificate’, C, we can verify V is in.
1 Vertex Cover Problem Given a graph G=(V, E), find V' ⊆ V such that for each edge (u, v) ∈ E at least one of u and v belongs to V’ and |V’| is minimized.
CS 1114: Graphs and Blobs Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
Blobs and Graphs Prof. Noah Snavely CS1114
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 10 Instructor: Paul Beame.
Clustering and greedy algorithms Prof. Noah Snavely CS1114
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
Minimum Spanning Trees What is a MST (Minimum Spanning Tree) and how to find it with Prim’s algorithm and Kruskal’s algorithm.
Lecture 2 Geometric Algorithms. A B C D E F G H I J K L M N O P Sedgewick Sample Points.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
Lecture 23. Greedy Algorithms
Algorithms for Network Optimization Problems This handout: Minimum Spanning Tree Problem Approximation Algorithms Traveling Salesman Problem.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
SPANNING TREES Lecture 21 CS2110 – Spring
UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.
Dijkstra’s Algorithm. Announcements Assignment #2 Due Tonight Exams Graded Assignment #3 Posted.
Prims’ spanning tree algorithm Given: connected graph (V, E) (sets of vertices and edges) V1= {an arbitrary node of V}; E1= {}; //inv: (V1, E1) is a tree,
Image segmentation Prof. Noah Snavely CS1114
Optimization revisited Prof. Noah Snavely CS1114
Lecture 19 Greedy Algorithms Minimum Spanning Tree Problem.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Hard problems in computer science Prof. Noah Snavely CS1114
Minimum Spanning Trees CS 146 Prof. Sin-Min Lee Regina Wang.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Chapter 9 Finding the Optimum 9.1 Finding the Best Tree.
SPANNING TREES Lecture 20 CS2110 – Fall Spanning Trees  Definitions  Minimum spanning trees  3 greedy algorithms (incl. Kruskal’s & Prim’s)
SPANNING TREES Lecture 21 CS2110 – Fall Nate Foster is out of town. NO 3-4pm office hours today!
Breadth-first and depth-first traversal Prof. Noah Snavely CS1114
Breadth-first and depth-first traversal CS1114
Optimization and least squares Prof. Noah Snavely CS1114
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 11.
Graphs. Introduction Graphs are a collection of vertices and edges Graphs are a collection of vertices and edges The solid circles are the vertices A,
Blobs and Graphs Prof. Noah Snavely CS1114
Greedy Algorithms.
Spanning Trees Lecture 21 CS2110 – Fall 2016.
Greedy Algorithms / Minimum Spanning Tree Yin Tat Lee
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
Minimum Spanning Tree.
Data Structures – Stacks and Queus
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
Advanced Analysis of Algorithms
Autumn 2015 Lecture 10 Minimum Spanning Trees
Graph Searching.
Greedy clustering methods
Richard Anderson Lecture 10 Minimum Spanning Trees
Clustering.
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Spanning Trees Lecture 20 CS2110 – Spring 2015.
Approximation Algorithms
Prims’ spanning tree algorithm
Minimum Spanning Trees
More Graphs Lecture 19 CS2110 – Fall 2009.
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

Clustering and greedy algorithms — Part 2 Prof. Noah Snavely CS1114

Administrivia  Prelim 3 on Thursday –Will be comprehensive, but focused on Markov chains and clustering –Review session Wednesday at 7pm, Upson 315 –Steve will proctor the exam 2

Administrivia  Final projects –Due May 8 th – we’ll have sign ups for demo sessions (tentative times: 2-6pm) –There will be prizes for the best demos!  Next year: –We hope to offer CS1114 again in SP10 –Possibly will start a fall version as well –We want you for the course staff! 3

Life after CS1114? 4 Computer Science major

Reversing a Markov chain?  What is the probability that a lecture will be followed by a prelim?  What is the probability that a prelim was preceded by a lecture? –Why isn’t the answer 0.2? –Why isn’t the answer 1/3? –The real answer is ~0.556 or 55.6% 5

Reversing a Markov chain?  PQLLQLLPLQQLQLLQLQQPLLQLLLLLLQPQL PLLLQLPQLLLLPPLQLLLQPLLQLPLLQPPLLQ QPLPLPLPLPLLLPLLLPQPLPQQLQLQLLQQP  The reason why a lecture is more likely to precede a prelim is that lectures are overall more frequent than prelims or quizzes  You’ll learn more in a course on probability / statistics (c.f. Bayes’ Rule) 6

7 Clustering Figure from Johan Everts

One approach: k-means  Suppose we are given n points, and want to find k clusters  We will find k cluster centers (or means), and assign each of the n points to the nearest cluster center –A cluster is a subset of the n points, called –We’ll call each cluster center a mean 8

k-means  How do we define the best k means and clusters? 9 Legend - centers (means) - clusters

Optimizing k-means  The bad news: this is not a convex optimization  The worse news: it is practically impossible to find the global minimum of this objective function –no one has ever come up with an algorithm that is faster than exponential time (and probably never will)  There are many problems like this (called NP-hard) 10

11 Greedy algorithms  Many CS problems can be solved by repeatedly doing whatever seems best at the moment –I.e., without needing a long-term plan  These are called greedy algorithms  Example: gradient descent for convex function minimization  Example: sorting by swapping out-of-order pairs (e.g., bubble sort)  Example: making change (with US currency)

12 A greedy method for k-means  Pick a random point to start with, this is your first cluster mean  Find the farthest point from the cluster center, this is a new cluster mean  Find the farthest point from any cluster mean and add it  Repeat until we have k means  Assign each point to its closest mean

A greedy method for k-means 13

A greedy method for k-means  Unfortunately, this doesn’t work that well in general  The answer we get could be much worse than the optimum 14

The k-centers problem  Let’s look at a related problem: k-centers  Find k cluster centers that minimize the maximum distance between any point and its nearest center –We want the worst point in the worst cluster to still be good (i.e., close to its center) –Concrete example: place k hospitals in a city so as to minimize the maximum distance from a hospital to a house 15

k-centers 16  What objective function does this correspond to?  We can use the same greedy algorithm

17 An amazing property  This algorithm gives you a solution that is no worse than twice the optimum  (k-centers is still NP-hard, just like k-means)  Such results are sometimes difficult to achieve, and the subject of much research –Mostly in CS6810, a bit in CS4820 –You can’t find the optimum, yet you can prove something about it!  Sometimes related problems (e.g. k-means vs. k-centers) have very different guarantees

Detour into graphs  We can also associate a weight with each edge (e.g., the distance between cities) 18 Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg

Spanning trees  A spanning tree of a graph is a subgraph that (a) connects all the vertices and (b) is a tree 19 Spanning trees Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg Q: How many edges are there in a spanning tree on n vertices?

Graph costs  We’ll say the cost of a graph is the sum of its edge weights 20 Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg Cost = = 1950 Cost = = 2000

Minimum spanning trees  We define the minimum spanning tree (MST) of a graph as the spanning tree with minimum cost  (Suppose we want to build the minimum length of track possible while still connecting all the cities.) 21 Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg MST: Cost = 1750 (Eurorail needs to build 1750 mi of track at minimum)

Minimum spanning trees  How do we find the minimum spanning tree?  Can you think of a greedy algorithm to do this? 22 Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg

Minimum spanning tree  Greedy algorithm: 23 Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg

Minimum spanning tree  This greedy algorithm is called Kruskal’s algorithm  Not that simple to prove that it gives the MST  How many connected components are there after adding the k th edge? 24 Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg

Back to clustering  We can define the clustering problem on graphs

Clustering using graphs  Clustering  breaking apart the graph by cutting long edges  Which edges do we break?

Spacing as a clustering metric  Another objective function for clustering: –Maximize the minimum distance between clusters –(Called the spacing.) 27 spacing

Cool fact  We compute the clusters with the maximum spacing during MST!  To compute the best k clusters, just stop MST construction k-1 edges early 28 Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg clusters with max spacing (=400) 400

Proof of cool fact  Suppose this wasn’t true – then someone could give us a different clustering with a bigger spacing  Let C * be our MST clustering, and let C be the purportedly better one  There must be two nodes u and v in different clusters in C but in the same cluster in C *  There’s a path between u and v in C *, and at some point this path crosses a cluster boundary in C 29

Proof of cool fact  Let this boundary-crossing edge be called e  We know that weight(e) ≤ the next edge we would add to the MST (why?)  weight(e) ≤ spacing of C *  spacing of C ≤ spacing of C *  So C wasn’t really better after all… 30

Pictorial proof 31 Paris Berlin London Rome Frankfurt Vienna Prague Naples Warsaw Hamburg C 400

Conclusions  Greedy algorithms work sometimes (e.g., with MST)  Some clustering objective functions are easier to optimize than others: –k-means  very hard –k-centers  very hard, but we can use a greedy algorithm to get within a factor of two of the best answer –maximum spacing  very easy! Just do MST and stop early 32

Next time  Make sure to fill in course evals online (2 pts)  Review session tomorrow, 7pm, Upson 315  Prelim on Thursday 33