Joint work with Chandrashekhar Nagarajan (Yahoo!)

Slides:



Advertisements
Similar presentations
A Dependent LP-Rounding Approach for the k-Median Problem Moses Charikar 1 Shi Li 1 1 Department of Computer Science Princeton University ICALP 2012, Warwick,
Advertisements

Lindsey Bleimes Charlie Garrod Adam Meyerson
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Incremental Clustering Previous clustering algorithms worked in “batch” mode: processed all points at essentially the same time. Some IR applications cluster.
Centrality of Trees for Capacitated k-Center Hyung-Chan An École Polytechnique Fédérale de Lausanne July 29, 2013 Joint work with Aditya Bhaskara & Ola.
Primal-Dual Algorithms for Connected Facility Location Chaitanya SwamyAmit Kumar Cornell University.
The Randomization Repertoire Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopThe Randomization Repertoire1.
Robust hierarchical k- center clustering Ilya Razenshteyn (MIT) Silvio Lattanzi (Google), Stefano Leonardi (Sapienza University of Rome) and Vahab Mirrokni.
Clustering Geometric Data Streams Jiří Skála Ivana Kolingerová ZČU/FAV/KIV2007.
Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, /20/2013.
A Constant Factor Approximation Algorithm for the Multicommodity Rent-or-Buy Problem Amit Kumar Anupam Gupta Tim Roughgarden Bell Labs CMU Cornell joint.
Approximation Algorithms
Wroclaw University, Sept 18, Approximation via Doubling (Part II) Marek Chrobak University of California, Riverside Joint work with Claire Kenyon-Mathieu.
Krakow, Jan. 9, Outline: 1. Online bidding 2. Cow-path 3. Incremental medians (size approximation) 4. Incremental medians (cost approximation) 5.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Optimization problems INSTANCE FEASIBLE SOLUTIONS COST.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Randomized Competitive Analysis for Two Server Problems Wolfgang Bein Kazuo Iwama Jun Kawahara.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
Facility Location with Nonuniform Hard Capacities Tom Wexler Martin Pál Éva Tardos Cornell University ( A 9-Approximation)
Primal-Dual Algorithms for Connected Facility Location Chaitanya SwamyAmit Kumar Cornell University.
Facility Location with Nonuniform Hard Capacities Martin Pál Éva Tardos Tom Wexler Cornell University.
Facility Location using Linear Programming Duality Yinyu Ye Department if Management Science and Engineering Stanford University.
Performance guarantees for hierarchical clustering Sanjoy Dasgupta University of California, San Diego Philip Long Genomics Institute of Singapore.
ESA 2003M. Pál. Universal Facility Location1 Universal Facility Location Mohammad Mahdian MIT Martin Pál Cornell University.
1 Krakow, Jan. 9, 2008 Approximation via Doubling Marek Chrobak University of California, Riverside Joint work with Claire Kenyon-Mathieu.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Integrated Logistics PROBE Princeton University, 10/31-11/1.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Sampling-based Approximation Algorithms for Multi-stage Stochastic Optimization Chaitanya Swamy Caltech and U. Waterloo Joint work with David Shmoys Cornell.
LP-based Algorithms for Capacitated Facility Location Chaitanya Swamy Joint work with Retsef Levi and David Shmoys Cornell University.
Approximation Algorithms for Stochastic Combinatorial Optimization Part I: Multistage problems Anupam Gupta Carnegie Mellon University.
LP-Based Algorithms for Capacitated Facility Location Hyung-Chan An EPFL July 29, 2013 Joint work with Mohit Singh and Ola Svensson.
Computational Geometry Piyush Kumar (Lecture 5: Linear Programming) Welcome to CIS5930.
1 By: MOSES CHARIKAR, CHANDRA CHEKURI, TOMAS FEDER, AND RAJEEV MOTWANI Presented By: Sarah Hegab.
A Polynomial Time Approximation Scheme For Timing Constrained Minimum Cost Layer Assignment Shiyan Hu*, Zhuo Li**, Charles J. Alpert** *Dept of Electrical.
Primal-Dual Algorithms for Connected Facility Location Chaitanya SwamyAmit Kumar Cornell University.
Approximating Soft- Capacitated Facility Location Problem Mohammad Mahdian, MIT Yinyu Ye, Stanford Jiawei Zhang, Stanford.
Packing Rectangles into Bins Nikhil Bansal (CMU) Joint with Maxim Sviridenko (IBM)
LP-Based Algorithms for Capacitated Facility Location Hyung-Chan An EPFL July 29, 2013 Joint work with Mohit Singh and Ola Svensson.
Approximation Algorithms for Stochastic Optimization Chaitanya Swamy Caltech and U. Waterloo Joint work with David Shmoys Cornell University.
Approximation Algorithms: Éva Tardos Cornell University problems, techniques, and their use in game theory.
CLUSTERABILITY A THEORETICAL STUDY Margareta Ackerman Joint work with Shai Ben-David.
Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.
1 Combinatorial Algorithms Local Search. A local search algorithm starts with an arbitrary feasible solution to the problem, and then check if some small,
Another story on Multi-commodity Flows and its “dual” Network Monitoring Rohit Khandekar IBM Watson Joint work with Baruch Awerbuch JHU TexPoint fonts.
1 Wroclaw University, Sept 18, 2007 Approximation via Doubling Marek Chrobak University of California, Riverside Joint work with Claire Kenyon-Mathieu.
“Fault Tolerant Clustering Revisited” - - CCCG 2013 Nirman Kumar, Benjamin Raichel خوشه بندی مقاوم در برابر خرابی سپیده آقاملائی.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
The Effectiveness of Lloyd-type Methods for the k-means Problem Chaitanya Swamy University of Waterloo Joint work with Rafi Ostrovsky, Yuval Rabani, Leonard.
Heuristics for Minimum Brauer Chain Problem Fatih Gelgi Melih Onus.
Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
PRIMAL-DUAL APPROXIMATION ALGORITHMS FOR METRIC FACILITY LOCATION AND K-MEDIAN PROBLEMS K. Jain V. Vazirani Journal of the ACM, 2001.
Facility Location with Service Installation Costs Chaitanya Swamy Joint work with David Shmoys and Retsef Levi Cornell University.
Approximation Algorithms Duality My T. UF.
Clustering Data Streams A presentation by George Toderici.
Approximation Algorithms based on linear programming.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Clustering – Definition and Basic Algorithms Seminar on Geometric Approximation Algorithms, spring 11/12.
Clustering Data Streams
Data Driven Resource Allocation for Distributed Learning
Stream-based Geometric Algorithms
Haim Kaplan and Uri Zwick
Finding Subgraphs with Maximum Total Density and Limited Overlap
Fault-Tolerant Facility Location
Optimization Problems Online with Random Demands
Fault Tolerant Facility Location
Guess Free Maximization of Submodular and Linear Sums
Presentation transcript:

Joint work with Chandrashekhar Nagarajan (Yahoo!) An experimental evaluation of incremental and hierarchical k-median algorithms David P. Williamson (Cornell University, on sabbatical at TU Berlin) Joint work with Chandrashekhar Nagarajan (Yahoo!) 10th International Symposium on Experimental Algorithms SEA 2011 May 6, 2011

The k-median problem Input: A set of points (P) in a metric space and an integer k. Let |P| = n, cij be the distance from i to j. Output: A set S of k points to “open” minimizing the sum of distances of each point to the nearest open point. k = 3 open

The k-median problem NP-hard, so work done on approximation algorithms An -approximation algorithm is a polynomial time algorithm for finding a solution S such that where OPTk is cost of optimal solution on k points. Factor Method Charikar, Guha, Shmoys, Tardos (1999) 6.66 LP rounding Jain, Vazirani (1999) 6 Primal-dual algorithm with Lagrangean relaxation Jain, Mahdian, Markakis, Saberi, Vazirani (2003) 4 Greedy algorithm with Lagrangean relaxation Arya, Garg, Khandekar, Meyerson, Munagala, Pandit (2001) 3 + ε Local search

The incremental k-median problem Goal: find a sequence of all the points such that opening the first k in the sequence is a near-optimal solution to the k-median problem, for every k.

The incremental k-median problem Input: Same as the k-median problem without the integer k Output: An ordering of points i1,i2,…,in Competitive ratio = Goal: Find an ordering with minimum competitive ratio. Competitive ratio Note Mettu and Plaxton (2003) 29.86 Complicated greedy algorithm Chrobak, Kenyon, Noga, Young (2008) Lin, Nagarajan, Rajaraman, W (2010) 24 +  Uses k-median α-approx alg as a black box (factor is 8α) 16 Uses Langrangean multiplier preserving facility location algorithm

Incremental k-median algorithm Use an -approximation algorithm for the k-median problem to get solutions for k = 1 to n. Bucket these solutions according to their costs into buckets of form (2l,2l+1] for integer l. Pick the maximum cost solutions from each bucket to obtain S1,S2,…,Sr. 2l+1 2l S1 S2 Sr Sr-1 Cost decreases

Algorithm (cont’d) S1 S2 Sr Sr-1 Recursively combine these solutions using a nesting routine Vr = Sr Nr-1 N1 N2 Nr-2 Obtain an ordering maintaining this nesting N1 N2 N3 Nr

Hierarchical clustering Collection of k-clusterings for all values of k (k-1)-clustering is formed by merging two clusters in k-clustering.

Hierarchical clustering with cluster centers Hierarchical clustering with a center for each cluster in the clustering Merged clusters center should be one of the two original centers

Hierarchical median problem Output: Hierarchical clustering with cluster centers. Clustering cost = sum of the distances of points to their cluster center. Competitive ratio = Objective: Minimize the competitive ratio. Competitive ratio Note Plaxton (2006) 238.9 Uses incremental k-median β-competitive alg as a black box (factor is 8β) Lin, Nagarajan, Rajaraman, W (2010) 62.13 Uses k-median α-approx alg as a black box (factor is 20.71α) 48 Uses Langrangean multiplier preserving facility location algorithm

Experimental Results Comparisons The k-median datasets k-median algorithms Incremental k-median algorithms Hierarchical k-median algorithms The k-median datasets OR library datasets (40) [Beasley 1985] Galvao datasets (2) [Galvao and ReVelle 1996] Alberta dataset [Alp, Erkut, Drezner 2003] We compare the quality of solutions and the running times of these algorithms on the datasets

The k-median algorithms We compare the following k-median algorithms on the datasets CPLEX LP optimum CPLEX IP optimum Single swap local search algorithm (=5) Arya, Garg, Khandekar, Meyerson, Munagala, Pandit [2004] Greedy facility location algorithm (=2) Jain, Mahdian, Markakis, Saberi, Vazirani [2003] LP rounding algorithm (=8) Charikar, Guha, Tardos, Shmoys [1999]

Quality of the k-median solutions

Quality of k-median solutions

Running times of k-median algorithms

Incremental k-median algorithms We compare the following incremental k-median algorithms Our incremental k-median algorithm using Arya et al.’s local search k-median solutions 5-approximation algorithm → Competitive ratio = 40 using Jain et al.’s greedy facility location algorithm 2-approximation algorithm → Competitive ratio = 16 using Charikar et al.’s LP rounding solutions 8-approximation algorithm → Competitive ration = 64 Mettu and Plaxton’s incremental k-median algorithm (Competitive Ratio = 29.86)

Quality of incremental k-median solutions

Quality of incremental k-median solutions

Running times of Incremental k-median algorithms

Hierarchical k-median algorithms We compare the following hierarchical k-median algorithms Our hierarchical k-median algorithm using Arya et al.’s local search k-median solutions using Jain et al.’s bounded envelope using Charikar et al.’s LP rounding solutions Plaxton’s hierarchical k-median algorithm using Mettu and Plaxton’s incremental k-median solutions using our incremental k-median solutions obtained by using Arya et al.’s k-median solutions

Quality of hierarchical k-median solutions

Quality of hierarchical k-median solutions

Summary of experimental results Charikar et al.’s LP rounding k-median algorithm works faster Arya et al.’s local search k-median algorithm gives better solutions Our incremental algorithms better in terms of quality than Mettu and Plaxton’s algorithm Our hierarchical algorithms are better in terms of quality than Plaxton’s algorithm, even when Plaxton is given our incremental solutions However the running times of our incremental algorithms are much worse than Mettu and Plaxton’s algorithm Main takeaway: despite the strong constraints of coming up with incremental and hierarchical solutions, these algorithms deliver quite good solutions in practice.

Open questions Implementations are slowed because we need calculate approximate k-medians for all values of k, but we really only need the solution of at most cost 2l for all values of l. Can this be computed faster than by doing binary search? Or maybe another primitive could find a solution per bucket? How do these algorithms compare with standard hierarchical clustering algorithms that have no provable competitive ratio?

Thanks. Any questions?