Network Lasso: Clustering and Optimization in Large Graphs

Slides:



Advertisements
Similar presentations
Distributed Nuclear Norm Minimization for Matrix Completion
Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Regularization David Kauchak CS 451 – Fall 2013.
SOFT LARGE MARGIN CLASSIFIERS David Kauchak CS 451 – Fall 2013.
Support Vector Machines
Optimization Tutorial
Wangmeng Zuo, Deyu Meng, Lei Zhang, Xiangchu Feng, David Zhang
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Face Recognition & Biometric Systems Support Vector Machines (part 2)
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
GRADIENT PROJECTION FOR SPARSE RECONSTRUCTION: APPLICATION TO COMPRESSED SENSING AND OTHER INVERSE PROBLEMS M´ARIO A. T. FIGUEIREDO ROBERT D. NOWAK STEPHEN.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.
Collaborative Filtering Matrix Factorization Approach
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
1 Unveiling Anomalies in Large-scale Networks via Sparsity and Low Rank Morteza Mardani, Gonzalo Mateos and Georgios Giannakis ECE Department, University.
M ULTIFRAME P OINT C ORRESPONDENCE By Naseem Mahajna & Muhammad Zoabi.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Message-Passing for Wireless Scheduling: an Experimental Study Paolo Giaccone (Politecnico di Torino) Devavrat Shah (MIT) ICCCN 2010 – Zurich August 2.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
BrainStorming 樊艳波 Outline Several papers on icml15 & cvpr15 PALM Information Theory Learning.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Javad Lavaei Department of Electrical Engineering Columbia University Convex Relaxation for Polynomial Optimization: Application to Power Systems and Decentralized.
Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
Logistic Regression & Elastic Net
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
Asynchronous Distributed ADMM for Consensus Optimization Ruiliang Zhang James T. Kwok Department of Computer Science and Engineering, Hong Kong University.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Large Margin classifiers
Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides
Multiplicative updates for L1-regularized regression
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign.
Nonnegative polynomials and applications to learning
Learning with information of features
Walter J. Scheirer, Samuel E. Anthony, Ken Nakayama & David D. Cox
Collaborative Filtering Matrix Factorization Approach
Large Scale Support Vector Machines
Lecture 4: Econometric Foundations
Support vector machines
Katz Centrality (directed graphs).
CS224w: Social and Information Network Analysis
CRISP: Consensus Regularized Selection based Prediction
Graph Neural Networks Amog Kamsetty January 30, 2019.
CS639: Data Management for Data Science
Traveling Salesman Problems Nearest Neighbor Method
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
ADMM and DSO.
Presentation transcript:

Network Lasso: Clustering and Optimization in Large Graphs David Hallac, Jure Leskovec, Stephen Boyd Stanford University Presented by Yu Zhao

What is this paper about Lasso problem The lasso solution is unique when rank(X) = p, because the criterion is strictly convex.

What is this paper about Network lasso problem The variables are , where . (The total number of scalar variables is mp.) Here is the variable at node i, is the cost function at node i, and is the cost function associated with edge .

Outline Convex problem definition Proposed solution(ADMM) Non-convex extension Experiments

Convex problem definition (1) (2)

Convex problem definition A distributed and scalable method was developed for solving the network lasso problem, in which each vertex variable xi is controlled by one “agent”, and the agents exchange (small) messages over the graph to solve the problem iteratively.

Convex problem definition General settings for different applications e.g. Control system: Nodes: possible states xi: actions to take when state i Graph: state transitions Weights: how much we care about the actions in neighboring states differing

Convex problem definition General settings for different applications The sum-of-norms regularization that we use is like group lasso, which encourages not just , for edge , but , consensus across the edge.

Convex problem definition Regularization Path 𝜆=0: simply a minimizer of 𝑓𝑖  local computations 𝜆 →∞ (𝜆 ≥𝜆𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙):

Convex problem definition Network lasso and clustering 𝑙2-norms penalty defines network lasso. Cluster size: 𝜆

Convex problem definition Inference on New Nodes we can interpolate the solution to estimate the value of 𝑥𝑖 on a new node 𝑗.

Proposed solution(ADMM) Alternating Direction Method of Multipliers(ADMM) S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3:1–122, 2011.

Proposed solution(ADMM) ADMM in network lasso 1). Introduce a copy of 𝑥𝑖, called 𝑧𝑖𝑗 , at each edge 𝑖𝑗.

Proposed solution(ADMM) ADMM in network lasso 2). Augmented Lagrangian M. R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4:302–320, 1969.

Proposed solution(ADMM) ADMM in network lasso 3). ADMM updates

Proposed solution(ADMM) Regularization Path compute the regularization path as a function of 𝜆 to gain insight into the network structure Start at 𝜆= 𝜆𝑖𝑛𝑖𝑡𝑖𝑎𝑙 update: 𝜆 ≔ 𝛼𝜆, 𝛼>1 stop:

Non-convex extension replace the group lasso penalty with a monotonically nondecreasing concave function 𝜙(𝑢), where 𝜙 𝑢 =0 , and whose domain is u𝑢 ≥0, ADMM is not guaranteed to converge, and even if it does, it need not be to a global optimum

Non-convex extension Heuristic solution: to keep track of the iteration which yields the minimum objective, and to return that as the solution instead of the most recent step.

Non-convex extension Non-convex z-Update Compared to the convex case, the only difference in the ADMM solution is the z-update, which is now

Experiments Idea: 1. Network-Enhanced Classification We first analyze a synthetic network in which each node has a support vector machine (SVM) classifier, but does not have enough training data to accurately estimate it Idea: “borrow” training examples from their relevant neighbors to improve their own results neighbors with different underlying models has non- zero lasso penalties

Experiments Network: 1. Network-Enhanced Classification Dataset: randomly generate a dataset containing 1000 nodes, each with its own classifier, a support vector machine in R50. Each node tries to predict 𝑦∈ −1, 1 , where Network: The 1000 nodes are split into 20 equally-sized groups. Each group has a common underlying classifier while different groups have independent models.

Experiments 1. Network-Enhanced Classification Objective function:

Experiments 1. Network-Enhanced Classification Results(regularization path):

Experiments 1. Network-Enhanced Classification Results(prediction accuracy):

Experiments 1. Network-Enhanced Classification Results(timing): Convergence comparison between centralized and ADMM methods for SVM problem

Experiments 1. Network-Enhanced Classification Results(timing): Convergence time for large-scale 3-regular graph solved at a single (constant) value of 𝜆

Experiments 2. spatial clustering and regressors Attempt to estimate the price of homes based on latitude/longitude data and a set of features.

Experiments Network: 2. spatial clustering and regressors Dataset: a list of real estate transactions over a oneweek period in May 2008 in the Greater Sacramento area. Network: build the graph by using the latitude/longitude coordinates of each house connect every remaining house to the five nearest homes with an edge weight inversely proportional to the distance between the houses 785 nodes, 2447 edges, and has a diameter of 61.

Experiments 2. spatial clustering and regressors Optimization Parameter and Objective Function: At each nodes, solve for Objective function:

Experiments 2. spatial clustering and regressors Results:

Experiments 2. spatial clustering and regressors Results:

Conclusion The network lasso is a useful way of representing convex optimization problems, and the magnitude of the improvements in the experiments show that this approach is worth exploring further, as there are many potential ideas to build on. The non-convex method gave comparable performance to the convex approach, and we leave for future work the analysis of different non-convex functions 𝜙(𝑢) we could attempt to iteratively reweigh the edge weights to attain some desired outcome Within the ADMM algorithm, there are many ways to improve speed, performance, and robustness

Questions?