Peter Richtarik School of Mathematics Optimization with Big Data * in a billion dimensional space on a foggy day Extreme* Mountain Climbing =

Slides:



Advertisements
Similar presentations
Peter Richtárik Parallel coordinate Simons Institute for the Theory of Computing, Berkeley Parallel and Distributed Algorithms for Inference and Optimization,
Advertisements

Semi-Stochastic Gradient Descent Peter Richtárik ANC/DTC Seminar, School of Informatics, University of Edinburgh Edinburgh - November 4, 2014.
Accelerated, Parallel and PROXimal coordinate descent Moscow February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )
Peter Richtarik Why parallelizing like crazy and being lazy can be good.
Large Scale Computing Systems
Accelerated, Parallel and PROXimal coordinate descent IPAM February 2014 APPROX Peter Richtárik (Joint work with Olivier Fercoq - arXiv: )
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Kick-off Meeting, July 28, 2008 ONR MURI: NexGeNetSci Distributed Coordination, Consensus, and Coverage in Networked Dynamic Systems Ali Jadbabaie Electrical.
R and HDInsight in Microsoft Azure
Semi-Stochastic Gradient Descent Methods Jakub Konečný University of Edinburgh BASP Frontiers Workshop January 28, 2014.
Peter Richtárik Distributed Coordinate Descent For Big Data Optimization Numerical Algorithms and Intelligent Software - Edinburgh – December 5,
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
Optimization Tutorial
Peter Richtarik Operational Research and Optimization Extreme* Mountain Climbing * in a billion dimensional space on a foggy day.
Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.
Peter Richtárik Parallel coordinate NIPS 2013, Lake Tahoe descent methods.
Distributed Optimization with Arbitrary Local Solvers
Support Vector Machines (and Kernel Methods in general)
Zheng Qu University of Edinburgh Optimization & Big Data Workshop Edinburgh, 6 th to 8 th May, 2015 Randomized dual coordinate ascent with arbitrary sampling.
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:
What Great Research ?s Can RAMP Help Answer? What Are RAMP’s Grand Challenges ?
Support Vector Machine (SVM) Classification
Optimization Methods One-Dimensional Unconstrained Optimization
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Lecture 10: Support Vector Machines
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization Tyler B. Johnson and Carlos Guestrin University of Washington.
Optimization Methods One-Dimensional Unconstrained Optimization
Peter Richtárik (joint work with Martin Takáč) Distributed Coordinate Descent Method AmpLab All Hands Meeting - Berkeley - October 29, 2013.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Introduction to Neural Networks. Neural Networks in the Brain Human brain “computes” in an entirely different way from conventional digital computers.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
CSE543T: Algorithms for Nonlinear Optimization Yixin Chen Department of Computer Science & Engineering Washington University in St Louis Spring, 2011.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
WOOD 492 MODELLING FOR DECISION SUPPORT Lecture 24 Simulation.
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
A Fault-Tolerant Environment for Large-Scale Query Processing Mehmet Can Kurt Gagan Agrawal Department of Computer Science and Engineering The Ohio State.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Cloud Computing & Big Data Group 9 Femme L H Sabaru | Aditya Gisheila N P | Aninda Harapan | Harry | Andrew Khosugih.
Monte Carlo Analysis of Uncertain Digital Circuits Houssain Kettani, Ph.D. Department of Computer Science Jackson State University Jackson, MS
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 1 Introduction EE692 Parallel and Distribution Computation | Prof. Song.
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Data Analytics (CS40003) Introduction to Data Lecture #1
Accelerated B.S./M.S An approved Accelerated BS/MS program allows an undergraduate student to take up to 6 graduate level credits as an undergraduate.
Big Data - Efficient SW Processing
Deep Feedforward Networks
Sathya Ronak Alisha Zach Devin Josh
Large-scale Machine Learning
Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides
Cross layer design is wireless multi-hop network
Machine Learning Basics
Kernels Usman Roshan.
Learn about MATLAB Engineers – not sales!
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
EE 4xx: Computer Architecture and Performance Programming
Structural Optimization Design ( Structural Analysis & Optimization )
Usman Roshan CS 675 Machine Learning
Artificial Neural Networks
Communication Networks
Presentation transcript:

Peter Richtarik School of Mathematics Optimization with Big Data * in a billion dimensional space on a foggy day Extreme* Mountain Climbing =

BIG DATA digital images & videos transaction records government records health records defence internet activity (social media, wikipedia,...) scientific measurements (physics, climate models,...) BIG Volume BIG Velocity BIG Variety Sources BIG Volume BIG Velocity BIG Variety

Western General Hospital ( Creutzfeldt-Jakob Disease) Arup (Truss Topology Design) Ministry of Defence dstl lab (Algorithms for Data Simplicity) Royal Observatory (Optimal Planet Growth)

GOD’S Algorithm = Teleportation

If you are not a God... x0x0 x1x1 x2x2 x3x3

Optimization as Lock Breaking Setup: Combination maximizing F opens the lock x = (x 1, x 2, x 3, x 4 )F(x) = F(x 1, x 2, x 3, x 4 ) A number representing the “quality” of a combination Optimization Problem: Find combination maximizing F

Optimization Algorithm

How to Open a Lock with Billion Interconnected Dials? F : R n R # variables/dials = n = 10 9 x1x1 x2x2 Assumption: F = F 1 + F F n F j depends on the neighbours of x j only x3x3 x4x4 Example: F 1 depends on x 1, x 2, x 3 and x 4 F 2 depends on x 1 and x 2,... xnxn

Optimization Methods Computing Architectures Multicore CPUs GP GPU accelerators Clusters / Clouds Effectivity Efficiency Scalability Parallelism Distribution Asynchronicity Randomization

Optimization Methods for Big Data Randomized Coordinate Descent –P. R. and M. Takac: Parallel coordinate descent methods for big data optimization, ArXiv: [can solve a problem with 1 billion variables in 2 hours using 24 processors] Stochastic (Sub) Gradient Descent –P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions [can be applied to optimize an unknown function] Both of the above M. Takac, A. Bijral, P. R. and N. Srebro: Mini-batch primal and dual methods for SVMs, ArXiv:1302.xxxx

Theory vs Reality

Parallel Coordinate Descent

TOOLS Probability Machine LearningMatrix Theory HPC