Download presentation
Presentation is loading. Please wait.
Published byBailee Goulder Modified over 9 years ago
1
Peter Richtarik Why parallelizing like crazy and being lazy can be good
3
I. Optimization
4
Optimization with Big Data * in a billion dimensional space on a foggy day Extreme* Mountain Climbing =
5
Western General Hospital ( Creutzfeldt-Jakob Disease) Arup (Truss Topology Design) Ministry of Defence dstl lab (Algorithms for Data Simplicity) Royal Observatory (Optimal Planet Growth)
6
Big Data digital images & videos transaction records government records health records defence internet activity (social media, wikipedia,...) scientific measurements (physics, climate models,...) BIG Volume BIG Velocity BIG Variety
7
God’s Algorithm = Teleportation
8
If You Are Not a God... x0x0 x1x1 x2x2 x3x3
9
II. Randomized Coordinate Descent Methods [the cardinal directions of big data optimization]
10
P. R. and M. Takáč Iteration complexity of randomized block coordinate descent methods for minimizing a composite function Mathematical Programming A, 2012 Yu. Nesterov Efficiency of coordinate descent methods on huge-scale optimization problems SIAM J Optimization, 2012
11
Find the minimizer of 2D Optimization Contours of function Goal:
12
Randomized Coordinate Descent in 2D N S E W
13
N S E W 1
14
1 N S E W 2
15
3 N S E W 12
16
3 N S E W 12 4
17
3 N S E W 12 4 5
18
3 N S E W 12 4 56
19
3 N S E W 12 4 56 7
20
3 N S E W 12 4 56 7 8 S O L V E D !
21
1 Billion Rows & 100 Million Variables
22
Bridges are Indeed Optimal!
23
P. R. and M. Takáč Parallel coordinate descent methods for big data optimization ArXiv:1212.0873, 2012 M. Takáč, A. Bijral, P. R. and N. Srebro Mini-batch primal and dual methods for SVMs ICML 2013
24
Failure of Naive Parallelization 1a 1b 0
25
Failure of Naive Parallelization 1a 1b 1 0
26
Failure of Naive Parallelization 1 2b 2a
27
Failure of Naive Parallelization 1 2b 2a 2
28
Failure of Naive Parallelization 2
29
Parallel Coordinate Descent
31
Theory
32
Reality
33
A Problem with Billion Variables
34
P. R. and M. Takáč Distributed coordinate descent methods for big data optimization Manuscript, 2013
36
Distributed Coordinate Descent 1.2 TB LASSO problem solved on the HECToR supercomputer with 2048 cores
37
III. Randomized Lock-Free Methods [optimization as lock breaking]
38
A Lock with 4 Dials Setup: Combination maximizing F opens the lock x = (x 1, x 2, x 3, x 4 )F(x) = F(x 1, x 2, x 3, x 4 ) A function representing the “quality” of a combination Optimization Problem: Find combination maximizing F
39
Optimization Algorithm
40
P. R. and M. Takáč Randomized lock-free gradient methods Manuscript, 2013 F. Niu, B. Recht, C. Re, and S. Wright HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent NIPS, 2011
41
A System of Billion Locks with Shared Dials # dials = n x1x1 x2x2 x3x3 x4x4 xnxn Lock 1) Nodes in the graph correspond to dials 2) Nodes in the graph also correspond to locks: each lock (=node) owns dials connected to it in the graph by an edge = # locks
42
How do we Measure the Quality of a Combination? F : R n R Each lock j has its own quality function F j depending on the dials it owns However, it does NOT open when F j is maximized The system of locks opens when is maximized F = F 1 + F 2 +... + F n
43
1) Randomly select a lock 2) Randomly select a dial belonging to the lock 3) Adjust the value on the selected dial based only on the info corresponding to the selected lock An Algorithm with (too much?) Randomization
44
IDLE Synchronous Parallelization J4 J7 J1 J5 J8 J2 time J6 J9 J3 Processor 1 Processor 2 Processor 3 WASTEFUL
45
Crazy (Lock-Free) Parallelization time J4J5J6J7J8J9J1J2J3 Processor 1 Processor 2 Processor 3 NO WASTE
46
Crazy Parallelization
50
Theoretical Result Average # dials in a lock Average # of dials common to 2 locks # Locks # Processors
51
Computational Insights
56
IV. Final Two Slides
57
Why parallelizing like crazy and being lazy can be good? Randomization Effectivity Tractability Efficiency Scalability (big data) Parallelism Distribution Asynchronicity Parallelization
58
Tools Probability Machine LearningMatrix Theory HPC
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.