Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peter Richtarik Why parallelizing like crazy and being lazy can be good.

Similar presentations


Presentation on theme: "Peter Richtarik Why parallelizing like crazy and being lazy can be good."— Presentation transcript:

1 Peter Richtarik Why parallelizing like crazy and being lazy can be good

2

3 I. Optimization

4 Optimization with Big Data * in a billion dimensional space on a foggy day Extreme* Mountain Climbing =

5 Western General Hospital ( Creutzfeldt-Jakob Disease) Arup (Truss Topology Design) Ministry of Defence dstl lab (Algorithms for Data Simplicity) Royal Observatory (Optimal Planet Growth)

6 Big Data digital images & videos transaction records government records health records defence internet activity (social media, wikipedia,...) scientific measurements (physics, climate models,...) BIG Volume BIG Velocity BIG Variety

7 God’s Algorithm = Teleportation

8 If You Are Not a God... x0x0 x1x1 x2x2 x3x3

9 II. Randomized Coordinate Descent Methods [the cardinal directions of big data optimization]

10 P. R. and M. Takáč Iteration complexity of randomized block coordinate descent methods for minimizing a composite function Mathematical Programming A, 2012 Yu. Nesterov Efficiency of coordinate descent methods on huge-scale optimization problems SIAM J Optimization, 2012

11 Find the minimizer of 2D Optimization Contours of function Goal:

12 Randomized Coordinate Descent in 2D N S E W

13 N S E W 1

14 1 N S E W 2

15 3 N S E W 12

16 3 N S E W 12 4

17 3 N S E W 12 4 5

18 3 N S E W 12 4 56

19 3 N S E W 12 4 56 7

20 3 N S E W 12 4 56 7 8 S O L V E D !

21 1 Billion Rows & 100 Million Variables

22 Bridges are Indeed Optimal!

23 P. R. and M. Takáč Parallel coordinate descent methods for big data optimization ArXiv:1212.0873, 2012 M. Takáč, A. Bijral, P. R. and N. Srebro Mini-batch primal and dual methods for SVMs ICML 2013

24 Failure of Naive Parallelization 1a 1b 0

25 Failure of Naive Parallelization 1a 1b 1 0

26 Failure of Naive Parallelization 1 2b 2a

27 Failure of Naive Parallelization 1 2b 2a 2

28 Failure of Naive Parallelization 2

29 Parallel Coordinate Descent

30

31 Theory

32 Reality

33 A Problem with Billion Variables

34 P. R. and M. Takáč Distributed coordinate descent methods for big data optimization Manuscript, 2013

35

36 Distributed Coordinate Descent 1.2 TB LASSO problem solved on the HECToR supercomputer with 2048 cores

37 III. Randomized Lock-Free Methods [optimization as lock breaking]

38 A Lock with 4 Dials Setup: Combination maximizing F opens the lock x = (x 1, x 2, x 3, x 4 )F(x) = F(x 1, x 2, x 3, x 4 ) A function representing the “quality” of a combination Optimization Problem: Find combination maximizing F

39 Optimization Algorithm

40 P. R. and M. Takáč Randomized lock-free gradient methods Manuscript, 2013 F. Niu, B. Recht, C. Re, and S. Wright HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent NIPS, 2011

41 A System of Billion Locks with Shared Dials # dials = n x1x1 x2x2 x3x3 x4x4 xnxn Lock 1) Nodes in the graph correspond to dials 2) Nodes in the graph also correspond to locks: each lock (=node) owns dials connected to it in the graph by an edge = # locks

42 How do we Measure the Quality of a Combination? F : R n R Each lock j has its own quality function F j depending on the dials it owns However, it does NOT open when F j is maximized The system of locks opens when is maximized F = F 1 + F 2 +... + F n

43 1) Randomly select a lock 2) Randomly select a dial belonging to the lock 3) Adjust the value on the selected dial based only on the info corresponding to the selected lock An Algorithm with (too much?) Randomization

44 IDLE Synchronous Parallelization J4 J7 J1 J5 J8 J2 time J6 J9 J3 Processor 1 Processor 2 Processor 3 WASTEFUL

45 Crazy (Lock-Free) Parallelization time J4J5J6J7J8J9J1J2J3 Processor 1 Processor 2 Processor 3 NO WASTE

46 Crazy Parallelization

47

48

49

50 Theoretical Result Average # dials in a lock Average # of dials common to 2 locks # Locks # Processors

51 Computational Insights

52

53

54

55

56 IV. Final Two Slides

57 Why parallelizing like crazy and being lazy can be good? Randomization Effectivity Tractability Efficiency Scalability (big data) Parallelism Distribution Asynchronicity Parallelization

58 Tools Probability Machine LearningMatrix Theory HPC


Download ppt "Peter Richtarik Why parallelizing like crazy and being lazy can be good."

Similar presentations


Ads by Google