Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading Group

Motivations Industry-wide shift to multicore No good framework for parallelize ML algorithms Goal: develop a general and exact technique for parallel programming of a large class of ML algorithms for multicore processors

Idea Statistical Query Model Summation Form Map-Reduce

Outline Introduction Statistical Query Model and Summation Form Architecture (inspired by Map-Reduce) Adopted ML Algorithms Experiments Conclusion

Valiant Model [Valiant’84] x is the input y is a function of x that we want to learn In Valiant model, the learning algorithm uses randomly drawn examples to learn the target function

Statistical Query Model [Kearns’98] A restriction on Valiant model A learning algorithm uses some aggregates over the examples, not the individual examples More precisely, the learning algorithm interacts with a statistical query oracle Learning algorithm asks about f(x,y) Oracle returns the expectation that f(x,y) is true

Summation Form Aggregate over the data: Divide the data set into pieces Compute aggregates on each cores Combine all results at the end

Example: Linear Regression using Least Squares Model: Goal: Solution: Given m examples: (x1, y1), (x2, y2), …, (xm, ym) We write a matrix X with x1, …, xm as rows, and row vector Y=(y1, y2, …ym). Then the solution is Parallel computation: Cut to m/num_processor pieces

Lighter Weight Map-Reduce for Multicore

Locally Weighted Linear Regression (LWLR) Mappers: one sets compute A, the other set compute b Two reducers for computing A and b Finally compute the solution When wi==1, this is least squares. Solve:

Naïve Bayes (NB) Goal: estimate P(xj=k|y=1) and P(xj=k|y=0) Computation: count the occurrence of (xj=k, y=1) and (xj=k, y=0), count the occurrence of (y=1) and (y=0), the compute division Mappers: count a subgroup of training samples Reducer: aggregate the intermediate counts, and calculate the final result

Gaussian Discriminative Analysis (GDA) Goal: classification of x into classes of y assuming each class is a Gaussian Mixture model with different means but same covariance. Computation: Mappers: compute for a subset of training samples Reducer: aggregate intermediate results

K-means Computing the Euclidean distance between sample vectors and centroids Recalculating the centroids Divide the computation to subgroups to be handled by map-reduce

Expectation Maximization (EM) E-step computes some prob or counts per training example M-step combines these values to update the parameters Both of them can be parallelized using map-reduce

Neural Network (NN) Back-propagation, 3-layer network Input, middle, 2 output nodes Goal: compute the weights in the NN by back propagation Mapper: propagate its set of training data through the network, and propagate errors to calculate the partial gradient for weights Reducer: sums the partial gradients and does a batch gradient descent to update the weights

Principal Components Analysis (PCA) Compute the principle eigenvectors of the covariance matrix Clearly, we can compute the summation form using map-reduce

Other Algorithms Logistic Regression Independent Component Analysis Support Vector Machine

Time Complexity

Setup Compare map-reduce version and sequential version 10 data sets Machines: Dual-processor Pentium-III 700MHz, 1GB RAM 16-way Sun Enterprise 6000 (these are SMP, not multicore)

Dual-Processor SpeedUps

2-16 processor speedups More data in the paper

Multicore Simulator Results A paragraph on this Basically, says that results are better than multiprocessor machines. Could be because of less communication cost

Conclusion Parallelize summation forms Use map-reduce on a single machine

Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Similar presentations

Presentation on theme: "Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading.

Similar presentations

Presentation on theme: "Map-Reduce for Machine Learning on Multicore C. Chu, S.K. Kim, Y. Lin, Y.Y. Yu, G. Bradski, A.Y. Ng, K. Olukotun (NIPS 2006) Shimin Chen Big Data Reading."— Presentation transcript:

Similar presentations

About project

Feedback