Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chao Liu Internet Services Research Center Microsoft Research-Redmond.

Similar presentations


Presentation on theme: "Chao Liu Internet Services Research Center Microsoft Research-Redmond."— Presentation transcript:

1 Chao Liu Internet Services Research Center Microsoft Research-Redmond

2  Motivation & Challenges  Background on Distributed Computing  Standard ML on MapReduce  Classification: Naïve Bayes  Clustering: Nonnegative Matrix Factorization  Modeling: EM Algorithm  Customized ML on MapReduce  Click Modeling  Behavior Targeting  Conclusions 11/2/20152

3  Data on the Web  Scale: terabyte-to-petabyte data ▪ Around 20TB log data per day from Bing  Dynamics: evolving data streams ▪ Click data streams with evolving/emerging topics  Applications: Non-traditional ML tasks ▪ Predicting clicks & ads 11/2/20153

4  Motivation & Challenges  Background on Distributed Computing  Standard ML on MapReduce  Classification: Naïve Bayes  Clustering: Nonnegative Matrix Factorization  Modeling: EM Algorithm  Customized ML on MapReduce  Click Modeling  Behavior Targeting  Conclusions 11/2/20154

5  Parallel computing  All processors have access to a shared memory, which can be used to exchange information between processors  Distributed computing  Each processor has its own private memory (distributed memory), communicating over the network ▪ Message passing ▪ MapReduce 11/2/20155

6  MPI is for task parallelism  Suitable for CPU-intensive jobs  Fine-grained communication control, powerful computation model  MapReduce is for data parallelism  Suitable for data-intensive jobs  A restricted computation model 11/2/20156

7 7 Reducer Aggregate values by keys …… …… Mapper docs (docId, doc) pairs (w1,1) (w2,1) (w3,1) (w1, ) (w1, 3) Mapper docs (docId, doc) pairs (w1,1) (w3,1) Mapper docs (docId, doc) pairs (w1,1) (w2,1) (w3,1) Reducer (w2, ) (w2, 2) Reducer (w3, ) (w3, 3) … Web corpus on multiple machines Mapper: for each word w in a doc, emit (w, 1) Intermediate (key,value) pairs are aggregated by word Reducer is copied to each machine to run over the intermediate data locally to produce the result

8  A big picture: Not Omnipotent but good enough 11/2/20158 Standard ML AlgorithmCustomized ML Algorithm MapReduce Friendly Classification: Naïve Bayes, logistic regression, MART, etc Clustering: k-means, NMF, co- clustering, etc Modeling: EM algorithm, Gaussian mixture, Latent Dirichlet Allocation, etc PageRank Click Models Behavior Tageting MapReduce Unfriendly Classification: SVM Clustering: Spectrum clustering Learning-to-Rank

9  Motivation & Challenges  Background on Distributed Computing  Standard ML on MapReduce  Classification: Naïve Bayes  Clustering: Nonnegative Matrix Factorization  Modeling: EM Algorithm  Customized ML on MapReduce  Click Modeling  Behavior Targeting  Conclusions 11/2/20159

10  P(C|X) P(C) P(X|C) =P(C)∏P(X j |C) 10 …… Mapper (x (i),y (i) ) (j, x j (i),y (i) ) Reduce on y (i) P(C) Reduce on j P(X j |C) (x (i),y (i) ) Mapper …… ……

11  Effective tool to uncover latent relationships in nonnegative matrices with many applications [Berry et al., 2007, Sra & Dhillon, 2006]  Interpretable dimensionality reduction [Lee & Seung, 1999]  Document clustering [Shahnaz et al., 2006, Xu et al, 2006] Challenge: Can we scale NMF to million-by-million matrices

12

13  Data Partition: A, W and H across machines ………….....

14

15 … … … … Map-I Reduce-I Map-II Reduce-II Map-III Map-IV Map-V … … … … … … Reduce-III Reduce-V

16 … … … Map-I Reduce-I Map-II Reduce-II … … …

17 … … Map-III Map-IV … Reduce-III........

18 … Map-V … … … … Reduce -V

19 … … … … Map-I Reduce-I Map-II Reduce-II Map-III Map-IV Map-V … … … … … … Reduce-III Reduce-V

20 3 hours per iteration, 20 iterations take around 20*3*0.72 ≈ 43 hours Less than 7 hours on a 43.9M-by-769M matrix with 4.38 billion nonzero values

21  Map  Evaluate  Compute  Reduce  11/2/201521

22  Motivation & Challenges  Background on Distributed Computing  Standard ML on MapReduce  Classification: Naïve Bayes  Clustering: Nonnegative Matrix Factorization  Modeling: EM Algorithm  Customized ML on MapReduce  Click Modeling  Behavior Targeting  Conclusions 11/2/201522

23  Clicks are good…  Are these two clicks equally “good”?  Non-clicks may have excuses:  Not relevant  Not examined 11/2/201523

24 2411/2/2015

25 query URL 1 URL 2 URL 3 URL 4 C1C1 C2C2 C3C3 C4C4 S1S1 S2S2 S3S3 S4S4 Relevance E1E1 E2E2 E3E3 E4E4 Examine Snippet ClickThroughs

26 S1S1 E1E1 E2E2 C1C1 S2S2 C2C2 … … … SiSi EiEi CiCi the preceding click position before i

27  Ultimate goal  Observation: conditional independence

28  Likelihood of search instance  From S to R:

29  Posterior with  Re-organize by R j ’s How many times d j was clicked How many times d j was not clicked when it is at position (r + d) and the preceding click is on position r

30  Exact inference with joint posterior in closed form  Joint posterior factorizes and hence mutually independent  At most M(M+1)/2 + 1 numbers to fully characterize each posterior  Count vector:

31  Compute  Count vector for R 4 0 0 00 0 0 0 1 2 3 2 13 2 1 0 N4N4 N 4, r, d 1 1

32  Map: emit((q,u), idx)  Reduce: construct the count vector

33 (U1, 0) (U2, 4) (U3, 0) Map (U1, 1) (U3, 0) (U4, 7) Map (U1, 1) (U3, 0) (U4, 0) Map (U1, 0, 1, 1) (U2, 4)(U4, 0, 7)(U3, 0, 0, 0) Reduce

34  Setup:  8 weeks data, 8 jobs  Job k takes first k- week data Experiment platform – SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets [Chaiken et al, VLDB’08]

35  Increasing computation load  more queries, more urls, more impressions  Near-constant elapse time Computation Overload Elapse Time on SCOPE 3 hours Scan 265 terabyte data Full posteriors for 1.15 billion (query, url) pairs

36  Behavior targeting  Ad serving based on users’ historical behaviors  Complementary to sponsored Ads and content Ads 11/2/201536

37  Goal  Given ads in a certain category, locate qualified users based on users’ past behaviors  Data  User is identified by cookie  Past behavior, profiled as a vector x, includes ad clicks, ad views, page views, search queries, clicks, etc  Challenges:  Scale: e.g., 9TB ad data with 500B entries in Aug'08  Sparse: e.g., the CTR of automotive display ads is 0.05%  Dynamic: i.e., user behavior changes over time. 11/2/201537

38  CTR = ClickCnt/ViewCnt  A model to predict expected click count  A model to predict expected view count  Linear Poisson model  MLE on w 11/2/201538

39  Learning  Map: Compute and  Reduce: Update  Prediction 11/2/201539

40  Motivation & Challenges  Background on Distributed Computing  Standard ML on MapReduce  Classification: Naïve Bayes  Clustering: Nonnegative Matrix Factorization  Modeling: EM Algorithm  Customized ML on MapReduce  Click Modeling  Behavior Targeting  Conclusions 11/2/201540

41  Challenges imposed by Web data  Scalability of standard algorithms  Application-driven customized algorithms  Capability to consume huge amount of data outweighs algorithm sophistication  Simple counting is no less powerful than sophisticated algorithms when data is abundant or even infinite  MapReduce: a restricted computation model  Not omnipotent but powerful enough  Things we want to do turn out to be things we can do 11/2/201541

42 Thank You! 11/2/2015SEWM‘10 Keynote, Chengdu, China42


Download ppt "Chao Liu Internet Services Research Center Microsoft Research-Redmond."

Similar presentations


Ads by Google