Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large-scale Recommender Systems on Just a PC LSRS 2013 keynote (RecSys ’13 Hong Kong) Aapo Kyrölä Ph.D. CMU

Similar presentations


Presentation on theme: "Large-scale Recommender Systems on Just a PC LSRS 2013 keynote (RecSys ’13 Hong Kong) Aapo Kyrölä Ph.D. CMU"— Presentation transcript:

1 Large-scale Recommender Systems on Just a PC LSRS 2013 keynote (RecSys ’13 Hong Kong) Aapo Kyrölä Ph.D. candidate @ CMU http://www.cs.cmu.edu/~akyrola Twitter: @kyrpov Big Data – small machine

2 My Background Academic: 5 th year Ph.D. @ Carnegie Mellon. Advisors: Guy Blelloch, Carlos Guestrin (UW) Startup Entrepreneur 2009  2012  + Shotgun : Parallel L1-regularized regression solver (ICML 2011). + Internships at MSR Asia (2011) and Twitter (2012) Habbo : founded 2000

3 Outline of this talk 1.Why single-computer computing? 2.Introduction to graph computation and GraphChi 3.Recommender systems with GraphChi 4.Future directions & Conclusion

4 Why on a single machine? Can’t we just use the Cloud? Large-Scale Recommender Systems on Just a PC

5 Why use a cluster? Two reasons: 1.One computer cannot handle my problem in a reasonable time. 1.I need to solve the problem very fast.

6 Why use a cluster? Two reasons: 1.One computer cannot handle my problem in a reasonable time. 1.I need to solve the problem very fast. Our work expands the space of feasible (graph) problems on one machine: -Our experiments use the same graphs, or bigger, than previous papers on distributed graph computation. (+ we can do Twitter graph on a laptop) -Most data not that “big”. Our work raises the bar on required performance for a “complicated” system.

7 Benefits of single machine systems Assuming it can handle your big problems… 1.Programmer productivity – Global state – Can use “real data” for development 2.Inexpensive to install, administer, less power. 3.Scalability.

8 Efficient Scaling Task 7Task 6Task 5Task 4Task 3Task 2Task 1 TimeT Distributed Graph System Single-computer system (capable of big tasks) Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 TimeT T11T10T9T8T7T6T5T4T3T2T1 6 machines 12 machines Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 10 Task 11 Task 12 (Significantly) less than 2x throughput with 2x machines Exactly 2x throughput with 2x machines

9

10 GRAPH COMPUTATION AND GRAPHCHI

11 Why graphs for recommender systems? Graph = matrix: edge(u,v) = M[u,v] – Note: always sparse graphs Intuitive, human-understandable representation – Easy to visualize and explain. Unifies collaborative filtering (typically matrix based) with recommendation in social networks. – Random walk algorithms. Local view  vertex-centric computation

12 Vertex-Centric Computational Model Graph G = (V, E) – directed edges: e = (source, destination) – each edge and vertex associated with a value (user- defined type) – vertex and edge values can be modified (structure modification also supported) Data 12 GraphChi – Aapo Kyrola A A B B

13 Data Vertex-centric Programming “Think like a vertex” Popularized by the Pregel and GraphLab projects MyFunc(vertex) { // modify neighborhood } Data

14 What is GraphChi Both in OSDI’12!

15 The Main Challenge of Disk-based Graph Computation: Random Access ~ 100K reads / sec (commodity) ~ 1M reads / sec (high-end arrays) << 5-10 M random edges / sec to achieve “reasonable performance” 100s reads/writes per sec

16 Parallel Sliding Windows Only P large reads for each interval (sub-graph). P 2 reads on one full pass. or Details: Kyrola, Blelloch, Guestrin: “Large-scale graph computation on just a PC” (OSDI 2012)

17 GraphChi Program Execution For T iterations: For p=1 to P For v in interval(p) updateFunction(v) For T iterations: For v=1 to V updateFunction(v) “Asynchronous”: updates immediately visible (vs. bulk-synchronous).

18 Performance GraphChi can compute on the full Twitter follow-graph with just a standard laptop. ~ as fast as a very large Hadoop cluster! (size of the graph Fall 2013, > 20B edges [Gupta et al 2013])

19 GraphChi is Open Source C++ and Java-versions in GitHub: http://github.com/graphchi http://github.com/graphchi – Java-version has a Hadoop/Pig wrapper. If you really really want to use Hadoop.

20 RECSYS MODEL TRAINING WITH GRAPHCHI

21 Overview of Recommender Systems for GraphChi Collaborative Filtering toolkit (next slide) Link prediction in large networks – Random-walk based approaches (Twitter) – Talk on Wednesday.

22 GraphChi’s Collaborative Filtering Toolkit Developed by Danny Bickson (CMU / GraphLab Inc) Includes: – Alternative Least Squares (ALS) – Sparse-ALS – SVD++ – LibFM (factorization machines) – GenSGD – Item-similarity based methods – PMF – CliMF (contributed by Mark Levy) – …. Note: In the C++ -version. Java-version in development by a CMU team. See Danny’s blog for more information: http://bickson.blogspot.com /2012/12/collaborative- filtering-with-graphchi.html

23 TWO EXAMPLES: ALS AND ITEM-BASED CF

24 Example: Alternative Least Squares Matrix Factorization (ALS) Reference: Y. Zhou, D. Wilkinson, R. Schreiber, R. Pan: “Large-Scale Parallel Collaborative Filtering for the Netflix Prize” (2008) Task: Predict ratings for items (movies) by users. Model: – Latent factor model (see next slide)

25 ALS: Product – Item bipartite graph City of God Wild Strawberries The Celebration La Dolce Vita Women on the Verge of a Nervous Breakdown 4 3 2 5 0.42.3-1.82.91.2 -3.22.80.90.24.1 8.72.90.042.13.141 2.32.53.90.020.04 User’s rating of a movie modeled as a dot-product: User’s rating of a movie modeled as a dot-product:

26 ALS: GraphChi implementation Update function handles one vertex a time (user or movie) For each user: – Estimate latent(user): minimize least squares of dot- product predicted ratings GraphChi executes the update function for each vertex (in parallel), and loads edges (ratings) from disk – Latent factors in memory: need O(V) memory. – If factors don’t fit in memory, can replicate to edges. and thus store on disk Scales to very large problems!

27 ALS: Performance Matrix Factorization (Alternative Least Squares) Remark: Netflix is not a big problem, but GraphChi will scale at most linearly with input size (ALS is CPU bounded, so should be sub-linear in #ratings).

28 Example: Item Based-CF Task: compute a similarity score [e,g. Jaccard] for each movie-pair that has at least one viewer in common. – Similarity(X, Y) ~ # common viewers – Output top K similar items for each item to a file. – … or: create edge between X, Y containing the similarity. Problem: enumerating all pairs takes too much time.

29 City of God Wild Strawberries The Celebration La Dolce Vita Women on the Verge of a Nervous Breakdown 3 Solution: Enumerate all triangles of the graph. New problem: how to enumerate triangles if the graph does not fit in RAM?

30 Enumerating Triangles (Item-CF) Triangles with edge (u, v) = intersection(neighbors(u), neighbors(v)) Iterative memory efficient solution (next slide)

31 PIVOTS Algorithm: Let pivots be a subset of the vertices; Load all neighbor-lists (adjacency lists) of pivots into RAM Use now GraphChi to load all vertices from disk, one by one, and compare their adjacency lists to the pivots’ adjacency lists (similar to merge). Repeat with a new subset of pivots.

32 Triangle Counting Performance Triangle Counting

33 FUTURE DIRECTIONS & FINAL REMARKS

34 Single-Machine Computing in Production? GraphChi supports incremental computation with dynamic graphs: – Can keep on running indefinitely, adding new edges to the graph  Constantly fresh model. – However, requires engineering – not included in the toolkit. Compare to a cluster-based system (such as Hadoop) that needs to compute from scratch.

35 Efficient Scaling Businesses need to compute hundreds of distinct tasks on the same graph – Example: personalized recommendations. Parallelize each task Parallelize across tasks Task

36 Single Machine vs. Cluster Most “Big Data” computations are I/O-bound – Single machine: disk bandwidth + seek latency – Distributed memory: network bandwidth + network latency Complexity / challenges: – Single machine: algorithms and data structures that reduce random access – Distributed: admin, coordination, consistency, fault tolerance Total cost – Programmer productivity – Specialized vs. Generalized frameworks

37 Unified Recsys Platform for GraphChi? Working with masters students at CMU. Goal: ability to easily compare different algorithms, parameters – Unified input, output. – General programmable API (not just file-based) – Evaluation process: Several evaluation metrics; Cross- validation, held-out data… – Run many algorithm instances in parallel, on same graph. – Java. Scalable from the get-go.

38

39

40 Recent developments: Disk-based Graph Computation Recently two disk-based graph computation systems published: – TurboGraph (KDD’13) – X-Stream (SOSP’13 in October) Significantly better performance than GraphChi on many problems – Avoid preprocessing (“sharding”) – But GraphChi can do some computation that X- Stream cannot (triangle counting and related); TurboGraph requires SSD – Hot research area!

41 Do you need GraphChi – or any system? Heck, for many algorithms, you can just mmap() over your (binary) adjacency list / sparse matrix, and write a for-loop. – See Lin, Chau, Kang Leveraging Memory Mapping for Fast and Scalable Graph Computation on a PC (Big Data ’13)Leveraging Memory Mapping for Fast and Scalable Graph Computation on a PC Obviously good to have a common API – And some algos need more advanced solutions (like GraphChi, X-Stream, TurboGraph) Beware of the hype!

42 Conclusion Very large recommender algorithms can now be run on just your PC or laptop. – Additional performance from multi-core parallelism. – Great for productivity – scale by replicating. In general, good single machine scalability requires care with data structures, memory management  natural with C/C++, with Java (etc.) need low- level byte massaging. – Frameworks like GraphChi hide the low-level. More work needed to ‘’productize’’ current work.

43 Thank you! Aapo Kyrölä Ph.D. candidate @ CMU – soon to graduate! (Currently visiting U.W) http://www.cs.cmu.edu/~akyrola Twitter: @kyrpov


Download ppt "Large-scale Recommender Systems on Just a PC LSRS 2013 keynote (RecSys ’13 Hong Kong) Aapo Kyrölä Ph.D. CMU"

Similar presentations


Ads by Google