PA an Coordinated Memory Caching for Parallel Jobs

Slides:



Advertisements
Similar presentations
Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.
Advertisements

Aggressive Cloning of Jobs for Effective Straggler Mitigation Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica.
Effective Straggler Mitigation: Attack of the Clones Ganesh Ananthanarayanan, Ali Ghodsi, Srikanth Kandula, Scott Shenker, Ion Stoica.
SDN + Storage.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
UC Berkeley Job Scheduling for MapReduce Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Scott Shenker, Ion Stoica 1 RAD Lab, * Facebook Inc.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Disk-Locality in Datacenter Computing Considered Irrelevant Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, Ion Stoica 1.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Delay.
A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
UC Berkeley Improving MapReduce Performance in Heterogeneous Environments Matei Zaharia, Andy Konwinski, Anthony Joseph, Randy Katz, Ion Stoica University.
The Power of Choice in Data-Aware Cluster Scheduling
The Case for Tiny Tasks in Compute Clusters Kay Ousterhout *, Aurojit Panda *, Joshua Rosen *, Shivaram Venkataraman *, Reynold Xin *, Sylvia Ratnasamy.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Matei Zaharia, Dhruba Borthakur *, Joydeep Sen Sarma *, Khaled Elmeleegy +, Scott Shenker, Ion Stoica UC Berkeley, * Facebook Inc, + Yahoo! Research Fair.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Multi-Resource Packing for Cluster Schedulers Robert Grandl Aditya Akella Srikanth Kandula Ganesh Ananthanarayanan Sriram Rao.
Scalable and Coordinated Scheduling for Cloud-Scale computing
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
A Platform for Fine-Grained Resource Sharing in the Data Center
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Presented by Qifan Pu With many slides from Ali’s NSDI talk Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion Stoica.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types Ali Ghodsi, Matei Zaharia, Benjamin Hindman, Andy Konwinski, Scott Shenker, Ion.
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI 11’ Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D.
Big Data Analytics with Parallel Jobs
GRASS: Trimming Stragglers in Approximation Analytics
BD-Cache: Big Data Caching for Datacenters
Introduction to Distributed Platforms
Chris Cai, Shayan Saeed, Indranil Gupta, Roy Campbell, Franck Le
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Managing Data Transfer in Computer Clusters with Orchestra
Memory Management for Scalable Web Data Servers
Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.
A Survey on Distributed File Systems
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
MapReduce Simplied Data Processing on Large Clusters
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
EECS 582 Final Review Mosharaf Chowdhury EECS 582 – F16.
湖南大学-信息科学与工程学院-计算机与科学系
Hadoop Technopoints.
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Cloud Computing Large-scale Resource Management
Scheduling of Regular Tasks in Linux
Cloud Computing MapReduce in Heterogeneous Environments
Virtual Memory: Working Sets
Database System Architectures
Page Cache and Page Writeback
Scheduling of Regular Tasks in Linux
Presentation transcript:

PA an Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, Ion Stoica Presented by- Subarno Banerjee EECS 582 – F16

Motivation Jobs Scheduler EECS 582 – F16

Motivation Heavy-tailed Input Sizes of Jobs 96% of Facebook’s smallest jobs account for 10% overall data Input data of most jobs can be cached in 32GB of memory 96% of jobs in Facebook’s Hadoop cluster fit in memory Popularity Skew EECS 582 – F16

Majority of jobs are small in size Motivation IO-intensive phase constitute significant portion of datacenter execution 79% of runtime, 69% of resource + Majority of jobs are small in size Popularity skew || In-Memory Caching EECS 582 – F16

PACMan: Parallel All-or-nothing Cache Manager PACMan Coordinator PACMan Client DFS Slave DFS Master Globally coordinates access to its distributed memory caches across various machines Two main tasks: Support queries for the set of machines where block is cached Mediate cache replacement globally across machines EECS 582 – F16

PACMan Coordinator Keep track of the changes that are made by clients. PACMan Client DFS Slave DFS Master Keep track of the changes that are made by clients. Maintain a mapping between every cached block and the machines that cache it. Implements the cache eviction policies such as LIFE and LFU-F Secondary coordinator on cold standby as a backup. EECS 582 – F16

PACMan Client PACMan Coordinator PACMan Client DFS Slave DFS Master Service requests to existing cache blocks and manage new blocks. Data is cached at the destination, rather than the source EECS 582 – F16

Insight: All-or-Nothing Property Tasks of small jobs run simultaneously in a wave Task duration (uncached input) Task duration (cached input) slot2 slot1 time completion slot2 slot1 time completion slot2 slot1 time completion All-or-nothing: Unless all inputs are cached, there is no benefit EECS 582 – F16

Insight: All-or-Nothing Property All-or-nothing property matters for efficiency as well Task duration (uncached input) Task duration (cached input) time Read Compute time Map Read Compute Idle Reduce Read Compute Idle EECS 582 – F16

Problems with Traditional Policies Simply maximizing the hit-rate may not improve performance E.g. If we have Job 1 and Job 2 where Job 2 depends on result of Job 1 Task duration (uncached input) (cached input) slot1 slot2 slot3 slot4 Job 1 completion Job 2 slot1 slot2 slot3 slot4 Job 1 completion Job 2 Sticky Policy: Evict the Incomplete Files first. EECS 582 – F16

Eviction Policy - LIFE Goal: Minimize the average completion time of jobs Are there any Incomplete Files? “largest” – file with largest wave-width Wave-width: Number of parallel tasks of a job YES NO Evict largest Incomplete File Evict largest Complete File EECS 582 – F16

Eviction Policy – LFU-F Goal: Maximize the resource efficiency of the cluster (Utilization) Are there any Incomplete Files? YES NO Evict least accessed Incomplete File Evict least accessed Complete File EECS 582 – F16

Eviction Policy – LIFE vs LFU-F Job 1 – Wave-width: 3 Capacity: 3 Frequency: 2 Job 2 – Wave-width: 2 Capacity: 4 Frequency: 1 LIFE: Evict the highest wave-width LFU-F : Evict the lowest frequency slot1 slot2 slot3 slot4 slot5 Job 1 Job 2 slot1 slot2 slot3 slot4 slot5 Job 1 Job 2 Capacity Required: 4 Capacity Required: 3 EECS 582 – F16

Evaluation Experimental Setup: 100-node cluster in Amazon EC2 34.2GB of memory, 20GB of cache allocation for PACMan per machine 13 cores and 850 GB storage Traces from Facebook and Bing EECS 582 – F16

Results: PACMan on Hadoop Significant reduction in completion time for small jobs. Better efficiency at larger jobs. EECS 582 – F16

Results: PACMan vs Traditional Policies LIFE performs significantly better than MIN, despite having lower hit ratio for most applications. Sticky-policy help LFU-F have better cluster efficiency. EECS 582 – F16

Takeaways Most datacenter workloads are small in size, and can fit in memory. PACMan – Coordinated Cache Management System Take into account All-or-nothing nature of parallel jobs to improve: Completion Time (LIFE policy) Resource Utilization (LFU-F policy) 53% improvement in runtime, 54% improvement in resource utilization over Hadoop. EECS 582 – F16

Reviews All-or-nothing Overheads Fairness? Wave-based execution Reduces completion time Increases cluster efficiency Significant performance gain Theoretical analysis Coordination architecture independent of replacement policy Overheads Fairness? No Prefetching Generalize to other parallel workloads? Fault tolerance? Coordinator failure recovery is slow Scalability? >10 tasks per machine? Single coordinator bottleneck? Communication overhead with scheduler for prefetching Wave-width approximation More policies ? Reliability? Clients become uncooperative Priority inversion? Long running jobs suffer due to many small jobs EECS 582 – F16

Discussion Should PACMan be scaled down to coordinate shared cache management for multicore processors ? Are there distributed workloads where the “all-or-nothing” property does not hold ? PACMan assumes remote memory access is constrained by the network. How does RDMA affect PACMan’s design ? EECS 582 – F16

Results: PACMan’s Scalability PACMan client saturate at 10-12 tasks per machine, for block sizes of 64/128/256MB, which is typical of Hadoop Coordinator maintains a constant ~1.2ms latency till 10,300 requests per second; Hadoop handles 3,200 requests/s EECS 582 – F16

Results: PACMan’s sensitivity to Cache Size Both LIFE and LFU-F react gracefully to reduction in cache size. At 12GB cache- LIFE yields 35% reduction in completion time, LFU-F yields 29% improvement in cluster efficiency EECS 582 – F16