Presentation is loading. Please wait.

Presentation is loading. Please wait.

Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir.

Similar presentations


Presentation on theme: "Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir."— Presentation transcript:

1 Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir

2 GOAL Explore the feasibility of a distributed caching mechanism inside Hadoop

3 Presentation Overview Motivation Design Experimental Results Future Work

4 Motivation Disk Access Times are a bottleneck in cluster computing Large amount of data is read from disk DARE RAMClouds PACMan – Coordinated Cache Replacement We want to strike a balance between RAM and Disk Storage

5 Our Approach Integrate Memcached with Hadoop Used Quickcached and Spymemcached Reserve a portion of the main memory at each node to serve as local cache Local caches aggregate to abstract a distributed caching mechanism governed by Memcached Greedy caching strategy Least Recently Used (LRU) cache eviction policy

6 Design Overview

7 Memcached

8 Design Choice 1 Simultaneous requests to Namenode and Memcached Minimizes access latency with additional network overhead

9 Design Choice 2 Send request to Namenode only in the case of a cache miss Minimizes network overhead with increased latency

10 Design Choice 3 Datanodes send requests only to Memcached Memcached checks for cached blocks If cache miss occurs, it contacts the namenode and returns the replicas addresses to the datanodes

11 Global Cache Replacement LRU based Global Cache Eviction Scheme

12 Prefetching

13 Simulation Results Test data ranging from 2GB to 24GB Word Count and Grep

14 Word Count

15

16 Grep

17

18 Future Work Implement a pre-fetching mechanism Customized caching policies based on access patterns Compare and contrast caching with locality aware scheduling

19 Conclusion Caching can improve the performance of cluster based systems based on the access patterns of the workload being executed


Download ppt "Investigating Distributed Caching Mechanisms for Hadoop Gurmeet Singh Puneet Chandra Rashid Tahir."

Similar presentations


Ads by Google