Presentation is loading. Please wait.

Presentation is loading. Please wait.

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

Similar presentations


Presentation on theme: "A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science."— Presentation transcript:

1 A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science

2 B ACKGROUND MapReduce programming model Simplifies the design and implementation of certain parallel algorithms. Original MapReduce model Need not support updates and consistency. Allows efficient and simple parallelization with static input data. Concurrent random-access write operations occur in the extended MapReduce models. Original MapReduce model does not provide consistency. The model does not apply to iterative or online computations. Extensions of the model have been proposed. The model does not apply to iterative or online computations. Extensions of the model have been proposed. We propose to use in-memory storage for extended MapReduce. We propose to use in-memory storage for extended MapReduce.

3 I N - MEMORY STORAGE Avoid the latency of disk accesses and can support consistent data updates. Increase the flexibility of the MapReduce parallel programming model without requiring additional communication facilities to propagate data updates. Transparently replicates data to computing nodes and is able to guarantee nontrivial data consistency. Demonstrate the utility of distributed in-memory storage for extended MapReduce EMR Framework Demonstrate the utility of distributed in-memory storage for extended MapReduce EMR Framework

4 OUTLINE The MapReduce Model Data Access in MapReduce Consistency and Fault Tolerance in Extended MapReduce The EMR Framework

5 T HE MAP R EDUCE MODEL Two important properties: The simplicity of parallelizing adequate computations Applicability to many common data processing problems. It is not suitable for problems which are not embarrassingly parallel. Extensions to the original model have been proposed. Extended MapReduce models: (1)Enable online aggregation of data (2) Allow continuous queries for online processing of data streams. (3) Support long-running computations, requiring stronger data consistency and fault tolerance. Extended MapReduce models: (1)Enable online aggregation of data (2) Allow continuous queries for online processing of data streams. (3) Support long-running computations, requiring stronger data consistency and fault tolerance.

6 E XTENDED M AP R EDUCE Extended MR increase the original MapReduce May update data instead of immutable input data and written-once intermediate data. Workers do not necessarily run map and reduce phase in lock-step Implement extended MapReduce models using data-oriented communication In-memory storage enables data-oriented communication for extended MapReduce Allows for adaptive caching Makes intermediate results accessible to all nodes without requiring explicit exchange of messages

7 T HE EMR FRAMEWORK -- IN - MEMORY FRAMEWORK FOR EXTENDED M AP R EDUCE Includes job management, storage abstractions and synchronization facilities. ECRAM implements a distributed transactional memory (DTM) for strong data consistency. Storage descriptors offers data access in a generic manner. The different execution models require specialized synchronization.

8 Architecture Extended MapReduce run- time environment The underlying ECRAM in- memory storage A utility library Run-time includes job management, abstractions storage and synchronization facilities Run-time includes job management, abstractions storage and synchronization facilities ECRAM virtually store all data. Enables MapReduce applications to access data in a locationtransparent and fault- tolerant way. ECRAM virtually store all data. Enables MapReduce applications to access data in a locationtransparent and fault- tolerant way. The utility library implements generic data structures.

9 E XECUTION MODEL The master and the workers create storage objects dynamically. Map and Reduce jobs emerge on demand The master assigns jobs to workers using a job queue. ECRAM stores computational tasks, job descriptors, job queue and node information blocks. Master calls mapreduce to start a MapReduce run Worker calls job_run to run jobs

10 J OB SCHEDULER Implements scheduling based on ECRAM’s update notifications Represents all processing nodes and all jobs as in- memory objects. Nodes have work queues and there is a global work queue for jobs Runtime system takes advantage of the wait/notify mechanism Transaction properties

11 EXECUTION OF DIFFERENT M AP R EDUCE Conventional MapReduce jobs An integer object to hold the number of finished map jobs. One map phase after one reduce phase Iterative MapReduce Creates map and reduce jobs alternately The worker nodes run exactly the same code as with conventional MapReduce. Online MapReduce Requires less job synchronization Create map and reduce jobs as soon as input data is available

12 F AULT T OLERANCE ECRAM implements fault tolerance using atomic transactions and replicated object. EMR builds upon ECRAM’s fault tolerance mechanisms Operations are implemented as transactions A job is scheduled to run on a node Assign it a timestamp. If an ancient timestamp identifies a lagging job, the node will be removed from the worker queue The job will be re-scheduled on another node.

13 EVALUATION A cluster system consisting of 33 computing nodes, each equipped with 2 AMD Opteron processors and 2 GB ccNUMA RAM. Ran with one master node and a varying numbers of workers. Kept the problem size constant.

14 The map phase: Node scans his partition of the input file-No concurrency Scales quite well in the number of nodes The reduce phase: All nodes write to the final results concurrently Worker’s increasing leading to more conflict.

15 R EAL - TIME R AYTRACING Raytracing transforms a 3D scene graph into a 2D image. Trace each ray separately. Raytracing improve an image iteratively. Nodes can update the image of a changing scene,making a case for online MapReduce. Master node splits the 2D image plane into equally- sized partitions The reduce phase simply collates the distinct partitions to a complete image.

16 The reduce phase is almost constant. The image computation during the map phase is almost inverse to the number of nodes

17 CONCLUSION The original MapReduce model does not apply to certain computational problems. Extensions of the original MapReduce model necessitate caring for consistency and reliable execution once again. In-memory storage benefits consistency and fault handling, giving rise to the idea of an in-memory framework for extended MapReduce.

18


Download ppt "A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science."

Similar presentations


Ads by Google