資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 出處 : 2011 UKSim 5th European Symposium on Computer Modeling.

Slides:

Advertisements

Similar presentations

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO

Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

MapReduce Online Veli Hasanov Fatih University.

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.

國立台灣大學資訊工程學系 Chapter 4: Threads. 資工系網媒所 NEWS 實驗室 Objectives To introduce the notion of a thread — a fundamental unit of CPU utilization that forms the.

 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)

Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.

A Survey of proxy Cache Evaluation Techniques 系統實驗室田坤銘

Google Distributed System and Hadoop Lakshmi Thyagarajan.

Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

On Availability of Intermediate Data in Cloud Computations Steven Y. Ko, Imranul Hoque, Brian Cho, and Indranil Gupta Distributed Protocols Research Group.

Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.

U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Cloud MapReduce ： a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.

THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.

SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.

李智宇、林威宏、施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion

An Architecture for Video Surveillance Service based on P2P and Cloud Computing Yu-Sheng Wu, Yue-Shan Chang, Tong-Ying Juang, Jing-Shyang Yen speaker:

國立台灣大學資訊工程學系 Chapter 4: Threads. 資工系網媒所 NEWS 實驗室 Objectives To introduce the notion of a thread — a fundamental unit of CPU utilization that forms the.

Software Architecture

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu, Xiaohui Gu Source: Annual Computer Security Applications.

Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 江嘉福徐光成章博遠 2011, 11th IEEE/ACM International.

MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.

Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve ， Devendra Dahiphale ， Amit Chhajer 報告 : 饒展榕.

南台科技大學資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.

An Architecture for Distributed High Performance Video Processing in the Cloud Speaker : 吳靖緯 MA0G IEEE 3rd International Conference.

The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.

MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,

智慧型系統實驗室 iLab 南台資訊工程 1 Evaluation for the Test Quality of Dynamic Question Generation by Particle Swarm Optimization for Adaptive Testing Department of.

An Architecture for Distributed High Performance Video Processing in the Cloud 作者 :Pereira, R.; Azambuja, M.; Breitman, K.; Endler, M. 出處 :2010 IEEE 3rd.

資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 A Static Hand Gesture Recognition Algorithm Using K- Mean Based Radial Basis Function Neural Network 作者 :Dipak Kumar Ghosh,

MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.

Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!

Hidemoto Nakada, Hirotaka Ogawa and Tomohiro Kudoh National Institute of Advanced Industrial Science and Technology, Umezono, Tsukuba, Ibaraki ,

Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 2011 UKSim 5th European Symposium on Computer Modeling and Simulation Speker : Hong-Ji.

Intelligent Space 國立台灣大學資訊工程研究所智慧型空間實驗室 Service Behavior Consistency in the OSGi Platform Authors Y.Qin, H.Hao,L.Jun, G.Jidong and L.Jian Proceedings.

MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.

The LSAM Proxy Cache - a Multicast Distributed Virtual Cache Joe Touch USC / Information Sciences Institute 元智大學資訊工程研究所系統實驗室陳桂慧

CFTP - A Caching FTP Server Mark Russell and Tim Hopkins Computing Laboratory University of Kent Canterbury, CT2 7NF Kent, UK 元智大學資訊工程研究所系統實驗室陳桂慧.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.

A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 A new social and momentum component adaptive PSO algorithm for image segmentation Expert Systems with Applications 38 (2011)

南台科技大學資訊工程系 An effective solution for trademark image retrieval by combining shape description and feature matching 指導教授：李育強報告者：楊智雁日期： 2010/08/27.

Improving the WWW: Caching or Multicast? Pablo RodriguezErnst W. BiersackKeith W. Ross Institut EURECOM 2229, route des Cretes. BP , Sophia Antipolis.

Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {

Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.

INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.

Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Cloud Computing What, why, how? Noam Bercovici Renata Dividino.

COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University

Hadoop Aakash Kag What Why How 1.

Large-scale file systems and Map-Reduce

Introduction to MapReduce and Hadoop

Ministry of Higher Education

MapReduce: Data Distribution for Reduce

Introduction to Apache

MapReduce: Simplified Data Processing on Large Clusters

Presentation transcript:

資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 出處 : 2011 UKSim 5th European Symposium on Computer Modeling and Simulation 作者 : Rutvik Karve, Devendra Dahiphale, Amit Chhajer 報告者 : 邵建銘

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Outline 1.Introduction 2.Literature Survey 3.Our Proposed Architecture 4.Advantages,Features And Applications 5.Conclusions And Future Work 2

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Introduction(1/2) Cloud Map Reduce (CMR) is gaining popularity among small companies for processing large data sets in cloud environments. The current implementation of CMR is designed for batch processing of data. For processing streaming data in Cloud MapReduce (CMR), significant changes are required to be made to the existing CMR architecture. 3

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Introduction(2/2) We use pipelining between Map and Reduce phases as an approach to support stream data processing. In contrast to the current implementation where the Reducers do not start working unless all Mappers have finished. In our architecture, the Reduce phase too gets a continuous stream of data and can produce continuous output. 4

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Literature Survey(1/4) MapReduce –MapReduce is a programming model developed by Google for processing large data sets in a distributed fashion. –The model consists of two phases: Map phase and Reduce phase. 5

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Literature Survey(2/4) Hadoop –Hadoop is an implementation of the MapReduce programming model developed by Apache. –It incorporates a distributed file system called HDFS. –Hadoop is popular for processing huge data sets especially in social networking, targeted advertisements, internet log processing etc. 6

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Literature Survey(3/4) Cloud MapReduce Cloud MapReduce is a light-weight implememtation of MapReduce programming model on top of the Amazon cloud OS. The architecture of CMR consists of one input queue and multiple reduce queues master reduce queue that holds the pointers to the reduce queues, and an output queue that holds the final results. S3 file system is used to store the data to be processed, and SimpleDB is used to communicate the status of the worker nodes. 7

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Literature Survey(4/4) Online MapReduce (Hadoop Online Prototype) –Hadoop Online Prototype(HOP) is a modification to traditional Hadoop framework that incorporates pipelining between the Map and Reduce phases. –The downstream dataflow element can begin processing before an upstream producer finishes. –HOP providing support for processing streaming data. –It also supports snapshots of output data. 8

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Our Proposed Architecture(1/8) The HOP implementations suffer from the following drawbacks. 1. HOP is unsuitable for cloud, it lacks the inherent scalability and flexibility of cloud. 2. In HOP, code for handling of HDFS, reliability, scheduling etc. is a part of the Hadoop framework itself, and hence makes it large and heavy-weight. 3. Cloud MapReduce does not support stream data processing. 9

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Our Proposed Architecture(2/8) Our proposal aims at bridging this gap between heavy-weight HOP and the light-weight, scalable Cloud MapReduce implementation, by providing support for processing stream data in CMR. The challenges involved in the implementation include: 1. Providing support for streaming data at input. 2. A novel design for output aggregation. 3. Handling Reducer failures. 4. Handling windows based on timestamps. 10

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Our Proposed Architecture(3/8) Currently, no open-source implementation exists for processing streaming data using MapReduce on top of Cloud. The best of our knowledge is to integrate stream data processing capability with MapReduce on Amazon Web Services using EC2. Describe the architecture of the Pipelined CMR approach. – Input, Mapper Operation, Reduce Phase 11

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Our Proposed Architecture(4/8) Input –A drop-box concept can be used, where a folder on S3 is used to hold the data that is to be processed by Cloud MapReduce. –The user is responsible for providing data in the drop- box from which it will be sent to the input SQS queue 12

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Our Proposed Architecture(5/8) Mapper Operation –The Mapper, whenever it is free, pops one message from the input SQS queue thereby removing the message from the queue for a visibility timeout and processes it according to the user-defined Map function. 13

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Our Proposed Architecture(6/8) Reduce Phase –The Mapper writes the intermediate records produced to ReduceQueues. –ReduceQueues are intermediate staging queues implemented using SQS for holding the mapper output. 14

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Our Proposed Architecture(7/8) 15

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Our Proposed Architecture(8/8) Handling Reducer Failures –The Status field is used for handling Reducer failures. –Status can be one of Live, Dead, Idle. –If reducer has not updated its status, its sets its Status to Dead –If reducer’s status is Idle, assigns the new Reducer the Reduce Queue Pointers and Output Pointers previously held by the old Reducer. 16

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Advantages,Features And Applications(1/4) The design has the following advantages: 1. This allows parallelism between the Map and Reduce phases. 2. A downstream processing element can start processing as soon as some data is available from an upstream element. 3. The network is better utilized as data is continuously pushed from one phase to the next. 17

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Advantages,Features And Applications(2/4) 4. The final output is computed incrementally. 5. Introduction of a pipeline between the Reduce phase of one job and the Map phase of the next job will support Cascaded MapReduce jobs. 18

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Advantages,Features And Applications(3/4) Other Features of the design include: Time windows Snapshots Cascaded MapReduce Jobs 19

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Advantages,Features And Applications(4/4) Applications: With these features, the design is particularly suited to stream processing of data. Typically analysis and processing of web feeds click- streams, micro-blogging, and stock market quotes are some of the popular and typical stream processing applications. This design can also be used to process real-time data. 20

資訊工程系智慧型系統實驗室 iLab 南台科技大學 Conclusions And Future Work The design fulfills a real need of processing streaming data using MapReduce Further work can be done in maintaining intermediate output information for supporting rolling windows. Future scope also includes designing a generic system that is portable across several cloud operating systems. 21