Genetic Algorithms by using MapReduce Fei Teng Doga Tuncay 12/5/2011.

Slides:



Advertisements
Similar presentations
Distributed and Parallel Processing Technology Chapter2. MapReduce
Advertisements

Beyond Mapper and Reducer
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
MapReduce Simplified Data Processing on Large Clusters
Performance Considerations of Data Acquisition in Hadoop System
MapReduce.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Parallel Computing MapReduce Examples Parallel Efficiency Assignment
O’Reilly – Hadoop: The Definitive Guide Ch.5 Developing a MapReduce Application 2 July 2010 Taewhi Lee.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US.
Clydesdale: Structured Data Processing on MapReduce Jackie.
Hadoop: The Definitive Guide Chap. 2 MapReduce
Hadoop: Nuts and Bolts Data-Intensive Information Processing Applications ― Session #2 Jimmy Lin University of Maryland Tuesday, February 2, 2010 This.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
ACL, June Pairwise Document Similarity in Large Collections with MapReduce Tamer Elsayed, Jimmy Lin, and Douglas W. Oard University of Maryland,
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Standard architecture emerging: – Cluster of commodity.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
HADOOP ADMIN: Session -2
Inter-process Communication in Hadoop
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
MapReduce.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Committed to Deliver….  We are Leaders in Hadoop Ecosystem.  We support, maintain, monitor and provide services over Hadoop whether you run apache Hadoop,
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
Storage in Big Data Systems
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
HAMS Technologies 1
Whirlwind Tour of Hadoop Edward Capriolo Rev 2. Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High.
Large-scale file systems and Map-Reduce Single-node architecture Memory Disk CPU Google example: 20+ billion web pages x 20KB = 400+ Terabyte 1 computer.
Genetic Algorithms by using MapReduce
MARISSA: MApReduce Implementation for Streaming Science Applications 作者 : Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, L. ; Gunter, D. ; Canon,
MRPGA : An Extension of MapReduce for Parallelizing Genetic Algorithm Reporter :古乃卉.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Hidemoto Nakada, Hirotaka Ogawa and Tomohiro Kudoh National Institute of Advanced Industrial Science and Technology, Umezono, Tsukuba, Ibaraki ,
Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
MapReduce. What is MapReduce? (1) A programing model for parallel processing of a distributed data on a cluster It is an ideal solution for processing.
MapReduce and the New Software Stack CHAPTER 2 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
MapReduce Basics Chapter 2 Lin and Dyer & /tutorial/
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  System architecture  Implementation – HDFS  Implementation – System Analysis ◦ System Information.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
Jimmy Lin and Michael Schatz Design Patterns for Efficient Graph Algorithms in MapReduce Michele Iovino Facoltà di Ingegneria dell’Informazione, Informatica.
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
Large-scale file systems and Map-Reduce
湖南大学-信息科学与工程学院-计算机与科学系
MapReduce Algorithm Design
CS110: Discussion about Spark
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
CS 345A Data Mining MapReduce This presentation has been altered.
5/7/2019 Map Reduce Map reduce.
COS 518: Distributed Systems Lecture 11 Mike Freedman
Presentation transcript:

Genetic Algorithms by using MapReduce Fei Teng Doga Tuncay 12/5/2011

Outline Onemax problem Hadoop genetic algorithm Twister genetic algorithm Performance discussion References

Onemax problem Tries to maximize the number of ones of a bitstring. Formally, can be described as finding a string that maximizes the following equation:

Hadoop genetic algorithm Make hadoop to support iterative mapreduce – Start new job for each iteration – Put iterative output in HDFS – Override interfaces to make customized value type – Map input key-value pair – Reduce input key-value pair

Hga dataflow Client Mappers Reducers HDFS Sub populations … Initial population

Twister genetic algorithm Twister supports iterative sematic in nature – No file system and hard disk I/O involved – Use combiner to restore next generation population – Override interfaces to make new value type – Map output key-value pair – Reduce output key-value pair

Twister workflow Twister Driver Sub popul ation Map Reducer Map Combiner Intermediate New sub populations

Hadoop/Twister performance Testing configuration – Futuregrid 8 nodes x 8 cores CPU: 2.93G Mem: 24GB – Input size: 5120 genes – Gene length: 2KB – Both converge on the optimal point

Tga performance test Reducer is the key of performance – Because mappers just simply count the number of ones in each gene and emit them Testing environment – Quarry cluster – Ten nodes Mem: 16GB memory CPU: 2.33G x 8 cores

Tga performance results

Tga performance results(cont’d)

Discussion Hadoop GATwister GA PerformanceLow for GAHigh for GA ProgrammabilityStraightforward because the existence of HDFS and not easy to make mistake Must have a clear understanding about what is static data and what is the data flow of dynamic data Iterative supportNoYes ScalabilityGood according to [2]Good Configuration and testMany parameters to set and support unite test Easy to deploy but test mainly based on “printf” AdministrationAdmin and moniter by web brower Mainly by checking deamon/driver’s output

References [1] Chao Jin, Christian Vecchiola and Rajkumar Buyya MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms [2] Abhishek Verma, Xavier Llora, David E. Goldberg, Scaling Simple and Compact Genetic Algorithms using MapReduce [3] [4] Di-Wei Huang, Jimmy Lin, Scaling Populations of a Genetic Algorithm for Job Shop Scheduling Problems using MapReduce

Thank you Questions?