Introduction of Apache Hama Edward J. Yoon, October 11, 2011.

Slides:



Advertisements
Similar presentations
Pregel: A System for Large-Scale Graph Processing
Advertisements

1 Fault-Tolerance Techniques for Mobile Agent Systems Prepared by: Wong Tsz Yeung Date: 11/5/2001.
© Hortonworks Inc Running Non-MapReduce Applications on Apache Hadoop Hitesh Shah & Siddharth Seth Hortonworks Inc. Page 1.
Distributed Graph Processing Abhishek Verma CS425.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Look Who’s Talking: Discovering Dependencies between Virtual Machines Using CPU Utilization HotCloud 10 Presented by Xin.
1 Machine Learning with Apache Hama Tommaso Teofili tommaso [at] apache [dot] org.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Big Data Open Source Software and Projects ABDS in Summary XIII: Level 14A I590 Data Science Curriculum August Geoffrey Fox
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Tyson Condie.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Bulk Synchronous Parallel Processing Model Jamie Perkins.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
An Introduction to HDInsight June 27 th,
RAM, PRAM, and LogP models
LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
1 Bulk Synchronous Parallel Computing Trevor Schaub Jim Sellers This presentation was prepared for Professor Stefan Dobrev in partial fulfillment of the.
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
Programming in Hadoop Guangda HU Huayang GUO
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Data Structures and Algorithms in Parallel Computing Lecture 4.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.
Cluster and Warehouse-Scale Computing Responsible for Internet Services Websites (google) Content services (Netflix) Gaming (World of Warcraft) “Clouds”
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Data Structures and Algorithms in Parallel Computing
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
Next Generation of Apache Hadoop MapReduce Owen
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
MapReduce Compilers-Apache Pig
Jeremy Martin Alex Tiskin
Introduction to Distributed Platforms
Chapter 10 Data Analytics for IoT
Hadoop Tutorials Spark
Spark Presentation.
Hadoop Clusters Tess Fulkerson.
Data Structures and Algorithms in Parallel Computing
湖南大学-信息科学与工程学院-计算机与科学系
Introduction to Scheduling Chapter 1
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Introduction of Apache Hama Edward J. Yoon, October 11, 2011

About Me  Founder of Apache Hama.  Committer of Apache Bigtop.  Employee for KT. 

What Is Hama?  Apache Incubator Project.  BSP (Bulk Synchronous Parallel) for massive scientific computations.  Written In Java.  Currently 2 releases, 3 main committers.

Hama Characteristics  Provides a Pure BSP model.  Job submission and management interface.  Multiple tasks per node.  Checkpoint recovery.  Supports to run in the Clouds using Apache Whirr.  Supports to run with Hadoop nextGen.

Bulk Synchronous Parallel?  Parallel programming model introduced by Valiant.  Consist of a sequence of supersteps.  Conceptually simple and intuitive from a programming standpoint.  Used for a variety of applications e.g., scientific computing, genetic programming, …

Schematic diagram of a superstep Local Computation Idle Communication ………. Barrier Synchronization

Internals  Hadoop RPC is used for BSP tasks to communicate each other.  Collection and bundling of messages as a technique to reduce network overheads and contentions.  Zookeeper is used for Barrier Synchronization.

Pi Calculation  Each task executes locally its portion of the loop a number of times.  One task acts as master and collects the results through the BSP communication interface.

Structural Analysis of Network Traffic Flows  Traffic flows in KT clouds.  traffic engineering, anomaly detection, traffic forecasting and capacity planning  Currently BSP jobs are experimentally running on 512 multi-cores machines.

Random Communication Benchmarks  Benchmarked on 16 1U servers using 10 tasks per server.  X axis is the time (sec.)of BSP job execution (32 supersteps).  Y axis is the number of messages to be sent to random BSP tasks in each superstep.

What’s Next?  Support Input/Output Formatter like MapReduce.  Message Compression for High Performance.  Add some frameworks on top of Hama.

More Information  