Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  System architecture  Implementation – HDFS  Implementation – System Analysis ◦ System Information.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

Distributed and Parallel Processing Technology Chapter2. MapReduce
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
MapReduce.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
SDN + Storage.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Developing a MapReduce Application – packet dissection.
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Resource Management with YARN: YARN Past, Present and Future
Hadoop: The Definitive Guide Chap. 2 MapReduce
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Introduction to MapReduce Programming & Local Hadoop Cluster Accesses Instructions Rozemary Scarlat August 31, 2011.
Jian Wang Based on “Meet Hadoop! Open Source Grid Computing” by Devaraj Das Yahoo! Inc. Bangalore & Apache Software Foundation.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
Inter-process Communication in Hadoop
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Overview Hadoop is a framework for running applications on large clusters built of commodity hardware. The Hadoop framework transparently provides applications.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Zois Vasileios Α. Μ :4183 University of Patras Department of Computer Engineering & Informatics Diploma Thesis.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
CC P ROCESAMIENTO M ASIVO DE D ATOS O TOÑO 2014 Aidan Hogan Lecture VI: 2014/04/14.
HAMS Technologies 1
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Apache Hadoop MapReduce What is it ? Why use it ? How does it work Some examples Big users.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
HAMS Technologies 1
Whirlwind Tour of Hadoop Edward Capriolo Rev 2. Whirlwind tour of Hadoop Inspired by Google's GFS Clusters from systems Batch Processing High.
Large-scale file systems and Map-Reduce Single-node architecture Memory Disk CPU Google example: 20+ billion web pages x 20KB = 400+ Terabyte 1 computer.
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Cole Jaya Chakladar Group No: 1.
Youngil Kim Awalin Sopan Sonia Ng Zeng.  Introduction  Concept of the Project  System architecture  Implementation – HDFS  Implementation – System.
Virtualization and Databases Ashraf Aboulnaga University of Waterloo.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hadoop Joshua Nester, Garrison Vaughan, Calvin Sauerbier, Jonathan Pingilley, and Adam Albertson.
CSE 548 Advanced Computer Network Security Trust in MobiCloud using Hadoop Framework Updates Sayan Kole Jaya Chakladar Group No: 1.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
HDFS MapReduce Hadoop  Hadoop Distributed File System (HDFS)  An open-source implementation of GFS  has many similarities with distributed file.
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
Next Generation of Apache Hadoop MapReduce Owen
Part III BigData Analysis Tools (YARN) Yuan Xue
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
Image taken from: slideshare
Big Data is a Big Deal!.
Large-scale file systems and Map-Reduce
Hadoop MapReduce Framework
Software Engineering Introduction to Apache Hadoop Map Reduce
The Basics of Apache Hadoop
Cloud Distributed Computing Environment Hadoop
Presentation transcript:

Youngil Kim Awalin Sopan Sonia Ng Zeng

 Introduction  System architecture  Implementation – HDFS  Implementation – System Analysis ◦ System Information Logger (SIL) ◦ System Information Gatherer (SIG) ◦ Map/Reduce  Implementation – Visualization  Implementation – P2P Application  Demo

 How can we know system information from many nodes? ◦ It is hard to track which node has a problem when too many nodes exist  But… DFS & Map/Reduce make it easy! ◦ Analyze system information using Map/Reduce ◦ A kind of network managing system like HP

System Info Gatherer (Hadoop Master) System Info Gatherer (Hadoop Master) Hadoop Slave Node Slave HDFS System Manager (Visualization) System Manager (Visualization) p2p Local P2P app. p2p Local P2P app. p2p Local P2P app. p2p Local P2P app. Sys Info Logger System Control Network P2P Network System Information

 Hadoop for DFS & Map/Reduce Framework ◦ Master: brood00 ◦ Slaves: Currently tested with 5 nodes (bug51 ~ bug55) ◦ Using each local storage (not using home directory) ◦ Network Ports: hdfs(9000), job tracker(9001), Namenode Interface (50070), JobTracker Interface (50030)

 mr_syslog.py ◦ Implemented in Python ◦ Save information in both local storage and HDFS ◦ Gather information about every 10 secs ◦ Create logfile based on time  Information of each node is saved with the following format ◦ ◦ bug : mem(75.50), cpu(1.00), disk(10.00) ◦ bug : mem(75.50), cpu(1.50), disk(10.00) ◦ bug : mem(75.51), cpu(0.40), disk(10.00) ◦ bug : mem(75.51), cpu(0.50), disk(10.00) ◦ bug : mem(75.50), cpu(0.50), disk(10.00) ◦ bug : mem(75.50), cpu(0.40), disk(10.00)

 Functions ◦ Find current resource usage of each node at current time using Map/Reduce  Currently, it shows maximum values per minute time slot ◦ Communication Gateway between nodes and visualization tool  Send “QUERY” to each P2P application  Send node status to visualization tool (node ID, (in)active, CPU usage, memory usage, storage)

 Map: ◦ Input – each node log file  Key: position of file  Value: raw data, one line per key ◦ Output  Key: node ID  Value: set of system information (CPU/memory/storage usage)  Eg:

 Reduce: ◦ Input – from Map  Key: node ID  Value: set of set of system information  Eg: ◦ Output  Key: Node ID  Value: Maximum values for each piece of information  Eg:

 Not a real application to use ◦ Just to show how to control application or system on each node using visualization ◦ Only has STOP/RESUME operation  Functions ◦ Response to “QUERY”  Show active/inactive ◦ Response to “CONTROL”  Change status based on control argument

 System set-up and initialization (video file)  Show namenode & jobtracker interface  Show Map/Reduce jobs  Show Visualization tool ◦ Changes of each status ◦ Control each P2P application