FutureGrid Dynamic Provisioning Experiments including Hadoop Fugang Wang, Archit Kulshrestha, Gregory G. Pike, Gregor von Laszewski, Geoffrey C. Fox.

Slides:



Advertisements
Similar presentations
Overview of the FutureGrid Software
Advertisements

FutureGrid related presentations at TG and OGF Sun. 17th: Introduction to FutireGrid (OGF) Mon. 18th: Introducing to FutureGrid (TG) Tue. 19th –Educational.
FutureGrid UAB Meeting XSEDE13 San Diego July
2010 FutureGrid User Advisory Meeting Architecture Roadmap Long term vision 10:00-10:45, Monday, August 2, 2010 Pittsburgh, PA Gregor von Laszewski Representing.
SALSA HPC Group School of Informatics and Computing Indiana University.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Future Grid Introduction March MAGIC Meeting Gregor von Laszewski Community Grids Laboratory, Digital Science.
Advanced Computing and Information Systems laboratory Educational Virtual Clusters for On- demand MPI/Hadoop/Condor in FutureGrid Renato Figueiredo Panoat.
Implementing a menu based application in FutureGrid
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
FutureGrid Image Repository: A Generic Catalog and Storage System for Heterogeneous Virtual Machine Images Javier Diaz, Gregor von Laszewski, Fugang Wang,
Student Visits August Geoffrey Fox
Jefferson Ridgeway 2, Ifeanyi Rowland Onyenweaku 3, Gregor von Laszewski 1*, Fugang Wang 1 1* Indiana University, Bloomington, IN 47408, U.S.A.,
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.
Reproducible Environment for Scientific Applications (Lab session) Tak-Lon (Stephen) Wu.
SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Design Discussion Rain: Dynamically Provisioning Clouds within FutureGrid Geoffrey Fox, Andrew J. Younge, Gregor von Laszewski, Archit Kulshrestha, Fugang.
System Center 2012 Setup The components of system center App Controller Data Protection Manager Operations Manager Orchestrator Service.
Panel Session The Challenges at the Interface of Life Sciences and Cyberinfrastructure and how should we tackle them? Chris Johnson, Geoffrey Fox, Shantenu.
SC2010 Gregor von Laszewski (*) (*) Assistant Director of Cloud Computing, CGL, Pervasive Technology Institute.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
Eucalyptus on FutureGrid: A case for Eucalyptus 3 Sharif Islam, Javier Diaz, Geoffrey Fox Gregor von Laszewski Indiana University.
Cyberaide Virtual Appliance: On-demand Deploying Middleware for Cyberinfrastructure Tobias Kurze, Lizhe Wang, Gregor von Laszewski, Jie Tao, Marcel Kunze,
FutureGrid: an experimental, high-performance grid testbed Craig Stewart Executive Director, Pervasive Technology Institute Indiana University
Building service testbeds on FIRE D5.2.5 Virtual Cluster on Federated Cloud Demonstration Kit August 2012 Version 1.0 Copyright © 2012 CESGA. All rights.
Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.
A Web 2.0 Portal for Teragrid Fugang Wang Gregor von Laszewski May 2009.
High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman.
Software Architecture
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
Gregor von Laszewski*, Geoffrey C. Fox, Fugang Wang, Andrew Younge, Archit Kulshrestha, Greg Pike (IU), Warren Smith, (TACC) Jens Vöckler (ISI), Renato.
FutureGrid: an experimental, high-performance grid testbed Craig Stewart Executive Director, Pervasive Technology Institute Indiana University
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
FutureGrid: an experimental, high-performance grid testbed Craig Stewart Executive Director, Pervasive Technology Institute Indiana University
Large Scale Sky Computing Applications with Nimbus Pierre Riteau Université de Rennes 1, IRISA INRIA Rennes – Bretagne Atlantique Rennes, France
Future Grid FutureGrid Overview Geoffrey Fox SC09 November
Image Management and Rain on FutureGrid: A practical Example Presented by Javier Diaz, Fugang Wang, Gregor von Laszewski.
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
Image Generation and Management on FutureGrid CTS Conference 2011 Philadelphia May Geoffrey Fox
Image Management and Rain on FutureGrid Javier Diaz - Fugang Wang – Gregor von.
SALSA HPC Group School of Informatics and Computing Indiana University.
WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.
FutureGrid Cyberinfrastructure for Computational Research.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
RAIN: A system to Dynamically Generate & Provision Images on Bare Metal by Application Users Presented by Gregor von Laszewski Authors: Javier Diaz, Gregor.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
Hosting Cloud, HPC and Grid Educational Activities on FutureGrid Renato Figueiredo – U. of Florida Geoffrey Fox, Barbara Ann O’Leary – Indiana University.
Design Discussion Rain: Dynamically Provisioning Clouds within FutureGrid PI: Geoffrey Fox*, CoPIs: Kate Keahey +, Warren Smith -, Jose Fortes #, Andrew.
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
Computing Research Testbeds as a Service: Supporting large scale Experiments and Testing SC12 Birds of a Feather November.
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu
SALSASALSA Dynamic Virtual Cluster provisioning via XCAT on iDataPlex Supports both stateful and stateless OS images iDataplex Bare-metal Nodes Linux Bare-
Grappling Cloud Infrastructure Services with a Generic Image Repository Javier Diaz Andrew J. Younge, Gregor von Laszewski, Fugang.
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
Private Public FG Network NID: Network Impairment Device
Digital Science Center II
FutureGrid: a Grid Testbed
Versatile HPC: Comet Virtual Clusters for the Long Tail of Science SC17 Denver Colorado Comet Virtualization Team: Trevor Cooper, Dmitry Mishin, Christopher.
Clouds from FutureGrid’s Perspective
Presentation transcript:

FutureGrid Dynamic Provisioning Experiments including Hadoop Fugang Wang, Archit Kulshrestha, Gregory G. Pike, Gregor von Laszewski, Geoffrey C. Fox

Hadoop at FG FG will provide a hadoop environment to users Currently in development and test phase – Environment installed and configured on NFS mounted space. – Users can request a virtual hadoop cluster with a specified number of nodes and cores, to execute a hadoop application. – FG provides tools to dynamically configure the virtual cluster. – FG software will generate a job for the hadoop application and submit it through the Torque queuing system. – Activities are logged and the output is dependent on the hadoop app itself. – Currently relying on Torque for job status monitoring. – CLI: fg-hadoop -[n|nodes] nodesNumber -[c|coresPerNode] coresPerNode -[i|jobname] jobName -[e|cmd] hadoopAppCmd(quoted) -[v|verbose]

Hadoop at FG – cont’d SWG hadoop application is used to test the current setup – See next slide for the app introduction. (Thanks Judy Qiu and the SALSA group for providing this) – A sample run: fg-hadoop -v -n 4 -c 4 -i swg300_4Nodes4Cores -cmd "~/swg-hadoop.jar ~/AluY_300.txt swgResult1 ~/swgTiming300_4Nodes4Cores.txt“ – Sample result: # #seq#blockSTtimeinputdataDistTimeoutput /N/u/fuwang/AluY_300.txt974 swgResult1 Future improvement plans for this activity – The hadoop environment is preinstalled and configured on FG resources; or user could customize an image that has the environment included. – A persistent Hadoop filesystem instead of dynamically setting up and tearing down one. – With the proposed FG Experiment Management, it will be more convenient for users to monitor the job execution and retrieve the result. – The CLI could be augmented too when Experiment Management is ready. Users will also be able to access this functionality through FG Web portal.

DNA/Protein Sequence Alignment Using Hadoop * Smith Waterman - Gotoh pairwise – Performs local alignment of either DNA or Protein sequences.. User Program Split Data FASTA Partition the input FASTA file Map ( ) SWG Map ( ) SWG Map () SWG Reduce ( ) Pairwise align sequences in each input file Combine partial matrices to form a full matrix Partial distance score matrix * Slide Courtesy of Stephen TAK-LON WU and the SALSA group at IU

Dynamic Provisioning on Future Grid

Dynamic Provisioning at FG FutureGrid will allow for dynamic provisioning at multiple levels. – Core software and services will be dynamically provisioned on bare hardware – Services such as Eucalyptus and Nimbus will allow provisioning of VMs on nodes deployed as Eucalyptus or Nimbus nodes. Will be used to supporting HPC activities and also Cloud activities. – Greater power and control – Build your own cluster with custom kernels, Network drivers, new paradigms of computing

Dynamic Provisioning Experiment Logical View

Dynamic Provisioning Performance Experiments show very good scalability – The experiment run this: msub –l os=statelessrhel5 testjob.sh – Time taken to provision a node is an average of 3 minutes and 45 seconds in the experiment. – As the number of provisioned nodes requested grows from 2 to 32, fluctuation in time taken to provision nodes is less than 10%. – When provisioning 32 nodes, the time to provision the nodes is quite uniform with a standard deviation of 14 seconds.

Dynamic Provisioning Results Time elapsed between requesting a job and the jobs reported start time on the provisioned node. The numbers here are an average of 2 sets of experiments.

Provisioning times for nodes in a 32 node request The nodes took an average of 3 minutes and 45 seconds to switch from the stateful to stateless image with a standard deviation of 14 seconds.

Phase III Process View

Credits NSF – This work was supported in part by the National Science Foundation (NSF) under Grant No to Indiana University for "FutureGrid: An Experimental, High-Performance Grid Test-bed." IU Research Technologies Team IU Salsa Team