SALSA HPC Group School of Informatics and Computing Indiana University.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

Twister4Azure Iterative MapReduce for Windows Azure Cloud Thilina Gunarathne Indiana University Iterative MapReduce for Azure Cloud.
SCALABLE PARALLEL COMPUTING ON CLOUDS : EFFICIENT AND SCALABLE ARCHITECTURES TO PERFORM PLEASINGLY PARALLEL, MAPREDUCE AND ITERATIVE DATA INTENSIVE COMPUTATIONS.
Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US.
SALSASALSASALSASALSA Chemistry in the Digital Age Workshop, Penn State University, June 11, 2009 Geoffrey Fox
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Two Broad Categories of Software
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Panel Session The Challenges at the Interface of Life Sciences and Cyberinfrastructure and how should we tackle them? Chris Johnson, Geoffrey Fox, Shantenu.
Big Data Challenge Mega 10^6 Giga 10^9 Tera 10^12 Peta 10^15 Pig Latin.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Virtualization Concept. Virtualization  Real: it exists, you can see it.  Transparent: it exists, you cannot see it  Virtual: it does not exist, you.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
School of Informatics and Computing Indiana University
Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.
SALSASALSA Twister: A Runtime for Iterative MapReduce Jaliya Ekanayake Community Grids Laboratory, Digital Science Center Pervasive Technology Institute.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
DISTRIBUTED COMPUTING
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure Thilina Gunarathne Bingjing Zhang, Tak-Lon.
Guide to Linux Installation and Administration, 2e1 Chapter 2 Planning Your System.
FutureGrid Dynamic Provisioning Experiments including Hadoop Fugang Wang, Archit Kulshrestha, Gregory G. Pike, Gregor von Laszewski, Geoffrey C. Fox.
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
SALSA HPC Group School of Informatics and Computing Indiana University.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
SALSA HPC Group School of Informatics and Computing Indiana University.
Virtual Private Grid (VPG) : A Command Shell for Utilizing Remote Machines Efficiently Kenji Kaneda, Kenjiro Taura, Akinori Yonezawa Department of Computer.
Virtual Appliances CTS Conference 2011 Philadelphia May Geoffrey Fox
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
Security: systems, clouds, models, and privacy challenges iDASH Symposium San Diego CA October Geoffrey.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
Memcached Integration with Twister Saliya Ekanayake - Jerome Mitchell - Yiming Sun -
SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu
Microsoft Cloud Solution.  What is the cloud?  Windows Azure  What services does it offer?  How does it all work?  How to go about using it  Further.
ETRI Site Introduction Han Namgoong,
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
SALSASALSA Dynamic Virtual Cluster provisioning via XCAT on iDataPlex Supports both stateful and stateless OS images iDataplex Bare-metal Nodes Linux Bare-
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
SALSA HPC Group School of Informatics and Computing Indiana University Workshop on Petascale Data Analytics: Challenges, and.
Implementation of Classifier Tool in Twister Magesh khanna Vadivelu Shivaraman Janakiraman.
ETRI Site Introduction
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Chapter 1: Introduction
Introduction to Distributed Platforms
3.2 Virtualisation.
MapReduce and Data Intensive Applications XSEDE’12 BOF Session
Our Objectives Explore the applicability of Microsoft technologies to real world scientific domains with a focus on data intensive applications Expect.
Applying Twister to Scientific Applications
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Scientific Data Analytics on Cloud and HPC Platforms
Twister4Azure : Iterative MapReduce for Azure Cloud
Group 15 Swathi Gurram Prajakta Purohit
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Presentation transcript:

SALSA HPC Group School of Informatics and Computing Indiana University

Gene Sequences (N = 1 Million) Distance Matrix Interpolative MDS with Pairwise Distance Calculation Multi- Dimensional Scaling (MDS) Visualization 3D Plot Reference Sequence Set (M = 100K) N - M Sequence Set (900K) Select Referenc e Reference Coordinates x, y, z N - M Coordinates x, y, z Pairwise Alignment & Distance Calculation O(N 2 )

Performance with/without data caching Speedup gained using data cache Scaling speedup Increasing number of iterations

BLAST Sequence Search Cap3 Sequence Assembly Smith Watermann Sequence Alignment

Configuration Program to setup Twister environment automatically on a cluster Full mesh network of brokers for facilitating communication New messaging interface for reducing the message serialization overhead Memory Cache to share data between tasks and jobs

This demo is for real time visualization of the process of multidimensional scaling(MDS) calculation. We use Twister to do parallel calculation inside the cluster, and use PlotViz to show the intermediate results at the user client computer. The process of computation and monitoring is automated by the program.

MDS projection of 100,000 protein sequences showing a few experimentally identified clusters in preliminary work with Seattle Children’s Research Institute

Master Node Twister Driver Twister-MDS ActiveMQ Broker ActiveMQ Broker MDS Monitor PlotViz I. Send message to start the job II. Send intermediate results Local Disk III. Write data IV. Read data Client Node

MDS Output Monitoring Interface Pub/Sub Broker Network Worker Node Worker Pool Twister Daemon Master Node Twister Driver Twister-MDS Worker Node Worker Pool Twister Daemon map reduc e map reduc e calculateStres s calculateBC

Twister Driver Node Twister Daemon Node ActiveMQ Broker Node Broker-Daemon Connection Broker-Broker Connection Broker-Driver Connection 7 Brokers and 32 Computing Nodes in total

copy instantiate … Virtual Machines Group VPN Virtual IP - DHCP Virtual IP - DHCP GroupVPN Credentials (from Web site)

Linux HPC Bare-system Linux HPC Bare-system Amazon Cloud Windows Server HPC Bare-system Windows Server HPC Bare-system Virtualization CPU Nodes Virtualization Infrastructure Hardware Azure Cloud Grid Appliance GPU Nodes Cross Platform Iterative MapReduce (Collectives, Fault Tolerance, Scheduling) Kernels, Genomics, Proteomics, Information Retrieval, Polar Science Scientific Simulation Data Analysis and Management Dissimilarity Computation, Clustering, Multidimentional Scaling, Generative Topological Mapping Kernels, Genomics, Proteomics, Information Retrieval, Polar Science Scientific Simulation Data Analysis and Management Dissimilarity Computation, Clustering, Multidimentional Scaling, Generative Topological Mapping Applications Programming Model Services and Workflow High Level Language Distributed File Systems Data Parallel File System Runtime Storage Object Store Security, Provenance, Portal

Development of library of Collectives to use at Reduce phase Broadcast and Gather needed by current applications Discover other important ones Implement efficiently on each platform – especially Azure Better software message routing with broker networks using asynchronous I/O with communication fault tolerance Support nearby location of data and computing using data parallel file systems Clearer application fault tolerance model based on implicit synchronizations points at iteration end points Later: Investigate GPU support Later: run time for data parallel languages like Sawzall, Pig Latin, LINQ

SALSA HPC Group Indiana University