SALSASALSASALSASALSA AOGS, Singapore, August 11-14, 2009 Geoffrey Fox 1,2 and Marlon Pierce 1

Slides:



Advertisements
Similar presentations
University of Notre Dame
Advertisements

SALSA HPC Group School of Informatics and Computing Indiana University.
IU ORE-Chem Update Marlon Pierce, Geoffrey Fox Indiana University.
Supporting Cloud Computing with the Virtual Block Store System Xiaoming Gao, Mike Lowe,
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Emerging Platform#6: Cloud Computing B. Ramamurthy 6/20/20141 cse651, B. Ramamurthy.
Cloud activities at Indiana University: Case studies in service hosting, storage, and computing Marlon Pierce, Joe Rinkovsky, Geoffrey Fox, Jaliya Ekanayake,
1 Overview of Cyberinfrastructure and the Breadth of Its Application Geoffrey Fox Computer Science, Informatics, Physics Chair Informatics Department Director.
SALSASALSASALSASALSA Using MapReduce Technologies in Bioinformatics and Medical Informatics Computing for Systems and Computational Biology Workshop SC09.
SALSASALSASALSASALSA Chemistry in the Digital Age Workshop, Penn State University, June 11, 2009 Geoffrey Fox
SALSASALSASALSASALSA Using Cloud Technologies for Bioinformatics Applications MTAGS Workshop SC09 Portland Oregon November Judy Qiu
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Indiana University QuakeSim Activities Marlon Pierce, Geoffrey Fox, Xiaoming Gao, Jun Ji, Chao Sun.
Clouds will win! Geoffrey Fox Director,
1 Clouds and Sensor Grids CTS2009 Conference May Alex Ho Anabas Inc. Geoffrey Fox Computer Science, Informatics, Physics Chair Informatics Department.
Student Visits August Geoffrey Fox
Clouds Cyberinfrastructure and Collaboration CTS2010 Chicago IL May Geoffrey Fox
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Data-intensive Computing on the Cloud: Concepts, Technologies and Applications B. Ramamurthy This talks is partially supported by National.
Cloud Computing Systems Lin Gu Hong Kong University of Science and Technology Sept. 21, 2011 Windows Azure—Overview.
SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.
SALSASALSASALSASALSA Performance Analysis of High Performance Parallel Applications on Virtualized Resources Jaliya Ekanayake and Geoffrey Fox Indiana.
Virtual Clusters Supporting MapReduce in the Cloud Jonathan Klinginsmith School of Informatics and Computing.
Panel Session The Challenges at the Interface of Life Sciences and Cyberinfrastructure and how should we tackle them? Chris Johnson, Geoffrey Fox, Shantenu.
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
SCSI: Platforms & Foundations: Cyberinfrastructure Socially Coupled Systems & Informatics: Science, Computing & Decision Making in a Complex Interdependent.
X-Informatics Cloud Technology (Continued) March Geoffrey Fox Associate.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 7 2/23/2015.
Science of Cloud Computing Panel Cloud2011 Washington DC July Geoffrey Fox
SALSASALSA Twister: A Runtime for Iterative MapReduce Jaliya Ekanayake Community Grids Laboratory, Digital Science Center Pervasive Technology Institute.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
On the Varieties of Clouds for Data Intensive Computing 董耀文 Antslab Robert L. Grossman University of Illinois at Chicago And Open Data.
Introduction to Cloud Computing
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Biomedical Cloud Computing iDASH Symposium San Diego CA May Geoffrey Fox
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
SALSASALSASALSASALSA CloudComp 09 Munich, Germany Jaliya Ekanayake, Geoffrey Fox School of Informatics and Computing Pervasive.
SALSA HPC Group School of Informatics and Computing Indiana University.
Building Effective CyberGIS: FutureGrid Marlon Pierce, Geoffrey Fox Indiana University.
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
SALSASALSASALSASALSA Cloud Panel Session CloudCom 2009 Beijing Jiaotong University Beijing December Geoffrey Fox
1 Multicore for Science Multicore Panel at eScience 2008 December Geoffrey Fox Community Grids Laboratory, School of informatics Indiana University.
CLOUD COMPUTING. What is cloud computing ? History Virtualization Cloud Computing hardware Cloud Computing services Cloud Architecture Advantages & Disadvantages.
Security: systems, clouds, models, and privacy challenges iDASH Symposium San Diego CA October Geoffrey.
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Memcached Integration with Twister Saliya Ekanayake - Jerome Mitchell - Yiming Sun -
SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu
1 Cloud Systems Panel at HPDC Boston June Geoffrey Fox Community Grids Laboratory, School of informatics Indiana University
Cloud Architecture. SPI Model Cloud Computing Classification Model – SPI Cloud Computing Classification Model – SPI - SaaS: (Software as a Service) -
SALSASALSA Dynamic Virtual Cluster provisioning via XCAT on iDataPlex Supports both stateful and stateless OS images iDataplex Bare-metal Nodes Linux Bare-
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Virtualization Assessment. Strategy for web hosting Reduce costs by consolidating services onto the fewest number of physical machines
INTRODUCTION TO AMAZON WEB SERVICES (EC2). AMAZON WEB SERVICES  Services  Storage (Glacier, S3)  Compute (Elastic Compute Cloud, EC2)  Databases (Redshift,
Biology MDS and Clustering Results
SC09 Doctoral Symposium, Portland, 11/18/2009
Outline Virtualization Cloud Computing Microsoft Azure Platform
Hadoop Technopoints.
Introduction to Apache
PolarGrid and FutureGrid
Assoc. Prof. Marc FRÎNCU, PhD. Habil.
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Big Data, Simulations and HPC Convergence
Presentation transcript:

SALSASALSASALSASALSA AOGS, Singapore, August 11-14, 2009 Geoffrey Fox 1,2 and Marlon Pierce Community Grids Laboratory, Pervasive Technology Institute 2 School of Informatics Indiana University

Clouds as Cost Effective Data Centers 2 Exploit the Internet by allowing one to build giant data centers with 100,000’s of computers; ~ to a shipping container “Microsoft will cram between 150 and 220 shipping containers filled with data center gear into a new 500,000 square foot Chicago facility. This move marks the most significant, public use of the shipping container systems popularized by the likes of Sun Microsystems and Rackable Systems to date.”

SALSASALSA Cloud Computing: Infrastructure and Runtimes Cloud infrastructure: outsourcing of servers, computing, data, file space, etc. – Handled through Web services that control virtual machine (Xen, VMWare, OpenVZ,…) lifecycles. – Compare to Grid interfaces such as Globus, Unicore, etc. Cloud runtimes:: tools for using clouds to do data-parallel computations. – Apache Hadoop, Google MapReduce, Microsoft Dryad, and others – Designed for information retrieval but are excellent for a wide range of machine learning and data-centric science applications. – Example: Apache Mahout for machine learning.

SALSASALSA Commercial Cloud Software Cloud/ Service AmazonMicrosoft Azure Google (and Apache) DataS3, EBS, SimpleDB Blob, Table, SQL Services GFS, BigTable ComputingEC2, Elastic Map Reduce (runs Hadoop) Compute Service MapReduce (not public, but Hadoop) Service Hosting EC2 with load balancing. Web Hosting Service AppEngine/Ap pDrop Boldfaced names have open source versions

SALSASALSA Open Architecture Clouds Amazon, Google, Microsoft, et al., don’t tell you how to build a cloud. – Proprietary knowledge Indiana University and others want to document this publically. – What is the right way to build and run a cloud? – It is more than just running software. What is the minimum-sized organization to run a cloud? – Department? University? University Consortium? Outsource it all? – Analogous issues in government, industry, and enterprise.

SALSASALSA IU’s Cloud Testbed Host Hardware: – IBM iDataplex = 84 nodes – 32 nodes for Eucalyptus – 32 nodes for Nimbus – 20 nodes for test and/or reserve capacity – 2 dedicated head nodes Nodes specs: – 2 x Intel L5420 Xeon 2.50 (4 cores/cpu) – 32 gigabytes memory – 160 gigabytes local hard drive Gigabit network – No support in Xen for Infiniband or Myrinet (10 Gbps) Part of IU’s Research Computing Infrastructure Hopefully will grow soon. – Tempest is a similar machine that supports both Linux and Windows Server 2008

SALSASALSASALSASALSA Cloud Runtimes What science can you do on a cloud?

SALSASALSA Data-File Parallelism and Clouds Now that you have a cloud, you may want to do large scale processing with it. Classic problems are to perform the same (sequential) algorithm on fragments of extremely large data sets. Cloud runtime engines manage these replicated algorithms in the cloud. – Can be chained together in pipelines (Hadoop) or DAGs (Dryad). – Runtimes manage problems like failure control. We are exploring both scientific applications and classic parallel algorithms (clustering, matrix multiplication) using Clouds and cloud runtimes.

SALSASALSA 9 Dryad supports general dataflow reduce(key, list ) map(key, value) MapReduce implemented by Hadoop Example: Word Histogram Start with a set of words Each map task counts number of occurrences in each data partition Reduce phase adds these counts

SALSASALSA Geospatial Examples Image processing and mining – Ex: SAR Images from Polar Grid project (J. Wang) – Apply to 20 TB of data Flood modeling – Chaining flood models over a geographic area. – Parameter fits and inversion problems. – Earthquake modeling equivalents GPS processing: real time and archival. – Robert Granat, JPL Filter

SALSASALSA Alternative Elastic Block Store Components Volume Server Volume Delegate Virtual Machine Manager (Xen Dom 0) Xen Delegate Xen Dom U VBS Web Service VBS Client VBD ISCSI Create Volume, Export Volume, Create Snapshot, etc. Import Volume, Attach Device, Detach Device, etc. There’s more than one way to build Elastic Block Store. We need to find the best way to do this.

SALSASALSA More Information See publications at Examples – Geoffrey Fox, Seung-Hee Bae, Jaliya Ekanayake, Xiaohong Qiu, and Huapeng Yuan Parallel Data Mining from Multicore to Cloudy GridsParallel Data Mining from Multicore to Cloudy Grids – Jaliya Ekanayake, Geoffrey Fox High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies – Sangmi Lee Pallickara, Marlon Pierce, Qunfeng Dong, and ChinHua Kong,Enabling Large Scale Scientific Computations for Expressed Sequence Tag Sequencing over Grid and Cloud Computing ClustersEnabling Large Scale Scientific Computations for Expressed Sequence Tag Sequencing over Grid and Cloud Computing Clusters See also and