SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu

Slides:



Advertisements
Similar presentations
SALSA HPC Group School of Informatics and Computing Indiana University.
Advertisements

Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Future Grid Introduction March MAGIC Meeting Gregor von Laszewski Community Grids Laboratory, Digital Science.
SALSASALSASALSASALSA Using MapReduce Technologies in Bioinformatics and Medical Informatics Computing for Systems and Computational Biology Workshop SC09.
SALSASALSASALSASALSA Chemistry in the Digital Age Workshop, Penn State University, June 11, 2009 Geoffrey Fox
SALSASALSASALSASALSA Using Cloud Technologies for Bioinformatics Applications MTAGS Workshop SC09 Portland Oregon November Judy Qiu
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
1 Clouds and Sensor Grids CTS2009 Conference May Alex Ho Anabas Inc. Geoffrey Fox Computer Science, Informatics, Physics Chair Informatics Department.
Student Visits August Geoffrey Fox
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
Dimension Reduction and Visualization of Large High-Dimensional Data via Interpolation Seung-Hee Bae, Jong Youl Choi, Judy Qiu, and Geoffrey Fox School.
SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.
SALSASALSASALSASALSA Performance Analysis of High Performance Parallel Applications on Virtualized Resources Jaliya Ekanayake and Geoffrey Fox Indiana.
SALSASALSASALSASALSA High Performance Biomedical Applications Using Cloud Technologies HPC and Grid Computing in the Cloud Workshop (OGF27 ) October 13,
Design Discussion Rain: Dynamically Provisioning Clouds within FutureGrid Geoffrey Fox, Andrew J. Younge, Gregor von Laszewski, Archit Kulshrestha, Fugang.
Panel Session The Challenges at the Interface of Life Sciences and Cyberinfrastructure and how should we tackle them? Chris Johnson, Geoffrey Fox, Shantenu.
3DAPAS/ECMLS panel Dynamic Distributed Data Intensive Analysis Environments for Life Sciences: June San Jose Geoffrey Fox, Shantenu Jha, Dan Katz,
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
FutureGrid Summary FutureGrid User Advisory Board TG’10 Pittsburgh August Geoffrey Fox
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
SALSASALSASALSASALSA Hybrid Cloud and Cluster Computing Paradigms for Scalable Data Intensive Applications April 15, 2011 University of Alabama Judy Qiu.
SALSASALSASALSASALSA AOGS, Singapore, August 11-14, 2009 Geoffrey Fox 1,2 and Marlon Pierce 1
Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.
Experimenting with FutureGrid CloudCom 2010 Conference Indianapolis December Geoffrey Fox
SALSASALSA Twister: A Runtime for Iterative MapReduce Jaliya Ekanayake Community Grids Laboratory, Digital Science Center Pervasive Technology Institute.
PolarGrid Geoffrey Fox (PI) Indiana University Associate Dean for Graduate Studies and Research, School of Informatics and Computing, Indiana University.
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
SALSASALSASALSASALSA Cloud Technologies and Their Applications March 26, 2010 Indiana University Bloomington Judy Qiu
FutureGrid Dynamic Provisioning Experiments including Hadoop Fugang Wang, Archit Kulshrestha, Gregory G. Pike, Gregor von Laszewski, Geoffrey C. Fox.
Future Grid FutureGrid Overview Geoffrey Fox SC09 November
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
SALSA HPC Group School of Informatics and Computing Indiana University.
Future Grid Future Grid All Hands Meeting Introduction Indianapolis October Geoffrey Fox
Cloud Technologies and Data Intensive Applications INGRID 2010 Workshop Poznan Poland May Geoffrey Fox
Cloud Technologies and Data Intensive Applications INGRID 2010 Workshop Poznan Poland May Geoffrey Fox
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
SALSASALSASALSASALSA Scalable Programming and Algorithms for Data Intensive Life Science Applications Data Intensive Seattle, WA Judy Qiu
FutureGrid TeraGrid Science Advisory Board San Diego CA July Geoffrey Fox
SALSA Group’s Collaborations with Microsoft SALSA Group Principal Investigator Geoffrey Fox Project Lead Judy Qiu Scott Beason,
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
SALSA HPC Group School of Informatics and Computing Indiana University.
SALSASALSASALSASALSA Cloud Panel Session CloudCom 2009 Beijing Jiaotong University Beijing December Geoffrey Fox
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
Computing Research Testbeds as a Service: Supporting large scale Experiments and Testing SC12 Birds of a Feather November.
Performance of MapReduce on Multicore Clusters
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics,
SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 
Future Grid Future Grid Overview. Future Grid Future GridFutureGridFutureGrid The goal of FutureGrid is to support the research that will invent the future.
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
SALSASALSASALSASALSA Data Intensive Biomedical Computing Systems Statewide IT Conference October 1, 2009, Indianapolis Judy Qiu
SALSASALSA Dynamic Virtual Cluster provisioning via XCAT on iDataPlex Supports both stateful and stateless OS images iDataplex Bare-metal Nodes Linux Bare-
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
SALSA HPC Group School of Informatics and Computing Indiana University Workshop on Petascale Data Analytics: Challenges, and.
Lizhe Wang, Gregor von Laszewski, Jai Dayal, Thomas R. Furlani
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
Our Objectives Explore the applicability of Microsoft technologies to real world scientific domains with a focus on data intensive applications Expect.
Digital Science Center I
Assignment 0 (5 points; Due Jan. 15, 2017)
FutureGrid: a Grid Testbed
Biology MDS and Clustering Results
Tutorial Overview February 2017
SC09 Doctoral Symposium, Portland, 11/18/2009
FutureGrid and Applications
PolarGrid and FutureGrid
CReSIS Cyberinfrastructure
Convergence of Big Data and Extreme Computing
Presentation transcript:

SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu School of Informatics and Computing and Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University

SALSASALSA PTI Activities in Digital Science Center Community Grids Laboratory led by Fox – Gregor von Lazewski: FutureGrid architect, GreenIT – Marlon Pierce: Grids, Services, Portals including Earthquake Science, Chemistry and Polar Science applications – Judy Qiu: Multicore and Data Intensive Computing (Cyberinfrastructure) including Biology and Cheminformatics applications Open Software Laboratory led by Andrew Lumsdaine – Software like MPI, Scientific Computing Environments – Parallel Graph Algorithms Complex Networks and Systems led by Alex Vespignani – Very successful H1N1 spread simulations run on Big Red – Can be extended to other epidemics and to “critical infrastructure” simulations such as transportation

SALSASALSA FutureGrid September 10, 2009 Press Release BLOOMINGTON, Ind. -- The future of scientific computing will be developed with the leadership of Indiana University and nine national and international partners as part of a $15 million project largely supported by a $10.1 million grant from the National Science Foundation (NSF). The award will be used to establish FutureGrid—one of only two experimental systems (other one is GPU enhanced cluster) in the NSF Track 2 program that funds the most powerful, next- generation scientific supercomputers in the nation.

SALSASALSA FutureGrid FutureGrid is part of TeraGrid – NSF’s national network of supercomputers – and is aimed at providing a distributed testbed of ~9 clusters for both application and computer scientists exploring – Clouds – Grids – Multicore and architecture diversity Testbed enabled by virtual machine technology including virtual network – Dedicated network connects allowing experiments to be isolated Modest number of cores (5000) but will be relatively large as a Science Cloud

SALSASALSA Add 768 core Windows Server at IU and Network Fault Generator

SALSASALSA Indiana University is already part of base TeraGrid through Big Red and services

SALSASALSA 7

SALSASALSA Biology MDS and Clustering Results Alu Families This visualizes results of Alu repeats from Chimpanzee and Human Genomes. Young families (green, yellow) are seen as tight clusters. This is projection of MDS dimension reduction to 3D of repeats – each with about 400 base pairs Metagenomics This visualizes results of dimension reduction to 3D of gene sequences from an environmental sample. The many different genes are classified by clustering algorithm and visualized by MDS dimension reduction

SALSASALSA High Performance Data Visualization Developed parallel MDS and GTM algorithm to visualize large and high-dimensional data Processed 0.1 million PubChem data having 166 dimensions Parallel interpolation can process up to 2M PubChem points MDS for 100k PubChem data 100k PubChem data having 166 dimensions are visualized in 3D space. Colors represent 2 clusters separated by their structural proximity. GTM for 930k genes and diseases Genes (green color) and diseases (others) are plotted in 3D space, aiming at finding cause-and-effect relationships. GTM with interpolation for 2M PubChem data 2M PubChem data is plotted in 3D with GTM interpolation approach. Red points are 100k sampled data and blue points are 4M interpolated points. [3] PubChem project,

SALSASALSA Applications using Dryad & DryadLINQ (2) Derive associations between HLA alleles and HIV codons and between codons themselves PhyloD [2] project from Microsoft Research Scalability of DryadLINQ PhyloD Application [5] Microsoft Computational Biology Web Tools, Output of PhyloD shows the associations

SALSASALSA Dynamic Virtual Clusters Switchable clusters on the same hardware (~5 minutes between different OS such as Linux+Xen to Windows+HPCS) Support for virtual clusters SW-G : Smith Waterman Gotoh Dissimilarity Computation as an pleasingly parallel problem suitable for MapReduce style applications Pub/Sub Broker Network Summarizer Switcher Monitoring Interface iDataplex Bare- metal Nodes XCAT Infrastructure Virtual/Physical Clusters Monitoring & Control Infrastructure iDataplex Bare-metal Nodes (32 nodes) iDataplex Bare-metal Nodes (32 nodes) XCAT Infrastructure Linux Bare- system Linux Bare- system Linux on Xen Windows Server 2008 Bare-system SW-G Using Hadoop SW-G Using DryadLINQ Monitoring Infrastructure Dynamic Cluster Architecture

SALSASALSA SALSA HPC Dynamic Virtual Clusters Demo At top, these 3 clusters are switching applications on fixed environment. Takes ~30 Seconds. At bottom, this cluster is switching between Environments – Linux; Linux +Xen; Windows + HPCS. Takes about ~7 minutes. It demonstrates the concept of Science on Clouds using a FutureGrid cluster.