1 Is there an ? Is there an app for that ? Challenges in scalable analysis for Life sciences 1 Nirav Merchant UA BioComputing + iPlant Arizona Research.

Slides:



Advertisements
Similar presentations
Chapter 1 The Study of Body Function Image PowerPoint
Advertisements

1 Mixing Public and private clouds a Practical Perspective Maarten Koopmans Nordunet Conference 2009 Maarten Koopmans Nordunet Conference 2009.
Evaluating Caching and Storage Options on the Amazon Web Services Cloud Gagan Agrawal, Ohio State University - Columbus, OH David Chiu, Washington State.
Cloud Resource Broker for Scientific Community By: Shahzad Nizamani Supervisor: Peter Dew Co Supervisor: Karim Djemame Mo Haji.
Auto-scaling Axis2 Web Services on Amazon EC2 By Afkham Azeez.
1. 2 Objectives Become familiar with the purpose and features of Epsilen Learn to navigate the Epsilen environment Develop a professional ePortfolio on.
1 The phone in the cloud Utilizing resources hosted anywhere Claes Nilsson.
The Platform as a Service Model for Networking Eric Keller, Jennifer Rexford Princeton University INM/WREN 2010.
Chapter 11: The X Window System Guide To UNIX Using Linux Third Edition.
© 2012 National Heart Foundation of Australia. Slide 2.
Macromedia Dreamweaver MX 2004 – Design Professional Dreamweaver GETTING STARTED WITH.
Analyzing Genes and Genomes
Essential Cell Biology
Introduction Peter Dolog dolog [at] cs [dot] aau [dot] dk Intelligent Web and Information Systems September 9, 2010.
Enabling Phenotypic Image Analysis Using Shared Cyberinfrastructure
The iPlant Collaborative Community Cyberinfrastructure for Life Science Nirav Merchant iPlant / University of Arizona
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Arthropod Genomics Research in ARS Workshop Jason Williams Cold.
Customized cloud platform for computing on your terms !
The iPlant Collaborative Community Cyberinfrastructure for Life Science Roger Barthelson/Uwe Hilgert iPlant / University of Arizona.
Introduction to iPlant Dan Stanzione The iPlant Collaborative September 16th, 2013.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory Botany 2013, New Orleans, LA.
BISQUE: Enabling Cloud and Grid Powered Image Analysis Ramona Walls iPlant Collaborative
IPlant's Taxonomic Name Resolution Service Naim Matasci BIO5 / The iPlant Collaborative tnrs.iplantc.org.
Enabling Cloud and Grid Powered Image Phenotyping Nirav Merchant iPlant Collaborative
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
The iPlant Collaborative IBP Annual Meeting – June 1 st 2011 Steve.
1 iPlant: Cyberinfrastructure for Plant Sciences (and Beyond) Your Name Here 1.
IPlant Collaborative Bringing Together High Performance Computing and Biology.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Customized cloud platform for computing on your terms ! Nirav Merchant
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant
The iPlant Collaborative Presented by Sheldon McKay Cold Spring Harbor Laboratory.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Enabling Cloud and Grid Powered Image Phenotyping Martha Narro iPlant Collaborative Adapted.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
The Cooperative Computing Lab  We collaborate with people who have large scale computing problems in science, engineering, and other fields.  We operate.
My-Plant.org A Phylogenetically Structured Social Network Matthew R Hanlon November 13, 2010.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Using iPlant for sharing, managing, and analyzing ecological data Ramona Walls Presented at ESA 2014 – Ignite session August 12,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams iPlant / Cold Spring Harbor Laboratory Texas A&M Tools and Services.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop University of Hawaii at Manoa; December 10-11, 2012.
The iPlant Collaborative Pollen RCN March 2 nd, 2013 The iPlant Collaborative Pollen RCN March 2 nd, 2013 Steve Goff BIO5 Institute.
Overview of Atmosphere
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant.
IPlant Collaborative Bringing Together High Performance Computing and Biology.
Enabling Cloud and Grid Powered Image Phenotyping
The iPlant Collaborative Community Cyberinfrastructure for Life Science Jason Williams Cold Spring Harbor Laboratory, iPlant.
The iPlant Collaborative iPToL Data Assembly Workshop November 21 st, 2009 Steve Goff, Sonya Lowry, Martha Narro, Dan Stanzione University of Arizona,
Bringing your favorite analysis applications to iPlant using Docker containers Nirav Merchant
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Joslynn.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Atmosphere.
Transforming Science Through Data-driven Discovery Bringing your Bioinformatics tools to CyVerse’s Discovery Environment using Docker Upendra Kumar Devisetty.
Tools and Services Workshop
Customized cloud platform for computing on your terms !
Joslynn Lee – Data Science Educator
Tools and Services Workshop Overview of Atmosphere
Cyberinfrastructure for the Life Sciences
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Presentation transcript:

1 Is there an ? Is there an app for that ? Challenges in scalable analysis for Life sciences 1 Nirav Merchant UA BioComputing + iPlant Arizona Research Laboratories University of Arizona

Topic Coverage  Formula for success (and failure)  Flavors of Bio-information  What is iPlant ?  Typical Non-NGS workflow  Data life cycle issues (some)  Application life cycle issues (some)  Why “app” ? 2

3 + = Simple Formula

The Reality 4 ++ PERL Python Java Ruby Fortran C C# C++ R Matlab etc. PERL Python Java Ruby Fortran C C# C++ R Matlab etc. Amazon Azure Rackspace Campus HPC XSEDE Etc. Amazon Azure Rackspace Campus HPC XSEDE Etc. and lots of glue…..

+ = Simple Formula

Life science: Going across scales 6

Putting it all to work Wayne Stayskal, The Tampa Tribune

The iPlant Collaborative Cyberinfrastructure for the Plant Sciences The iPlant CI is designed as infrastructure. This means it is a platform upon which other projects can build. Use of the iPlant infrastructure can take one of several forms: Storage Computation Hosting Web Services Scalability

 For a challenge as broad as “plant science,” focus on specific applications/tools is a moving target, and never enough.  Most important to build a platform that can support diverse and constantly evolving needs. “Cyberinfrastructure” is, in fact, infrastructure. The platform can lift all the apps, not select winners and losers. “ The useful lifetime of our analysis toolchains is now 6 months” -Matthew Trunnel, Broad Institute The iPlant Collaborative Cyberinfrastructure for the Plant Sciences

End Users Computational Users Teragrid XSEDE The iPlant Collaborative Cyberinfrastructure for the Plant Sciences

BioInformation :: Data Flavors  Sequences  Structures  Images  Video  Audio  Pathways (graphs)  Text (Publications)  Traces  Combination (eg Video & Traces)  And much more …

Life scientist :: Data Wrestler  Volume of data is increasing  Resolution of data is increasing  Number of data repositories is increasing  Ever increasing analysis options  Demands to share, collaborate data (team science)  Do you know where your data is ? (and your collaborators data !)

13 System s Biology Genomi cs Function al Genomic s Metabolomi cs Proteomi cs Pharmaco- genomics Modelin g Clinical Pathway s

X prize for sequencing guidelines are different, this is graphics dated

X prize for analyzing it ? ? 15

The Lifecycle 16 The Fourth Paradigm: Data-Intensive Scientific Discovery

17

18

Why is this hard when we have …  Pegasus  Taverna  Kepler  Condor (DAGman)  Gearman  Makeflow  myExperiment  Science pipes  We have X (take your pick) 19

What did the scientists do ? 20 Used the “parametric launcher” Essentially its a very functional “submit” script ! Why use it ? Dir of full of files and one executable Simple linear flow (no branching) Needed results “yesterday” for conference/working group Need to be run ONCE every year Not sexy but functional Serial runs are important

Python in HPC : OMG 21

Data issues 22

DLM: Issues  Most “pipelines/analysis” are Data intensive Sadly data originates from slow desktops, external hard drives, file servers using ftp, http etc (and ends up there)  Hard to stage data to begin computation ! No place to bring things together (quickly)  Data needs substantial pre and post processing Meta data is usually not adequate  RDBMS are part of workflows Do you need better indexing of flat files ?  It does not have to be this way ! 23

24

Data Lifecycle: Our effort 25

What can users do ? 26

27

But I don’t get throughput 28 Networking is huge BLACK BOX and too much finger pointing

Compute Issues: Cloud 29

What is cloud computing ?

The application lifecycle 31

 A rich web client  Provides a consistent interface to a range of bioinformatics tools  Provides a portal to users not wishing to interact with lower level infrastructure  An integrated, extensible system of applications and services  Provides additional intelligence above low level APIs – Provenance, Collaboration, etc. 32 The iPlant Collaborative iPlant Discovery Environment

 API-compatible implementation of Amazon EC2/S3 interfaces  Virtualize the execution environment for applications and services  Get Up to 12 core / 48 GB instances  Access to Cloud Storage + EBS  1008 users  167 users launched 657 instances (May 2012)  227 were terminated outside the of Atmosphere due to idleness (per user's request)  430 instances average time was 1 day, 16 hours, and 13 minutes. Longest running was 30 days  Run servers, CloudBurst desktop use cases. Big data and the desktop are co-local again! >60 hosted applications in Atmosphere today, including users from USDA, Forest Service, data providers, etc. 30+ private images for postdocs and grad students for training classes The iPlant Collaborative Project Atmosphere™: Custom Cloud Computing

Atmosphere: Collaboration iPlant Data Store

Lifecycle

How to Connect

Different Ways to Log in to VMs

Steps to get started !

My wish list for CCL (parrot)  Improved performance for iRODS transfers (parallel transfers ?)  File permission calls (iRODS ACL)*  Ability to provide throughput/transfer stats  Thanks for updating iRODS support to

My wish list for CCL (makeflow)  *Bundle dependencies along with script and binaries e.g. CDE: Automatically create portable Linux applications  Progress reporting, profiling of performance e.g equivalent progress bar 40 *Not a makeflow issue but a good feature

Staff: Greg Abram Sonali Aditya Roger Barthelson Brad Boyle Todd Bryan Gordon Burleigh John Cazes Mike Conway Karen Cranston Rion Doodey Andy Edmonds Dmitry Fedorov Michael Gatto Utkarsh Gaur Cornel Ghiban Michael Gonzales Hariolf Häfele Matthew Hanlon 74 MetadataDataToolsWorkflowsViz Executive Team: Steve Goff Dan Stanzione Faculty Advisors & Collaborators: Ali Akoglu Greg Andrews Kobus Barnard Sue Brown Thomas Brutnell Michael Donoghue Casey Dunn Brian Enquist Damian Gessler Ruth Grene John Hartman Matthew Hudson Dan Kliebenstein Jim Leebens-Mack David Lowenthal Robert Martienssen Students: Peter Bailey Jeremy Beaulieu Devi Bhattacharya Storme Briscoe Ya-Di Chen John Donoghue Steven Gregory Yekatarina Khartianova Monica Lent Amgad Madkour B.S. Manjunath Nirav Merchant David Neale Brian O’Meara Sudha Ram David Salt Mark Schildhauer Doug Soltis Pam Soltis Edgar Spalding Alexis Stamatakis Ann Stapleton Lincoln Stein Val Tannen Todd Vision Doreen Ware Steve Welch Mark Westneat Andrew Lenards Zhenyuan Lu Eric Lyons Naim Matasci Sheldon McKay Robert McLay Angel Mercer Dave Micklos Nathan Miller Steve Mock Martha Narro Praveen Nuthulapati Shannon Oliver Shiran Pasternak William Peil Titus Purdin J.A. Raygoza Garay Dennis Roberts Jerry Schneider Anthony Heath Barbara Heath Matthew Helmke Natalie Henriques Uwe Hilgert Nicole Hopkins Eun-Sook Jeong Logan Johnson Chris Jordan B.D. Kim Kathleen Kennedy Mohammed Khalfan Seung-jin Kim Lars Koersterk Sangeeta Kuchimanchi Kristian Kvilekval Aruna Lakshmanan Sue Lauter Tina Lee Bruce Schumaker Sriramu Singaram Edwin Skidmore Brandon Smith Mary Margaret Sprinkle Sriram Srinivasan Josh Stein Lisa Stillwell Kris Urie Peter Van Buren Hans Vasquez-Gross Matthew Vaughn Fusheng Wei Jason Williams John Wregglesworth Weijia Xu Jill Yarmchuk Aniruddha Marathe Kurt Michaels Dhanesh Prasad Andrew Predoehl Jose Salcedo Shalini Sasidharan Gregory Striemer Jason Vandeventer Kuan Yang Postdocs: Barbara Banbury Jamie Estill Bindu Joseph Christos Noutsos Brad Ruhfel Stephen A. Smith Chunlao Tang Lin Wang Liya Wang Norman Wickett The iPlant Collaborative