Providing National Cyberinfrastructure to Biologists, esp. Genomicists. William K. Barnett, Ph.D. (Director) Thomas G. Doak (Manager & Domain Biologist)

Slides:



Advertisements
Similar presentations
September 4, 2014 Using National Cyberinfrastructure Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Advertisements

 Preparing undergraduates to succeed in college and beyond in a bioinformatics-rich curriculum  Discussion of existing resources, opportunities, and.
Workforce Demand and Career Opportunities in University and Research Libraries NAS Symposium on Digital Curation Anne R. Kenney July 19, 2012.
Ian Foster Computation Institute Argonne National Lab & University of Chicago Education in the Science 2.0 Era.
The Center for Computational Genomics and Bioinformatics Christopher Dwan Mike Karo Tim Kunau.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
David A. Lifka Chief Technical Officer Cornell Theory Center Data Intensive Computing Enabling Seamless High Performance Computing.
BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, *, ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
The Creation of a Big Data Analysis Environment for Undergraduates in SUNY Presented by Jim Greenberg SUNY Oneonta on behalf of the SUNY wide team.
Bioinformatics Core Facility Ernesto Lowy February 2012.
Statewide IT Conference, Bloomington IN (October 7 th, 2014) The National Center for Genome Analysis Support, IU and You! Carrie Ganote (Bioinformatics.
Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science AAMC 2013 Information Technology in Academic Medicine Conference.
Empowering Bioinformatics Workflows Using the Lustre Wide Area File System across a 100 Gigabit Network Stephen Simms Manager, High Performance File Systems.
Presenter: Karla Strieb Assistant Executive Director Transforming Research Libraries June 3, 2010 Supporting E-science: Progress at Research Institutions.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.
Bio-IT World Asia, June 7, 2012 High Performance Data Management and Computational Architectures for Genomics Research at National and International Scales.
The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused.
The New Digital World and the Transformation of Information and Libraries Patricia L. Thibodeau Associate Dean Library Services & Archives Oct. 26, 2011.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Current Situation and CI Requirements OOI CyberInfrastructure Science User Requirements Workshop: San Diego January 23-24, 2008.
Bioinformatics Core Facility Guglielmo Roma January 2011.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
SCHOOL OF INFORMATION UNIVERSITY OF MICHIGAN si.umich.edu Cyberinfrastructure Requirements and Best Practices Lessons from a study of TeraGrid Ann Zimmerman.
Solomon Wong Information Resource Center Public Affairs Section U.S. Consulate General Hong Kong Consulate Librarian: Functions, Qualifications & Skills.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
The National Center for Genomic Analysis Support: creating a national cyberinfrastructure environment for genomics researchers. William Barnett, Thomas.
Funding: Staffing for Research Computing What staffing models does your institution use for research computing? How does your institution pay for the staffing.
Pti.iu.edu/sc14 The National Center for Genome Analysis Support Supercomputing 2014 November 17-21, 2014.
Cancer Center Support Grant Site Review Date Cancer Center Support Grant Site Review Date Genomics High-Throughput Facility (GHTF) and Bioinformatics Core.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
September 2, 2015 ACI-REF Mission: User Sensitivity 101 Tom Doak Le-Shin Wu Carrie Ganote National Center for Genome Analysis Support.
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
The role of the National Agricultural Library in arthropod genomics research - implementing and developing tools for genomic data management Monica Poelchau.
1 The Cloud and Desktop as a Service as a teaching tool for different research communities David Wallom Oxford e-Research Centre.
NCGAS provides A specific goal is to provide dedicated access to memory rich supercomputers customized for genomics studies, including Mason and other.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
The Genome Analysis Centre Building Excellence in Genomics and Computational Bioscience Mario Caccamo Acting
Galaxy based BLAST submission to distributed high throughput computing resources Rob Quick and Soichi Hayashi Open Science Grid Operations Indiana University.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Joslynn.
Enhancements to Galaxy for delivering on NIH Commons
Accessing the VI-SEEM infrastructure
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Gretchen Stahlman, PhD Candidate, University of Arizona
A Few Questions Before We Begin
Bioinformatics Community of CNGrid A New Approach to Utilizing Grids
National Center for Genome Analysis Support
Cognitus: A Science Case for HPC in the Nordic Region
Jay Bhatt Drexel University Libraries
Bioinformatic analysis using Jetstream, a cloud computing environment
USF Health Informatics Institute (HII)
HII Technical Infrastructure
Functional Annotation of the Horse Genome
Curate, Archive, Manage, Preserve
Richard LeDuc, Ph.D. (Manager)
Cyberinfrastructure for the Life Sciences
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Trip report: Visit to UPPNEX
Campus and Phoenix Resources
Presentation transcript:

Providing National Cyberinfrastructure to Biologists, esp. Genomicists. William K. Barnett, Ph.D. (Director) Thomas G. Doak (Manager & Domain Biologist) National Center for Genome Analysis Support

2 Lab of Mike Lynch

Thank You OSG Team Rob Quick and Soichi Hayashi Questions? Bill Barnett Le-Shin Wu Carrie Ganote Tom Doak

An outline: The science and research NCGAS addresses What tools and infrastructure NCGAS provides to researchers What is the near to mid-term future of bioinformatics research

Genomics Proteomics Transcriptomics MetaGenomics MetaProteomics MetaTranscriptomics ‘omics is expanding to include everything then Population Genomics, etc. ….

Cost per Genome

03/23/2015

8

Making it easier for Biologists Web interface to NCGAS resources Supports many bioinformatics tools Available for both research and instruction. Common Rare Computational Skills LOW HIGH

Researchers must balance cost, ease, and availability.

NCGAS’s primary goals: Provide bioinformatics expertise Maintain a curated set applications Provide access to HPC resources, esp. large- memory clusters = Mason Build Galaxy instances for our software Pursue out-reach to biologists

NCGAS is embedded in Research Technologies 12

NCGAS is embedded in Research Technologies 13

16-nodes, 500GB RAM 10TB project space Bioinformatics software Galaxy instance 50TB archive space/user We ask that you acknowledge our grant in any published work that uses our resources. Collaborations and authorship are requested for intellectual contributions. THE FACTS The fine print

Mason and NCGAS use over time Users

Mason Use

CASE STUDY Suspect: Horned Dung Beetle Scientific Name: Onthophagus taurus, O. sagittarius, and O. nigriventris Wanted for: Nutrition, metabolism, and horn development. Warning! Subject may be armed with horns which “vary in size, number, position on the body, and degree of sexual dimorphism”. Rapidly evolving genes in three closely related species may be implicated in the diversity of these structures. Suspects’ genomes are under current investigation for strong signals of selection. PI: Melissa Pespeni (lab of Armin Moczek)

Our role in Melissa’s research We recommended assembly procedures and Unix commands – when and how to concatenate data sets together to retrieve the desired information We wrote customized scripts to get the data in the format required by the programs requested We troubleshot issues with the system that were beyond user experience We assisted with the data moving process and advised steps to address data corruption and failures We added new users to project and brought them up to speed on the project and on Unix …With a smile

GALAXY.NCGAS.ORG Model Virtual box hosting Galaxy.ncgas.org The host for each tool is configured individually Quarry Mason Data Capacitor Archiv e NCGAS establishes tools, hardens them, and moves them into production. Custom Galaxy tools can be made for moving data Individual projects can get duplicate boxes – provided they support it themselves. Policies on the DC guarantee that untouched data is removed with time.

Simplify this!

From our recent NSF survey:

From our NSF survey: “The biggest impediment to discovery by biologists is the need to rely on others with knowledge of impenetrable systems and obscure acronyms to process and interpret data. Don't know how to fix this, but on some level user friendly platforms programs like Geneious more than make up for their lack of power by providing an intuitive platform that encourages free exploration and experimentation with data.”

The end…

Mason Use

27 From our recent NSF survey:

28 From our recent NSF survey:

29 From our recent NSF survey:

30 From our recent NSF survey:

31 From our recent NSF survey:

32 From our recent NSF survey:

33 From our recent NSF survey: