Heterogeneous community Research computing at CSU There is HPC Activities in all 8 colleges Computing being done by Professional staff as well as grad.

Slides:



Advertisements
Similar presentations
Supercomputing Institute for Advanced Computational Research © 2009 Regents of the University of Minnesota. All rights reserved. The Minnesota Supercomputing.
Advertisements

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
ELIDA Panel on Data Marketplaces and digital preservation Fabrizio Gagliardi, Vassilis Christophides, David Giaretta, Robert Sharpe, Elena Simperl.
Life and Health Sciences Summary Report. “Bench to Bedside” coverage Participants with very broad spectrum of expertise bridging all scales –From molecule.
The Protein Folding Problem David van der Spoel Dept. of Cell & Mol. Biology Uppsala, Sweden
Copyright (c) John Y. Cheung, 2002 ECE Recruiting,ppt Slide 1 What is an Electrical and Computer Engineer?
Structural Genomics – an example of transdisciplinary research at Stanford Goal of structural and functional genomics is to determine and analyze all possible.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
The Golden Age of Biology DNA -> RNA -> Proteins -> Metabolites Genomics Technologies MECHANISMS OF LIFE Health Care Diagnostics Medicines Animal Products.
NICLS: Development of Biomedical Computing and Information Technology Infrastructure Presented by Simon Sherman August 15, 2005.
Workshop in Bioinformatics 2010 What is it ? The goals of the class… How we do it… What’s in the class Why should I take the class..
UK -Tomato Chromosome Four Sarah Butcher Bioinformatics Support Service Centre For Bioinformatics Imperial College London
Computer Science Prof. Bill Pugh Dept. of Computer Science.
Luxembourg, Sep 2001 Pedro Fernandes Inst. Gulbenkian de Ciência, Oeiras, Portugal EMBER A European Multimedia Bioinformatics Educational Resource.
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
David L. Spooner1 IT Education: An Interdisciplinary Approach David L. Spooner Rensselaer Polytechnic Institute.
1 The Department of Computational Biology University of Pittsburgh School of Medicine Hagai Meirovitch.
Institutional Research Computing at WSU: Implementing a community-based approach Exploratory Workshop on the Role of High-Performance Computing in the.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Institute of Systems Biology (INBIOSIS)/ School of Biosciences & Biotechnology (Faculty of Science & Technology), Bioinformatics Development in Malaysia.
DRAFT 1 Institutional Research Computing at WSU: A community-based approach Governance model, access policy, and acquisition strategy for consideration.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
University of Southampton Clusters: Changing the Face of Campus Computing Kenji Takeda School of Engineering Sciences Ian Hardy Oz Parchment Southampton.
Stern Center for Research Computing
Institute for Digital Research and Education (IDRE) UCLA’s CI Vision Research CI DataNet Cyber Learning Institute for Digital Research and Education (IDRE)
SYSTEMS BIOLOGY AND NEUROENGINEERING Christine P. Hendon, PhD Assistant Professor Electrical Engineering.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
UPPMAX and UPPNEX: Enabling high performance bioinformatics Ola Spjuth, UPPMAX
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
The Research Computing Center Nicholas Labello
Report on CSU HPC (High-Performance Computing) Study Ricky Yu–Kwong Kwok Co-Chair, Research Advisory Committee ISTeC August 18,
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Parallel and Distributed Simulation Introduction and Motivation.
Bioinformatics Core Facility Guglielmo Roma January 2011.
CCS Overview Rene Salmon Center for Computational Science.
A Framework for Visualizing Science at the Petascale and Beyond Kelly Gaither Research Scientist Associate Director, Data and Information Analysis Texas.
Futures Lab: Biology Greenhouse gasses. Carbon-neutral fuels. Cleaning Waste Sites. All of these problems have possible solutions originating in the biology.
Genomes To Life Biology for 21 st Century A Joint Initiative of the Office of Advanced Scientific Computing Research and Office of Biological and Environmental.
Master of Science in Biological Informatics PROGRAM DESCRIPTION The MS in Biological Informatics program program aims.
© 2009 IBM Corporation Motivation for HPC Innovation in the Coming Decade Dave Turek VP Deep Computing, IBM.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop GWAS/QTL Apps Overview.
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.
Implementing a National Data Infrastructure: Opportunities for the BIO Community Peter McCartney Program Director Division of Biological Infrastructure.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Meeting Users Needs with Octave A survey of the use of Matlab™ and Octave at the NIH Tom Holroyd NIMH MEG Core Facility Presented at the Octave 2006 Workshop.
NICS Update Bruce Loftis 16 December National Institute for Computational Sciences University of Tennessee and ORNL partnership  NICS is the 2.
Interdisciplinary Research Interests.
Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 1 Introduction EE692 Parallel and Distribution Computation | Prof. Song.
HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
Center for Bioinformatics and Genomic Systems Engineering Bioinformatics, Computational and Systems Biology Research in Life Science and Agriculture.
SCIENCE FAIR CATEGORIES MIDDLE SCHOOL :
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Accelerated B.S./M.S An approved Accelerated BS/MS program allows an undergraduate student to take up to 6 graduate level credits as an undergraduate.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Science Fair Categories
HII Technical Infrastructure
Panel on Research Challenges in Big Data
The Barcelona Supercomputing Center
Presentation transcript:

Heterogeneous community Research computing at CSU There is HPC Activities in all 8 colleges Computing being done by Professional staff as well as grad students & post-docs Users need fast processing, or large memory, or large disk (or 2 out of 3) Application programs being used are homegrown, or open source, or commercial 59 respondents to survey

Majority of computation at CSU (by # of people or grant funding) is in support of experimental programs Computation a critical part of a project, not the project There is a significant developmental presence across the University Research computing at CSU Active 53 funding (from VPR database) Computing, modeling, simulation, bioinformatics…$78M For reference: Solar, biofuels, methane, wind, powerhouse…$41M Tuberculosis$46M Total CSU$1.1B

Application areas include: Atmospheric, fire, and ecological sciences Ecosystem modeling Geosciences Robotics Machine Learning - Exercise science - Atmospheric science - Biomedical engineering Distributed Computing and Big Data - Atmospheric Science - Epidemiology - Healthcare Civil Engineering (ordinary & partial differential equations) Simulation - greenhouse gas emissions - financial modeling - large-scale models in ECE - large scale network - molecular (bio, nano, polymer) - MCNPX Monte Carlo simulations - resource management for HPC systems - data center energy and thermal properties Statistics - large datasets - large number of datasets Smart Grid research Power systems engineering Computational Electromagnetics Electronic structure (small molecule, bio, solids) Sequential Image Capture and Processing GIS analysis Bioinformatics Math Physics Computational Biology Genomics (short-reads analysis, sequence analysis, etc.) Transcriptomics Proteomics Metabolomics Population genetic evaluation Structural RNA, RNA-RNA interaction prediction Protein structure determination

Needs/usage Ram (16 GB up to TB) Disk (TB up to PBs) Processors fast serial, single node-multicore, Fast interconnect up to 1000s of cores, GPUs, accelerators most applications don’t need to be/aren’t here Cray 2,016 CPU cores 2.5 TB RAM (32 GB/node) 32 TB disk Based on survey (hardware that people use) applications tend to need/use large memory (TB) or large disk (PB) or fast parallel (1000s of cores/GPUs) Perhaps due to non-linearity of pricing with speed

Applications programs Open Source Roll your own Commercial open source > commercial > roll you own (based on # of apps) Hardware needs generally dictated by software “vendor” a number of applications take advantage of parallel processing but many can not, yet

Computers Department/college Individual/group Cray/National resource Work is getting done, grants are being funded, but more resources are needed to remain competitive 50%/10% 30%/30% 20%/60% Based on number of cores Based on number of active users

Heterogeneous User base Professional staff & Faculty busy doing their day job need help in small bites Grad students and post-docs large learning curve users & developers Support needs consulting scripting (python, matlab) courses (discipline-specific & computing) consulting scripting (python, matlab) Personal aside: If programming assistance were available, perhaps: scripts->applications serial->parallel

Courses ANEQ 575 Computational Biology in Animal Breeding. BC 441 3D Molecular Models for Biochemistry. BZ577/MIP577 Computer Analysis in Population Genetics. CIVE542 Water Quality Modeling. CIVE556 Seepage and Earth Dams. CIVE607 Computational Fluid Dynamics. CIVE631 Computational Methods in Subsurface Systems. CS475 Parallel Programming CS570 Advanced Computer Architecture. CS575 Parallel Processing. GRAD510 Fundamentals of High Performance Computing. GRAD511 High Performance Computing and Visualization. MECH650 Computational Materials from First Principles. NB650 Computer Analysis of Neuronal Proteins. SOCR731 Plant Breeding Data Management.

Bioinformatics Courses BSPM 576/MIP 576 Bioinformatics. Technical computing across platforms using bioinformatics tools in molecular analyses. CS 425 Introduction to Bioinformatics Algorithms. Algorithms for analysis of large scale biological data. CS 548/STAT 548 Bioinformatics Algorithms. Computational methods for analysis of DNA/protein sequences and other biological data. CS (3-2-0). Machine Learning in Bioinformatics. Recent research on the supplications of machine learning in bioinformatics.

Work is getting done, grants are being funded, but more resources are needed to remain competitive Hardware needs generally dictated by software “vendor” a number of applications can take advantage of parallel processing but many can not, yet Majority of computation at CSU is in support of experimental programs A critical part of a project, not the project There is a significant developmental presence across the University (that can be tapped through GSAs) Applications tend to need/use large memory (TB) or large disk (PB) or fast parallel (1000s of cores/GPUs) Support needed Courses (discipline-specific & computing) Consulting Scripting (python, matlab) Summary

My personal opinion: We need a heterogeneous environment modeled after campus usage/need with a) large ram subsystem (several nodes) b) gpu-rich subsystem c) subsystem with large (cheap) scratch d) subsystem with large number of nodes with fast interconnect e) distributed nodes & fast communication to (pre)process where data is being generated Over time users could/would add notes to the subsystem type(s) that they need Need sophisticated queuing system and larger support staff HPC Support staff should include GSAs in parallel programming and discipline-specific computing (eg. bioinformatics, statistics) A lot of work going on a CSU, very grassroots Central leverage could help funding success. Grants tend to be "modular". For example, in chemistry one can ask for $450K (not going to get much more), you can either buy hardware or pay people. It's better for CSU if you pay people (hardware has no overhead)

What do you think we need/what will help your research? What should be in central infrastructure? Software site licenses? Support infrastructure?