Trip report: Visit to UPPNEX

Slides:



Advertisements
Similar presentations
The Access Grid Ivan R. Judson 5/25/2004.
Advertisements

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
1/16/2008CSCI 315 Operating Systems Design1 Introduction Notice: The slides for this lecture have been largely based on those accompanying the textbook.
Research Computing with Newton Gerald Ragghianti Nov. 12, 2010.
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
UPPMAX and UPPNEX: Enabling high performance bioinformatics Ola Spjuth, UPPMAX
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Objectives.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Quick Introduction to NorduGrid Oxana Smirnova 4 th Nordic LHC Workshop November 23, 2001, Stockholm.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Virtual Classes Provides an Innovative App for Education that Stimulates Engagement and Sharing Content and Experiences in Office 365 MICROSOFT OFFICE.
Internet2 AdvCollab Apps 1 Access Grid Vision To create virtual spaces where distributed people can work together. Challenges:
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
The VERSO Product Returns Portal Incorporates Office 365 Outlook and Excel Add-Ins to Create Seamless Workflow for All Participating Users OFFICE 365 APP.
Office 365 is cloud- based productivity, hosted by Microsoft. Business-class Gain large, 50GB mailboxes that can send messages up to 25MB in size,
Instantly Deliver and Track Training to Learners Anytime, Around the World and on Any Device Within Your Office 365 Environment with LMS365 OFFICE 365.
ENEA GRID & JPNM WEB PORTAL to create a collaborative development environment Dr. Simonetta Pagnutti JPNM – SP4 Meeting Edinburgh – June 3rd, 2013 Italian.
Automated File Server Disk Quota Management May 13 th, 2008 Bill Claycomb Computer Systems Analyst Infrastructure Computing Systems Department Sandia is.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
Managing Explosive Data Growth
Accessing the VI-SEEM infrastructure
A Brief Introduction to NERSC Resources and Allocations
Office 365 is cloud-based productivity, hosted by Microsoft.
Chapter 1: Introduction
Chapter 1: Introduction
Computing Clusters, Grids and Clouds Globus data service
BEST CLOUD COMPUTING PLATFORM Skype : mukesh.k.bansal.
Science for Life Laboratory
Operating System.
CyVerse Discovery Environment
GWE Core Grid Wizard Enterprise (
Chapter 1: Introduction
Chapter 1: Introduction
National Center for Genome Analysis Support
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
A Complete Internet Company
Chapter 1: Introduction
A Modern Intranet Integration that Extends the Value of Your Microsoft Office 365 Deployment, Boosts Productivity, and Enhances Collaboration OFFICE 365.
Chapter 1: Introduction
Chapter 1: Introduction
+Vonus: An Intuitive, Cloud-Based Point-of-Sale Solution That’s Powered by Microsoft Office 365 with Tools to Increase Sales Using Social Media OFFICE.
File Manager for Microsoft Office 365, SharePoint, and OneDrive: Extensible Via Custom Connectors in Enterprise Deployments, Ideal for End Users OFFICE.
Interoperability of Digital Repositories
LP+365 App Transforms Office 365 into a Learning Management System That Promotes Digital Literacy and Encourages All Students to Develop Together OFFICE.
Richard LeDuc, Ph.D. (Manager)
Built on the Powerful Microsoft Office 365 Platform, My Intranet Boosts Efficiency with Support of Daily Tasks, Internal Communications and Collaboration.
Module 01 ETICS Overview ETICS Online Tutorials
Chapter 1: Introduction
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Yooba File Sync: A Microsoft Office 365 Add-In That Syncs Sales Content in SharePoint Online to Yooba’s Sales Performance Management Solution OFFICE 365.
Chapter 1: Introduction
Chapter 1: Introduction
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Chapter 1: Introduction
Software - Operating Systems
ENA Cloud Services.
The National Grid Service Mike Mineter NeSC-TOE
Chapter 1: Introduction
Chapter 1: Introduction
H2020 EU PROJECT | Topic SC1-DTH | GA:
Presentation transcript:

Trip report: Visit to UPPNEX UPPmax NExt generation sequencing Cluster & Storage University of Uppsala Uppsala, Sweden Presented by Hershel Safer in Ron Shamir’s group meeting on 16/11/2011 Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Outline Background about UPPNEX Growth in UPPNEX projects UPPNEX resources Technical details Projects under development Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

SciLifeLab 1 The Science for Life Laboratory (SciLifeLab - www.scilifelab.se) is Sweden’s main center for high-throughput biosciences and translational medicine. SciLifeLab is a collaboration between four universities: Stockholm University Karolinska Institute Royal Institute of Technology (KTH) Uppsala University 2 sites: Stockholm & Uppsala The Stockholm site is in a new office building near the Karolinska Institute Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

SciLifeLab 2 Platforms Genomics Proteomics Bioimaging & functional biology Comparative genetics Research programs (intramural and extramural) Biology Medicine Environmental sciences >100 research groups Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

How to manage all the data? Need to support at least: Immediate use (initial analysis) Short-term storage (ongoing analysis) Long-term archive Collaboration (intra- and inter-institute) Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

UPPNEX UPPNEX (www.uppnex.uu.se) is a national resource that provides computing and storage resources for the Next Generation Sequencing (NGS) community in Sweden. It was founded in 2009. UPPNEX is a project of the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX - www.uppmax.uu.se). UPPMAX was founded in 2003. UPPMAX is one of six centers in the Swedish National Infrastructure for Computing (SNIC - www.snic.vr.se), a national strategic resource to support high-performance computing in Sweden’s academic research community. Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

The visit On 12/9/11, I spent the morning at UPPNEX and the afternoon at SciLifeLab Stockholm. My goal was to learn how they support inter-institute data transfer and data management, and to bring back information that we can use for our projects. At UPPNEX, I was hosted by: Martin Dahlӧ, who works with bioinformatics applications Samuel Lampa, who works with computing systems I visited because of upcoming needs to support inter-institute data management for NGS as part of I-Core. Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Outline Background about UPPNEX Growth in UPPNEX projects UPPNEX resources Technical details Projects under development Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Number of projects Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Current projects (July 2011) University # projects uu.se 58 ki.se 19 scilifelab.se 13 slu.se 9 kth.se 4 su.se umu.se 3 gu.se 2 liu.se 1 lu.se Total 161 Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Storage use Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

CPU use (core hours / month) Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Community UPPNEX – SciLifeLab Bioinformatics Forum: Monthly meeting to discuss computing questions and issues SNIC – UPPMAX User Meetings: Hands-on presentations about how to use UPPNEX resources E-mail lists: UPPNEX users Bioinformatics support (planned) UPPMAX support Online knowledge base Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Outline Background about UPPNEX Growth in UPPNEX projects UPPNEX resources Technical details Projects under development Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Staffing Bioinformatics applications UPPNEX has 1.5 FTEs (2 people) Another 6 application experts are available to help Systems UPPNEX has 1 FTE UPPMAX has 5.5 FTEs. They help with UPPNEX. Overall, UPPNEX is using 2-3 FTEs for systems. Scientific coordinator: 1 person @ 50% Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

National computing resources SweStore: “Infinite” space for archiving – cheaper, slower storage for large-scale projects in all areas of science SUNet: 10 GB inter-university network SweGrid: SNIC’s grid-based computer and storage system. High- energy physics gets 1/3 of it; the rest is assigned to other areas on a peer-review basis. Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

UPPNEX computing resources SMP computer with 64 cores, 2 TB RAM 900k core hours / month in UPPMAX’s Kalkyl computing cluster ~420 TB cluster-attached parallel storage for short-term work SweStore: Currently using >1 PB (10^15) of space, and growing Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Money Funding Initial funding from a private foundation Personnel support from SNIC Charges Users are not currently charged for storage space Each project is initially allocated 0.5 TB Projects last ~1.5 years, which is longer than expected in the initial plan Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Outline Background about UPPNEX Growth in UPPNEX projects UPPNEX resources Technical details Projects under development Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

NGS workflow Data are initially stored and analyzed locally at SciLifeLab. Thomas Svensson at SciLifeLab Stockholm has a group of ~20 people who deal with data analysis. After initial analysis, data are transferred to UPPNEX over SUNet. At present, transfers are initiated manually. Data are stored at UPPNEX for ongoing analysis. Access to each project is restricted to authorized users. Job scheduling is done with SLURM. After analysis is complete, data are archived to SweStore. Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Architecture Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Cluster configuration 1 Linux: Currently Scientific Linux (based on RedHat). Possibly moving to Rocks Clusters for convenient installation. RAM: UPPMAX Kalkyl cluster has 24 GB / node => 3 GB / core. Some NGS jobs, such as assembly, need much more. Purchased dedicated machine with 2 TB memory. Storage: Parallel file system to obviate need to copy files to / from local storage. Connectivity: Infiniband to facilitate inter-node communication. This is expensive and superfluous for bioinformatics – we don’t do much work that requires this capability. Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Cluster configuration 2 Security issues with software (e.g., Galaxy) that runs as a single server process and is used by multiple users: The software needs to access each user’s protected files. It therefore needs to be run as root (very dangerous), be able to suid to any user (dangerous), or be a member of all the projects’ groups (insecure). They are trying various ways to grapple with this. Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Issues Data growth is much faster than expected Data duplication Too much manual intervention in data management Users (biologists) need help with Linux data-management tools More interactive style of working than many other UPPMAX users, such as physicists. Allows possibility of sharing workload. Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Outline Background about UPPNEX Growth in UPPNEX projects UPPNEX resources Technical details Projects under development Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Upcoming systems for automated data transfer Data transfer from sequencing machines to UPPNEX storage Their solution is based on the Blue Collar Bioinformatics (bcbb) package by Brad Chapman at the Harvard School of Public Health. They have their own fork of bcbb. Data transfer between fast storage and cheap storage Installing iRODS for automated, policy-based data migration between fast storage (UPPNEX servers) and cheaper storage (SweStore) Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Bioclipse workbench for working with data Based on Eclipse They are among the lead developers of Bioclipse Features include: Secure communication using SSH Job configuration wizard Graphic interface to command-line tools Integrated genome viewer Galaxy integration Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Bioclipse client Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Galaxy Web portal Some results are provided to users via a local Galaxy installation Web-based interface to command-line tools Create and run re-usable workflows Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Galaxy in Web browser Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Galaxy in Bioclipse browser Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011

Planned architecture Trip report: Visit to UPPNEX – Hershel Safer 16 November 2011