DDN & iRODS at ICBR By Alex Oumantsev History of ICBR  Campus wide Interdisciplinary Center for Biotechnology Research  Core Facility  Funded by the.

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Agile Infrastructure built on OpenStack Building The Next Generation Data Center with OpenStack John Griffith, Senior Software Engineer,
Windows Azure Conference 2014 Hybrid Cloud Storage: StorSimple and Windows Azure.
Ddn.com ©2012 DataDirect Networks. All Rights Reserved. Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer.
Dawei Lin, Ph.D. Director, Bioinformatics Core UC Davis Genome Center July 20, 2008, SLIMS (Solexa sequencing.
Symantec Vision and Strategy for the Information-Centric Enterprise Muhamed Bavçiç Senior Technology Consultant SEE.
What is it? CLOUD COMPUTING.  Connects to the cloud via the Internet  Does computing tasks, or  Runs applications, or  Stores Data THE AVERAGE CLOUD.
Microsoft ® Application Virtualization 4.5 Infrastructure Planning and Design Series.
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
Flow Cytometry Shared Resource Bioinformatics Improvements/Bluearc Storage.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Microsoft ® Application Virtualization 4.6 Infrastructure Planning and Design Published: September 2008 Updated: February 2010.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
1. Outline Introduction Virtualization Platform - Hypervisor High-level NAS Functions Applications Supported NAS models 2.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Opensource for Cloud Deployments – Risk – Reward – Reality
© Paradigm Publishing Inc. 4-1 Chapter 4 System Software.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Computers Are Your Future Tenth Edition
Bioinformatics Core Facility Ernesto Lowy February 2012.
Performance Testing of DDN WOS Boxes Shaun de Witt, Roger Downing Future of Big Data Workshop June 27 th 2013.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Chapter 4 System Software.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Ceph Storage in OpenStack Part 2 openstack-ch,
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
Virtualization in the NCAR Mass Storage System Gene Harano National Center for Atmospheric Research Scientific Computing Division High Performance Systems.
What is Driving the Virtual Desktop? VMware View 4: Built for Desktops VMware View 4: Deployment References…Q&A Agenda.
Collecting and Storing Sequences In the laboratory Heather Helm UPR Sequencing Facilities Manager.
Corral: A Texas-scale repository for digital research data Chris Jordan Data Management and Collections Group Texas Advanced Computing Center.
Educating Minds and Hearts to Change the World USFfiles Xythos at the University of San Francisco.
Chapter © 2006 The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/ Irwin Chapter 7 IT INFRASTRUCTURES Business-Driven Technologies 7.
Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.
Live Migration Failover Clustering with Cluster Shared Volumes (CSV) Support for new Processor features Improved Performance Lower Power Costs Enhanced.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Virtualization Infrastructure Administration Other Jakub Yaghob.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
ICAT Integration at ISIS Tom Griffin, ISIS Facility ICAT Developer Workshop The Cosener’s House, Abingdon August 2009
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Enw / Name. What is a on-line / paper based data capture form Can you give an example where each are used? Automated data capture systems are used around.
Cloud Computing ENG. YOUSSEF ABDELHAKIM. Agenda :  The definitions of Cloud Computing.  Examples of Cloud Computing.  Which companies are using Cloud.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
U N C L A S S I F I E D LA-UR Leveraging VMware to implement Disaster Recovery at LANL Anil Karmel Technical Staff Member
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
© 2010 VMware Inc. All rights reserved Why Virtualize? Beng-Hong Lim, VMware, Inc.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Extreme Scale Infrastructure
Happy Endings: Reengineering Wesleyan’s Software Deployment to Labs and Classrooms Kyle Tousignant 03/22/2016.
Computers Are Your Future
WP18, High-speed data recording Krzysztof Wrona, European XFEL
CyVerse Discovery Environment
Working With Azure Batch AI
Introduction to Data Management in EGI
Agenda Backup Storage Choices Backup Rule
Tools and Services Workshop
Virtualization, Cloud Computing and Big Data
Data uploading and sharing with CyVerse
Computer software.
Client/Server and Peer to Peer
Presentation transcript:

DDN & iRODS at ICBR By Alex Oumantsev

History of ICBR  Campus wide Interdisciplinary Center for Biotechnology Research  Core Facility  Funded by the state of Florida in 1987  On average 58 staff 22% faculty 45% full time staff 33% postgrad

Current Cores at ICBR  Bioinformatics  Cytometry  Electron Microscopy  Gene Expression & Genotyping  Monoclonal Antibody  NextGen DNA Sequencing  Proteomics & Mass Spectrometry  Sanger Sequencing

Diverse Environment  Over 400 services provided  Multiple diverse platforms  Diverse user base  Varying analysis pipelines  Wide range of data  Growing data set sizes

Computational Challenges  Data storage  Data processing  Data delivery

Current Computational Environment  Several storage silos with NFS/SMB ~600TB  Mix of 10 and 1 GbE  Various size compute systems ~ 1000 cores  Workstations connected over 1GbE

Current Storage  Segmented  Slow  Not high availability

Current Data Delivery Methods  Hardware encrypted USB drives  University provided file transfer service 5GB max single file size  Client personal USB drives for self service instruments  Various unsupported options…

iRODS at ICBR  Set up a system that can store all instrument data  Maintain archive of all of the instrument data  Check out data for analysis  Check results back in  Manage permissions  Electronic Data delivery

DDN at ICBR  Rapid growth of instrument output dataset size  SFA 12KXE Fast and scalable storage Ability to run custom images on storage GPFS Ability to run some compute tasks directly on storage High availability

iRODS on DDN  Scalable, high performance, high availability  Set up iRODS on the VMs that run on SFA 12KXE  All of the VMs see common storage namespace  A pool of iRODS resource servers running on the VMs Each has full view of the namespace Microservices take advantage of built in compute  Use SSD from the SFA 12KXE to store iCAT Run on MySQL Cluster edition Prevents single point of failure Distributed for performance

iRODS on DDN VM-0 iRODSiDORPiCATGPFSSFA Driver VM-1 iRODSiDORPiCATGPFSSFA Driver VM-2 iRODSiDORPiCATGPFSSFA Driver VM-3 iRODSiDORPiCATGPFSSFA Driver RAID DISK RAID DISK RAID SSD RAID SSD

iRODS at ICBR  Automated instrument data and metadata ingestion into iRODS Set up most used instruments Set up new instruments as they arrive  Provide download link to clients via some LIMS  Create custom Web front end Uniform look Data portal with identical interface for all of the Cores Custom views supporting mobile platforms  Have option to transfer client data to other University compute Resources