UAB Research Computing Day

Slides:



Advertisements
Similar presentations
An Example. STEP #1 Sign up for FREE at Rechamp.com.
Advertisements

Introduction and Overview
© BMA Inc All rights reserved. Decision-making Making routine decisions without the help of standard costs.
ARIZONA DEPARTMENT OF ADMINISTRATION INFORMATION SERVICES DIVISION - DATA CENTER.
Managing Processes and Capabilities CHAPTER THREE McGraw-Hill/Irwin Copyright © 2011 by the McGraw-Hill Companies, Inc. All rights reserved.
1 Mixing Public and private clouds a Practical Perspective Maarten Koopmans Nordunet Conference 2009 Maarten Koopmans Nordunet Conference 2009.
Storage Services Let the data flow! NorduNet 2008,.fi, 9 April 2008 Jan Meijer.
Grid Computing at The Hartford OGF22 February 27, 2008 Robert Nordlund
Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.
DCIA Keynote MARCH 9, Confidential 2010 Diversion Media LLC What is the Cloud? Computing as a utility As many processors as you want, instantly.
1 MAXIMIZING PUBLIC INVESTMENT Ohio Department of Transportation Highway Funding Overview Julie Ray, Deputy Director Division of Finance & Forecasting.
Some Practice Questions in Engineering Economics
Learning Objectives for Section 3.2
Course-Source Ltd Ad-Hoc e-Learning Can e-Learning be purchased ad-hoc like classroom training? Ken Wood Managing Director, Course-Source Ltd
Govern the Flow of Data: Moving from Chaos to Control
Broadband and content as a driver for promotion Broadband Promotion IRG-FSR workshop Fiesole, 16th of November.
Key Concepts and Skills
Announcements Homework 6 is due on Thursday (Oct 18)
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
1 Disks Introduction ***-. 2 Disks: summary / overview / abstract The following gives an introduction to external memory for computers, focusing mainly.
Storage Devices.
Network, Local, and Portable Storage Media Computer Literacy for Education Majors.
Why should I consider Implementing a Document Imaging / Management System? Created by Harold Hegerhorst North American Technology. LLC © North American.
Enterprise Document Management Symposium October 5 th – 6 th 2010 Niagara Falls, Canada.
Net Price Calculator Presented by American Student Assistance.
Lesson 3: Working with Storage Systems
Activity 1………………Saving vs. Investing Activity 2……….….Saving for a Rainy Day Activity 3…………………… = Saving Activity 4…..Investing for the Long Term.
Time Value of Money Time value of money: $1 received today is not the same as $1 received in the future. How do we equate cash flows received or paid at.
Why Nations Trade Chapter 18 1.
Capacity Planning Break-Even Point Ardavan Asef-Vaziri Systems and Operations Management College of Business and Economics California State University,
WHICH TO CHOOSE RIGHT SERVER FOR THE RIGHT JOB. Today’s business environment demands that small and midsize businesses do more with less. The large majority.
Take Charge of Your Finances
1.7.6.G1 © Family Economics & Financial Education –March 2008 – Financial Institutions – Online Banking – Slide 1 Funded by a grant from Take Charge America,
Ed Duguid with subject: MACE Cloud
Cloud Storage in Czech Republic Czech national Cloud Storage and Data Repository project.
MS SQL Server & Solid State Storage November 2013 Gavin McLaughlin Solutions Development Director X-IO International Cutting through the marketing hype.
House Financial Institutions Committee Interim Charge: Commercial Usury Texas Department of Banking Testimony of: Randall S. James Banking Commissioner.
Introduction to ikhlas ikhlas is an affordable and effective Online Accounting Solution that is currently available in Brunei.
1 DIGITAL INTERACTIVE MEDIA Wednesday, October 28, 2009.
Key Concepts and Skills
Welcome to the simplest way to create financial independence today!
© Copyright Savingsbonds.com | UIS Inc. – May 2007 Welcome to SavingsBonds.com’s MAXIMIZING YOUR SAVINGS BOND INVESTMENT WEBCAST Questions can be submitted.
Amazon. Cloud computing also known as on-demand computing or utility computing. Similar to other utility providers like electric, water, and natural gas,
Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms Brian D. Halligan, Ph.D. Medical.
Barracuda Backup Service Data Backup and Disaster Recovery.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Storing Data. Memory vs. Storage Storage devices are like file drawers, in that they hold programs and data. Programs and data are stored in units called.
Big Data A big step towards innovation, competition and productivity.
Addition to Networking.  There is no unique and standard definition out there  Cloud Computing is a general term used to describe a new class of network.
Banking Clouds V International Youth Banking Forum.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Cloud Computing and its Application in Television and Broadcasting Industry 出處 : School of Information and Engineering, Communication University of ChinaBeijing,
Introduction to Cloud Computing
-- Don Preuss NCBI/NLM/NIH
SuperBelle Collaboration Meeting December 2008 Martin Sevior University of Melbourne A Computing Model for SuperBelle This is an idea for discussion only!
Company small business cloud solution Client UNIVERSITY OF BEDFORDSHIRE.
Practical IT Research that Drives Measurable Results Leverage Server Virtualization for DR Affordability and Agility 1Info-Tech Research Group.
Data Hosting and Security Overview January, 2011.
Canadian Bioinformatics Workshops
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
Unit 3—Part B Computer Storage Computer Technology (S1 Obj 2-3 and S3 Obj 1-1) Unit 3—Part B Computer Storage Computer Technology (S1 Obj 2-3 and S3 Obj.
Canadian Bioinformatics Workshops
With Office 365, Collaborative Solution by Qorus Streamlines Document Assembly and Enhances Productivity for Any Business-Critical Documents OFFICE 365.
Course: Cluster, grid and cloud computing systems Course author: Prof
CyVerse Tools and Services
Amazon Storage- S3 and Glacier
+Vonus: An Intuitive, Cloud-Based Point-of-Sale Solution That’s Powered by Microsoft Office 365 with Tools to Increase Sales Using Social Media OFFICE.
File Manager for Microsoft Office 365, SharePoint, and OneDrive: Extensible Via Custom Connectors in Enterprise Deployments, Ideal for End Users OFFICE.
Autoaddress for Office 365 Brings Fast, Accurate and Quality Address and Eircode Capture to Your Organization in a Comprehensive, Simple Manner OFFICE.
Presentation transcript:

UAB Research Computing Day September 15, 2011 Challenges in data acquisition, storage and processing for NIH funded studies Stephen Barnes, PhD Department of Pharmacology & Toxicology and the Targeted Metabolomics and Proteomics Laboratory

Synopsis Proteomic and genomic “cats” Federal rules for maintaining data from funded grants Economics of storing and transferring data Local Cloud vs Commercial Cloud Media for Cloud storage

How many cats in 2011? If we start with 2 cats (male and female) in 2001 and cats breed every 3 months producing a litter of 4 kittens, how many cats will we have today? At 3 months, we’ll have 4 kittens At 6 months, we’ll have 8 kittens At 12 months we’ll have 32 kittens (expanded by 16) At 5 years, we’ll have 165 kittens = (24*5) = one million At 10 years, we’ll have 1610 kittens = 240 Transfer to OMG units or as they’re better known in IT departments as terabytes

The problem Analysis is exploding Imaging NextGen sequencing Proteomics Metabolomics Have created a world where there are “routine” TB datasets

Recent increases in speed of DNA sequencing Since 2005, DNA sequencing rates have increased 500-fold and are headed higher. The annualized increase is 4.5-fold, i.e., 22.5-fold every two years, an order of magnitude more than Moore’s law in computing. Mardis ER Nature 470:198, 2011

Expense of Deep Sequencing Deep DNA sequencing is leading to Terabytes of data per month If the genomes of the population of the USA were sequenced, this would amount to >1,000 Petabytes of data 1 Terabyte of storage is ~$100 If simple scaling is possible, then it would cost $100 million a year (expected lifetime of the drives) to store the data or approaching $7 billion for the average person’s lifetime (assuming no further population increase) And that’s without backups, using the data in any way and assuming that Microsoft doesn’t structure your saved files!

NIH requirements for data collected from funded grants NIH requires you to make available datasets created from federally supported studies How long should this be after termination of the grant? Federal rules about “data” state that you must keep them for 3 years http://grants.nih.gov/grants/policy/nihgps_2010/nihgps_ch8.htm#_Toc271264950 So, who pays for keeping TB datasets? The investigator doesn’t have financial authority once a grant is over So, UAB?

Does the “cloud” present a viable option? Yes, if we can transfer the data to other computers with greater processing power and cheaper long-term storage, but……..

Whither the cloud? Depends on the size and capacity of the pipe from the acquisition computer to the computers in the cloud If we can get data to the cloud, can the software, particularly commercial software, be used for analysis in the cloud? Opportunity for companies to go to a different business model where there is one, always up-to-date version of their software in the cloud and users pay a small fee for each time they use it

Principal issues posed at Bio-IT 2011 Can cloud infrastructure can support high-throughput data analysis pipelines designed for next-generation sequencing data? “The cloud is a valuable option for small research centers that lack the resources to purchase and maintain in-house infrastructure” “For large centers like the Broad Institute (or UAB), which require almost constant compute power to manage and move files ranging from 1 gigabyte to 1 terabyte in size, the cloud, at least for the present, does not seem to be a cost-effective option.”

Economics of the cloud Estimated cost of traditional storage is $3.75 per Gigabyte-month Amazon cost – $0.15 Gigabyte-month CPU cost estimated at $2.63-$3.33 per CPU-hour Microsoft Azure cost - $0.12 CPU-hour The cloud, if you can get there, offers substantial savings From Virtualization and Cloud Computing – Digital Realty Trust, February 2011

Tb storage costs in the Cloud $0.15/GB/month  $150/TB/month Annual cost  $1,800/TB/year Commercially, “life-time storage” is $3,000 For our group, ONE machine generates 2 TB each month Annualized cost $78,000 Conclusion: Cloud HD storage is not viable

Other models to consider Do we really need to have high speed access to old data? Is a tape back up system viable? Google still uses it as their long-term storage system One tenth of the costs of HD storage This reduces 1 TB storage to $15/month or $180/year

Costs of tape storage http://www.bitbunker.com/pricing If an investigator uploads a 200 GB file, this costs $20. It also costs $20 each time it is downloaded. This will be a cost borne by the investigator.

The robotic tape storage system Download time is consistent with the time to get a coffee or a Coke, or take a quick bathroom break.

Summary and the future Biomedical research is generating volumes of data that strain the current system for data transport and storage There are two options Gaining access to very fast pipes from UAB NextGen and other UAB data generating centers to existing regional fast pipes and on to the commercial cloud Creating very fast pipes from UAB NextGen and other UAB data generating centers to a UAB cloud Storage costs of large data sets has become an economic heavyweight Is a tape system the solution for NIH data? Software may not be transferrable to the cloud Security issues need good solutions for all parties

Acknowledgements David Shealy, PhD Chiquito Crasto, PhD Jonas Almeida, PhD John-Paul Robinson Scott Sweeney Landon Wilson Mikako Kawai Chandrahas Narne NCCAM R21 AT004661 NCRR S10 RR027822