Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ben Rogers ITS – Research Services 1.  Data Awareness  Data Management  Data Storage  Campus Resources  Questions 2.

Similar presentations


Presentation on theme: "Ben Rogers ITS – Research Services 1.  Data Awareness  Data Management  Data Storage  Campus Resources  Questions 2."— Presentation transcript:

1 Ben Rogers ITS – Research Services 1

2  Data Awareness  Data Management  Data Storage  Campus Resources  Questions 2

3  Sequencing changes faster than IT  Understand the data you will produce  Understand the data you will keep  Understand how the data will move 3

4  Understand the sizes of the data each instrument produces ◦ How often will you collect this data? ◦ What IT resources are needed for each data set?  How will you handle? ◦ Raw Data ◦ Intermediate Data ◦ Derived Data 4

5  Must decide what data to keep ◦ How long? ◦ How will it be stored?  Is it cheaper to: ◦ Rerun the experiment ◦ Rerun the analysis 5

6  Data captured by the instrument must be moved  Terabytes of data may be involved  Moving terabytes of data across networks is non-trivial ◦ The network is not always the bottleneck 6

7  Common Data Movements ◦ Instrument to local capture storage ◦ Capture storage to shared storage ◦ Shared storage to HPC resource ◦ Shared storage to desktop ◦ Shared storage to backup/replication 7

8  Globus Online – Fastest for big files but requires GridFTP  scp – Fetch (Mac) WS_FTP (Windows)  Network Drive File Copy – Slowest but simplest  External Hard Drive – Reasonably fast but requires physical movement 8

9 Transfer MechanismTransfer Speed External Hard Drive100MB/second read + 100MB/second write + Walking Time Gigabit EthernetUp to 120MB/second Typical Desktop Hard Drive100MB/second Typical Desktop SSD300MB/second GridFTP over 1Gb120MB/second CIFS over 1Gb60-80MB/second scp over 1Gb60-100MB/second Fastest network filesystem on campus 600MB/second single copy 6GB/second aggregate 9 Moving 1TB can easily take 3 hours or more!

10  Very important  There are many solutions ◦ Wiki, spreadsheet, database, etc ◦ Campus Options  Campus Wiki  Sharepoint  Redcap  Galaxy  Make sure you have backups! 10

11  Cheap storage is easy ◦ 2TB External USB Drive  Big storage is harder ◦ 50TB Storage Server  Big, fast, cheap, safe storage is much harder ◦ 50TB Storage Server Pair  Checksum  High Performance Network  Backups  Cost of storage does not scale linearly 11

12 12

13  Where you store the data can impact how fast you can analyze your data.  On Helium during testing we saw over 100% difference in analysis time for BWA depending on where we stored the data.  If doing analysis on your desktop fast storage will likely improve analysis time for NGS.  Galaxy is being optimized to take advantage of this.  If running directly on a cluster ask for recommendations. 13

14  Galaxy  Redcap  Helium ◦ Colocation ◦ /nfsscratch – 110TB ◦ /glusterscratch – 146TB  R Drive  ITS Research Data Storage Service Pilot  Lab/Shared ZFS Systems 14

15  Today ◦ All data in Galaxy should be considered as transient  Deleted after 30 days ◦ Data processing platform only ◦ Please backup all data that is valuable to you!  Future ◦ Solutions to allow longer term storage of data 15

16  Increased availability of 10Gb Networking  Research Data Storage Service  Backup Service  Cloud Storage  Galaxy Data Libraries 16

17  17

18  Tom Bair – Economy of Scale Reversed  Safe Photo - content/uploads/2010/09/safe-installation.jpg content/uploads/2010/09/safe-installation.jpg  BioTeam - =s&source=web&cd=1&ved=0CF4QFjAA&url=htt p%3A%2F%2Fwww.bioteam.net%2Fwp- content%2Fuploads%2F2010%2F03%2Fcdag- xgen- storageForNGS_v3.pdf&ei=0cwWUJPJG4WHqQGih oHoDw&usg=AFQjCNFrzHSvQ8y4Ze3igsXd9mFV_ EWb_Q =s&source=web&cd=1&ved=0CF4QFjAA&url=htt p%3A%2F%2Fwww.bioteam.net%2Fwp- content%2Fuploads%2F2010%2F03%2Fcdag- xgen- storageForNGS_v3.pdf&ei=0cwWUJPJG4WHqQGih oHoDw&usg=AFQjCNFrzHSvQ8y4Ze3igsXd9mFV_ EWb_Q 18


Download ppt "Ben Rogers ITS – Research Services 1.  Data Awareness  Data Management  Data Storage  Campus Resources  Questions 2."

Similar presentations


Ads by Google