Presentation is loading. Please wait.

Presentation is loading. Please wait.

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.

Similar presentations


Presentation on theme: "The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store."— Presentation transcript:

1 The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store

2 Welcome to the iPlant Data Store Manage and share your data across iPlant's tools and services

3 Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. - Wikipedia - (http://en.wikipedia.org/wiki/Big_data) Challenges Working with Big Data

4 Challenges: rapid technological progress Working with Big Data

5 Biologists work with and require access to diverse data types Working with Big Data Challenges: biology is more than sequence data

6 Working with Big Data Why isn't saving/moving/copying big data as simple as using the tools we already have?

7 Challenges: moving to a big data mindset Working with Big Data Changes in scale introduce quantitative and qualitative complications Difficult/slow transfers Expense for storage/backup Difficult to share and publish Metadata Analysis

8 The Data Store services all iPlant platforms iPlant Data Store Overview Access your data from multiple iPlant services Automatic data backup (redundant between University of Arizona and University of Texas) Default 100GB allocation. >1TB allocations available with justification

9 iRODS is an open-source data management system iRODS supports many data intensive projects like NSF TeraGrid, Large Synoptic Survey telescope, etc. iRODS abstracts data services from data storage to facilitate executing services across heterogeneous, distributed storage systems. Avoid reinventing the wheel iPlant Data Store Overview

10 Benefits Get Science Done Reproducibility Productivity Store any type of files related to your research An evolving “Data Commons” lets you access important datasets Metadata captures information needed for reproducibility Automatic backup and accessibility support your project’s data management plan IRODS makes high-speed transfers possible (100GB in ~30min)* Share data instantly with collaborators within iPlant iPlant Data Store Overview

11 Multiple ways to access iPlant Data Store Overview Command linePoint-and-click Discovery Environment iDrop Desktop iCommands

12 iPlant Data Store Overview Texas Replication Arizona Key component of your NSF data management Worry Free! Some important things we will not “see” in the demo

13 iPlant Data Store Overview SourceDestinationCopy MethodTime (seconds) CDMy Computercp320 Berkeley ServerMy Computerscp150 External DriveMy Computercp36 USB2.0 FlashMy Computercp30 iPlant Data StoreMyComputeriget18 My Computer cp15 Close to optimum conditions; transfer between Univ. of Arizona and UC Berkeley 100GB: 29m15s 1 GB / 17.5 seconds Some important things we will not “see” in the demo

14 iPlant Data Store Overview Some important things we will not “see” in the demo http://www.speedtest.net/ One of the complications of big data transfers is that you will always be limited by your local connection and Institutional policies.

15 Hands-on demo iPlant Data Store Overview

16 Import files from a URL Upload/Download “large” files Share data via a public link and via the Discovery Environment View and manage file metadata By the end of this demo you should be able to:

17 User perspectives and possible applications Bench Scientist Bioinformatician Uploads all of his fastq files along with 50gb of root growth videos Shares his analyses results with his thesis advisor Created a metadata template for assembled genomes her students and collaborators will place in a shared folder Uses public links in the supplemental materials of her publications Developed a script to automate transfer of data to core users Uses a shared folder to make large datasets accessible Core Facilities iPlant Data Store Overview Images from personas based on: Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies PLOS Biology DOI: 10.1371/journal.pcbi.1003496

18 Keep asking: ask.iplantcollabortive.org

19 The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI-0735191).


Download ppt "The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store."

Similar presentations


Ads by Google