Presentation is loading. Please wait.

Presentation is loading. Please wait.

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.

Similar presentations


Presentation on theme: "The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data."— Presentation transcript:

1 The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data

2 Welcome to the iPlant Data Store Manage and share your data across iPlant's tools and services

3 Big Data: data sets whose size and complexity is beyond the capabilities of commonly used tools to capture, manage, and process the data within a tolerable time frame. Big Data: constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in single data sets, with different types of data sets potentially deeply intertwined. - Wikipedia (http://en.wikipedia.org/wiki/Big_data) Wikipedia Challenges: the scope and scale of life sciences data continue to grow Working with Big Data

4 Challenges (sequencing example): data generation is cheaper and faster Working with Big Data

5 Biologists work with diverse data types Working with Big Data Challenges: biology encompasses more than sequence data Advanced ImagingGeospatialNetwork

6 Challenges: changes in data require changes in tools Working with Big Data Changes in scale introduce quantitative and qualitative challenges Transfer: difficult/slow Store/backup: expensive Share/publish: all of the above Analyze: Tools complex; format & reformat Understand: Don’t forget your Metadata

7 The Data Store services all iPlant platforms iPlant Data Store Overview Access your data from multiple iPlant services Automatic backup (redundant between University of Arizona and University of Texas) Default 100GB allocation, >1TB allocations available with justification

8 iRODS (integrated Rule-Oriented Data System) is an established, scalable, open-source data management system iRODS supports many data intensive projects iRODS abstracts data services from data storage to facilitate executing services across heterogeneous, distributed storage systems Avoid reinventing the wheel iPlant Data Store Overview Critical for effective data management Works under the hood Folder = Collection

9 Benefits Get Science Done Reproducibility Productivity Store files of any type related to your research Access key data sets in “Data Commons” Capture data about data in metadata Base your data management plan on iPlant’s automatic backup and accessibility Take advantage of IRODS high-speed transfers (100GB in ~30min)* Share any data instantly from within iPlant iPlant Data Store Overview

10 Multiple ways to access for varied skill levels iPlant Data Store Overview Command linePoint-and-click Discovery Environment iDrop Desktop iCommands

11 iPlant Data Store Overview Texas Replication Arizona Key component of your data management Worry Free! Some important things we will not “see” in the demo SourceDestination Copy Method Time (seconds) CDMy Computercp320 Berkeley ServerMy Computerscp150 External DriveMy Computercp36 USB2.0 FlashMy Computercp30 iPlant Data StoreMyComputeriget18 My Computer cp15 Close to optimum conditions; transfer between Univ. of Arizona and UC Berkeley 100GB: 29m15s, 1 GB / 17.5 seconds Data TransfersData Backups

12 iPlant Data Store Overview Some important things we will not “see” in the demo http://www.speedtest.net/ Local connections and institutional policies limit data transfers

13 User perspectives and potential applications Bench Scientist Bioinformatician Uploads all of his fastq files along with 50gb of root growth videos Shares his analysis results with his thesis advisor Core Facilities iPlant Data Store Overview Images from personas based on: Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies PLOS Biology DOI: 10.1371/journal.pcbi.1003496 Creates a metadata template for assembled genomes her students and collaborators will place in a shared folder Uses public links in supplemental materials for her publications Develops a script to automate transfer of data to core users Uses a shared folder to make large datasets accessible

14 Hands-on demo iPlant Data Store Overview

15 Hands-on demo: Managing “Big” Data Import files from a URL Upload/Download large files Share data View and manage file metadata By the end of this demo you will know how to:

16 iPlant Data Store Overview Hands-on demo: Managing “Big” Data Packet pages 8-13 (iDrop, Sharing) & 29-32 (URL Import, iDrop Lite, Managing and Adding Metadata) 1. Upload/Download files Via DE Via iDrop iCommands 2. Import files from a URL (http://goo.gl/w7sIWL)http://goo.gl/w7sIWL 3. Share data via a public link via the DE (practice with a neighbor) 4. View, generate and manage metadata Demo Components

17 Time for Summaries and Tips? iPlant Data Store Overview

18 Searching in the Discovery Environment Basic search bar searches all files and folders where you have permission Advanced search allows searching based on metadata, permissions, and share status Create auto-updated ‘smart’ folders based on searches

19 Summary: Upload and Download In the Discovery Environment  ‘Simple’, for small files (~ 5 files, <1.9 GB)  ‘Bulk’, for larger files and folders (<10GB)  Import from URL (no size limit) Advantage + Disadvantage - Covers most upload/download sharing needs Point and Click Some size/speed limitations

20 Tips Spaces /Special Characters Many software packages are sensitive to spaces in file names and/or the special characters below. Rename uploaded files before using them in an analysis. Good advice for any transfer method. ~` !@# $ % ^& *()+ = {}[]|\:;"'<>,?/

21 Summary: Faster Transfers iDrop  Drag and Drop files and folders  File sizes up to your total allocation  Fast transfers  Synchronize folders with Data Store Advantage + Disadvantage - Upload/download large / many files Sharing and permission features more complex

22 Summary: Sharing Files in the Data Store Discovery Environment Sharing Sharing via Public Link Share files/folders instantly Control access permissions Manage sharing between collaborators No iPlant account required Limited to individual files URLs are public (less secure, can revoke) 2 Easy ways to share data from the Discovery Environment

23 Tips When sharing, use this chart to decide appropriate permissions PermissionReadDownloadMetadataRenameMoveDelete Read Write Own

24 Viewing and Editing Metadata In the DE User metadata stored AVUs Attribute – Value – Unit Template-based metadata

25 Tips Can only use one template at a time (this will change) Can create custom metadata templates Not just for data management, think reproducibility! Metadata in the DE

26 Detailed instructions with videos, manuals, documentation in Learning Center Keep asking: ask.iplantcollabortive.org

27 The iPlant Collaborative is funded by a grant from the National Science Foundation Plant Cyberinfrastructure Program (#DBI-0735191).


Download ppt "The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data."

Similar presentations


Ads by Google