MANAGING, SHARING, AND PUBLISHING DATA WITH THE CYVERSE DATA STORE

Slides:



Advertisements
Similar presentations
Services Course Windows Live SkyDrive Participant Guide.
Advertisements

IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
Managing Data with iPlant Introduction to Uploading, Downloading, Sharing, and Metadata in the Data Store.
Creating an AMI at Amazon’s EC2 Joe Steele
George Blank University Lecturer. Creating A Web Site at NJIT Professor Blank.
Presented by Mina Haratiannezhadi 1.  publishing, editing and modifying content  maintenance  central interface  manage workflows 2.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IT FORUM March 23, 2010 RoyalDrive Tony Gazoo Applications Administrator IT Development & Applications.
Week 11 Further Web Design Concepts and Tools FTP, CMS, Wordpress and Responsive Web Design.
Customized cloud platform for computing on your terms !
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
Home Media Network Hard Drive Training for Update to 2.0 By Erik Collett Revised for Firmware Update.
Course ILT Internet/intranet support Unit objectives Use the Internet Information Services snap-in to manage IIS, Web sites, virtual directories, and WebDAV.
Bonrix SMPP Client. Index Introduction Software and Hardware Requirements Architecture Set Up Installation HTTP API Features Screen-shots.
| nectar.org.au NECTAR TRAINING Module 5 The Research Cloud Lifecycle.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iCommands and Other Data Store Resources.
FTP Server and FTP Commands By Nanda Ganesan, Ph.D. © Nanda Ganesan, All Rights Reserved.
Creating and Publishing Your own web site PC Version SEAS 001 Professor Ahmadi.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop – Part 2 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 29, 2015,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
Tour Overview Introduction Collage Basics Collage Basics (Templates and Tools) Computer Configuration Bookmark Collage Getting Started Tour Collage Terminology.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of the iPlant Data Store.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
IPlant Collaborative Hands-on Cyberinfrastructure Workshop - Part 1 R. Walls University of Arizona Biodiversity Information Standards (TDWG) Sep. 28, 2015,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store – Managing Your ‘Big’ Data.
Sync and Exchange Research Data b2drop.eudat.eu This work is licensed under the Creative Commons CC-BY 4.0 licence B2DROP EUDAT’s Personal.
Intro to Datazen.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B 2 DROP User.
CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
IPlant Collaborative Tools and Services Workshop Overview of the iPlant Discovery Environment Sriram Srinivasan.
Here are some things you can do while you wait 1.Open your omeka.net site in your browser (e.g. 2.Open.
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
CloudBerry Explorer for S3. CB Explorer Free to use Browse and manage files PowerShell functions Open and edit files  CloudBerry Explorer is an easy.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No B 2 DROP User.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Atmosphere Joslynn Lee – Data Science Educator Cold Spring Harbor Laboratory,
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store Overview.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee, Ph.D. – Data Science.
Transforming Science Through Data-driven Discovery Tools and Services Workshop Data Store – Managing your ‘Big’ Data Joslynn Lee – Data Science Educator.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
Transforming Science Through Data-driven Discovery Using CyVerse Cyberinfrastructure to Enable Data Intensive Research, Collaboration, and Education Atmosphere.
VCL Best practices Lee Toderick, Department of Technology Systems
Welcome to Microsoft Office 365.
bitcurator-access-webtools Quick Start Guide
Chapter 10: Web Basics.
File Management in the Cloud
BEST CLOUD COMPUTING PLATFORM Skype : mukesh.k.bansal.
Chapter 10: Web Basics.
CyVerse Tools and Services
Tools and Services Workshop
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
Lecture: Protocols in Detail
Evolution of Internet.
You’ve created an e-Learning resource...
Tools and Services Workshop
Tools and Services Workshop Overview of the iPlant Data Store
FTP - File Transfer Protocol
Bomgar Remote support software
Data uploading and sharing with CyVerse
OneDrive for Business User Guide
SRA Submission Pipeline
INSTALLING AND SETTING UP APACHE2 IN A LINUX ENVIRONMENT
More than just File Sync and Share.
Configuring Internet-related services
Unit# 5: Internet and Worldwide Web
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
bitcurator-access-webtools Quick Start Guide
Presentation transcript:

MANAGING, SHARING, AND PUBLISHING DATA WITH THE CYVERSE DATA STORE CyVerse Focus Forum http://www.cyverse.org/blog/events/webinar-managing-sharing-and-publishing-data-cyverse-data-store Ramona Walls, Tony Edgin, Nirav Merchant Sep. 15, 2017

Topics Overview of the CyVerse Data Store Uploading and downloading data Managing data Publishing data A future Focus Forum will cover accessing the Data Store via API, iRODS federation, and content delivery Uploading and downloading data to CyVerse using the Discovery Environment (DE, our scientific analysis and data management web interface), iCommands (a command line tool), CyberDuck (an open source desktop client), and FUSE (an open source tool for viewing and editing cloud-based file systems that works on Mac or Linux). We’ll cover working with data in Atmosphere virtual machines and connecting data to genome browsers Strategies and best practices for managing data using CyVerse tools for sharing and organizing data, including metadata in the DE. How to publish data using CyVerse, including publishing sequence data to NCBI, how to request a DOI in the CyVerse Data Commons, and how to create a Community Released Data folder in the Data Commons, so others can find/re-use your data.

BisQue Discovery Environment Data Store Data commons Atmosphere

CyVerse Data Store ~ 2.5 PB of data ~ 90 million files Growing at about 600 GB / day Built on the open source iRODS platform

Moving data in and out of CyVerse Uploading and downloading data to CyVerse using: Discovery Environment (DE, web interface) CyberDuck (desktop client) iCommands (command line) Atmosphere (virtual machines) FUSE (cloud-based file system) – not recommended for most uses Download from the Data Commons

Discovery Environment Home page: https://de.cyverse.org/de/ Manual: https://wiki.cyverse.org/wiki/display/DEmanual/Table+ of+Contents https://wiki.cyverse.org/wiki/display/DEmanual/Manag ing+Data+Files+and+Folders

DE - uploads Simple upload – up to 5 files, each <1.9 GB Bulk upload – use another method Import from URL example: ftp://ftp.gramene.org/pub/gramene/archives/PAST_RELEASES/rel ease39/data/fasta/brassica_rapa/cdna/README for password protected sites, can include username and password, but not recommended ftp://username:password@hostname/$URL The URL being opened may be determinable by other users on the same machine on which you are browsing (as from a command line). The URL retrieved from the remote machine may be logged in some non-secure place on the remote machine. Your browser history would then also contain a copy of your password.

DE - downloads Simple upload – up to 5 files, each <1.9 GB Bulk upload – use another method

CyberDuck For Mac or Windows users Not developed by CyVerse, works with iRODS Recommend using the latest version. Installation - see instructions at: https://wiki.cyverse.org/wiki/display/DS/Using+Cyberduck+for+ Uploading+and+Downloading+to+the+Data+Store Configuration Download the configuration file Enter connection details – keep defaults, add user name and password Choose “Open multiple connections” Can store multiple connections

Using CyberDuck Upload from your computer Download to your computer Anonymous data access – for public data on CyVerse Do not attempt to browse to iplant/home or iplant/! The large number of folders in these directories will cause CyberDuck to hang. Accessing shared data A paid, mounted version of CyverDuck is available – MountainDuck.

iCommands Command line access to iRODS https://wiki.cyverse.org/wiki/display/DS/Using+iCommands iCommands documentation for each command: https://docs.irods.org/4.2.0/icommands/user/

Using iCommands Logging in (iinit) Browsing (icd) Uploading (iput) Downloading (iget) Sharing/permissions (ichmod)

Atmosphere Use iCommands Mount a volume – a virtual hard drive that you attach to one or more instances. New tool: kanki: https://github.com/ilarik/kanki-irodsclient

Atmosphere – using volumes https://wiki.cyverse.org/wiki/display/atmman/Using+Volumes Steps: Create the volume (as part of a project) Click on the volume and attach it to an instance Grant users of the image access to the volume Save data generated on Atmo to your volume When finished, backup and detach the volume Before detaching a volume, be sure to back up your data to the Data Store! Data can be restored to the same or another instance https://wiki.cyverse.org/wiki/display/atmman/Backing+Up+and+Restori ng+Your+Data+to+the+Data+Store

Fuse (Filesystem in Userspace) https://wiki.cyverse.org/wiki/display/DS/Using+FUSE+to+Mo unt+the+CyVerse+Data+Store Mounts a Data Store directory to a local directory. For most use cases, other methods are more efficient.

Download public data from the Data Commons http://datacommons.cyverse.org/ Files <2GB can be downloaded directly For larger files, use one of the methods described above Change any data browsing URL to a direct link to the data by substituting “download” for “browse”: http://datacommons.cyverse.org/browse/iplant/home/shared/commons_repo/cur ated/VertNet_Traits/ReadMe.txt >>> http://datacommons.cyverse.org/download/iplant/home/shared/commons_repo/ curated/VertNet_Traits/ReadMe.txt OR use the DE data service:https://de.cyverse.org/anon- files/iplant/home/shared/commons_repo/curated/VertNet_Traits/ReadMe.tx t

Special topic: connecting data to genome browsers https://wiki.cyverse.org/wiki/display/DEmanual/Viewing+Genome+Files+i n+a+Genome+Browser File types: bam, vcf, gff, gtf, bed, bigBed, and bigWig Browsers: Ensembl, UCSC, IGV, GBrowse, jbrowse, and WashU EPIGenome Browser. Bam and vcf files require a matching index file (bam.bai or vcf.vci) Gff, gtf, bed, bigBed, and bigWig files require that the name of the reference genome's fasta header match the gene name in the genome file. Files must be tagged with the correct info type Fasta infotype files can also be viewed in CoGe (https://genomevolution.org/) If you have an issue, you may need to change https:// to http:// in the URL Demo: use icommands to copy genome file to my dir (bai file is already copied, because it is large) view the info type send to browser – notice that this creates a public link

Commons problems with data transfer University firewalls block access for CyberDuck or iCommands Contact CyVerse support and your university’s IT department Uploading 1000s of files at one time bundle them up before upload using tar command ibun command extracts the tar file in place on the Data Store ibun can also be used to bundle files within the Data Store Time out for very large files Usually a network error Other random problems Make sure the name is unix friendly! (no spaces or special chars)

Publishing Data Make your data FAIR Publish sequence data to NCBI Request a DOI in the CyVerse Data Commons Create a Community Released Data folder in the Data Commons

Default data allocation Lab group on PI’s allocation Community Folder Public Folder Published to Repository Private/ Single user Public/ Many users

Good metadata is key! Follow relevant data and metadata standards Use open source formats Available via web services, or directly from a URL

Publish Data to NCBI SRA (Sequence Read Archive) for raw sequences and alignments (NGS data) https://goo.gl/163Z9L WGS (Whole Genome Shotgun) for incomplete assemblies https://goo.gl/9mJb3N Tutorials walk you through creating a submission package, including BioProject, BioSamples, and data

Request a DOI Is the CyVerse Data Commons right for you? CyVerse Curated data are: Stable “Permanent” linked to a permanent identifier (DOI or ARK) managed by CyVerse staff described using DataCite metadata, plus scientific metadata DOI – Digital Object Identifier Can be used to cite your data Points to the dataset landing page, even if the data moves Go to DC home page and show link See http://datacommons.cyverse.org/

Community Released Data Folders Community Released data are: managed by community members publicly available possibly evolving not permanent described using Dublin Core metadata scientific metadata recommended See http://datacommons.cyverse.org/

Data Management Tips Strategies and best practices for managing data using CyVerse tools: Sharing data Organizing data Using metadata in the DE.

Sharing Data Can share any file or folder with any CyVerse user Don’t need to know their user name Grant read, write, or own permission Create a public link to a file, to share with non-CyVerse users https://wiki.cyverse.org/wiki/display/DEmanual/Sharing+Data+Fi les+and+Folders To share using iCommands, use ichmod Demo sharing in DE

Organizing Data https://pods.iplantcollaborative.org/wiki/display/DC/Using+CyVe rse+for+a+Shared+Project Coming soon: teams Use search to create “smart folders” from metadata https://www.dataone.org/best-practices Using CyVerse in your data management plans: 4. Data dissemination 5. Policies for data sharing, public access, and re-use. 6. Plans for archiving data, samples, software, and other research products.

Using metadata in the DE Add and edit metadata Apply a metadata template – need to publish data Copy metadata from one object to another Apply metadata in bulk – video tutorial: https://goo.gl/7EmhP9

CyVerse is supported by the National Science Foundation under Grants No. DBI-0735191 and DBI-1265383.