ORNL is managed by UT-Battelle for the US Department of Energy Tools Available for Transferring Large Data Sets Over the WAN Suzanne Parete-Koon Chris.

Slides:



Advertisements
Similar presentations
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Advertisements

SSH Operation and Techniques - © William Stearns 1 SSH Operation and Techniques The Swiss Army Knife of encryption tools…
Rhea Analysis & Post-processing Cluster Robert D. French NCCS User Assistance.
Architecture and Implementation of Lustre at the National Climate Computing Research Center Douglas Fuller National Climate Computing Research Center /
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
GridFTP: File Transfer Protocol in Grid Computing Networks
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Basic Grid Job Submission Alessandra Forti 28 March 2006.
Network Printing. Printer sharing Saves money by only needing one printer Increases efficiency of managing resources.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
ORNL is managed by UT-Battelle for the US Department of Energy Data Management User Guide Suzanne Parete-Koon Oak Ridge Leadership Computing Facility.
John Degenhart Joseph Allen.  What is FTP?  Communication over Control connection  Communication over Data Connection  File Type  Data Structure.
ORNL is managed by UT-Battelle for the US Department of Energy Globus: Proxy Lifetime Endpoint Lifetime Oak Ridge Leadership Computing Facility.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Plans for Exploitation of the ORNL Titan Machine Richard P. Mount ATLAS Distributed Computing Technical Interchange Meeting May 17, 2013.
Automating Student Course Profile & Student Record Report Uploads to GaDOE Chris A. McManigal Camden County Schools Kingsland, GA.
Chapter 31 File Transfer & Remote File Access (NFS)
GridFTP Guy Warner, NeSC Training.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
CSCI 1101 Intro to Computers 6. Local Area Networks.
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
Eos Center-wide File Systems Chris Fuson Outline 1 Available Center-wide File Systems 2 New Lustre File System 3 Data Transfer.
High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,
Introduction to HPC resources for BCB 660 Nirav Merchant
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
2nd April 2001Tim Adye1 Bulk Data Transfer Tools Tim Adye BaBar / Rutherford Appleton Laboratory UK HEP System Managers’ Meeting 2 nd April 2001.
TELE 301 Lecture 10: Scheduled … 1 Overview Last Lecture –Post installation This Lecture –Scheduled tasks and log management Next Lecture –DNS –Readings:
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Katie Antypas User Services Group Lawrence Berkeley National Lab 17 February 2012 JGI Training Series.
Week #3 Objectives Partition Disks in Windows® 7 Manage Disk Volumes Maintain Disks in Windows 7 Install and Configure Device Drivers.
1 TeraGrid Data Transfer Jeffrey P. Gardner Pittsburgh Supercomputing Center
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
Htar Hpss Tape Archiver Client API-based interface written by Mike Gleicher Originally commissioned for LLNL in 2000 Now available as part of the HPSS.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
Cosc 4750 Backups Why Backup? In case of failure In case of loss of files –User and system files Because you will regret it, if you don’t. –DUMB = Disasters.
Data Transfers in the ALCF Robert Scott Technical Support Analyst Argonne Leadership Computing Facility.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, Sam Liston,
GridFTP Richard Hopkins
Hepix LAL April 2001 An alternative to ftp : bbftp Gilles Farrache In2p3 Computing Center
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
ORNL is managed by UT-Battelle for the US Department of Energy OLCF News Suzanne Parete-Koon Oak Ridge Leadership Computing Facility.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
© Geodise Project, University of Southampton, Geodise Compute Toolbox Functions CommandFunctionCommandFunction gd_certinfo.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
GridFTP Guy Warner, NeSC Training Team.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
CCJ introduction RIKEN Nishina Center Kohei Shoji.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
ORNL is managed by UT-Battelle for the US Department of Energy OLCF HPSS Performance Then and Now Jason Hill HPC Operations Storage Team Lead
Advanced Computing Facility Introduction
Compute and Storage For the Farm at Jlab
StoRM: a SRM solution for disk based storage systems
TeraGrid Data Transfer
Architecture & System Overview
CyberShake Study 16.9 Discussion
FTP - File Transfer Protocol
Integration of Singularity With Makeflow
File Transfer Olivia Irving and Cameron Foss
Welcome to our Nuclear Physics Computing System
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Welcome to our Nuclear Physics Computing System
Presentation transcript:

ORNL is managed by UT-Battelle for the US Department of Energy Tools Available for Transferring Large Data Sets Over the WAN Suzanne Parete-Koon Chris Fuson Jake Wynne Oak Ridge Leadership Computing Facility

2 Presentation_name Data Management Users Guide We have organized a Data Management User Guide Data management policy Directory Structures of the filesystems Data transfer Look for this icon on the systems guide page:

3 Presentation_name Network File Service User homeProject home Description Home directories are located in a Network File Service (NFS) that is accessible from all OLCF resource. You login to this location. COMPILE HERE Storage area in the Network File Service (NFS) mounted filesystem intended for storage of data, code, and other files that are of interest to all members of a project. COMPILE HERE Location / ccs/home/$USER. / ccs/proj/[projid] Quota10 GB (default)50 GB PurgeNever Purged and always backed up Never Access Full access to the user, read and execute for the group Full access to user and group.

4 Presentation_name Directory Structure Member WorkProject WorkWorld Work DescriptionScratch areaScratch Area for Sharing data within a project Scratch Area for sharing data between projects. Location$MEMBERWORK$PROJWORK$WORLDWORK Quota10 TB100 TB10TB Purge14 days90 days14 days AccessMay alter permissions to share with project All project members have access All OLCF users can access

5 Presentation_name Data at the OLCF

6 Presentation_name Data Transfer Nodes 4 Interactive dtn 8 Batch schedulable dtn 7 Batch scheduled dtn dedicated just for HSI transfers to/from the hpss. Triggered only from the Titan Login nodes for HSI (not HTAR)

7 Presentation_name Moving to/from the HPSS archive Send a file to the hpss hsi put file.txt Get a file from the HPSS hsi get file.txt data-with-hsi-and-htar/ data-with-hsi-and-htar/ Files over 1TB in size get RAIT- This is like having two copies on tape, so data is not lost in a tape failure, however it takes up less space than two copy.

8 Presentation_name Moving to/from the HPSS archive

9 Presentation_name Batch DTN Example You can script data transfers as part of your workflow. How to Cross submit jobs: The Key is -q host script.pbs which will submit the file script.pbs to the batch queue on the specified host. ss-system-batch-submission/

10 Presentation_name Data Transfer Tools OLCF Available Selection Availability? Handle failure? Authentication? Data Validation? Speed? Scp Rsync Bbcp GridFTP Globus

11 Presentation_name Tool Availability Is the tool available on both client and server? –If not, can I install and do I need to open ports? scp, rsync –Available on most UNIX-like systems bbcp, GridFTP –Requires installation –Binary, rpm, code available Globus –Endpoints –OLCF endpoint olcf#dtn

12 Presentation_name Does the tool handle failure? Large/long transfers should plan for possible timeout/failure ToolRestart scpNo rsync‘--partial’ bbcp‘-a -k’ GridFTP‘-sync’ GlobusYes rsync automatically checks size and modification time Without ‘--partial’ will delete partial files bbcp without ‘-k’, file removed upon failure ‘-a’ create checkpoint file in ~/.bbcp

13 Presentation_name Authentication One time or reoccurring transfer? Workflows –Automate transfer process –Each tool has scriptable command line interface ssh X.509 Certificates –Globus, GridFTP –Globus easier to use differing endpoint certificates

14 Presentation_name Data Validation Verify copied data now or question latter? ToolValidation scpNo rsyncdefault bbcp‘-E md5’ GridFTP‘-sync-level 3’ GlobusYes Expensive scp use md5sum GridFTP Re-transfer ‘-sync –sync-level 3’

15 Presentation_name Data Transfer Software Break the transfer up into multiple parallel streams Speeds for tools: 4 parallel streams: bbcp –s4 GridFTP –p4 SCP rsync BBCP GridFTP

16 Presentation_name Transfer to NERSC

17 Presentation_name Speed: Data Size and Structure How is your data stored? Consider combining many small files into larger files GridFTP increase concurrent FTP connections: ‘-cc’ bbcp use program pipes instead of ‘-r’: Overhead for large numbers of files/directories bbcp -N io 'gtar -c -O –C /local/path DirToTransfer' ’RemoteSys:gtar -x –C /remote/path’

18 Presentation_name Other Considerations Connection between endpoints and firewalls Client/Server configuration –cpu speed, memory Filesystem Shared resources –Variable load, variable transfer times Reduce data to transfer –Should I transfer everything? –Compression depends on data and cost

19 Presentation_name Questions/Feedback We would like to hear from you – Workflow, problems, goals, suggestions – More information –