High Performance GridFTP Transport of Earth System Grid (ESG) Data 1 Center for Enabling Distributed Petascale Science.

Slides:



Advertisements
Similar presentations
The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.
Advertisements

A. Sim, CRD, L B N L 1 ANI and Magellan Launch, Nov. 18, 2009 Climate 100: Scaling the Earth System Grid to 100Gbps Networks Alex Sim, CRD, LBNL Dean N.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
High Performance Computing Course Notes Grid Computing.
Data Grids Darshan R. Kapadia Gregor von Laszewski
GridFTP: File Transfer Protocol in Grid Computing Networks
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Internet GIS. A vast network connecting computers throughout the world Computers on the Internet are physically connected Computers on the Internet use.
GridFTP Guy Warner, NeSC Training.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Reliable Data Movement Framework for Distributed Science Environments Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
material assembled from the web pages at
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
Globus online Reliable, high-performance file transfer… made easy. XSEDE ECSS Symposium, Dec.12, 2011 Presenter: Steve Tuecke, Deputy Director Computation.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
File and Object Replication in Data Grids Chin-Yi Tsai.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Data Transfers in the ALCF Robert Scott Technical Support Analyst Argonne Leadership Computing Facility.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
A Managed Object Placement Service (MOPS) using NEST and GridFTP Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu SCIDAC.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
GridFTP Guy Warner, NeSC Training Team.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
Climate-SDM (1) Climate analysis use case –Described by: Marcia Branstetter Use case description –Data obtained from ESG –Using a sequence steps in analysis,
New Development Efforts in GridFTP Raj Kettimuthu Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Computing Clusters, Grids and Clouds Globus data service
Introduction to Data Management in EGI
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Presentation transcript:

High Performance GridFTP Transport of Earth System Grid (ESG) Data 1 Center for Enabling Distributed Petascale Science

Description Transfer 10TBs of climate data into the SC09 show floor from three sites – the Argonne Leadership Computing Facility (ACLF), the National Energy Research Scientific Computing center (NERSC) and LLNL. As the data arrives at its destination in the University of Utah’s SC09 booth, it will be stored on disks provided by the Data Direct Networks. Data will be processed using climate data analysis and visualization tool and then publicly displayed along with graphs depicting the characteristics of the transfer.

End-to-End Flow

Scientific Purpose Climate data is moved in this challenge  Climate is a discipline that is highly collaborative, and its datasets are distributed across the globe.  An interesting feature of climate data is that the actual file size is not very large compared to that of other sciences.  Climate researchers, however, need to move hundreds or thousands of files in a single transfer.  Volume of data to be moved across the network is massive. Multiple TB of data from Climate Research Program Coupled Model Intercomparison Project, Phase 3 (CMIP3) is moved  This data was used in the Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (AR4)  This data is used in anticipation of the approaching IPCC Fifth Assessment Report (AR5)

How Computing and Network map into Climate Modeling Efforts Each Climate Modeling task maps onto these strategic objectives from:

Network Challenges in ESG Independent gateways federating metadata and users Individual data nodes responsible for publishing services Designed for model output data sets

Technical Approach and Methods Transfers initiated by the climate community can be between a client and a server or between two remote servers initiated by the user from a third machine. GridFTP and other data movement tools developed by Center for Enabling Distributed Petascale Science (CEDPS) are ideal for these types of transfers GridFTP is optimized for high-bandwidth, wide area networks. Globus implementation of GridFTP provides a software suite optimized for a broad range of data access applications  Including bulk file transfer and data extraction from complex storage systems.

GridFTP Advantages Performance - Orders of magnitude performance improvements over standard FTP  Uses parallel TCP streams and non-TCP protocols such as UDT  coordinated transfer using multiple computers at source and destination. Secure - GridFTP supports the PKI/X.509 based Grid Security Infrastructure (GSI) – simple options to encrypt/integrity check data GridFTP also supports SSH security Robust - Restart markers allow interrupted transfers to restart with minimal delay overhead. Extensible – Clear abstractions to interface with various transport protocols and with different storage systems  Completely shields user from the complexities of underlying storage systems including tape archves such as HPSS

Key GridFTP Features used in the Challenge Concurrency and Pipelining  Allows the client to simultaneously maintain multiple outstanding, unacknowledged transfer commands  Greatly improves performance lots of small files transfers File Request 1 File Request 2 File Request 3 DATA 1 DATA 2 DATA 3 ACK 1 ACK 2 ACK 3 File Request 1 File Request 2 File Request 3 DATA 1 DATA 2 DATA 3 ACK 1 ACK 2 ACK 3 Traditional Pipelining

GridFTP Clients and Netlogger Three different GridFTP clients are used to move the 10 TB data set for the challenge  Globus.org – hosted data movement service  BDM – Bulk Data Mover  Globus-url-copy Netlogger – used to monitor transfers and troubleshoot problems  Distributed performance analysis and troubleshooting  Standard log format and best practices  Log collection tools  Log parser  Data analysis tools

What is the Globus.org Data Movement Service (a.k.a. DataKoa) ? A new Globus data movement service  The same vision, but an updated implementation  Hosted  Domain-independent, multi-use Enables scientists to focus on domain-specific work  Manages technology failures  Sends notifications of interesting events Enables non-experts to easily and efficiently move data  No operations overhead  Minimal user-side software installation  User interfaces require no special expertise  Built-in data transport configuration expertise

GridFTP Server A GridFTP Server B Globus.org Laptop Globus.org Data Movement Service The client connects to Globus.org and submits requests. It can then disappear from the network Globus.org orchestrates the transfer between GridFTP servers.

What is BDM? BDM: Bulk Data Mover  Scalable data movement management tool Calls GridFTP file transfers  Designed for climate community (Earth System Grid) needs Efficient and reliable transfer management from user’s point of view Simple to install and maintain as a novice user Scalable to large in volume Scalable to large in number of files Efficient handling on extreme variance in file sizes Scalable to future performance expectations  Network performance improvements – 100Gbps and beyond  Storage performance improvements – distributed, parallel, SSD, etc.  Multiple transfer protocol support  Able to work with other applications with similar needs Information   Contact: Dean Williams

Globus-url-copy Commonly used command line scriptable GridFTP client Supports various transfer optimizations including parallel TCP streams, concurrent file transfers New features  Fault tolerant Store state in a file Restarting globus-url-copy transfers only the remaining data  Associate multiple physical endpoints with single logical endpoint  Load balance across all the physical endpoints 9/15/09Argonne National Laboratory

NetLogger BWC Deployment ALCF LLNL NERSC LBNL GridFTP servers SC09 Show Floor Data Logs NetLogger DB Plots on the web

Data Direct Networks Silicon Storage Architecture (S2A)

ESnet Science Data Network Good network is as important having the right tools and applications.  needed a good network that would move these datasets at high speeds to the convention center  ESnet was the perfect fit to pull data from national labs Science Data Networks (SDN) and On-Demand Secure Circuit and Advance Reservation System (OSCARS)  guarantees that we will have a dedicated circuit on the network for the duration of the challenge  don’t have to compete with anyone else for bandwidth 9/15/09Argonne National Laboratory

Data Analysis and Visualization The data were analyzed using the Climate Data Analysis Tools (CDAT) developed by Program for Climate Model Diagnosis and Intercomparison (PCMDI) CDAT is a suite of interrelated diagnostic software tools  Flexible, portable, adaptable, efficient, easy-to-use, shareable and free  Capable of operating in a distributed environment 3D Interface provided by the ViSUS plugin developed at the SCI Institute at University of Utah and LLNL  Streaming and progressive data flow  Integrated analysis and illustration tools 9/15/09Argonne National Laboratory

Data Analysis and Visualization Full Video is available at /

Overarching Research Agenda Climate community is expecting to generate petabytes of simulated data for analysis and future climate predictions. In the next few years, climate researchers will be moving terabytes of data to collaborators across the globe for IPCC Fifth Assessment Report (AR5), which will be published in Moving large amounts of data seamlessly, reliably and quickly is required to make sense of the enormous AR5 climate data set  Help scientists understand climatic imbalances and the potential impacts of future climate change scenarios. 9/15/09Argonne National Laboratory

Overarching Research Agenda This demonstration highlights the tools and services that will help them transport their data quickly and reliably Hope that the lessons learned in this experiment will help us to do this better Improve the transport and monitoring tools further and help not only the climate researchers but also other researchers in getting their science done faster than before 9/15/09Argonne National Laboratory