An Open Standards-based Scalable Heavy Lifting Data Transfer Service for e-Research David Meredith, Peter Turner, Alex Arana, Gerson Galang, David Wallom,

Slides:



Advertisements
Similar presentations
© 2007 Open Grid Forum SAGA: Simple API for Grid Applications Steven Newhouse Application Standards Area Director.
Advertisements

© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
Multiple Processor Systems
WS-JDML: A Web Service Interface for Job Submission and Monitoring Stephen M C Gough William Lee London e-Science Centre Department of Computing, Imperial.
Current status of grids: the need for standards Mike Mineter TOE-NeSC, Edinburgh.
3rd Campus Grid SIG Meeting. Agenda Welcome OMII Requirements document Grid Data Group HTC Workshop Research Computing SIG? AOB Next meeting (AG)
REST Introduction 吴海生 博克软件(杭州)有限公司.
Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
Dr. Kalpakis CMSC 621, Advanced Operating Systems. Fall 2003 URL: Distributed System Architectures.
JSAGA2 Overview job desc. gLite plug-ins Globus plug-ins JSAGA hidemiddlewareheterogeneity (e.g. gLite, Globus, Unicore) JDLRSL.
UK Campus Grid Special Interest Group Dr. David Wallom University of Oxford.
XSEDE 13 July 24, Galaxy Team: PSC Team:
GridFTP Introduction – Page 1Grid Forum 5 GridFTP Steve Tuecke Argonne National Laboratory.
A JSDL Applications Repository and Data Staging Portal: Some New Parameter Sweep Developments and Data transfer Requirements David Meredith STFC e-Science.
Distributed components
Computer Science 162 Section 1 CS162 Teaching Staff.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
David Meredith 1, Stephen Crouch 2, Peter Turner 3, Gerson Galang 4, Ming Jiang 5, Hung Nguyen 6 1 NGS, Science and Technology Facilities Council, Daresbury.
GridFTP Guy Warner, NeSC Training.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Microsoft Visual Studio 2010 Muhammad Zubair MS (FAST-NU) Experience: 5+ Years Contact:- Cell#:
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Microsoft Visual Studio 2010 Muhammad Zubair MS (FAST-NU) Experience: 5+ Years Contact:- Cell#:
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
© 2008 Open Grid Forum Independent Software Vendor (ISV) Remote Computing Primer Steven Newhouse.
London e-Science Centre GridSAM Job Submission and Monitoring Web Service William Lee, Stephen McGough.
GRNET Greek Research & Education Network GRNET Simple Storage – GSS Ioannis Liabotis, Panos Louridas Amsterdam, June 2007.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
The Grid computing Presented by:- Mohamad Shalaby.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 Moving Beyond Campus Grids Steven Young Oxford NGS.
The NGS Grid Portal David Meredith NGS + Grid Technology Group, e-Science Centre, Daresbury Laboratory, UK
GridFTP Richard Hopkins
Kemal Baykal Rasim Ismayilov
Easy Access to Grid infrastructures Dr. Harald Kornmayer (NEC Laboratories Europe) Dr. Mathias Stuempert (KIT-SCC, Karlsruhe) EGEE User Forum 2008 Clermont-Ferrand,
A Managed Object Placement Service (MOPS) using NEST and GridFTP Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu SCIDAC.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
The NGS Grid Portal David Meredith NGS + Grid Technology Group, e-Science Centre, Daresbury Laboratory, UK
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Exposing WS-PGRADE/gUSE for large user communities Peter Kacsuk, Zoltan Farkas, Krisztian Karoczkai, Istvan Marton, Akos Hajnal,
Introduction to Distributed Platforms
StoRM: a SRM solution for disk based storage systems
Vincenzo Spinoso EGI.eu/INFN
Data Bridge Solving diverse data access in scientific applications
FJPPL Lyon, 13 March 2012 Sylvain Reynaud, Lionel Schwarz
Introduction to Data Management in EGI
WS-PGRADE for Molecular Sciences and XSEDE
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
University of Technology
Chapter 3: Windows7 Part 4.
File Transfer Olivia Irving and Cameron Foss
Status and Future Steps
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Training for developers of X-Road interfaces
Service-Oriented Computing: Semantics, Processes, Agents
Internet Protocols IP: Internet Protocol
Service-Oriented Computing: Semantics, Processes, Agents
Presentation transcript:

An Open Standards-based Scalable Heavy Lifting Data Transfer Service for e-Research David Meredith, Peter Turner, Alex Arana, Gerson Galang, David Wallom, Phil Kershaw, Weijing Fang, Ally Hume, Mario Antonioletti, Steve Crouch

Problem Moving data is a growing problem Data increasing in size – difficult to move about – Storage – Network Initiating data transfers across different protocols (data onto/off grids) from a range of clients – Remote user - desktop, portal – Grid + Web e.g. copy from beam-line data resource to my home storage lab Cant do transfer through clients – not scalable Need something lightweight for users

Users/Use Cases For users from e.g.: – Diamond Synchrotron, STFC – Australian Synchrotron Facility Use Cases: – Hermes (e.g. Oxford Anatomy Institute of Biology – not wanting to deploy whole other machine to do this – 100gbs of data. They want desktop client to do this) – NGS Portal – Any Commons VFS-style Client – SAGA client?

High-level Requirements Properties: – Scalable – Durable/Reliable – Asynchronous – Support protocols: ftp/sftp/http/https/gsiftp/SRB/iRODS/SRM Core requirement: third party transfer needs to be cross-platform (e.g. SRB -> gsiftp) Construct XML that specifies requirements, send to 3 rd party service for asynchronous

Realisations Need to discuss at a high-level – separate into particular layers – Top-level service, scheduling/movement – I/fs to individual data protocols (i.e. thru VFS) Could go to data service providers and ask them to support 3 rd party – But process could take too long – The tech is already out there Would this go into UMD (Unified Middleware Distribution)? They want all projects using eu- funded e-Infrastructure

SRB/ FTP SFTP/ GSIFTP VFS/Saga client, e.g. Portal/Hermes File operations (list, upload, download, delete, rename) Bit pipe (byte IO stream) Authentication tokens (un/pw, x509?) Auth tokens only in memory on one server. Self contained. Piping bytes via client is bottleneck, single point of failure, concurrency issues). Current Cross Protocol File Transfer – Data is buffered through the client, this does Not Scale and is synchronous ! Client provides single interface to different remote file systems (Srb GsiFtp, Ftp, Sftp).

SRB/ FTP SFTP/ GSIFTP VFS/Saga client VFS workers JMS QUEUE behind WS-I interface Required / Suggested Architecture Asynchronous, no concurrency issues, no data buffered via client ! File operations (list, upload, download, delete, rename) Bit pipe (byte IO stream) Authentication tokens (un/pw, x509?) Move file transfers to different server (farm), increase bandwidth, concurrency. Passing auth tokens around in messages (strong security required) Development / testing.

Work to date Data transfer currently done via e.g. Hermes Client Commons VFS provides ftp/sftp/HTTP/HTTPS/webdav/gsiftp Will always need clients via interface e.g. Portal, Hermes, VFS client but have transfer via scalable third party service – Asynchronous, poll for progress – Architecture: underlying VFS code exists, deployed into service- oriented, scalable manner Standards-driven? – OGSA-DMI – JSDL GridSAM compute-focused

DataMINX DTS – Heavy Lifting Data Transfer Service This is just one possible implementation of this, GridSAM another? Under discussion last 4 days JMS-based scalability for asynchronously/in parallel moving data – DTS web service submits to JMS queue – DTS worker nodes (VFS clients) picks up JMS transfer msgs – Can specify in JMS queue direct machines to perform transfer Within J2EE environment Abstractions with target URIs – Through shared connection pool per machine – One connection to target URI

Other Possible Solution Paths GridSAM does some but not all gLite File Transfer Service – does this on a large scale Stork – Supports ftp/http/fsiftp/nest/srb/srm/csrm/unitree – But not web service – suitable? Alan W – Vbrowser – Hermes-esque? DW: Cloud-based (e.g. Amazon solution?) AH: Parallelisation in OGSA-DAI for compute, here is parallelisation for data – GridSAMs data transfer is not parallelised – Could have job that just moves data – but cannot guarantee network availability on worker nodes, and not architecturally ok If one web service supports a single protocol, just extend it

Issues Its a big problem with a big suggested solution – lots of developer work Need to think about failure use cases – Worker nodes fails – JMS gives you isolation from service failure through tested, transaction-based durability – Need to discuss and uncover other failure cases Specs – do they cover all the use cases? – JSDL/HPC File Staging Profile, OGSA-DMI? – Interfaces limited?

Next Steps (Within CW) Recommend further session (Mario, Steve C, Ally, David M, Peter T, Alex A, Gerson G, David W, Weijian F): – Have others critique the design work over last 4 days – Possible subdivision for detailed issues – High-level requirements discussion – Implementation/specification Go over issues with schema specs, possible ways forward Possible architectures that can assist the problem now – Stork!

Next Steps (Out of CW) Spec issues: – Schedule discussion within OGSA-DMI WG (Mario to organise) – HPC File Staging Profile/JSDL WGs (David M/Steve C to organise) – DW: attend the OGF PGI sessions – they will be observing & championing necessary changes to JSDL/HPC Profile (Steve C)