Presentation is loading. Please wait.

Presentation is loading. Please wait.

CEDPS Data Services Ann Chervenak USC Information Sciences Institute.

Similar presentations


Presentation on theme: "CEDPS Data Services Ann Chervenak USC Information Sciences Institute."— Presentation transcript:

1 CEDPS Data Services Ann Chervenak USC Information Sciences Institute

2 2 Goals of CEDPS Data Area Assist DOE applications with petascale data management requirements Includes assisting with evaluation and deployment of existing services Globus GridFTP for secure, efficient data transfer Replica Location Service for data registration and discovery Data Replication Service Condor NeST, etc. Development of new functionality Improvements to GridFTP for better resource management Policy-driven data placement services

3 3 New Data Services in CEDPS Develop tools and techniques for reliable, high- performance, secure, and policy-driven placement of data within a distributed science environment Managed Object Placement Service — enhancement to today’s GridFTP—that allows for management of: Space Bandwidth Connections Other resources needed to endpoints of data transfers Data placement and distribution services that implement different data distribution and placement behaviors

4 4 Extending GridFTP: The Managed Object Placement Service (MOPS) Functionality that will be added Adding Resource management to GridFTP Memory usage limitation Enforce appropriate storage usage Enforce appropriate bandwidth usage Eliminates the potential to consume too many system resources Bandwidth and storage reservation Transfer scheduling

5 5 MOPS Released under the CEDPS project MOPS 1.0 is available at http://www.cedps.net/wiki/index.php/Software Includes: Optimization for lots of small files transfer Globus fork (Gfork) - inetd like service that allows state to be maintained across connections Gfork plugin for GridFTP - allows for dynamic addition/removal of data movers, limit memory usage Lotman - manage storage GridFTP plugin to enforce storage usage policies using lotman

6 6 GridFTP - New Features GridFTP over UDT Users can substitute UDT for TCP UDT provides a reliable layer on top of UDP 4-5 times performance improvement over TCP GridFTP over SSH Globus-url-copy (GridFTP client) uses the standard ssh program to remotely start GridFTP server as user stdin/out becomes the control channel No data channel authentication GridFTP Where there’s FTP (GWFTP) A proxy server that allows use of any FTP client to transfer data to/from GridFTP server GFork An inetd like service and allows sharing of state between sessions

7 7 Data Placement Services: Motivation Scientific applications often perform complex computational analyses that consume and produce large data sets Computational and storage resources distributed in the wide area The placement of data onto storage systems can have a significant impact on performance of applications reliability and availability of data sets We want to identify data placement policies that distribute data sets so that they can be staged into or out of computations efficiently replicated to improve performance and reliability

8 8 Layered Data Placement Architecture Decide where to place objects and replicas in the distributed Grid environment Policy-driven, based on needs of application and the Virtual Organization Effectively creates a placement workflow that is passed to the Reliable Distribution Service Layer for execution

9 9 Higher-Level Data Placement Services Recently released first generation of data placement service Seeking application input on requirements for placement services they need “ Data Placement for Scientific Applications in Distributed Environments, ” Ann Chervenak, Ewa Deelman, Miron Livny, Mei-Hui Su, Rob Schuler, Shishir Bharathi, Gaurang Mehta, Karan Vahi, in Proceedings of Grid 2007 Conference, Austin, TX, September 2007.

10 10 Summary of CEDPS Data Services Goal is to assist DOE applications with petascale data management requirements Help applications evaluate and deploy existing services (GridFTP, RLS, etc.) New development to meet additional application requirements Improvements to GridFTP for better resource management Policy-driven data placement services Actively seeking DOE applications to use services and help define requirements


Download ppt "CEDPS Data Services Ann Chervenak USC Information Sciences Institute."

Similar presentations


Ads by Google