Presentation on theme: "Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation."— Presentation transcript:
Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation by Globus Team within GT2 n Focus on performance and features u Implementation by EDG team in a Web Services Framework n Focus on manageability and robustness u No interoperability due to differences in communication protocols and language bindings u EDG implementation chosen to build grid catalog for POOL – January 2003
RLS then (Jan 2003) : EDG & Globus Impl. u Globus RLS u C-based daemon-style technology u C-language binding, java through JNI u MySQL-only backend implementation u Supports LRCs and RLIs. u Uses proprietary Globus Toolkit 2 (GT2) protocols for network communications u N:M logical to physical filename mapping u Schema is not designed to support GUIDs and aliasing u Evolution is very hard due to hardcoded schema and SQL, code change required u WP2 RLS u Java-based technology u Native C, C++, Java, Perl, Python bindings u MySQL and Oracle support, easy to extend to more DBMS u LRC only at the moment. Support planned for RLIs. u Uses Web Service protocols for network communication Small client, no dependencies (on GT2 or others) u N:1:M logical to physical file mapping u Schema has natural support for GUIDs and alias-aliasing u Evolution is easier, no code change necessary. and SQL, SQL is in configuration file.
RLS now (June 2002) : EDG & Globus Impl. u Globus RLS u C-based daemon-style technology u Native C and java bindings u MySQL and Postgres backend implementation u Supports LRCs and RLIs. u Uses proprietary Globus Toolkit 2 (GT2) protocols for network communications u N:M logical to physical filename mapping u Schema is not designed to support GUIDs and aliasing u Evolution is very hard due to hardcoded schema and SQL, code change required u WP2 RLS u Java-based technology u Native C, C++, Java, Perl, Python bindings u MySQL and Oracle support, easy to extend to more DBMS u Supports LRCs and RLIs. u Uses Web Service protocols for network communication Small client, no dependencies (on GT2 or others) u N:1:M logical to physical file mapping u Schema has natural support for GUIDs and alias-aliasing u Evolution is easier, no code change necessary. and SQL, SQL is in configuration file.
RLS : Which one for which user? u Differences between functionality growing smaller n Commitment on both sides to implement new functionality in a interoperable way e.g. bulk upload, new query mechanisms u WP2 / CERN IT-DB not able to support external (non-EDG / LCG) customers u Deployment model still different n Major outstanding technical difference, which will be resolved with GT3 u Choice probably comes down to what components you already use
Interoperability u Had meeting with Globus after CHEP n Agreed to do it now would require lots of extra work – wrapper code to hide differences in network protocols n GT3, and GGF standards will make this easier u Second meeting scheduled mid July 2003 u Interoperability a strategic goal u Aim for full interoperability within the context of OGSA
EDG WP2 Work Schedule (1/3) u April 2003 : EDG 2.0 - First release of new data management framework u Services Deployed n Replica Location Service (Local Replica Catalog) n Replica Metadata Catalog n Replica Optimisation Service n Replica Manager u Issues: n No security integration (authentication + authorization) n Single Local Replica Catalog/ Replica Metadata Catalog per VO
EDG WP2 Work Schedule (2/3) u July 2003 : EDG 2.1 – Focus on missing functionality from EDG 2.0 u July 11 – Replica Location Indices n LRC pushes updates to registered RLIs n EDG Replica Manager supports multiple LRCs and RLIs u July 22 – Security n Integrate VOMS n Deployment of EDG Trust Manager into tomcat – authentication available for all java based services. R-GMA, SE also using this. n Deployment of EDG Authorization Manager to allow services to make course grained authorization decisions
EDG WP2 Work Schedule (3/3) u October 2003 u RLS Service Proxy - Hide the interaction with n RLI and remote LRCs n Information service u Removes complexity and duplicated code from n EDG Replica Manager n POOL n Grid File Access Library u The Grid look like two Local Replica Catalog n One is LRC for local site n One acts as a proxy for all other LRCs in the grid
Problems in current architecture (1/2) u RLS complexity n The RLS will have a large set of LRCs, RLIs running across many sites n Currently the client (e.g. EDG Replica Manager or POOL) will need to manage these interactions u Client failures n All failures are managed at the client side n if the client itself fails, there is no means of recovery - no state can be read back u Scalability of transfer n If each client is allowed to issue GridFTP requests in a fabric, the network will be saturated n Currently no way to find out whether a given file is already being replicated
Problems in current architecture (2/2) u Outbound connectivity n The worker nodes need to have direct outbound connectivity for the Replica Manager to work. n This is not a given in all fabrics u Failure upon unreachable remote service (RMC) n The RMC will be deployed only at one site n This means jobs can fail if network between site and RMC breaks
Possible Solutions (1/2) u RLS complexity n Service for RLS which acts as a proxy to RLI and all remote LRCs u Client failures n Better client side libraries which handle network retries n Hide/manage the network and service related exceptions u Scalability of transfer n Single service at each site could schedule replications, and block requests for that file until it arrives
Possible Solutions (2/2) u Outbound connectivity n SOAP proxy solves problem for all services n Data transfer (gridftp) still a problem u Failure upon unreachable remove service (RMC) n WAN-Distributed database n Distributed messaging system to store actions at worker node site and handle retries n Use vendor supplied solutions rather than re-invent
Outstanding Components u Collection management n Confined Collections s Limited to PFNs all stored on the same LRC s Could be considered as directories s Allows a user to replicate a set of PFNs from one site to another n Free Collections s PFNs which could be stored on any site s Are these needed? Use cases not clear u Replica Subscription Service n Provides functionality of GDMP from EDG Release 1 n Allows background third party replication between SRMs
Grid File Access Library u A solution to the Grid Open problem u POSIX API for opening a file in the Grid u Hides the complexity of n Replica Metadata Catalog n Replica Location Service n Storage Resource Manager n MSS backends n File access Protocols s file, rfio, dcap, root,
GFAL Overview Physics Application Replica Catalog Client SRM Client Local File I/O rfio I/O dCap I/O Grid File Access Library (GFAL) SRM Service dCap Service rfio Service RC Services MSS Service Local Disk POSIX I/O Wide Area Access VFS root I/O