Presentation is loading. Please wait.

Presentation is loading. Please wait.

RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June 2004 - What is the RLS -

Similar presentations


Presentation on theme: "RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June 2004 - What is the RLS -"— Presentation transcript:

1 RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June 2004 - What is the RLS - RLS and POOL - Service Overview - Experience in Data Challenges - Towards a Distributed RLS - Summary

2 Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone What is the RLS The LCG Replica Location Service (LCG-RLS) is the central Grid File Catalog, responsible for maintaining a consistent list of accessible files (physical and logical names) together with their relevant file metadata attributes The RLS (and POOL) refers to files via a unique and immutable file identifier, (FileID) generated at creation time Stable inter-file reference LFN1PFN1 LFN2 LFNn PFN2 PFNn File metadata (jobid, owner, …)

3 Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone POOL and the LCG-RLS POOL is the LCG Persistency Framework See talk from Radovan Chytracek The LCG-RLS is one of the three POOL File Catalog implementations XML based local file catalog MySQL based shared catalog RLS based Grid-aware file catalog A complete production chain deploys several of these Cascading changes from isolated worker nodes (XML catalog) up to the RLS service DC04 used MySQL catalog at Tier1, RLS at Tier0 RLS deployment at Tier1 sites See talk from James Casey

4 Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone RLS Service Goals RLS is a critical service for the correct operation of the Grid! Minimal downtime for both scheduled and unscheduled interruptions Good level of availability at iAS and DB level Meet requirements of Data Challenges In terms of performance (look-up / insert rate) and capacity (total number of GUID-PFN mappings and file-level meta-data entries) Currently, the performance is not limited by the service itself Prepare for future needs and increase reliability/ manageability

5 Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone RLS Service Overview Currently deploys LRC and RMC middleware components from EDG Distributed Replica Location Index not deployed in LCG-2 For now, a central service deployed at CERN RLS uses Oracle Application Server (iAS) and Database (DB) Dedicated farm node (iAS) per VO Shared disk server (DB) for production VOs Similar set-up is used for testing and software certification RLS AppServers (production) RLS AppServers (certification) RLS AppServers (test) production RLS DB (certification) RLS DB (test) spare ALICE ATLAS CMS LHCb DTEAM RLS DB (production)

6 Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone Handling Interventions High level – ‘run like an experiment’: On-call team; primary responsible and backup Documented procedures, training for on-call personnel, daily meetings List of experts to call in case standard actions do not work Planning of interventions Most frequent: security patches iAS: can transparently switch to new box using DNS alias change Used for both scheduled and unscheduled interruptions DB: short interruption to move to ‘stand-by’ DB Total up-time achieved: 99.91% Looking at Standard Oracle solutions for High Availability: iAS clusters and DB clusters Data Guard (for data protection)

7 Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone Experience in Data Challenges The RLS was used for the first time in production during the CMS Data Challenge DC04 (3M PFNs and file metadata stored) ATLAS and LHCb ramping up The service was stable throughout DC04 Looking up file information by GUID seems sufficiently fast Clear problems wrt to the performance of the RLS Partially due to the normal “learning curve” on all sides in using a new system Bulk operations were missing in the deployed RLS version Also, cross-catalog queries are not efficient by RLS design Several solutions produced ‘in flight’ EDG-based tools, POOL workarounds Support for bulk operations now addressed by IT-GD (in edg-rls v2.2.7). POOL will support it in the next release (POOL V1.7)

8 Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone Towards a Distributed RLS RLS in LCG-2 still lacks consistent replication between multiple catalog servers EDG RLI component has not been deployed as part of LCG Central single catalog expected to result in scalability and availability problems Joint evaluation with CMS of Oracle asynchronous database replication as part of DC04 (in parallel to production) Tested a minimal (two node) multi-master system between CERN and CNAF Catalog inserts/update propagated in both directions First Results RLS application could be deployed with only minor changes No stability and performance problems observed so far Network problems and temporary server unavailability were handled gracefully Setup could not unfortunately be tested in full production mode in DC04 due to lack of time/resource

9 Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone Next Generation RLS LCG Grid Deployment group is currently working with the experiments to gather requirements for the next generation RLS Taking into account the experience from DC04 Build on DC04 work: move to replicated rather distributed catalogs? Still need to prove Stability and performance with production access patterns Scaling to a sufficient number of replicas (4-6 Tier1 sites?) Automated resolution of catalog conflicts that may arise as consequence of asynchronous replication Propose to continue evaluation, possibly using Oracle streams in the context of the Distributed Database Deployment activity, in the LCG deployment area

10 Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone Summary The Replica Location Service is a central part of the LCG infrastructure Strong requirements in terms of reliability of the service Significant contribution from GridPP funded people The LCG-RLS middleware and service have passed there first production test Good service stability was achieved Experience in Data Challenge proven to be essential for improving performance and scalability of the RLS middleware Oracle replication tests are expected to provide important input to define replicated RLS and handling of distributed metadata in general

11 Database and Application Services GridPP Meeting, 3rd June 2004Maria Girone The RLS Supported Configuration A “Local Replica Catalogue” (LRC) Contains GUID PFN mapping for all local files A “Replica Metadata Catalogue” (RMC) Contains GUID LFN mapping for all local files and all file metadata information A “Replica Location Index” (RLI) <-- Not deployed in LCG-2 Allows files at other sites to be found All LRCs are configured to publish to all remote RLIs


Download ppt "RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June 2004 - What is the RLS -"

Similar presentations


Ads by Google