Presentation is loading. Please wait.

Presentation is loading. Please wait.

N° 1 LCG EDG Data Management Catalogs in LCG James Casey LCG Fellow, IT-DB Group, CERN

Similar presentations


Presentation on theme: "N° 1 LCG EDG Data Management Catalogs in LCG James Casey LCG Fellow, IT-DB Group, CERN"— Presentation transcript:

1 n° 1 LCG EDG Data Management Catalogs in LCG James Casey LCG Fellow, IT-DB Group, CERN james.casey@cern.ch

2 n° 2 LCG Talk Outline u Overview of Data Management components in EDG 2.0 u EDG catalogs n Architecture and features n Implementation details n Deployment choices for LCG u POOL as a client of EDG u Conclusions

3 n° 3 LCG Storage Element Data Management: Basic Functionality Replica Manager Replica Location Service Replica Metadata Catalog Storage Element Files have replicas stored at many Grid sites on Storage Elements. Each file has a unique GUID. Locations corresponding to the GUID are kept in the Replica Location Service. Users may assign aliases to the GUIDs. These are kept in the Replica Metadata Catalog. The Replica Manager provides atomicity for file operations, assuring consistency of SE and catalog contents.

4 n° 4 LCG Storage Element Interactions with EDG 2.0 Components Replica Manager Replica Location Service Replica Optimization Service Replica Metadata Catalog SE Monitor Network Monitor Information Service Resource Broker User Interface or Worker Node Storage Element Virtual Organization Membership Service Applications and users interface to data through the Replica Manager either directly or through the Resource Broker.

5 n° 5 LCG EDG Grid Catalogs (1/2) u Replica Location Service (RLS) n Local Replica Catalog (LRC) s Stores GUID to Physical File Name (PFN) mappings s Stores attributes on PFNs s Many Local Replica Catalogs in Grid n One per Storage Element (per VO) s Tested to 1.5M entries n Replica Location Index (RLI) s Allow fast lookup of which sites store GUID -> PFN mappings for a given GUID s Many Replica Location Index in the Grid n Normally one per Site (per VO), which indexes all LRCs in the Grid s Being deployed as part of EDG 2.1 in July n In the process of integration into Replica Manager, POOL, EDG Job Scheduler s Tested to 10M entries in an RLI

6 n° 6 LCG EDG Grid Catalogs (2/2) u Replica Metadata Catalog (RMC) n Stores Logical File Name (LFN) to GUID mappings – user-defined aliases n Stores attributes on LFNs and GUIDs n One Replica Metadata Catalog in Grid (per VO) s Single point of synchronization – current assumption in EDG model s bottleneck ? - move to replicated distributed database u No Application Metadata Catalog provided n But Replica Metadata Catalog has support for small level of application metadata – O(10) u RMC usage not as well understood as Replica Location Service n Architectural changes likely n Use cases required

7 n° 7 LCG Typical Location of Services in LCG-1 Replica Location Index Local Replica Catalog Storage Element CNAF Replica Location Index Local Replica Catalog Storage Element RAL Replica Location Index Local Replica Catalog Storage Element CERN Replica Location Index Local Replica Catalog Storage Element IN2P3 Replica Metadata Catalog Storage Element

8 n° 8 LCG Catalog Implementation Details u Catalogs implemented in Java as Web Services, and hosted in a J2EE application server n Uses Tomcat4 or Oracle 9iAS for application server n Uses Jakarta Axis for Web Services container n Java and C++ client APIs currently provided using Jakarta Axis (Java) and gSoap (C++) u Catalog data stored in a Relational Database n Runs with either Oracle 9i or MySQL u Catalog APIs exposed as a Web Service using WSDL n Easy to write a new client if we don’t support your language right now u Vendor neutral approach taken to allow different deployment options

9 n° 9 LCG Quality of Service u Quality of Service depends upon both the server software and architecture used as well as the software components deployed on it u Features required for high Quality of Service n High Availability n Manageability n Monitoring n Backup and Recovery with defined Service Level Agreements u Approach n Use vendor solutions for availability and manageability where available n Use common IT-DB solutions for monitoring and recovery n Components architected to allow easy deployment in high-availability environment u A variety of solutions with different characteristics are possible

10 n° 10 LCG Tradeoffs in different solutions Manageability Availability Single Instance MySQL/Tomcat Clustered Oracle 9i/Tomcat Clustered Oracle 9i/9iAS Single Instance Oracle 9i/9iAS

11 n° 11 LCG Current Deployment Plans u EDG n All sites use MySQL/Tomcat single instance solution u LCG-1 n CERN deploys LRC/RLI/RMC on Oracle 9iAS/Oracle 9i single instance n Tier-1 sites invited to use either Oracle 9iAS/Oracle or Tomcat4/MySQL single instance for their LRC/RLIs u CERN IT-DB working on “easy-install” packaging of Oracle n Oracle sees ease of install as a high priority for Oracle 10i s release date - Nov 2003 n Allow deployment of an Oracle based solution without requiring a lot of Oracle expertise u Testing of components for high-availability solution in progress n Based on Oracle 9i n Plan to be available for year-end 2003

12 n° 12 LCG System Architecture – High Availability u Standard n-tier architecture n Front end application layer load-balancer s Oracle 9iAS Web Cache n Cluster of stateless application servers s Oracle 9iAS J2EE container n Clustered database nodes s Oracle 9i/RAC n Shared SAN storage s Fibre Channel storage

13 n° 13 LCG POOL and the Grid u Good match in terms of architecture for usage of EDG Catalogs n A POOL FileID is a GUID in our architecture n Combines features from Replica Metadata Catalog (LFN aliases, GUID attributes) and Local Replica Catalog (GUID to PFN mappings) u EDG provides a C++ library to POOL which provides all functions required to implement a POOL File Catalog u Catalogs deployed for POOL release 1.0 – May 2003 n rlstest.cern.ch n Pre-production quality service u Catalogs deployed for LCG-1 – July 2003 n rlscms.cern.ch, rlsatlas.cern.ch, rlslhcb.cern.ch, rlsalice.cern.ch n Production quality service

14 n° 14 LCG Summary & Conclusions u New data management Architecture deployed as part of EDG 2.0 u Good match with requirements of POOL File Catalog u Focus on manageability and scalability aspects n Can’t wait for OGSA – production deploying by September 2003 n Design will allow evolution into OGSA u POOL and LCG acting as good “real” customers for EDG data management n Validate our components can act outside of an EDG context u LCG-1 will provide hard targets to meet in terms of scalability and reliability n Good testing ground

15 n° 15 LCG Questions ?

16 n° 16 LCG Oracle at Tier-1 Sites

17 n° 17 LCG WP2 Deployment u A farm node running Red Hat Enterprise Linux and Oracle9iAS n Runs Java middleware for LRC, RLI, RLS-service etc. u A disk server running Red Hat Enterprise Linux and Oracle9i n Stores GUID PFN mappings n Data volume for LCG 1 small (~10 5 – 10 6 entries, each < 1KB) n Query / lookup rate low (~1 every 3 seconds) s Projection to 2008: 100 – 1000Hz; 10 9 entries u Site responsible for acquiring and installing h/w and RHEL n $349 for ‘basic edition’ http://www.redhat.com/software/rhel/es/http://www.redhat.com/software/rhel/es/ u CERN will provide distribution kits of Oracle s/w for RHEL together with automatic installation scripts and documentation

18 n° 18 LCG Support Issues u CERN will recommend Oracle books & Training u technet.oracle.com – excellent source of information n Free access, but registration required u Support calls to Oracle via ‘metalink’ (Web) n Can provide read-only access to small number of administrators (e.g. Tier1 sites) s Already very useful for solving problems n Further escalation channeled through CERN-IT-DB s As is done now for other CERN users

19 n° 19 LCG Monitoring & Backup u CERN is moving to Oracle Enterprise Manager (OEM) and Recovery Manager (RMAN) for monitoring and backup respectively u Goal is to have common database setup, monitoring and backup strategy across all servers u Can provide example scripts, guidelines etc, but monitoring, backup & recovery is clearly responsibility of local site

20 n° 20 LCG Oracle 10i u Will be announced in September u Many new features requested by CERN n Native floats & doubles, ULDB, greatly simplified installation (database cloning, no client install etc.), machine independent transportable tablespaces(!), etc u For clients, just copy 2 files to LD_LIBRARY_PATH u For servers, database cloning should simplify and speedup deployment

21 n° 21 LCG Conclusion u Oracle licensing for LCG is a solved issue u Distribution kits are ready u Documentation is ready u We are ready to start working with Tier1 sites on the deployment on Oracle of WP2 services

22 n° 22 LCG Other Distribution Kits u From Oracle 10i: ship client libraries e.g. with POOL? n Potential clients include conditionsDB, POOL with RDBMS backend etc. u Server kit for other applications, e.g. local conditionsDB, local copy of COMPASS event metadata etc. u Full distribution n Will require local experienced DBAs

23 n° 23 LCG Misc.

24 n° 24 LCG The POOL Project u POOL is the LCG Persistency Framework n Pool of persistent objects for LHC n Consists of Several Components s Storage Service s File Catalog s Object-level Collections s Object Cache n Pool had several File Catalog implementations, for different usage patterns and user requirements s XML Catalog s Native MySQL Catalog n POOL wanted a Grid-aware catalog – December 2002 s Looked at EDG Catalogs as a possible solution

25 n° 25 LCG Catalogs and Naming u Catalogs are part of the basic functionality of Grid Data Management n Used for storage of data file location, replication metadata n Used by users directly n Used indirectly through other middleware components such as Job Scheduler and Replica Manager u Many different naming schemes in use for different purposes n Logical File Name (LFN) – An alias created by a user to refer to some accessible resource e.g. “lfn:cms/20030203/run2/track1” n Physical File Name (PFN) – The location of an actual piece of data on a storage system e.g. “gsiftp://pcrd24.cern.ch/flatfiles/cms/output10_1” n Globally Unique Identifier (GUID) – A non-human readable unique identifier e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” Logical File Name 1 Logical File Name 2 Logical File Name 3 GUID Physical File Name n Physical File Name 1


Download ppt "N° 1 LCG EDG Data Management Catalogs in LCG James Casey LCG Fellow, IT-DB Group, CERN"

Similar presentations


Ads by Google