Presentation is loading. Please wait.

Presentation is loading. Please wait.

Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.

Similar presentations


Presentation on theme: "Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF."— Presentation transcript:

1 Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF

2 2 Outline  LCG 3D (Distributed Deployment of Databases) project status  Oracle High Availability/Replication features  MySQL High Availability/Replication features  Databases in the GRID  Oracle replication case study: LFC  MySQL replication case study: VOMS

3 3 M M LCG 3D Service Architecture T2 - local db cache -subset data -only local service O O O T1- db back bone - all data replicated - reliable service T0 - autonomous reliable service Oracle Streams http cache (SQUID) Cross DB copy & MySQL/SQLight Files O Online DB -autonomous reliable service F S S SS R/O Access at Tier 1/2 (at least initially) Successfully Implemented Not Implemented Is it possible/interesting to investigate Oracle Heterogeneus Connectivity for Tier-1 to Tier-2 replication?

4 4 Oracle Building Blocks ASM RAC   Each cloud has to guarantee high availability, scalability, fault tolerance   @ CNAF High availability achieved at different levels:   Storage H/W level: RAID, Storage Area Network   Storage Logic level: logical volume manager Automatic Storage Manager   Database level: Real Application Clusters. Database shared among different servers. Load balancing, connection retries, failover implemented in Oracle drivers (quasi-transparent to applications)   Disaster recovery: Recovery MANager backups (RMAN)   Retention policy on disk: 2 days   Retention policy on tape 31 days Availability rate: 98,7% in 2007 Availability (%) = Uptime/(Uptime + Target Downtime + Agent Downtime)

5 5 Master DB Replica DB Queue Redo Log Database Objects Capture Queue Apply Database Objects LCR Oracle Streams Replication Propagation

6 6 MySQL High Availability and replication features   Master – Slave replication:   Referred as Asynchronous replication   Available since 3.23 stable and reliable feature   Some examples of it in GRID production deployment (VOMS)   The original databases is managed by master.   The slave manages a copy of the original databases.   The update queries (update, delete and insert in SQL jargon) must be executed only on the master host.   SQL and update commands are replicated, not the changed data   Multimaster replication   Available since 5.0 new and not fully tested feature   Possible only under particular conditions which allow for simple conflict resolution policies   MySQL cluster   Referred as Synchronous replication   It doesn’t seem to be a stable feature as you can read from the MySQL 5.1 manual “This chapter represents a work in progress, and its contents are subject to revision as MySQL Cluster continues to evolve”   I Know of no MySQL production systems currently deployed as cluster

7 7 Databases in GRID services   Databases are key components of various GRID components (list not exhaustive):   FTS   Database used for data persistency   MySQL and Oracle backends supported, but Oracle is recommended   High availability through Clusters   https://twiki.cern.ch/twiki/bin/view/EGEE/FTS   LFC   MySQL and Oracle backends supported   Both MySQL and Oracle replication supported   https://twiki.cern.ch/twiki/bin/view/LCG/LfcAdminGuide   VOMS   MySQL and Oracle backends supported   Both MySQL and Oracle replication are supported http://www.grid.auth.gr/guides/voms_replication/voms_replication.php

8 8 Oracle replication case study: LFC   LFC: LCG File Catalog is a high performance file catalog which stores LFN GUIDPFN mappings.   Oracle One-way Streams replication is used in WLCG in order to balance the load of LFC read-only requests among different catalog residing in various Tier-1s   The LFC code has been slightly modified in order to prevent an user to accidentally write into a read – only catalog. The only thing an administrator has to do, is to set the variable   RUN_READONLY="yes" in the /etc/sysconfig/lfcdaemon configuration file.   Database replication has to replicate all tables except CNS_USERINFO and CNS_GROUPINFO   In case of write attempts on the read-only LFC, you would get an error: $ lfc-mkdir /grid/dteam/hello cannot create /grid/dteam/hello: Read-only file system   Replication speed requirements are not very strict:   Update frequency ~ 1 Hz   Replication latency < 10 min

9 9 LHCb LFC Replication deployment CERN-CNAF 2 nodes Cluster Replica Oracle DB 6 nodes Cluster Master Oracle DB Oracle Streams Read Only Clients WAN LFC R-O Server LFC R-W Server LFC R-W Server Read Only Clients CERN CNAF LFC R-O Server r/w Clients LFC R-O Server Stress test: insertions at 900 Hz for 24 hours Max latency : 55 sec Mean latency: 15 sec Full Consistency maintained

10 10 MySQL replication case study: VOMS   The Virtual Organization Membership Service server manages authorization data   provides a database of users, groups, roles and capabilities that are grouped in Virtual Organizations (VO's)   users query the VOMS Server in order to get their VO grid credentials (proxy)   read-only operations originated by various command such as voms-proxy-info. They could be balanced across read only VOMS replicas   write operations are originated by mk-gridmap and voms-proxy-init commands   Expected write-rate on the VOMS server is:   1 Hz of voms-proxy-init   Peaks of 100 Hz of mk-gridmap (to be fixed)   A MySQL master-slave replication deployment can be useful for load balancing and fail over in case of read-only operations   VOMS supports MySQL one-way replication. http://wiki.egee-see.org/index.php/SEE-GRID_VOMS_Failover   Some examples of VOMS on replicated MySQL:   LIP (Portugal)   Fermilab   CNAF – INFN Padova (CDF VOMS)

11 11 VOMS replicated deployment   VOMS code has been adapted to MySQL replication, it provides a script which creates a slave MySQL replica, given a Master MySQL and a consistent dump. http://glite.cvs.cern.ch:8180/cgi-bin/glite.cgi/org.glite.security.voms/src/replica/voms_install_replica.in?revision=1.3.4.1&pathrev=glite-security-voms_branch_1_8_0   Concurrent writes   VOMS server has a web component, running in a web container provided by TomCat System that keeps the administration interface.   Problem: The administration interface running on a slave host will update the seqnumber and realtime tables of each VO database.   Solution: Data from those tables must not be replicated to the slave hosts.   replicate-ignore-table=VOMS_seqnumber   replicate-ignore-table=VOMS_realtime   Some stress tests performed by Fermilab: http://cd-docdb.fnal.gov/cgi-bin/ShowDocument?docid=2571   VOMS MySQL successfully queried at 125Hz (10.8M/day)   System load – 0.2, CPU – 10% (dual-core machine)   Simulated failures of one VOMS servers   Disabled network: New requests not routed to failed server   Re-enabled network: server added back to the pool for scheduling   Open connections during service failure are lost   Affected number of connections is very small (1-2)   Simulated failure of MySQL server   After re-enabling server, transaction logs replayed automatically   VOMS on Oracle replication is under test and will be available soon

12 12 Conclusions  Different high availability/redundancy techniques have been tested in WLCG environment and allow for a good availability of GRID database services  Both Oracle and MySQL replication solutions have been deployed in WLCG and offer different solutions in order to address different kind of load  LCG 3D project have developed a Tier-0 to Tier-1 replication but have left the Tier-1 to Tier-2 distribution issues to sites. Do we need to address them?


Download ppt "Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF."

Similar presentations


Ads by Google