Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.

Slides:



Advertisements
Similar presentations
A Ridiculously Easy & Seriously Powerful SQL Cloud Database Itamar Haber AVP Ops & Solutions.
Advertisements

ITEC474 INTRODUCTION.
Database Architectures and the Web
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
Oracle Clustering and Replication Technologies CCR Workshop - Otranto Barbara Martelli Gianluca Peco.
On Replication July 2006 Yin Chen. What is? Why need? Types? Investigation of existing technologies –IBM SQL replication –Sybase replication –Oracle replication.
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Keith Burns Microsoft UK Mission Critical Database.
CS 603 Data Replication in Oracle February 27, 2002.
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
1 Recovery and Backup RMAN TIER 1 Experience, status and questions. Meeting at CNAF June of 2007, Bologna, Italy Carlos Fernando Gamboa, BNL Gordon.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
National Manager Database Services
Implementing High Availability
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Gary MacDougall Premjit Singh Managing your Distributed Data.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
JOnAS developer workshop – /02/2004 status Emmanuel Cecchet
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
/11/2003 C-JDBC: a High Performance Database Clustering Middleware Nicolas Modrzyk
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.
1 Data Guard. 2 Data Guard Reasons for Deployment  Site Failures  Power failure  Air conditioning failure  Flooding  Fire  Storm damage  Hurricane.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
Heterogeneous Database Replication Gianni Pucciani LCG Database Deployment and Persistency Workshop CERN October 2005 A.Domenici
Usenix Annual Conference, Freenix track – June 2004 – 1 : Flexible Database Clustering Middleware Emmanuel Cecchet – INRIA Julie Marguerite.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
INFNGrid Constanza Project: Status Report A.Domenici, F.Donno, L.Iannone, G.Pucciani, H.Stockinger CNAF, 6 December 2004 WP3-WP5 FIRB meeting.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes.
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.
Oracle to MySQL synchronization Gianni Pucciani CERN, University of Pisa.
DATABASE REPLICATION DISTRIBUTED DATABASE. O VERVIEW Replication : process of copying and maintaining database object, in multiple database that make.
Oracle9i Performance Tuning Chapter 11 Advanced Tuning Topics.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
CommVault Architecture
Oracle Clustering and Replication Technologies UK Metadata Workshop - Oxford Barbara Martelli Gianluca Peco.
USEIMPROVEEVANGELIZE Solutions for High Availability and Disaster Recovery with MySQL ● Detlef Ulherr ● Sun Microsystems.
Servizi core INFN Grid presso il CNAF: setup attuale
Jean-Philippe Baud, IT-GD, CERN November 2007
Managing Multi-User Databases
Database Replication and Monitoring
Database Architectures and the Web
LCG 3D Distributed Deployment of Databases
Lead SQL BankofAmerica Blog: SQLHarry.com
Database Readiness Workshop Intro & Goals
Maximum Availability Architecture Enterprise Technology Centre.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Database Architectures and the Web
High Availability/Disaster Recovery Solution
Designing Database Solutions for SQL Server
Presentation transcript:

Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF

2 Outline  LCG 3D (Distributed Deployment of Databases) project status  Oracle High Availability/Replication features  MySQL High Availability/Replication features  Databases in the GRID  Oracle replication case study: LFC  MySQL replication case study: VOMS

3 M M LCG 3D Service Architecture T2 - local db cache -subset data -only local service O O O T1- db back bone - all data replicated - reliable service T0 - autonomous reliable service Oracle Streams http cache (SQUID) Cross DB copy & MySQL/SQLight Files O Online DB -autonomous reliable service F S S SS R/O Access at Tier 1/2 (at least initially) Successfully Implemented Not Implemented Is it possible/interesting to investigate Oracle Heterogeneus Connectivity for Tier-1 to Tier-2 replication?

4 Oracle Building Blocks ASM RAC   Each cloud has to guarantee high availability, scalability, fault tolerance  CNAF High availability achieved at different levels:   Storage H/W level: RAID, Storage Area Network   Storage Logic level: logical volume manager Automatic Storage Manager   Database level: Real Application Clusters. Database shared among different servers. Load balancing, connection retries, failover implemented in Oracle drivers (quasi-transparent to applications)   Disaster recovery: Recovery MANager backups (RMAN)   Retention policy on disk: 2 days   Retention policy on tape 31 days Availability rate: 98,7% in 2007 Availability (%) = Uptime/(Uptime + Target Downtime + Agent Downtime)

5 Master DB Replica DB Queue Redo Log Database Objects Capture Queue Apply Database Objects LCR Oracle Streams Replication Propagation

6 MySQL High Availability and replication features   Master – Slave replication:   Referred as Asynchronous replication   Available since 3.23 stable and reliable feature   Some examples of it in GRID production deployment (VOMS)   The original databases is managed by master.   The slave manages a copy of the original databases.   The update queries (update, delete and insert in SQL jargon) must be executed only on the master host.   SQL and update commands are replicated, not the changed data   Multimaster replication   Available since 5.0 new and not fully tested feature   Possible only under particular conditions which allow for simple conflict resolution policies   MySQL cluster   Referred as Synchronous replication   It doesn’t seem to be a stable feature as you can read from the MySQL 5.1 manual “This chapter represents a work in progress, and its contents are subject to revision as MySQL Cluster continues to evolve”   I Know of no MySQL production systems currently deployed as cluster

7 Databases in GRID services   Databases are key components of various GRID components (list not exhaustive):   FTS   Database used for data persistency   MySQL and Oracle backends supported, but Oracle is recommended   High availability through Clusters     LFC   MySQL and Oracle backends supported   Both MySQL and Oracle replication supported     VOMS   MySQL and Oracle backends supported   Both MySQL and Oracle replication are supported

8 Oracle replication case study: LFC   LFC: LCG File Catalog is a high performance file catalog which stores LFN GUIDPFN mappings.   Oracle One-way Streams replication is used in WLCG in order to balance the load of LFC read-only requests among different catalog residing in various Tier-1s   The LFC code has been slightly modified in order to prevent an user to accidentally write into a read – only catalog. The only thing an administrator has to do, is to set the variable   RUN_READONLY="yes" in the /etc/sysconfig/lfcdaemon configuration file.   Database replication has to replicate all tables except CNS_USERINFO and CNS_GROUPINFO   In case of write attempts on the read-only LFC, you would get an error: $ lfc-mkdir /grid/dteam/hello cannot create /grid/dteam/hello: Read-only file system   Replication speed requirements are not very strict:   Update frequency ~ 1 Hz   Replication latency < 10 min

9 LHCb LFC Replication deployment CERN-CNAF 2 nodes Cluster Replica Oracle DB 6 nodes Cluster Master Oracle DB Oracle Streams Read Only Clients WAN LFC R-O Server LFC R-W Server LFC R-W Server Read Only Clients CERN CNAF LFC R-O Server r/w Clients LFC R-O Server Stress test: insertions at 900 Hz for 24 hours Max latency : 55 sec Mean latency: 15 sec Full Consistency maintained

10 MySQL replication case study: VOMS   The Virtual Organization Membership Service server manages authorization data   provides a database of users, groups, roles and capabilities that are grouped in Virtual Organizations (VO's)   users query the VOMS Server in order to get their VO grid credentials (proxy)   read-only operations originated by various command such as voms-proxy-info. They could be balanced across read only VOMS replicas   write operations are originated by mk-gridmap and voms-proxy-init commands   Expected write-rate on the VOMS server is:   1 Hz of voms-proxy-init   Peaks of 100 Hz of mk-gridmap (to be fixed)   A MySQL master-slave replication deployment can be useful for load balancing and fail over in case of read-only operations   VOMS supports MySQL one-way replication.   Some examples of VOMS on replicated MySQL:   LIP (Portugal)   Fermilab   CNAF – INFN Padova (CDF VOMS)

11 VOMS replicated deployment   VOMS code has been adapted to MySQL replication, it provides a script which creates a slave MySQL replica, given a Master MySQL and a consistent dump.   Concurrent writes   VOMS server has a web component, running in a web container provided by TomCat System that keeps the administration interface.   Problem: The administration interface running on a slave host will update the seqnumber and realtime tables of each VO database.   Solution: Data from those tables must not be replicated to the slave hosts.   replicate-ignore-table=VOMS_seqnumber   replicate-ignore-table=VOMS_realtime   Some stress tests performed by Fermilab:   VOMS MySQL successfully queried at 125Hz (10.8M/day)   System load – 0.2, CPU – 10% (dual-core machine)   Simulated failures of one VOMS servers   Disabled network: New requests not routed to failed server   Re-enabled network: server added back to the pool for scheduling   Open connections during service failure are lost   Affected number of connections is very small (1-2)   Simulated failure of MySQL server   After re-enabling server, transaction logs replayed automatically   VOMS on Oracle replication is under test and will be available soon

12 Conclusions  Different high availability/redundancy techniques have been tested in WLCG environment and allow for a good availability of GRID database services  Both Oracle and MySQL replication solutions have been deployed in WLCG and offer different solutions in order to address different kind of load  LCG 3D project have developed a Tier-0 to Tier-1 replication but have left the Tier-1 to Tier-2 distribution issues to sites. Do we need to address them?