Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Online/offline replication via Oracle Streams WLCG Collaboration.

Slides:



Advertisements
Similar presentations
Monitoring and Testing I/O
Advertisements

Replication solutions for Oracle database 11g Zbigniew Baranowski.
Acknowledgments Byron Bush, Scott S. Hilpert and Lee, JeongKyu
Oracle Architecture. Instances and Databases (1/2)
DB server limits (process/sessions) Carlos Fernando Gamboa, BNL Andrew Wong, TRIUMF WLCG Collaboration Workshop, CERN Geneva, April 2008.
INTRODUCTION TO ORACLE DATABASE ADMINISTRATION Lynnwood Brown System Managers LLC Introduction – Lecture 1 Copyright System Managers LLC 2007 all rights.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
GoldenGate Monitoring and Troubleshooting
Oracle Clustering and Replication Technologies CCR Workshop - Otranto Barbara Martelli Gianluca Peco.
Replication Technologies at WLCG Lorena Lobato Pardavila CERN IT Department – DB Group JINR/CERN Grid and Management Information Systems, Dubna (Russia)
1 - Oracle Server Architecture Overview
Harvard University Oracle Database Administration Session 2 System Level.
Gavin Payne Oracle for SQL Server DBAs. Why Oracle? Installation Physical Storage Backup and Recovery 20 slides in 50 minutes Inside the database Programmability.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
Introduction History The principles of the relational model were first outlined by Dr. E.F Codd in a June 1970 paper is called “A Relational Model of Data.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams new features in 11g Zbigniew Baranowski.
1 Copyright © 2005, Oracle. All rights reserved. Introduction.
1 Data Guard Basics Julian Dyke Independent Consultant Web Version - February 2008 juliandyke.com © 2008 Julian Dyke.
1 Copyright © 2009, Oracle. All rights reserved. Exploring the Oracle Database Architecture.
Database monitoring and service validation Dirk Duellmann CERN IT/PSS and 3D
LCG 3D StatusDirk Duellmann1 LCG 3D Throughput Tests Scheduled for May - extended until end of June –Use the production database clusters at tier 1 and.
Fermilab Oct 17, 2005Database Services at LCG Tier sites - FNAL1 FNAL Site Update By Anil Kumar & Julie Trumbo CD/CSS/DSG FNAL LCG Database.
1 © 2006 Julian Dyke Streams Julian Dyke Independent Consultant juliandyke.com Web Version.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
CSE 781 – DATABASE MANAGEMENT SYSTEMS Introduction To Oracle 10g Rajika Tandon.
ORACLE GOLDENGATE AT CERN
1 Oracle Architectural Components. 1-2 Objectives Listing the structures involved in connecting a user to an Oracle server Listing the stages in processing.
Oracle Tuning Ashok Kapur Hawkeye Technology, Inc.
1 Data Guard. 2 Data Guard Reasons for Deployment  Site Failures  Power failure  Air conditioning failure  Flooding  Fire  Storm damage  Hurricane.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
Outline Introduction to Oracle Memory Structures SGA, PGA, SCA The Specifics of the System Global Area (SGA) Structures Overview of Program Global Areas.
Page 1. Data Integration Using Oracle Streams A Case Study Session #:
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
DB Questions and Answers open session Carlos Fernando Gamboa, BNL WLCG Collaboration Workshop, CERN Geneva, April 2008.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
11 Copyright © 2006, Oracle. All rights reserved. Checkpoint and Redo Tuning.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
INTRODUCTION TO ORACLE DATABASE ADMINISTRATION Lynnwood Brown President System Managers LLC Introduction – Lecture 1 Copyright System Managers LLC 2003.
Oracle Database Architecture By Ayesha Manzer. Automatic Storage Management Spreads database data across all disks Creates and maintains a storage grid.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review and Outlook Distributed Database Workshop PIC, 20th April 2009.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
REMINDER Check in on the COLLABORATE mobile app Best Practices for Oracle on VMware - Deep Dive Darryl Smith Chief Database Architect Distinguished Engineer.
3 Copyright © 2004, Oracle. All rights reserved. Database Architecture Comparison.
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
15/02/2006CHEP 061 Measuring Quality of Service on Worker Node in Cluster Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division,
1 PVSS Oracle scalability Target = changes per second (tested with 160k) changes per client 5 nodes RAC NAS 3040, each with one.
Oracle Clustering and Replication Technologies UK Metadata Workshop - Oxford Barbara Martelli Gianluca Peco.
Chapter 21 SGA Architecture and Wait Event Summarized & Presented by Yeon JongHeum IDS Lab., Seoul National University.
WLCG Collaboration Workshop CMS online/offline replication
Replication using Oracle Streams at CERN
Streams Service Review
STREAMS failover and resynchronization
Oracle Database Monitoring and beyond
Some less known facts about log file sync and other LGWR-related waits
3D Project Status Report
Oracle Streams Performance
Transaction Log Internals and Performance David M Maxwell
Lecture 3: Main Memory.
CS347 Spring 2017 – Quiz 5 Preparation - Solutions UTEID _________
Hybrid Buffer Pool The Good, the Bad and the Ugly
Presentation transcript:

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Online/offline replication via Oracle Streams WLCG Collaboration Workshop January 2007

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Agenda Online/offline replication setup Replication tests Performance Questions

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Setup O2O Transformation Oracle Streams replication

STREAMS Architecture CAPTURE PROCESS APPLY PROCESS user changes REDO LOG log changes capture changes LCRs SOURCE QUEUE DESTINATION QUEUE propagate events LCRs apply changes SOURCE DATABASE TARGET DATABASE (replica) capture stagingconsumption Slide by Eva Dafonte

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Test setup Replication between two dual node RACs Capture process runs on instance #2 of d3r Apply process runs on instance #2 of test1

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Test environment Test replicates ecalpedestals objects 7 objects have to be inserted in a single transaction Each object is 1,63 MB in size, after object-to-relational mapping st_ecalpedestals_item rows Objects are generated by PL/SQL procedure provided by Ricky Egeland We have 428,400 row inserts in a single commit, which corresponds to the number of LCRs LCR size is row-dependant

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Streams monitoring Tools – strmmon – Zbyszek’s monitor – AWR – Logminer – Lemonweb – Oracle Enterprise manager Parameters – LCRs flow – Streams pools size – Redo generated on source and destination databases – Replication time

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Performance Goal : to reduce replication time = increase throughput Seemed that replication time was dependant on interval between test runs Periodical time structure of the results Responsible for differences: KJC Wait events (AWR) Solution – patch set

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Performance Replication time – 300s – With 11.4 MB payload MB/s – 1428 row inserts per second – Average LCR apply rate – 2700/s AWR report – Both CPUs utilized in ~50% – 120 seconds of Streams „AQ: enqueue blocked due to flow control” wait events Lemonweb – Very high I/O during test run (over 1 GB)

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Redo High I/O turned out to be generated by Log Writer (LGWR) process Further investigation has shown that the redo on the destination site was 12 times larger than on the source database Log mining has shown that the redo is generated due to inserting to STREAMS$_APPLY_SPILL_MSGS_PART table Typical redo entry for st_ecalpedestals_item table was 75 bytes vs. 590 bytes for spill queue Redo volume on the destination is ~700 MB, both tables insert-related redo is ~300 MB so we still have 400MB unaccounted for

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication LCRs Flow

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Redo destination

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Spilling Inserting data to STREAMS$_APPLY_SPILL_MSGS_PART table is related to the architecture of the apply process The messages are spilled to the disk from the memory Apply process’ TXN_LCR_SPILL_THRESHOLD user modifiable parameter can change the spill threshold. Increasing the threshold over the number of LCRs in a transaction should disable the apply spilling However, increasing this parameter degrades the performance

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Queue spilling Streams pool size is 1.3 GB Allocated streams pool size never exceeds 75% Spilling when messages enqueued for more than 5 minutes (not the case) Spilling when there’s no memory available (maybe the case, but 1.3 GB!) Higher spill thresholds lead to „propagation gaps” and degrade performance, default threshold is 10,000 While spill threshold set to values over 200K messages, queue spilling occurs – other than apply spilling

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Strmmon - destination :55:50 || test12-> | NET 4K 0 | PR01 0 | Q | - A sec :55:52 || test12-> | NET 5K 0 | PR01 0 | Q | - A sec :55:54 || test12-> | NET 5K 0 | PR01 0 | Q | - A sec :55:56 || test12-> | NET 821K 0 | PR | Q | - A sec :55:58 || test12-> | NET 1M 0 | PR | Q | - A sec :56:00 || test12-> | NET 1M 0 | PR | Q | - A sec :56:02 || test12-> | NET 356K 0 | PR | Q | - A sec :56:04 || test12-> | NET 2M 0 | PR | Q | - A sec :56:06 || test12-> | NET 316K 0 | PR | Q | - A sec :56:08 || test12-> | NET 1M 0 | PR | Q | - A sec :56:10 || test12-> | NET 5K 0 | PR01 0 | Q | - A sec :56:12 || test12-> | NET 5K 0 | PR01 0 | Q | - A sec :56:14 || test12-> | NET 5K 0 | PR01 0 | Q | - A sec :56:16 || test12-> | NET 5K 0 | PR01 0 | Q | - A sec :56:18 || test12-> | NET 5K 0 | PR01 0 | Q | - A sec :56:20 || test12-> | NET 5K 0 | PR01 0 | Q | - A sec :56:22 || test12-> | NET 203K 0 | PR | Q | - A sec :56:25 || test12-> | NET 884K 0 | PR | Q | - A sec :56:27 || test12-> | NET 884K 0 | PR | Q | - A sec :56:29 || test12-> | NET 868K 0 | PR | Q | - A sec :56:31 || test12-> | NET 884K 0 | PR | Q | - A sec :56:33 || test12-> | NET 884K 0 | PR | Q | - A sec :56:35 || test12-> | NET 772K 0 | PR | Q | - A sec :56:37 || test12-> | NET 882K 0 | PR | Q | - A sec :56:39 || test12-> | NET 996K 0 | PR | Q | - A sec :56:41 || test12-> | NET 756K 0 | PR | Q | - A sec

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication 2 x spilling, 130K threshold

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication 2 x spilling, 130K threshold

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication LCR performance Performance greatly depends on the number of LCRs in a transaction, rather than LCR size Test with BLOB objects instead of relational data – The same payload but binary data – Inserting 7 x 1.63 MB binary objects – Such a transaction produces ~3600 LCRs – Replication time ~30 seconds (factor 10 difference) – No queue spilling – Probably no apply spilling – Redo ratio source/destination is 45/66 (MB)

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Questions?

Marcin Bogusz CERN, PH-CMG WLCG Collaboration Workshop CMS online/offline replication Thank you