CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Streams Service Review and Outlook Distributed Database Workshop PIC, 20th April 2009.

Slides:



Advertisements
Similar presentations
Overview of Database Administrator (DBA) Tools
Advertisements

Oracle9i Database Administrator: Implementation and Administration 1 Chapter 2 Overview of Database Administrator (DBA) Tools.
Oracle Architecture. Instances and Databases (1/2)
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
Backup and recovery Basics of Backup and restoration Types of recovery Defining strategy Starting up and shutting down 80/20 rule SLA’s.
CERN - IT Department CH-1211 Genève 23 Switzerland t Transportable Tablespaces for Scalable Re-Instantiation Eva Dafonte Pérez.
Oracle Database Administration
Backup The flip side of recovery. Types of Failures Transaction failure –Transaction must be aborted System failure –Hardware or software problem resulting.
Harvard University Oracle Database Administration Session 2 System Level.
5 Copyright © 2006, Oracle. All rights reserved. Database Recovery.
A Guide to Oracle9i1 Introduction to Oracle9i Database Administration Chapter 11.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
RMAN Restore and Recovery
Backup and Recovery (2) Oracle 10g CAP364 1 Hebah ElGibreen.
Backup and Recovery Part 1.
CERN - IT Department CH-1211 Genève 23 Switzerland t STREAMS Resynchronization Scenarios and Tests LCG 3D CERN September 2006.
9 Copyright © Oracle Corporation, All rights reserved. Oracle Recovery Manager Overview and Configuration.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams new features in 11g Zbigniew Baranowski.
Backup & Recovery 1.
7 Copyright © 2006, Oracle. All rights reserved. Dealing with Database Corruption.
1 © 2006 Julian Dyke Streams Julian Dyke Independent Consultant juliandyke.com Web Version.
Online Database Support Experiences Diana Bonham, Dennis Box, Anil Kumar, Julie Trumbo, Nelly Stanfield.
Slide 1. © 2012 Invensys. All Rights Reserved. The names, logos, and taglines identifying the products and services of Invensys are proprietary marks.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
4 Copyright © 2006, Oracle. All rights reserved. Recovering from Noncritical Losses.
January 21, 2009 Migrating an 11i Database to Linux – Tips, Tricks & Gotchas Mark Morgan DBA Consultant siMMian systems, inc
11g(R1/R2) Data guard Enhancements Suresh Gandhi
9 Copyright © 2004, Oracle. All rights reserved. Flashback Database.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Metalink for Tier 1 Miguel Anjo Database mini workshop 26.January.2007.
17 Copyright © Oracle Corporation, All rights reserved. Recovery Catalog Creation and Maintenance.
ESRI User Conference 2004 ArcSDE. Some Nuggets Setup Performance Distribution Geodatabase History.
CERN IT Department CH-1211 Genève 23 Switzerland t Application security (behind Oracle roles and profiles) Miguel Anjo 8 th July 2008 Database.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
CERN IT Department CH-1211 Genève 23 Switzerland t Security Overview Luca Canali, CERN Distributed Database Operations Workshop April
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
Oracle 10g Database Administrator: Implementation and Administration Chapter 5 Basic Storage Concepts and Settings.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Castor incident (and follow up) Alberto Pace.
Database Competence Centre openlab Major Review Meeting nd February 2012 Maaike Limper Zbigniew Baranowski Luigi Gallerani Mariusz Piorkowski Anton.
Overview of Oracle Backup and Recovery Darl Kuhn, Regis University.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review Distributed Database Workshop CERN, 27 th November 2009 Eva Dafonte.
12 Copyright © Oracle Corporation, All rights reserved. User-Managed Complete Recovery.
WLCG Service Report ~~~ WLCG Management Board, 7 th July 2009.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
6 Copyright © 2007, Oracle. All rights reserved. Performing User-Managed Backup and Recovery.
CERN IT Department CH-1211 Geneva 23 Switzerland t Eva Dafonte Perez IT-DB Database Replication, Backup and Archiving.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
3 Copyright © 2007, Oracle. All rights reserved. Using the RMAN Recovery Catalog.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
CERN IT Department CH-1211 Geneva 23 Switzerland t Distributed Database Operations Workshop CERN, 17th November 2010 Dawid Wójcik Streams.
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
20 Copyright © 2006, Oracle. All rights reserved. Best Practices and Operational Considerations.
ASGC incident report ASGC/OPS Jason Shih Nov 26 th 2009 Distributed Database Operations Workshop.
CERN - IT Department CH-1211 Genève 23 Switzerland t Service Level & Responsibilities Dirk Düllmann LCG 3D Database Workshop September,
1 Copyright © 2005, Oracle. All rights reserved. Oracle Database Administration: Overview.
SQL Database Management
Replication using Oracle Streams at CERN
Streams Service Review
How To Pass Oracle 1z0-060 Exam In First Attempt?
Recovery Catalog Creation and Maintenance
WLCG DB Service Reviews
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
Maximum Availability Architecture Enterprise Technology Centre.
STREAMS failover and resynchronization
Introduction of Week 6 Assignment Discussion
Oracle Streams Performance
Performing Tablespace Point-in-Time Recovery
Performing Database Recovery
Presentation transcript:

CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review and Outlook Distributed Database Workshop PIC, 20th April 2009 Eva Dafonte Pérez

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overview Announce Interventions Tier0 Responsibilities New Split and Merge procedures Tier1 Responsibilities Streams Resynchronization Recent Problems or Interventions Recommended Patches Summary

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Announce Interventions Announce interventions –schedule new intervention using 3D wiki –submit EGEE broadcasts –register outages in the CIC portal –long interventions: contact Tier0 to analyze if it is necessary to split the Streams setup Unplanned downtime: update Tier0 –problem description, progress and expected duration Report regularly

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Tier0 Responsibilities Initial Streams setup Add new schemas to the Streams environment Split & Merge – new procedures in place Streams resynchronization Analyze and test new features and optimizations Validate upgrades and patches Monitoring

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Split and Merge New automated procedures –ORA-600 [KWQBMCRCPTS101] error when dropping propagation fixed by Oracle before it was needed to re-create all the streams components “manual” intervention is avoided –scheduled downtime new streams setup (queue, capture and propagation) is created in parallel to the main setup –unscheduled downtime spilled LCRs removed from the main queue execute resynchronize once the site is up again The procedures have been extended to all the database administrators at Tier0

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Tier1 Responsibilities Announce interventions Maintain the 3d OEM operational –check agents status –configure targets After an intervention: check and re-enable Streams processes –Use “STRMPROP_ ” account to connect to the downstream database i.e. STRMPROP_PIC for Tier1 PIC enable the propagation job enable the capture process when site is split Streams resynchronization

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Streams Resynchronization How to resynchronize a Tier1 site which is out of the Streams recovery window? Idea: –use transportable tablespaces to move data faster tablespaces are copied from a “collaborative” Tier1 “collaborative” Tier1 temporally unavailable –tablespaces need to be set to read-only while the files are copied –complete re-instantiation using Streams

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services split Streams Resynchronization Steps at Tier0 –split “collaborative” Tier1 site –temporary Streams setup for Tier1 site to be resynchronized C A A A C A 3 C A 5 new dictionary clone capture process include apply rule set to avoid LCRs to be applied (only one “test” schema is replicated) new dictionary clone capture process include apply rule set to avoid LCRs to be applied (only one “test” schema is replicated) A 3

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Streams Resynchronization Steps at Tier1 –coordination between “collaborative” and “resynchronized” Tier1s share connection strings share streams administrator account ! create database links between databases create directories pointing to datafiles and grant access –ask Tier0 to stop replication for both sites –“collaborative”: ensure tablespaces are read-only alter tablespace … read only; –“resynchronized”: remove tablespaces and datafiles

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Streams Resynchronization Steps at Tier1 –transfer datafiles from “collaborative” to “resynchronized” dbms_file_transfer parallel sessions –“resynchronized”: import tablespaces metadata –change tablespaces back to read-write –ask Tier0 to re-enable Streams apply rules must be dropped first! Streams will recover the backlog produced during the operation automatically –merge all streaming to be done by Tier0

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services New Requests – Service Changes! MUON sites replication to CERN –master: 3 Tier2 sites (Rome, Munich, Michigan) –target: ATLAS offline AMI replication to CERN –master: Tier1 Lyon –target: ATLAS offline Resources: –currently 2 apply offline –4 more to be added!! Service level: –problems must be addressed to the master side

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services New Requests – Service Changes! Tier0 Responsibilities –Initial Streams setup –Add new schemas to the Streams environment –Split & Merge –Streams re-synchronization –Analyze and test new features and optimizations –Validate upgrades and patches –Monitoring Source database responsibilities

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Recent Problems or Interventions ORA-01280: Fatal LogMiner Error + ORA-00600: [KRVRDCBMDDLSQL1] –caused by rebuild index operation using parallel option –ATLAS replication (conditions and PVSS) –capture process cannot be restarted at the current SCN –workaround proposed by Oracle: recreate capture using new dictionary after the index rebuild operation  data loss!! –complete re-instantiation of the whole system ORA-01372: Insufficient processes for specified LogMiner operation –one instance is down, number of parallel_max_servers is not enough –increase parameter parallel_max_servers

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Recent Problems or Interventions Apply abortion: user error encountered while applying –ORA-04043: object … does not exist –ATLAS replication (conditions and PVSS) –triggers, views, PL/SQL procedures, synonyms are not copied using transportable tablespaces –use datapump "schema" triggers still not copied – manual creation GRANTs on views, PL/SQL procedures, functions and packages from owner to other accounts do not get replicated –“undocumented feature”, it is not a bug!

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Recent Problems or Interventions ASGC (Taiwan) –January 09: system tablespace corruption backups missing archived logs complete re-instantiation using PIC –February 09: fire incident in ASGC data centre database reallocated after several weeks service unavailable – listener problem? removed from the ATLAS Streams setup after 1 month incomplete recovery due to control file corruption –data loss – impossible to re-start Streams complete re-instantiation will be needed – RAL volunteers as “collaborative” Tier1 site

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Recent Problems or Interventions Power cut at RAL –caused corruption of the control file on the ATLAS database –complete recovery allowed to re-enable Streams without resynchronization Downtime at CNAF –scheduled for more than 5 days – out of the Streams recovery window –archived log files retention increased but not guaranteed in case of space pressure

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Current SRs CREATE VIEW ON SCHEMA NOT IN STREAMS REPLICATED –the view references to a table in a replicated schema –same for synonyms, grants, … –ATLAS and CMS replication –apply aborts if schema does not exist in the apply side –error might be safely ignored ORA REPORTED INTERMITTENTLY AND CAPTURE ABORTS + ORA-07445: [kghufree()+485] –related to change notification –CMS replication –capture process aborts repeatedly –manually clean dbms_aqadm_sys.register_driver jobs

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Current SRs Connection problem to Gridka –propagation errors: ORA-12152: TNS:unable to send break message, ORA-03135: connection lost contact –apply side errors: ORA [knclprcols:chrlen1], [kngo_kadadupkl2:bad version AnyD], [OCIKCallPush: deprecated] –only affects LFC replication (even database is shared with LHCb) –propagation job is disabled after 16 unsuccessful connection attempts and cannot be restarted –workaround: recreate Gridka propagation parallel streams setup to avoid data loss –diagnostic patch installed – waiting until the problem reproduces

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Looking for Help… 3d wiki – Streams operations manual Overview for Troubleshooting Streams Performance Issues (metalink note ) Streams monitoring Streams health check report –metalink note d OEM

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Recommended Patches addresses performance improvement for capture process and logminer: merge label request on top of for Bugs: –Bug Capture aborts on LOB columns with ORA and ORA –Bug Streams capture slow processing direct path insert, high cpu for logmnr builder –Bug High latencies in Streams capture, while capturing primary workload with a lot of DDL activities such as truncates of empty tables –Bug Capture reader process constantly writing messages to trace file –Bug Restarting a logminer session can be slow if the session has fallen behind –Bug Parallel DDL (PDDL) transactions can cause logminer memory spill for Streams, or run slowly during adhoc log mining in order to fix ORA-600 [KWQBMCRCPTS101] when dropping propagation Propagation ora-600 [KWQPCBK179], [1], [1369] Excessive memory usage for lcr cache due to large freelists ORA Malformed redo on capture of long ORA No instantiation scn provided when drop child table AQ propagation may fail after changing queue_to_queue=>true Apply process is slow after upgrading to Apply aborting with ORA-600 [KNLQDQM2USR:4] after installing patchset Metalink note

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Summary Keep the monitoring operational –spot problems quickly, understand bottlenecks,... Coordination with Tier0 –complex streams environments where the activity at one point might impact the whole system Feedback!!! –and collaboration to improve the documentation and the service

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Questions

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services