1 PUG Challenge EU 2014 Click to edit Master title style PUG Challenge EMEA 2014 – Dusseldorf, Germany Common Database Problems Common Database Solutions.

Slides:



Advertisements
Similar presentations
B3: Putting OpenEdge Auditing to Work: Dump and Load with (Almost) No Downtime David EDDY Senior Solution Consultant.
Advertisements

13,000 Jobs and counting…. Advertising and Data Platform Our System.
DB-03: A Tour of the OpenEdge™ RDBMS Storage Architecture Richard Banville Technical Fellow.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Oracle High Availability Solutions RAC and Standby Database Copyright System Managers LLC 2008.
1 PUG Challenge Americas 2014 Click to edit Master title style PUG Challenge EMEA 2014 – Dusseldorf, Germany Tales from the Audit Trails Presented by:
1 PUG Challenge Americas 2013 Click to edit Master title style PUG Challenge Americas 2013 – Westford, MA Tales from the Audit Trails Presented by: Mike.
Skyward Disaster Recovery Options
DB-13: Database Health Checks How to tell if you’re heading for The Wall Richard Shulman Principal Support Engineer.
WSUS Presented by: Nada Abdullah Ahmed.
OPS-21: Managing Multiple Sites Part Time: Replication & OpenEdge ® Management Case Study Brian Bowman Sr Solution Engineer.
Skyward Server Management Options Mike Bianco. Agenda: Managed Services Overview OpenEdge Management / OpenEdge Explorer OpenEdge Managed Demo.
DataBase Administration Scheduling jobs Backing up and restoring Performing basic defragmentation and index rebuilding Using alerts Archiving.
70-270, MCSE/MCSA Guide to Installing and Managing Microsoft Windows XP Professional and Windows Server 2003 Chapter Nine Managing File System Access.
Week:#14 Windows Recovery
VIRTUALIZATION AND YOUR BUSINESS November 18, 2010 | Worksighted.
CHAPTER 17 Configuring RMAN. Introduction to RMAN RMAN was introduced in Oracle 8.0. RMAN is Oracle’s tool for backup and recovery. RMAN is much more.
National Manager Database Services
Exchange 2010 Project Presentation/Discussion August 12, 2015 Project Team: Mark Dougherty – Design John Ditto – Project Manager Joel Eussen – Project.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
New Generation of OpenEdge ® RDBMS Advanced Storage Architecture II Tomáš Kučera Principal Solution Engineer / EMEA Power Team.
MOVE-4: Upgrading Your Database to OpenEdge® 10 Gus Björklund Wizard, Vice President Technology.
DB-12: Achieving High Availability with Clusters and OpenEdge® Replication Combining the two technologies Hugo Loera Chávez Senior Tech Support Engineer.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Microsoft ® Official Course Module 12 Monitoring, Managing, and Recovering AD DS.
1 The Virtual Reality Virtualization both inside and outside of the cloud Mike Furgal Director – Managed Database Services BravePoint.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Day 10 Hardware Fault Tolerance RAID. High availability All servers should be on UPSs –2 Types Smart UPS –Serial cable connects from UPS to computer.
High-Availability Methods Lesson 25. Skills Matrix.
Maintaining a Microsoft SQL Server 2008 Database SQLServer-Training.com.
Overview of SQL Server Alka Arora.
Business Continuity and Disaster Recovery Chapter 8 Part 2 Pages 914 to 945.
Top Performance Enhancers Top Performance Killers in Progress Dan Foreman Progress Expert
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Strength. Strategy. Stability.. Progress Performance Monitoring and Tuning Dan Foreman Progress Expert BravePoint BravePoint
DATABASE MIRRORING  Mirroring is mainly implemented for increasing the database availability.  Is configured on a Database level.  Mainly involves two.
DB-2: OpenEdge® Replication: How to get Home in Time … Brian Bowman Sr. Solutions Engineer Sandy Caiado Sr. Solutions Engineer.
Preventing Common Causes of loss. Common Causes of Loss of Data Accidental Erasure – close a file and don’t save it, – write over the original file when.
Win202 Database Administration. Introduction Welcome to OpenEdge. Type 2 Storage Areas. One of the big selling points for the OpenEdge platform and Win202.
Copyright 2002, Jeremy Zawodny MySQL Backup & Recovery O’Reilly Open Source Convention Jeremy Zawodny Yahoo! Finance July 24th, 2002.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
XP Practical PC, 3e Chapter 6 1 Protecting Your Files.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
IT Database Administration Section 09. Backup and Recovery Backup: The available options Full Consistent (cold) Backup Database shutdown, all files.
IT1001 – Personal Computer Hardware & system Operations Week7- Introduction to backup & restore tools Introduction to user account with access rights.
Visual Basic for Application - Microsoft Access 2003 Finishing the application.
Common Database Problems Common Database Solutions Mike Furgal Managed Database Service EMEA PUG Challenge 2015, Copenhagen, Denmark 4 – 6 November, 2015.
TRUE CANADIAN CLOUD Cloud Experts since The ORION Nebula Ecosystem.
TechTarget Backup School exagrid.com | 1 Backup School ExaGrid / Commvault Stress-free backup storage.
Common Database Problems Common Database Solutions Mike Furgal PROGRESS Bravepoint – Database Services.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Introduction to Exadata X5 and X6 New Features
John Samuels October, Why Now?  Vista Problems  New Features  >4GB Memory Support  Experience.
TechTarget Backup School exagrid.com | 1 ExaGrid Stress-free backup storage
Sql Server Architecture for World Domination Tristan Wilson.
Tales of the Secret Bunker 2016 (231) Dump and Load Edition Mike Furgal – Director MDBA and Pro2 Services Gus Bjorklund - Lackey.
1 CASE STUDY: Platform Migration Mike Furgal Director – Managed Database Services BravePoint.
File-System Management
Platform and Data Migration With Little Downtime
Secrets to Fast, Easy High Availability for SQL Server in AWS
We Have Found Nirvana with Online Dump and Load (224)
Mike Furgal Director – DB and Pro2 Services March 20th, 2017
Real-Time Data Replication From Your Progress DB (Pro2 Enterprise)
Introduction of Week 6 Assignment Discussion
Introduction of Week 3 Assignment Discussion
Walking Through A Database Health Check
Microsoft Azure P wer Lunch
What’s new in SQL Server 2016 Availability Groups
Minimize Unplanned Downtime and Data Loss with OpenEdge
February 11-13, 2019 Raleigh, NC.
PerformanceBridge Application Suite and Practice 2.0 IT Specifications
Presentation transcript:

1 PUG Challenge EU 2014 Click to edit Master title style PUG Challenge EMEA 2014 – Dusseldorf, Germany Common Database Problems Common Database Solutions Presented by: Mike Furgal

2 PUG Challenge EU 2014 Introductions Mike Furgal –Progress employee from 1989 Short time at Bravepoint from 2012 until 2014 –Progress OpenEdge Database Expert

3 PUG Challenge EU 2014 Introduction - BravePoint Managed Database Services Databases 50+ TB in DB space 75,000+ connected users Pro2 Replication Real Time Replication SQL target 600+ Deployments

4 PUG Challenge EU 2014 Agenda Disasters Area Performance Problems Migrations and Upgrades

5 PUG Challenge EU 2014 Disaster Area

6 PUG Challenge EU 2014 Disaster Area Case Study 1 –A large distribution center had a power failure. When the power came back on the machine booted but the database did not start P-7188 T I : (43) ** Cannot find or open file /agility/prod/prod_db/platte_11.d5, errno = 2. P-8609 T I : (451) prostrct list session begin for root on /dev/pts/0. P-8609 T I : (12475) Unable to get file status for extent /agility/prod/prod_db/platte_11.d5 P-8609 T I : (334) prostrct list session end.

7 PUG Challenge EU 2014 Specifics Database was 80 GB Last good backup was 1 week old Not running After Imaging Platform was Linux

8 PUG Challenge EU 2014 WHAT WOULD YOU DO?

9 PUG Challenge EU 2014 Approach Made a copy of the existing database incase we made a mistake Used PROSTRCT LIST to determine which files were missing –We were lucky that the missing file was part of a storage area that only held indexes Tools Available –PROSTRCT UNLOCK –PROSTRCT BUILDDB

10 PUG Challenge EU 2014 Solution Restored the missing extent from the week old backup and ran PROSTRCT UNLOCK Rebuilt the indexes –# proutil db –C idxbuild all BUT……. Index rebuild failed due to finding bad blocks in the storage area where the records were stored

11 PUG Challenge EU 2014 NOW WHAT?

12 PUG Challenge EU 2014 Back to the Beginning Copied the backed up database to start over. –Good thing we had copied all the files in the first place Add the missing extent Truncate the BI and do a DBRPR scan –Fix bad blocks –Fix bad records

13 PUG Challenge EU 2014 Dump and Load After all the corruption was removed it was time to dump and load Need to do an ASCII Dump to dump around some bad records

14 PUG Challenge EU 2014 Lessons Learned This Database was important to this customer, hence they wanted it back when it got corrupted. They need to treat the Database better –Daily Backups –After Imaging A good DR plan saves a lot of heartache

15 PUG Challenge EU 2014 Next Steps Implement a good Disaster Recover plan which includes –Frequent backups –After Imaging implemented Test the Disaster Recover Plan –Annually Disaster Recover Plan needs to be on Paper –Can’t be just on the computer –Need a backup plan incase the DR plan fails

16 PUG Challenge EU 2014 Disaster Area

17 PUG Challenge EU 2014 Disaster Area Case Study 2 –A brand name US bank had SAN corruption. This prevented Crash Recovery from completing. They had a Hot Standby machine and database using OE Replication.

18 PUG Challenge EU 2014 Specifics Had a local backup and local AI files, but the backup would not restore Previous backup was not available Replica Database was up to date Platform was Windows Database size 50 GB

19 PUG Challenge EU 2014 What’s the Problem Customer refused to fail-over –They never tested running on the fail-over machine. Had little confidence that the application would run in the fail-over environment. –Customer worried about the time it takes to fail-back once failed over.

20 PUG Challenge EU 2014 Making Matters Worse Copying the DR database to the production machine is measured in days Options presented to Management included FORCED ACCESS to the Database

21 PUG Challenge EU 2014

22 PUG Challenge EU 2014 What Next Forced into the database. Index Rebuild DOES NOT fix the database Dump and Load DOES NOT fix the database

23 PUG Challenge EU 2014 Lesson Learned Have confidence in your Disaster Recovery Plan –There is no sense of having one if you are never going to use it Be Careful of the “QUICK FIX” –Non technical people will ALWAYS choose the fastest approach to the solution without understanding the consequences

24 PUG Challenge EU 2014 Next Steps Worked with the customer to do a fail-over test. Made the fail-over testing an annual event

25 PUG Challenge EU 2014 Disaster Area

26 PUG Challenge EU 2014 Disaster Area Case Study 3 –A Large school district needs to get their reports cards out to 30,000+ students. They discovered they had corruption in the database because backups stopped working for about a week

27 PUG Challenge EU 2014 Specifics 10.2B05 Windows 64bit OpenEdge Last good backup is 1 week old All report card data for 30,000+ students entered since that last good backup After Imaging is turned on, but AI file retention was less than 1 week Database is about 300 GB They have the 1 week old backup restored to a different location

28 PUG Challenge EU 2014 WHAT WOULD YOU DO?

29 PUG Challenge EU 2014 Approach We had 2 plans Plan A – Get the corruption out of the live database –Use any and all tools to remove the corruption Plan B – Revert back to the week old database –See if we can take all the report card data from the live database and import it into the week old database.

30 PUG Challenge EU 2014 Plan A The database.lg file showed the extents where the corruption was located. Each storage area was a single variable length extent Corruption was in an 80 GB extent (Ugh!) –Used DBRPR to scan and fix bad blocks –This took hours to run on this large extent –In the end this failed

31 PUG Challenge EU 2014 Plan B Worked with the vendor to find all the tables that made up the report card processing –This was about 12 tables Dumped these tables from the live database –There was no corruption in these tables Had to figure out how to get the table data into the week old database

32 PUG Challenge EU 2014 HMMMMM…..

33 PUG Challenge EU 2014 Plan B Dumped the schema for the 12 tables Went into the dictionary and renamed the tables –Added _old to the end of the table name Loaded the schema for the 12 tables Loaded the data for the 12 tables This is a very useful trick –Didn’t need to recompile – the application worked

34 PUG Challenge EU 2014 Plan A (revisited) Dumped and Loaded the plan A database There were 5 tables where the dump and load failed. Did a 4GL dump –FOR EACH … BY field. EXPORT… –FOR EACH … BY field DESCENDING. EXPORT … Didn’t trust the data, so we use the same table rename technique to get these tables from the week old backup.

35 PUG Challenge EU 2014

36 PUG Challenge EU 2014 But Wait – There’s More A week later they found they also had corruption in a different database –That was solved by restore and roll forward –Needed to upgrade to 10.2B08 for Roll Forward to work properly

37 PUG Challenge EU 2014 Next Steps Implement a DR solution –OpenEdge Replication –Rolling Forward AI Restore the backup and roll forward on the same machine –This verifies the backup is functional –DB block corruption does not get replication from roll forward License Costs associated with both approaches

38 PUG Challenge EU 2014 Agenda Disasters Area Performance Problems Migrations and Upgrades

39 PUG Challenge EU 2014

40 PUG Challenge EU 2014 Performance Problems Case Study 4 –A customer recently purchased a new Linux machine to upgrade their old Linux machine. The new machine has more memory, faster disks, and twice the CPUs. –Their application runs slower on the new hardware. Specifics –CPUs are not busy –System Load is high (60 to 80) –Application is sluggish –Cannot identify a process causing the system load to go up

41 PUG Challenge EU 2014 NUMA Non-Uniform Memory Architecture

42 PUG Challenge EU 2014 Cache Coherence OpenEdge needs to gain exclusive access to a region of shared memory –A test is performed to see if the region is locked. –This test requires all the CPUs to stop what they are doing, synchronize their cache-lines to make sure that multiple processes are not thinking that they obtained the lock at the same time. –As CPUs scale out wide, the Cache Coherency problem gets worse

43 PUG Challenge EU 2014 Far Memory Reads Shared Memory Region spans nodes –Far Memory Access is known as the NUMA Ratio –This is the time difference it takes to access memory from a CPU on node 1 to memory on node n. When the ratio is 1 it means you are on a SMP machine, when it’s higher than 1, then you are on a NUMA machine. –Typical NUMA ratios are 3:1 or higher. This means that it takes 3x longer the access memory on a remote node than it does to access memory on a local node. –Given that database systems like OpenEdge, Oracle, MSSQL, etc access memory extensively, this 3x slowdown becomes a noticeable bottleneck.

44 PUG Challenge EU 2014 Solutions Change operating mode to client/server –Less processes directly connected to shared memory Disable CPUs –Helps Cache Coherence problem Purchased a new machine

45 PUG Challenge EU 2014 Agenda Disasters Area Performance Problems Migrations and Upgrades

46 PUG Challenge EU 2014

47 PUG Challenge EU 2014 Migrations and Upgrades Case Study 5 –A large warehouse distribution center is migrating from an older machine to a newer machine. They are 24x7x365, so downtime is a minimum. Specifics –Not changing platforms (AIX -> AIX) –DB size is 500 GB, it takes 6 hours to copy it the new machine

48 PUG Challenge EU 2014 Constraints Blocked out a 2 hour window for the data migration Do you need any more constraints than this?

49 PUG Challenge EU 2014 Approach Build a test environment and TEST TEST TEST Migrate: –Application Files –System Files –Other files (in this case, custom terminfo)

50 PUG Challenge EU 2014 TEST TEST TEST

51 PUG Challenge EU 2014 Special Sauce Use After Imaging to keep the soon to be production database in synch. During the cut-over period only need to transfer last AI file and apply it This fits into the 2 hour downtime window with ease

52 PUG Challenge EU 2014

53 PUG Challenge EU 2014 Application Upgrades Case Study 6 –A customer has replication in place using AI files. They are an end user of an Application Partner who does frequent updates of the application. The Application Partner is aware of the DR machine, and updates the application on the DR machine. Sometimes this causes problems Specifics –Windows Application

54 PUG Challenge EU 2014 The Problem The application support personnel doing the upgrade follows a a script which at times includes not only updating the Application, but also connecting to the database

55 PUG Challenge EU 2014 The Problem Once the database is connected to, crash recovery is performed and no other AI files can be applied

56 PUG Challenge EU 2014 So….. When we can’t apply any more AI files, we need to rebaseline the database. The database size is 80GB and the replica is on a WAN, so rebaselining is a pain.

57 PUG Challenge EU 2014 Solution #1 Use the oplock and opunlock commands. –rfutil dbname –C roll forward oplock –a file.a1 –rfutil dbname –C roll forward opunlock

58 PUG Challenge EU 2014 Solution #1 However, this prevented the support engineer from doing the upgrade properly as they didn’t have control over when/how it connected to the database

59 PUG Challenge EU 2014 Solution #2 We provided a secondary database that is a backup of the replica database on the DR machine –We used probkup with the –norecovery switch to make sure future AI files can still be applied

60 PUG Challenge EU 2014 Solution #2 The support engineer was able to complete the upgrade using this database copy The Database Changes from the upgrade are THROW AWAY –These changes will migrate from the PRODUCTION Database to the REPLICA Database via Replication

61 PUG Challenge EU 2014 Summary These are examples of some real world Database Problems Don’t assume things can’t go wrong Having a plan is not going enough –Testing the plan and having confidence is required If all else fails, seek professional help

62 PUG Challenge EU 2014 Questions THANK YOU FOR YOUR TIME

63 PUG Challenge EU 2014 Thank You! Questions?