Common Database Problems Common Database Solutions Mike Furgal Managed Database Service EMEA PUG Challenge 2015, Copenhagen, Denmark 4 – 6 November, 2015.

Slides:



Advertisements
Similar presentations
B3: Putting OpenEdge Auditing to Work: Dump and Load with (Almost) No Downtime David EDDY Senior Solution Consultant.
Advertisements

DB-03: A Tour of the OpenEdge™ RDBMS Storage Architecture Richard Banville Technical Fellow.
1 PUG Challenge Americas 2014 Click to edit Master title style PUG Challenge EMEA 2014 – Dusseldorf, Germany Tales from the Audit Trails Presented by:
Skyward Disaster Recovery Options
DB-13: Database Health Checks How to tell if you’re heading for The Wall Richard Shulman Principal Support Engineer.
OPS-21: Managing Multiple Sites Part Time: Replication & OpenEdge ® Management Case Study Brian Bowman Sr Solution Engineer.
Backups Backups are essential for recovering from – mistakes deleting a file by accident making changes to a document or file that turn out to be undesirable.
1 How Healthy is Your Progress System? ( Progess DB Best Practices) Dan Foreman BravePoint, Inc.
DataBase Administration Scheduling jobs Backing up and restoring Performing basic defragmentation and index rebuilding Using alerts Archiving.
1 PUG Challenge EU 2014 Click to edit Master title style PUG Challenge EMEA 2014 – Dusseldorf, Germany Common Database Problems Common Database Solutions.
Business Intelligence Michael Gross Tina Larsell Chad Anderson.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Week:#14 Windows Recovery
Preservasi Informasi Digital.  It will never happen here!  Common Causes of Loss of Data  Accidental Erasure (delete, power, backup)  Viruses and.
Database Backup and Recovery
OpenEdge Replication Made Easy Adam Backman White Star Software
Exchange 2010 Project Presentation/Discussion August 12, 2015 Project Team: Mark Dougherty – Design John Ditto – Project Manager Joel Eussen – Project.
New Generation of OpenEdge ® RDBMS Advanced Storage Architecture II Tomáš Kučera Principal Solution Engineer / EMEA Power Team.
MOVE-4: Upgrading Your Database to OpenEdge® 10 Gus Björklund Wizard, Vice President Technology.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
November 2009 Network Disaster Recovery October 2014.
DB-12: Achieving High Availability with Clusters and OpenEdge® Replication Combining the two technologies Hugo Loera Chávez Senior Tech Support Engineer.
Microsoft ® Official Course Module 12 Monitoring, Managing, and Recovering AD DS.
Welcome. Who am I? Philip L. Sullivan MCT, MCSE, MCSA Microsoft Certified Trainer for 6 Years Work as a Lead Windows NT\2000\2003 Instructor for Clark.
1 The Virtual Reality Virtualization both inside and outside of the cloud Mike Furgal Director – Managed Database Services BravePoint.
Maintaining a Microsoft SQL Server 2008 Database SQLServer-Training.com.
Business Continuity and Disaster Recovery Chapter 8 Part 2 Pages 914 to 945.
Top Performance Enhancers Top Performance Killers in Progress Dan Foreman Progress Expert
SRUTHI NAGULAVANCHA CIS 764, FALL 2008 Department of Computing and Information Sciences (CIS) Kansas State University -1- Back up & Recovery Strategies.
Chapter Fourteen Windows XP Professional Fault Tolerance.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Leaders Have Vision™ visionsolutions.com 1 Database Archiving Michelle Ayers Advisory Solution Consultant November 2010.
Strength. Strategy. Stability.. Progress Performance Monitoring and Tuning Dan Foreman Progress Expert BravePoint BravePoint
DB-01 Upgrading to OpenEdge ® Practices & Initial Tuning Tom Harris, Managing Director, RDBMS Technology.
MyFloridaMarketPlace MyFloridaMarketPlace Change Request Board August 30, 2007.
DB-2: OpenEdge® Replication: How to get Home in Time … Brian Bowman Sr. Solutions Engineer Sandy Caiado Sr. Solutions Engineer.
Preventing Common Causes of loss. Common Causes of Loss of Data Accidental Erasure – close a file and don’t save it, – write over the original file when.
Win202 Database Administration. Introduction Welcome to OpenEdge. Type 2 Storage Areas. One of the big selling points for the OpenEdge platform and Win202.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
11 DISASTER RECOVERY Chapter 13. Chapter 13: DISASTER RECOVERY2 OVERVIEW  Back up server data using the Backup utility and the Ntbackup command  Restore.
XP Practical PC, 3e Chapter 6 1 Protecting Your Files.
Who Says Servers Can’t Crash? Rocky Mountain PBS Survives Multiple Server Crashes and Lives to tell about it! Presented By Michelle Nesmith Rocky Mountain.
IT Database Administration Section 09. Backup and Recovery Backup: The available options Full Consistent (cold) Backup Database shutdown, all files.
Understanding Backup and Recovery Methods Lesson 8.
IT1001 – Personal Computer Hardware & system Operations Week7- Introduction to backup & restore tools Introduction to user account with access rights.
© 2015 Progress Software Corporation. 1 abstract Your intrepid band of benchmarkers returns once more, with the results of testing a recent release of.
1 Overcoming your Reporting and Replication Hurdles Mike Furgal Director – Managed Database Services BravePoint.
Version Control and SVN ECE 297. Why Do We Need Version Control?
Common Database Problems Common Database Solutions Mike Furgal PROGRESS Bravepoint – Database Services.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
OE REPLICATION AKA FATHOM REPLICATION. WHO AM I Currently with Eaton Corp as a Sr. Progress DBA for the past 12 years Started Programming with Progress.
6 Copyright © Oracle Corporation, All rights reserved. Backup and Recovery Overview.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
Tales of the Secret Bunker 2016 (231) Dump and Load Edition Mike Furgal – Director MDBA and Pro2 Services Gus Bjorklund - Lackey.
Backup and Disaster Dr Stuart Petch CeG IT/IS Manager
1 CASE STUDY: Platform Migration Mike Furgal Director – Managed Database Services BravePoint.
Platform and Data Migration With Little Downtime
Planning for Application Recovery
How Social is your data Fundamentals on database storage
How to Reduce Costs and Increase Productivity with MDBA
We Have Found Nirvana with Online Dump and Load (224)
Database Corruption Advanced Recovery Techniques|
Mike Furgal Director – DB and Pro2 Services March 20th, 2017
Introduction of Week 6 Assignment Discussion
Walking Through A Database Health Check
Minimize Unplanned Downtime and Data Loss with OpenEdge
Leave the driving to Us with the Progress Managed Database Service
OPS-14: Effective OpenEdge® Database Configuration
The Troubleshooting theory
Presentation transcript:

Common Database Problems Common Database Solutions Mike Furgal Managed Database Service EMEA PUG Challenge 2015, Copenhagen, Denmark 4 – 6 November, 2015

© 2015 Progress Software Corporation. 2 Introduction Mike Furgal  Progress Employee since 1989  Developer of the OpenEdge database  Joined Bravepoint in 2012  Heads up Database Services Including Managed Database Services Bravepoint  Largest Progress/OpenEdge consulting firm  Founded in 1987  Purchased by Progress in October 2014  Specializes in all things OpenEdge Database Services Programming  Pro2SQL Real-time Replication to SQL Target

A series of case studies of issue that the PROGRESS BravePoint Managed Database Services Team has encountered over the years.

The case of the Missing Files

A large distribution center had a power failure. When the power came back on the machine booted but the database did not start

© 2015 Progress Software Corporation. 6 : (43) ** Cannot find or open file /agility/prod/prod_db/platte_11.d5, errno = 2. : (451) prostrct list session begin for root on /dev/pts/0. : (12475) Unable to get file status for extent /agility/prod/prod_db/platte_11.d5 : (334) prostrct list session end.

© 2015 Progress Software Corporation. 7 Specifics  Database was 80 GB  Last good backup was 1 week old  Not running After Imaging  Platform was Linux

WHAT WOULD YOU DO?

© 2015 Progress Software Corporation. 9 Approach  Made a copy of the existing database incase we made a mistake  Used PROSTRCT LIST to determine which files were missing We were lucky that the missing file was part of a storage area that only held indexes  Tools Available PROSTRCT UNLOCK PROSTRCT BUILDDB

© 2015 Progress Software Corporation. 10 Solution  Restored the missing extent from the week old backup and ran PROSTRCT UNLOCK  Rebuilt the indexes # proutil db –C idxbuild all BUT…….  Index rebuild failed due to finding back blocks in the storage area where the records were stored

NOW WHAT?

© 2015 Progress Software Corporation. 12 Back to the Beginning  Copied the backed up database to start over. Since Index Rebuild failed, we needed to start over Good thing we had copied all the files in the first place  Add the missing extent  Truncate the BI and do a DBRPR scan Fix bad blocks Fix bad records

© 2015 Progress Software Corporation. 13 Dump and Load  After all the corruption was removed it was time to dump and load  Need to do an ASCII Dump to dump around some bad records

© 2015 Progress Software Corporation. 14 Lessons Learned  This Database was important to this customer, hence they wanted it back when it got corrupted.  They need to treat the Database better Daily Backups After Imaging  A good DR plan saves a lot of heartache

© 2015 Progress Software Corporation. 15 Next Steps  Implement a good Disaster Recover plan which includes Frequent backups After Imaging implemented  Test the Disaster Recover Plan Annually  Disaster Recover Plan needs to be on Paper Can’t be just on the computer Need a backup plan incase the DR plan fails

The case of the Micro Manager

A brand name US bank had SAN corruption. This prevented Crash Recovery from completing. They had a Hot Standby machine and database using OE Replication.

© 2015 Progress Software Corporation. 18 Specifics  Had a local backup and local AI files, but the backup would not restore  Previous backup was not available  Replica Database was up to date  Platform was Windows  Database size 200 GB  OpenEdge 10.1C04

© 2015 Progress Software Corporation. 19 What’s the Problem  Customer refused to fail-over They never tested running on the fail-over machine. Had little confidence that the application would run in the fail-over environment. Customer worried about the time it takes to fail- back once failed over.

© 2015 Progress Software Corporation. 20 Making Matters Worse  Copying the DR database to the production machine is measured in days  Options presented to Management included FORCED ACCESS to the Database

© 2015 Progress Software Corporation. 21

© 2015 Progress Software Corporation. 22 What Next  Forced into the database – This skips Crash Recovery  Index Rebuild DOES NOT fix the database  Dump and Load DOES NOT fix the database

© 2015 Progress Software Corporation. 23 Lesson Learned  Have confidence in your Disaster Recovery Plan There is no sense of having one if you are never going to use it  Be Careful of the “QUICK FIX” Non-technical people will ALWAYS choose the fastest approach to the solution without understanding the consequences

© 2015 Progress Software Corporation. 24 Next Steps  Worked with the customer to do a fail-over test.  Made the fail-over testing an annual event

Schools Out For Summer

A Large school district needs to get their reports cards out to 30,000+ students. They discovered they had corruption in the database because backups stopped working for about a week

© 2015 Progress Software Corporation. 27 Specifics  10.2B05 Windows 64bit OpenEdge  Last good backup is 1 week old  All report card data for 30,000+ students entered since that last good backup  After Imaging is turned on, but AI file retention was less than 1 week  Database is about 300 GB  They have the 1 week old backup restored to a different location

WHAT WOULD YOU DO?

© 2015 Progress Software Corporation. 29 Approach  We had 2 plans  Plan A – Get the corruption out of the live database Use any and all tools to remove the corruption  Plan B – Revert back to the week old database See if we can take all the report card data from the live database and import it into the week old database.

© 2015 Progress Software Corporation. 30 Plan A  The database.lg file showed the extents where the corruption was located.  Each storage area was a single variable length extent  Corruption was in an 80 GB extent (Ugh!) Used DBRPR to scan and fix bad blocks This took hours to run on this large extent In the end this failed

© 2015 Progress Software Corporation. 31 Plan B  Worked with the vendor to find all the tables that made up the report card processing This was about 12 tables  Dumped these tables from the live database There was no corruption in these tables  Had to figure out how to get the table data into the week old database

HMMMMM…..

© 2015 Progress Software Corporation. 33 Plan B  Dumped the schema for the 12 tables  Went into the dictionary and renamed the tables Added _old to the end of the table name  Loaded the schema for the 12 tables  Loaded the data for the 12 tables  This is a very useful trick Didn’t need to recompile – the application worked

© 2015 Progress Software Corporation. 34 Plan A (revisited)  Dumped and Loaded the plan A database  There were 5 tables where the dump and load failed.  Did a 4GL dump FOR EACH … BY field. EXPORT… FOR EACH … BY field DESCENDING. EXPORT …  Didn’t trust the data, so we use the same table rename technique to get these tables from the week old backup.

© 2015 Progress Software Corporation. 35

© 2015 Progress Software Corporation. 36 But Wait – There’s More  A week later they found they also had corruption in a different database That was solved by restore and roll forward Needed to upgrade to 10.2B08 for Roll Forward to work properly –Windows 64bit 10.2B06 has a roll forward bug that prevented it from working.

© 2015 Progress Software Corporation. 37 Next Steps  Implement a DR solution OpenEdge Replication Rolling Forward AI  Restore the backup and roll forward on the same machine This verifies the backup is functional DB block corruption does not get replicated from roll forward

A case of the spins

A large medical center patched their software over the weekend. On Monday the performance of the system was unacceptable. The vendor says the patch was minor and could not be the cause of the issue. The customer says nothing else changed.

© 2015 Progress Software Corporation. 40 Specifics  OpenEdge bit  Windows bit  Database is 321 GB  Number of users is 3,000

© 2015 Progress Software Corporation. 41 Some Metrics – Month View Date CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec DelDB Writes BI Writes AI Writes Latch TO 05/16/15 (Sat)865807,948,358,747149,508,775531,072,918,966425,5421,922,412374,270175,980172,163116,49014,798 05/15/15 (Fri)2612,79212,987,557,626114,936, ,465,462,9101,520,2963,227,5681,626,6921,886,768449,449295,333115,003 05/14/15 (Thu)2633,01111,000,344,09056,940, ,090,165,0021,639,5643,475,092871,8082,023,720454,097298,08873,017 05/13/15 (Wed)3233,12610,371,051,21355,142, ,879,551,6622,250,1683,423,7601,099,9302,378,306525,070374,006885,294 05/12/15 (Tue)2793,08910,567,333,668140,530,655751,901,654,8031,797,5203,397,2381,043,8492,068,487496,165328,450943,510 05/11/15 (Mon) Restart 05/10/15 (Sun) ,806,473,996206,617,341522,307,235,660307,3771,804,694368,589100,764150,257102,579244,087 05/09/15 (Sat)885045,704,394,38982,411, ,023,191483,4791,423,644516,069171,617165,926115,092186,064 05/08/15 (Fri) Restart 05/07/15 (Thu)2712,94010,046,740,997145,481,723691,596,756,3581,705,6613,503,669924,0822,153,671455,228306,003128,058 05/06/15 (Wed)3382,9899,830,327,570153,056,406641,442,212,5612,247,9143,525,5461,225,8262,453,942557,639374,309129,160 05/05/15 (Tue)2932,96710,392,149,949154,806,221671,593,242,3562,000,3923,366,9551,126,1772,324,533488,949329,102171,067 05/04/15 (Mon)4882,97110,483,718,093162,479,487651,547,975,2672,311,1793,733,3071,363,4092,678,057712,951528,518212,059 05/03/15 (Sun) ,161,696,099217,504,812511,884,717,953331,0061,783,8681,243,395270,902222,981156,12823,917 05/02/15 (Sat)1, ,114,391,833164,345, ,325,461444,5681,853,37624,483,1713,078,6551,889,1511,496,027132,360 05/01/15 (Fri)3742,73511,611,724,202126,877,164921,815,943,4552,450,9873,063,5771,458,1662,046,184590,195411,7053,268,221

© 2015 Progress Software Corporation. 42 Some Metrics – Month View Date CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec DelDB Writes BI Writes AI Writes Latch TO 05/16/15 (Sat)865807,948,358,747149,508,775531,072,918,966425,5421,922,412374,270175,980172,163116,49014,798 05/15/15 (Fri)2612,79212,987,557,626114,936, ,465,462,9101,520,2963,227,5681,626,6921,886,768449,449295,333115,003 05/14/15 (Thu)2633,01111,000,344,09056,940, ,090,165,0021,639,5643,475,092871,8082,023,720454,097298,08873,017 05/13/15 (Wed)3233,12610,371,051,21355,142, ,879,551,6622,250,1683,423,7601,099,9302,378,306525,070374,006885,294 05/12/15 (Tue)2793,08910,567,333,668140,530,655751,901,654,8031,797,5203,397,2381,043,8492,068,487496,165328,450943,510 05/11/15 (Mon) Restart 05/10/15 (Sun) ,806,473,996206,617,341522,307,235,660307,3771,804,694368,589100,764150,257102,579244,087 05/09/15 (Sat)885045,704,394,38982,411, ,023,191483,4791,423,644516,069171,617165,926115,092186,064 05/08/15 (Fri) Restart 05/07/15 (Thu)2712,94010,046,740,997145,481,723691,596,756,3581,705,6613,503,669924,0822,153,671455,228306,003128,058 05/06/15 (Wed)3382,9899,830,327,570153,056,406641,442,212,5612,247,9143,525,5461,225,8262,453,942557,639374,309129,160 05/05/15 (Tue)2932,96710,392,149,949154,806,221671,593,242,3562,000,3923,366,9551,126,1772,324,533488,949329,102171,067 05/04/15 (Mon)4882,97110,483,718,093162,479,487651,547,975,2672,311,1793,733,3071,363,4092,678,057712,951528,518212,059 05/03/15 (Sun) ,161,696,099217,504,812511,884,717,953331,0061,783,8681,243,395270,902222,981156,12823,917 05/02/15 (Sat)1, ,114,391,833164,345, ,325,461444,5681,853,37624,483,1713,078,6551,889,1511,496,027132,360 05/01/15 (Fri)3742,73511,611,724,202126,877,164921,815,943,4552,450,9873,063,5771,458,1662,046,184590,195411,7053,268, ,879,551,662 1,901,654, ,593,242,356 1,547,975,267

© 2015 Progress Software Corporation. 43 Sample CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes BI Writes AI Writes Latch TO 41 10:05:01 2 3,033106,609,21936,595, ,840,5809,66431,2572,338 19,1894,8584,499 88, :20:01 2 3,052102,685,94627,744, ,858,41210,92333,0242,407 21,0265,8584,660 99, :35:02 2 3,08297,655,6451,303, ,250,59313,81438,9473,318 27,6117,0755, , :50:01 3 3,08981,674,3921,293, ,030,50922,28936,3215,409 25,4287,6045, , :05:0173,086214,447,1211,716, ,973,39640,12258,59529,27659,98713,9198,50930, :20:0153,039155,915,7671,492, ,202,28525,75857,49414,19748,0548,7784,9934, :35:0143,040156,151,5011,434, ,103,82427,30460,0457,79148,3238,2854,5713, :50:0152,888146,245,4141,666, ,019,80133,60560,46311,37952,6068,5665,2262,711 Sample CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes BI Writes AI Writes Latch TO 41 10:05:01 4 2,848153,746,9512,343, ,340,78330,48960,8057,666 55,8187,7574,774 5, :20:01 5 2,812145,441,8711,755, ,387,07026,49059,67613,279 53,1956,6424,962 3, :35:01 4 2,877151,783,5161,876, ,653,29730,44661,75411,262 54,9067,8995,192 6, :50:01 7 2,894143,780,0801,877, ,215,54346,23463,98019,392 66,42911,8206,774 7, :05:02 4 2,912158,495,0871,808, ,191,42834,80663,07012,040 59,0419,2155,284 10, :20:0162,897155,845,1102,259, ,841,34634,72760,41612,77059,9988,9295,4977, :35:0182,938150,662,8222,195, ,976,74470,23983,41920,19382,77712,5528,5426, :50:0142,914138,147,8041,774, ,570,98131,23659,28612,95757,7567,6964,9712,731 Good Day – 15 minute samples Bad Day – 15 minute samples Digging Deeper

© 2015 Progress Software Corporation. 44 Sample CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes BI Writes AI Writes Latch TO 41 10:05:01 2 3,033106,609,21936,595, ,840,5809,66431,2572,338 19,1894,8584,499 88, :20:01 2 3,052102,685,94627,744, ,858,41210,92333,0242,407 21,0265,8584,660 99, :35:02 2 3,08297,655,6451,303, ,250,59313,81438,9473,318 27,6117,0755, , :50:01 3 3,08981,674,3921,293, ,030,50922,28936,3215,409 25,4287,6045, , :05:0173,086214,447,1211,716, ,973,39640,12258,59529,27659,98713,9198,50930, :20:0153,039155,915,7671,492, ,202,28525,75857,49414,19748,0548,7784,9934, :35:0143,040156,151,5011,434, ,103,82427,30460,0457,79148,3238,2854,5713, :50:0152,888146,245,4141,666, ,019,80133,60560,46311,37952,6068,5665,2262,711 Sample CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes BI Writes AI Writes Latch TO 41 10:05:01 4 2,848153,746,9512,343, ,340,78330,48960,8057,666 55,8187,7574,774 5, :20:01 5 2,812145,441,8711,755, ,387,07026,49059,67613,279 53,1956,6424,962 3, :35:01 4 2,877151,783,5161,876, ,653,29730,44661,75411,262 54,9067,8995,192 6, :50:01 7 2,894143,780,0801,877, ,215,54346,23463,98019,392 66,42911,8206,774 7, :05:02 4 2,912158,495,0871,808, ,191,42834,80663,07012,040 59,0419,2155,284 10, :20:0162,897155,845,1102,259, ,841,34634,72760,41612,77059,9988,9295,4977, :35:0182,938150,662,8222,195, ,976,74470,23983,41920,19382,77712,5528,5426, :50:0142,914138,147,8041,774, ,570,98131,23659,28612,95757,7567,6964,9712,731 Good Day – 15 minute samples Bad Day – 15 minute samples Digging Deeper

© 2015 Progress Software Corporation. 45 Some Metrics Date CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec DelDB Writes BI Writes AI Writes Latch TO 05/16/15 (Sat)865807,948,358,747149,508,775531,072,918,966425,5421,922,412374,270175,980172,163116,49014,798 05/15/15 (Fri)2612,79212,987,557,626114,936, ,465,462,9101,520,2963,227,5681,626,6921,886,768449,449295,333115,003 05/14/15 (Thu)2633,01111,000,344,09056,940, ,090,165,0021,639,5643,475,092871,8082,023,720454,097298,08873,017 05/13/15 (Wed)3233,12610,371,051,21355,142, ,879,551,6622,250,1683,423,7601,099,9302,378,306525,070374,006885,294 05/12/15 (Tue)2793,08910,567,333,668140,530,655751,901,654,8031,797,5203,397,2381,043,8492,068,487496,165328,450943,510 05/11/15 (Mon) Restart 05/10/15 (Sun) ,806,473,996206,617,341522,307,235,660307,3771,804,694368,589100,764150,257102,579244,087 05/09/15 (Sat)885045,704,394,38982,411, ,023,191483,4791,423,644516,069171,617165,926115,092186,064 05/08/15 (Fri) Restart 05/07/15 (Thu)2712,94010,046,740,997145,481,723691,596,756,3581,705,6613,503,669924,0822,153,671455,228306,003128,058 05/06/15 (Wed)3382,9899,830,327,570153,056,406641,442,212,5612,247,9143,525,5461,225,8262,453,942557,639374,309129,160 05/05/15 (Tue)2932,96710,392,149,949154,806,221671,593,242,3562,000,3923,366,9551,126,1772,324,533488,949329,102171,067 05/04/15 (Mon)4882,97110,483,718,093162,479,487651,547,975,2672,311,1793,733,3071,363,4092,678,057712,951528,518212,059 05/03/15 (Sun) ,161,696,099217,504,812511,884,717,953331,0061,783,8681,243,395270,902222,981156,12823,917 05/02/15 (Sat)1, ,114,391,833164,345, ,325,461444,5681,853,37624,483,1713,078,6551,889,1511,496,027132,360 05/01/15 (Fri)3742,73511,611,724,202126,877,164921,815,943,4552,450,9873,063,5771,458,1662,046,184590,195411,7053,268, , , , ,160 3,268,211

© 2015 Progress Software Corporation. 46 Latch Timeouts increased. CRUD Operations Decreased. Why? Nothing had changed

© 2015 Progress Software Corporation. 47  Further investigation revealed that the –spin setting was changed from 96,000 to 20,000. This change was a move to best practices where so called “industry experts” have been saying to not have –spin higher than 20,000

© 2015 Progress Software Corporation. 48  The change was made months back to the conmgr.properties file and was long forgotten.  When the patch was applied, the database was bounced and the change finally took affect  While no one remembers a configuration change, the change was there  Setting –spin back up to 96,000 got them the performance back

© 2015 Progress Software Corporation. 49 Sample CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes BI Writes AI Writes Latch TO 41 10:05:01 2 3,033106,609,21936,595, ,840,5809,66431,2572,338 19,1894,8584,499 88, :20:01 2 3,052102,685,94627,744, ,858,41210,92333,0242,407 21,0265,8584,660 99, :35:02 2 3,08297,655,6451,303, ,250,59313,81438,9473,318 27,6117,0755, , :50:01 3 3,08981,674,3921,293, ,030,50922,28936,3215,409 25,4287,6045, , :05:01 7 3,086214,447,1211,716, ,973,39640,12258,59529,276 59,98713,9198,509 30, :20:01 5 3,039155,915,7671,492, ,202,28525,75857,49414,197 48,0548,7784,993 4, :35:01 4 3,040156,151,5011,434, ,103,82427,30460,0457,791 48,3238,2854,571 3, :50:01 5 2,888146,245,4141,666, ,019,80133,60560,46311,379 52,6068,5665,226 2,711 Bad Day – 15 minute samples Sample CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec Del DB Writes BI Writes AI Writes Latch TO 41 10:05:01 4 2,848153,746,9512,343, ,340,78330,48960,8057,666 55,8187,7574,774 5, :20:01 5 2,812145,441,8711,755, ,387,07026,49059,67613,279 53,1956,6424,962 3, :35:01 4 2,877151,783,5161,876, ,653,29730,44661,75411,262 54,9067,8995,192 6, :50:01 7 2,894143,780,0801,877, ,215,54346,23463,98019,392 66,42911,8206,774 7, :05:02 4 2,912158,495,0871,808, ,191,42834,80663,07012,040 59,0419,2155,284 10, :20:0162,897155,845,1102,259, ,841,34634,72760,41612,77059,9988,9295,4977, :35:0182,938150,662,8222,195, ,976,74470,23983,41920,19382,77712,5528,5426, :50:0142,914138,147,8041,774, ,570,98131,23659,28612,95757,7567,6964,9712,731 Good Day – 15 minute samples Changed –spin online

But WAIT! There’s more

© 2015 Progress Software Corporation. 51  A different customer added a few CPUs to their environment.  When the users login, the CPUs peg to 100% utilized  Performance suffers  WebSpeed launches additional Agents Due to all agents are busy Specifics  Customer database is > 1 TB  430 Webspeed agents  AIX  10.1C 64bit

© 2015 Progress Software Corporation. 52 Sample CPs Users DB Requests DB Reads Ratio Rec Rds Rec Cr Rec Up Rec DelDB Writes BI Writes AI Writes Latch TO :10: ,177,32338,1385,93092,799,76830,1906,31604,9993,3111,859 16, :15: ,752,78145,4635,07695,312,15015,0755,07905,2531,2181,022 15, :20: ,272,16947,2694,80890,336,32721,1735,439814,9042,3661,409 21, :25: ,847,55450,8713,43768,054,67114,0285,46805,4251, , :30: ,167,30953,6613,74972,196,31615,1005,10906,3041,8081,032 20, :35: ,198,78369,1703,935104,501,98925,0867,38945,4312,9131,597 48, :40: ,870,509100,1912,61497,340,38723,8715,737427,9301,7841,504 58, :45: ,460,116391, ,827,71719,6685,931468,6711,6941,284 93, :50: ,536,969779, ,726,15723,0085,44407,8722,5431, , :55: ,690,881155,3331,801108,846,56622,6407,233246,0502,6681,470 72, :00: ,670,791539, ,316,85224,2307,55406,1472,6771,557 64, :05: ,585,414161,0121,612107,194,65121,8286,552637,0951,7391,375 38, :10: ,056,072316, ,424,28525,3435,86205,9732,4011,522 28,853

© 2015 Progress Software Corporation. 53 Unlike the previous example, we had no historical performance metrics to compare to when thing were good. Could only rely on instincts and experience.

© 2015 Progress Software Corporation. 54 A Different View In a 5 minute sample, the highest latch timeout should be no more than 3,000

© 2015 Progress Software Corporation. 55 Changed –spin from 60,000 to 20,000 and the problem went away

© 2015 Progress Software Corporation. 56 Lesson Learned  There is no one setting that will work for every situation Changing –spin from 20,000 to 96,000 helped one customer Changing –spin from 60,000 to 20,000 helped another one  Having historical data is key  Don’t assume nothing has changed just because they said so Configuration changes usually only take affect at next startup

© 2015 Progress Software Corporation. 57 Summary  These are examples of some real world Database Problems  Don’t assume things can’t go wrong  Having a plan is not going enough Testing the plan and having confidence is required  If all else fails, seek professional help

Answers