Presentation is loading. Please wait.

Presentation is loading. Please wait.

Schrödinger’s Backup Will your recovery work?

Similar presentations


Presentation on theme: "Schrödinger’s Backup Will your recovery work?"— Presentation transcript:

1 Schrödinger’s Backup Will your recovery work?
Patrick Flynn SQL Saturday South Island 8th April 2017

2 Thank you to our sponsors:
Gold Sponsors Silver Sponsors Bronze Sponsors

3 Please fill out your evaluation forms.
You have them in your A4 pack from registration. Please put them in the box at the front of the room. There are spot prizes for completed evaluation forms. Patrick Flynn | Schrödinger’s Backup SQL SATURDAY | #614 | South Island 2017

4 Who am I MCM – SQL Server 2008 MCSM – Data Platform
Patrick Flynn MCM – SQL Server 2008 MCSM – Data Platform Production DBA for 10+ years.

5 Schrödinger’s cat A thought experiment devised by Austrian physicist
Erwin Schrödinger in 1935 a cat, a flask of poison, a radioactive source are placed in a sealed box. If an internal monitor detects radioactivity the flask is shattered, releasing the poison that kills the cat. While box is closed the cat can be thought to be both alive and dead. Only when box is opened can actual state be determined.

6

7 Schrodinger’s Backup Not testing your recovery plan is unknowingly running a Schrödinger’s backup experiment. Unless tested a Backup can be either good or bad. Only by completing a Restore can you be assured that your Backup was valid. A failed Schrödinger’s backup experiment will often become a RGE* RGE – Resume Generating Event GitLab.Com (used by 100,000+ organisations – January A tired sysadmin, working late at night in the Netherlands, had accidentally deleted a directory on the wrong server during a frustrating database replication process: he wiped a folder containing 300GB of live production data that was due to be replicated. Problems Encountered LVM snapshots are by default only taken once every 24 hours. YP happened to run one manually about 6 hours prior to the outage Regular backups seem to also only be taken once per 24 hours, though YP has not yet been able to figure out where they are stored. According to JN these don’t appear to be working, producing files only a few bytes in size. Disk snapshots in Azure are enabled for the NFS server, but not for the DB servers. The replication procedure is super fragile, prone to error, relies on a handful of random shell scripts, and is badly documented Our backups to S3 apparently don’t work either: the bucket is empty We don’t have solid alerting/paging for when backups fails, we are seeing this in the dev host too now. So in other words, out of 5 backup/replication techniques deployed none are working reliably or set up in the first place. => we're now restoring a backup from 6 hours ago that worked

8 Restore Strategy Requires A defined RPO* and RTO*
Regular and Correct Backups Tested Restore Process (automated and manual) Documented Processes and Procedures RPO – Recovery Point Objective (how much data can be loss is acceptable) RTO – Recovery Time Objective (how much downtime is acceptable) Making sure that you are alerted that the building is burning down needs to be at the very least as important as knowing that if the building did burn down that you could restore your data.

9 A Defined RPO* and RTO* RPO: Recovery Point Objective
In terms of time, how much data are we willing to lose? RTO: Recovery Time Objective Our goal for how quickly we can restore the database. These Are A Business Decision! Need to know how long it will take to restore if everything but Backup Files are gone Includes: Time to get backup files from Offsite / Tape / Data Domain / NAS etc Time to copy to Restore Location If you don’t have a server to restore to, how long will it take to bring one up and configure it How long will Restore take

10 Regular and Correct Backups
Backup Your Databases Use Checksum on Backups Restore Verify Only How Do You Know When Your Backups Aren’t Successful? Alerts on Failure Run Reports to Check for Missing Backups TraceFlag 3023 Backup Report sp_Blitz SQL Script DBA Reports Backups Maintenance Plans Ola Hallengren Minion Backup Building a Centralised Database Maintenance and Monitoring Solution Manohar Punna 3:45pm

11 Regular and Correct Backups
3-2-1 Backup Rule How Much Do You Lose if Even Just One Backup File Goes Bad? The accepted rule for backup best practices is the three-two-one rule. It can be summarized as: if you’re backing something up, you should have: At least three copies, In two different formats, with one of those copies off-site.

12 Demo’s of Backup Script
Maintenance Plans Ola Hallengren Minion Backup

13 Tested Restore Process
You must Test your Restores Restore with Checksum Automated Restore Testing Restore to Test Environments Manual Testing (Fire Drills) Regular CheckDB Minion CheckDB NSA may be backing up your data but not seen a successful restore

14 Demo of DBATools Restore Database

15 Documented Processes and Procedures
Restore Strategy must be documented How and where to restore the data Currently used versions of relevant software Contact Information Must be kept Up-to-Date Must be Tested Having the right data selected for backup and backups running regularly and correctly is still only part of the solution to know you have a good backup that can get you back up and running. You must document the procedure on how and where to restore the data and/or application that you are backing up. Restore documentation shouldn’t just contain information about the restore. It should also include hard copies of currently used versions of relevant software, serial numbers for any software, contact numbers for support, and support contract reference numbers. Any documentation should have a date of when it was last updated and tested, for wrong documentation can be worse than no documentation at all. All documentation should have a glossary page explaining the acronyms to ensure that the person reading the documentation understands what the person who wrote it was trying to say. The restore procedure should also periodically be tested, and the documentation that is created should be followed to the letter, and any changes needed to the documentation should be noted and updated. Think of it like a smoke alarm, you are supposed to test your smoke alarms when you change your clock for daylight savings time. Making sure that you are alerted that the building is burning down needs to be at the very least as important as knowing that if the building did burn down that you could restore your data.

16 In Summary Build a Restore Strategy Test It! Document it. Questions ?
Having the right data selected for backup and backups running regularly and correctly is still only part of the solution to know you have a good backup that can get you back up and running. You must document the procedure on how and where to restore the data and/or application that you are backing up. Restore documentation shouldn’t just contain information about the restore. It should also include hard copies of currently used versions of relevant software, serial numbers for any software, contact numbers for support, and support contract reference numbers. Any documentation should have a date of when it was last updated and tested, for wrong documentation can be worse than no documentation at all. All documentation should have a glossary page explaining the acronyms to ensure that the person reading the documentation understands what the person who wrote it was trying to say. The restore procedure should also periodically be tested, and the documentation that is created should be followed to the letter, and any changes needed to the documentation should be noted and updated. Think of it like a smoke alarm, you are supposed to test your smoke alarms when you change your clock for daylight savings time. Making sure that you are alerted that the building is burning down needs to be at the very least as important as knowing that if the building did burn down that you could restore your data.

17 Thank you to our sponsors:
Gold Sponsors Silver Sponsors Bronze Sponsors

18 Please fill out your evaluation forms.
You have them in your A4 pack from registration. Please put them in the box at the front of the room. There are spot prizes for completed evaluation forms. Patrick Flynn | Schrödinger’s Backup SQL SATURDAY | #614 | South Island 2017

19 Resources Backup Monitoring Backup Solutions Testing Restores
Backup Monitoring Testing Restores

20 Resources Scary DBA SQL Skills The Accidental DBA (Day 6 of 30): Backups: Understanding RTO and RPO The Accidental DBA (Day 7 of 30): Backups: Recovery Models and Backup Types The Accidental DBA (Day 8 of 30): Backups: Planning a Recovery Strategy The Accidental DBA (Day 9 of 30): Backups: Essential BACKUP Options The Accidental DBA (Day 10 of 30): Backups: Backup Testing for Validation The Accidental DBA (Day 11 of 30): Backups: Backup Storage and Retention


Download ppt "Schrödinger’s Backup Will your recovery work?"

Similar presentations


Ads by Google