Presentation on theme: "Best Practices for Backing Up Your System"— Presentation transcript:
1Best Practices for Backing Up Your System Luca RavazzoloTechnology Architect
2Types of backups Cold file-level backup Caché online backup Caché shutdownServer-level copy to disk/tapeCaché restartedCaché online backupCaché’s backup tool copies data blocks from CACHE.DAT files to disk file or tape.Full or various incremental backups
3Types of backups SAN or disk array backup Backup I/O stays within the SAN or the arrayBlock level copy from device to device (disk, tape, virtual tape)All vendors have some type of software to control backups.To backup a consistent image, a point-in-time snapshot or clone is made of the source device.
4Types of backups: others CDP: Continuous Data Protection (Near-CDP)Use of separate appliance to journal changes out-of- band allowing for recovery to any point-in-time,Depending on space available can restore to most any point-in-time.SAN-based ReplicationProvides a disk-to-disk copy within the SAN, perhaps over long distances,Destination can be archived to tape.
6Caché online backup Advantages: Challenges: Caché stays up, users continue to workSimple to implement, may not need 3rd party softwareChallenges:Only backs up the CACHE.DAT data – must also backup journals, other files.Restores typically take multiple stepsCreate a Caché instanceRestore “*<date>.cbk” files from storageApply most recent full backup, then cumulative & incrementalsApply journal files
7Disk array/SAN-based snapshot Advantages:Point-in-time copy of all data (Caché and otherwise)Requires no downtime (when using Caché write daemon freeze and thaw)Challenges:Requires snap/clone technologyRequires additional software to coordinateThere are two main types of storage snapshot, called thecopy-on-write (or low-capacity) snapshot and thesplit-mirror snapshot.Utilities are available that can automatically generate either type.-A copy-on-write snapshot utility creates a snapshot of changes to stored data every time new data is entered or existing data is updated. This allows rapid recovery of data in case of a disk write error, corrupted file, or program malfunction. However, all previous snapshots must be available if complete archiving or recovery of all the data on a network or storage medium is needed.-A split-mirror snapshot utility references all the data on a set of mirrored drives. Every time the utility is run, a snapshot is created of the entire volume, not only of the new or updated data. This makes it possible to access data offline, and simplifies the process of recovering, duplicating, or archiving all the data on a drive. However, this is a slower process, and it requires more storage space for each snapshot.--from
8CDP or replication Advantages Challenges CDP allows restore to nearly any point-in-timeReplication allows geographically separated backupsChallengesNon-Caché technologies require coordination with Caché, i.e.May end up with Caché in a crash-consistent state and require recovery before useRequires appliances and softwareCDP = Continuous Data Protection
9Coordinating with Caché External BackupCoordinating with Caché
10Freeze the write daemon(s) For a consistent database image on your backup media (i.e. a CACHE.DAT without integrity errors) the write daemon’s cycle must be complete.Use the Backup.General.ExternalFreeze() methodKeeps write daemon from writingWaits for current write daemon cycle (if active) to finishSwitches journal fileLogs information to the cconsole.log file.
11Freezing the write daemon ExternalFreeze command:OS command returns a code:5 – successful3 - failureWhile frozen, all updates are made as usual to database cacheProcesses continue to run normally UNLESS:Available buffers in the database cache falls too low.The ExternalFreeze lasts longer than the default limit (600 seconds)#csession cache –U%SYS “##class(Backup.General).ExternalFreeze()”#echo $?%SYS>SET rc=##class(Backup.General).ExternalFreeze()
12Thaw the write daemonUse Backup.General.ExternalThaw to allow write daemon(s) to resume writing.Thaw command:OS-level command returns one of these codes:5 – success3 - failure#csession cache –U%SYS “##class(Backup.General).ExternalThaw()”#echo $?%SYS>SET rc=##class(Backup.General).ExternalThaw()
13Another useful methodUse Backup.General.ExternalSetHistory to log successful backups in the Backup Historylog is name of an externally created backup logdesc is free text%SYS>SET log=“/var/logs/backup.log”,desc=“Full Backup”%SYS>S rc=##class(Backup.General).ExternalSetHistory(log,desc)
14Who runs the freeze/thaw? The operating system user that executes the freeze/thaw command must have access to Caché.In normal install, the “backup” user must be a Caché user.%Service_Terminal must allow OS-level authentication.Caché “backup” user needs RW on the %DB_CACHESYS resource as well as use of %Admin_Operate and %Service_Terminal
15Case Study: External Backup Using snapshots, a de-duplication appliance and replication for an external backup of Caché
16External backup 1: Caché & snaps Backup software initiates the backup process from media serverMEDIAInvoke script on server running Caché to FREEZE write daemonBackup software initiates clone or snapshot of all Caché arraysMEDIAInvoke script on server running Caché to THAW write daemon
17External backup 2: Mount & copy Backup software mounts snapshot on the media serverMEDIABackup software does file level copy from snapshot to disk-based backup appliance.MEDIABackup releases the snapshot via command-line interface call to disk controllerMEDIA
18Ext Backup 3: Replicate, verify & archive Backup software initiates a backup copy to a secondary data centerMEDIAIn secondary data center, replicated backup is restored, mounted in a Caché instance and an integrity check is run to verify structural integrity.Depending on space and policy backup is kept online and/or archived to tape for long term storage.
19Timings and best practices Backup software initiates the backup process from media serverMEDIABackup software:Must be able to call freeze/thaw script on Caché serverMust be able to initiate the snapshotMost commercial backup software will work well including EMC Networker, Symantec NetBackup, IBM Tivoli (TSM), etc.
20Timings and best practices Sample scripts available from the WRCTime to freeze and return depends onDatabase activityCurrent write daemon phase (i.e. is it writing to disk?)Invoke script on server running Caché to FREEZE write daemon04/02-02:30:00 (1098) 0 ExternalFreeze: Suspending system04/02-02:30:00 (1098) 0 ExternalFreeze: Description: Backup Performed by TSM at: :30:0004/02-02:30:01 (1098) 0 ExernalFreeze: Start a journal restore for this backupwith journal file: /jrn/04/02-02:30:02 (1098) 0 ExernalFreeze: System suspended
21Timings and best practices Creating the clone or snap - this period is when write daemon(s) are frozen.Timing is based on array controller activityIf greater than a few minutes there is a risk of running into freeze timeout.50 seconds frozen withIBM DS5300 using FlashCopy ona few TB of data with active systemsBackup software initiates clone or snapshot of all Caché arraysMEDIA04/02-02:30:02 (1098) 0 ExernalFreeze: System suspended04/02-02:30:52 (9109) 0 ExternalThaw: Resuming system
22Timings and best practices Thawing the write daemon takes seconds at most.Best practice is to be sure to thaw the database on any error along the way.Perhaps have an independent job to check database status and thaw if frozen---so a failed backup will never leave Caché frozen.Invoke script on server running Caché to THAW write daemon
23External backup 2: Mount & copy Backup software mounts snapshot on the media serverMEDIABackup software does file level copy from snapshot to disk-based backup appliance.MEDIAUse of a de-duplication appliance as the file- level backup target speeds backup and saves space.Timings vary a lot here---disk used, dedupe rate etc.
24Ext Backup 3: Replicate, verify & archive Backup software initiates a backup copy to a secondary data centerMEDIASAN level replication or replication via de- duplication appliance.Timings vary a lot here based on bandwidth and de-dupe rate if applicable.
25Ext Backup 3: Replicate, verify & archive Integrity checks vary in timingAnother option is to have media server in primary data center run the check.In secondary data center, replicated backup is restored, mounted in a Caché instance and an integrity check is run to verify structural integrity.Depending on space and policy backup is kept online and/or archived to tape for long term storage.
26Final pointsConsidering cost and effort, Caché online backup works well for small to medium size databases (~ 100s of GB total) with generous RTOsUse InterSystems Mirroring in conjunction with your backup mechanism.Perhaps there will be no need to restore a backupIf needed, the mirror destination will have CACHE.DAT files and journal files.
27Final points Backup should have minimal impact on live database Using SAN/disk controller based backups offloads the work to other appliances/serversSAN/disk-based backups meet the fastest RTOs.Restore from backup RPOs are as good as the most recently available journal file.
28Best Practices for Backing Up Your System Luca RavazzoloTechnology Architect