Disaster Recovery Management

Slides:



Advertisements
Similar presentations
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Advertisements

Hands-on RAID on Moxa Computer Prepared by: (40min) Date: mm-dd-yyyy.
Networking Essentials Lab 3 & 4 Review. If you have configured an event log retention setting to Do Not Overwrite Events (Clear Log Manually), what happens.
How to Ensure Your Business Survives, Even if Your Server Crashes Backup Fast, Recover Faster Fast and Reliable Disaster Recovery, Data Protection, System.
 RAID stands for Redundant Array of Independent Disks  A system of arranging multiple disks for redundancy (or performance)  Term first coined in 1987.
A new standard in Enterprise File Backup. Contents 1.Comparison with current backup methods 2.Introducing Snapshot EFB 3.Snapshot EFB features 4.Organization.
Backup Strategy. An Exam question will ask you to describe a backup strategy. Be able to explain: Safe, secure place in different location. Why? – For.
Backups Backups are essential for recovering from – mistakes deleting a file by accident making changes to a document or file that turn out to be undesirable.
Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Chapter 12 File Management Systems
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
Preservasi Informasi Digital.  It will never happen here!  Common Causes of Loss of Data  Accidental Erasure (delete, power, backup)  Viruses and.
COMPUTER BACKUP A disaster will happen to you one day…an accidentally deleted file, a new program that caused problems or a virus that wreaked havoc, wiping.
Data Integrity: Backups and RAID Track 2 Workshop PacNOG 7 June 29, 2010 Pango Pango, American Samoa (Original slides by Phil Regnauld)
®® Microsoft Windows 7 for Power Users Tutorial 10 Backing Up and Restoring Files.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Backup & Restore The purpose of backup is to protect data from loss. The purpose of restore is to recover data that is temporarily unavailable due to some.
Maintaining Windows Server 2008 File Services
IBM TotalStorage ® IBM logo must not be moved, added to, or altered in any way. © 2007 IBM Corporation Break through with IBM TotalStorage Business Continuity.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
November 2009 Network Disaster Recovery October 2014.
Data Security GCSE ICT.
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
RAID and mirroring Track SA-E AfNOG workshop May 15, 2009 Cairo, Egypt (Slides by Phil Regnauld)
ISA Topic 9: Operations Security ISA 562 Internet Security Theory & Practice.
Introduction to Computer Networks Introduction to Computer Networks.
Day 10 Hardware Fault Tolerance RAID. High availability All servers should be on UPSs –2 Types Smart UPS –Serial cable connects from UPS to computer.
Chapter Sixteen Data Recovery and Fault Tolerance.
Business Continuity and Disaster Recovery Chapter 8 Part 2 Pages 914 to 945.
IS 380.  Provides detailed procedures to keep the business running and minimize loss of life and money  Identifies emergency response procedures  Identifies.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
BACKUP & RESTORE The purpose of backup is to protect data from loss. The purpose of restore is to recover data that is temporarily unavailable due to some.
Chapter Fourteen Windows XP Professional Fault Tolerance.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Backup Track SA-E AfNOG workshop May 15, 2009 Cairo, Egypt (Slides by Phil Regnauld)
Linux Operations and Administration
| nectar.org.au NECTAR TRAINING Module 9 Backing up & Packing up.
Preventing Common Causes of loss. Common Causes of Loss of Data Accidental Erasure – close a file and don’t save it, – write over the original file when.
Managing Disks and Drives Chapter 13 powered by dj.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
Chapter 6 Protecting Your Files. 2Practical PC 5 th Edition Chapter 6 Getting Started In this Chapter, you will learn: − What you should know about losing.
Disaster Recovery and Business Continuity Planning.
Backup & Restore The purpose of backup is to protect data from loss. The purpose of restore is to recover data that is temporarily unavailable due to some.
SECURITY OF DATA By: ADRIAN PERHAM. Issues of privacy; Threats to IT systems; Data integrity; Standard clerical procedures; Security measures taken to.
Cosc 4750 Backups Why Backup? In case of failure In case of loss of files –User and system files Because you will regret it, if you don’t. –DUMB = Disasters.
RAID Systems Ver.2.0 Jan 09, 2005 Syam. RAID Primer Redundant Array of Inexpensive Disks random, real-time, redundant, array, assembly, interconnected,
Fault Tolerance and Disaster Recovery. Topics Using Antivirus software Fault tolerance –Power –Redundancy –Storage –Services Disaster Recovery –Backup/Restore.
Data Integrity: Backups and RAID Track SA-E AfCHIX workshop Blantyre, Malawi (Original slides by Phil Regnauld)
| nectar.org.au NECTAR TRAINING Module 9 Backing up & Packing up.
Install, configure and test ICT Networks
Verification & Validation
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
RAID Presentation Raid is an acronym for “Redundant array of independent Drives”, or Redundant array of inexpensive drives”. The main concept of RAID is.
Memory management. Linux Memory Management Total memory available for processes = real memory + paging space - 1MB. First megabyte of real memory is used.
Backup and Disaster Dr Stuart Petch CeG IT/IS Manager
Chapter 6 Protecting Your Files
Back it up – Don't be a fool!
Cairo, Egypt (Slides by Phil Regnauld)
RAID RAID Mukesh N Tekwani
Overview Continuation from Monday (File system implementation)
UNIT IV RAID.
Data Backup Strategies
RAID RAID Mukesh N Tekwani April 23, 2019
Hard Drives & RAID PM Video 10:28
Chapter 10 Archives and Backups.
Presentation transcript:

Disaster Recovery Management OOPS! No water Poor recovery plan Disaster Recovery Management

Legal Aspects - the syllabus says: Disaster recovery management Describe the various potential threats to information systems, e.g. physical security; document security; personnel security; hardware security; communications security; software security. Understand the concept of risk analysis. Understand the commercial need to ensure that an information system is protected from threat. Describe a range of contingency plans to recover from disasters and relate these to identified threats. Describe the criteria used to select a contingency plan appropriate to the scale of an organisation and installation.

The Corporate Consequences of System Failure Any company that loses its computer data, even temporarily, will face serious financial losses. If company data is lost permanently then the chances of the company surviving are small. A well tested contingency plan (disaster recovery plan) is needed to recover data quickly after a disaster.

Potential threats to an IS Communication breach – hacking and altering data Hardware failure – disk head crash corrupts H/W,S/W & data Physical failure – fire, flood, earthquake, terrorist attack, split coffee on a stand alone.. corrupts H/W,S/W & data Personnel – accidental overwrite of data Unexpected invalid data causes software program to crash or corrupt files Power surge or power loss corrupts H/W,S/W & data Virus – such as Trojan horse corrupts S/W & data CHPPPV

Disaster Avoidance or Counter Measures CHPPPV Communication breach Allow 3 password attempts before disabling user id, firewalls to prevent unauthorised external access, encryption of data doesn’t prevent corruption but prevents viewing..all lessen chance of data being seen/corrupted Hardware failure have duplicate system or hot site set up so can transfer backup discs quickly causing minimum down time. Physical failure backup of data and software kept off site/fireproof safe/above flood line causing minimum down time. Personnel training in ICT good practice procedures so less likely to make mistakes Unexpected invalid data should be caught with validation/verification checks on data entry and test with every type of data. Power surge/loss should have a power surge protector device and a back up generator in place so can save files before power loss. Virus run up-to-date anti-virus software on all machines to detect viruses before they can cause damage.

What is Risk Analysis? • identify each element of an information system • place a value to the business on that element • identify any potential threats to that element • consider the likelihood of the threat occurring • calculate an overall Risk Figure based on the value and likelihood of the potential threat • make contingency/disaster recovery plan based on the various risk figure results What is the risk of someone in the pictures hurting themselves, someone else or their equipment ? Are some accidents worse than others? Why?

Disaster Recovery Management

Disaster Recovery Management R .risk analysis • identify each element of a successful information system • place a value to the business on that element • identify any potential threats to that element • the likelihood of the threat occurring • use an algorithm to calculate an overall risk figure • that will indicate a degree of severity S . security policy • Prevention of misuse • Physical security procedures • Logical (software) security procedures • Detection of misuse • Investigation of misuse • Staff responsibilities • Disciplinary procedures • Code of Practice • Adherence/Compliance with legislation A . auditing • Network auditing • Financial systems auditing • Application systems auditing • Impact of auditing • Audit tools • Audit trails D . disaster recovery • Threats to systems . e.g. physical, document, personnel, hardware, communications (network), software • Contingency plans . e.g. People involved, steps to be taken, types ( RAID, cold site recovery, reciprocal agreements) etc • Criteria for selecting contingency plan . e.g. scale, location, likelihood, recovery costs , type of systems etc (1m in total if listed, but 1m for each explained – 3 bullets above) • Why protect - commercial need • Backup (must talk about a feature or reason to get the first mark e.g. thinking about where to keep backup or frequency etc) • Recovery (ditto)

Creating a backup strategy This is driven by many variables. Such as: How long can you be offline before your org disappears? Do you have legal responsibilities. Levels of backup planned: Daily Weekly Monthly Quarterly Semi-annually Annually How long must you keep the data? How do you restore the data? Does your restore need to be “bare metal” or just data? Bare metal, fast restore, long-term storage = more $$

Incremental Image Backups Application writes to disk Source disk block-level backup update backup image Backup disk (image stored in VHD file) Shadow copy storage to track changes block-level restore Older restored disk, based on shadow copy Restored disk, same as source Restored disk same as updated source disk

Linux Backup Tools Open Source options dd dump tar rsync (Apple’s Time Machine uses this) Amanda Bacula (heavily used, very popular)

dd The lowest level type of backup Bit-for-bit copy For example: dd if=/dev/ad0s1a of=/backup/root Exact copy, but not efficient if you only use 100 MB on a 1 GB partition, you still end up with a backup of 1 GB compression helps, but you still spend time copying unused space Best for doing system recovery, or… Copying media (CD-ROMs, DVDs, etc.)

Dump The traditional UNIX® backup programs dump and restore. Works at inode level Takes backups of entire file systems, but only the used space It is unable to backup only part of a file system Dump does not backup across mount points (directory tree that spans more than one file system) Note: If you use dump on your / partition, you would not back up /home, /usr or or any other mounted FS. You must explicitly run dump for each FS.

Dump Dump can backup to several media local file remote file tape Dump can take incremental dumps only files that have changed are backup up

Remote dump It is possible to run dump over ssh for a secure transport: # /sbin/dump -0uan -f - /usr | gzip -2 | \ ssh targetuser@targetmachine.example.com | \ dd of=/backups/dump-usr.gz Anyone asking, “Where’s the if parameter for dd?

Tar tar (1) (Tape Archive) dates back to Version 6 of AT&T UNIX (circa 1975). tar operates in cooperation with the file system; tar writes files and directories to tape or to a file. Just like with dump, one can use ssh to backup across the network: # tar -cfz - / | (ssh remote; cat >/backups/backup-0425.tgz)

Examples using tar Let's take a backup of /etc where most configuration files reside, and place it in /home/backups: # mkdir /home/backups # tar -cvf /home/backups/etc.tar /etc Note: The -c option to tar tells it to create an archive, -v specifies verbose output and -f specifies the file to be either written to or read from You'll see quite a lot of output as tar creates the archive at this point.

Examples using tar Now we check whether our archive has actually been created # cd /home/backups # ls This now show us a new file in this directory etc.tar If we now wanted to view the contents of this backup we can run # tar -tvf etc.tar

Examples using tar This will show you the contents of the etc directory as you backed it up. To actually restore and and unpack the contents that were backed up previously: # cd /home/backups # tar -xvf etc.tar

Examples using tar Notice that the restore actually creates a new directory etc where you are located – not in /etc ! This is because tar by default removes the leading '/' from the directories it has backed up in order not to overwrite the original files on your system when you choose to do a restore (a security consideration)

Tar Exercise Backup /etc directory to /home/pacnog/backups/ cd mkdir backups tar –cvf /home/pacnog/backups/etc.tar /etc cd backups See what was backed up tar –tvf etc.tar

Rsync Another very powerful tool is rsync http://samba.anu.edu.au/rsync/ rsync is very efficient: it only transfers files that have changed, and for those files, only the parts of the files that have changed This is very efficient for large trees with many files, some of them large Great for replicating a server off-site, or for doing quick backups for a migration.

Rsync Combined with the --link-dest option, it allows to do snapshot-like backups. --link-dest takes the newest backup, and makes links (which take 0 space) to the files that have not changed, and replicates those that have changed Allows for backup.0, backup.1, backup.2, backup.3, where backup.X is a COMPLETE copy of the replicated source, but the disk space used is ONLY the difference.

Rsync – example script On remote backup host: # rm -rf /backups/etc.2 # mv /backups/etc.1 /backups/etc.2 # mv /backups/etc.0 /backups/etc.1 # mv /backups/etc /backups/etc.0 On machine to be backed up: # rsync -avHS \ --link-dest=etc.0 \ /etc/ host:/backups/etc/ This will backup only changed files from /etc/ to host:/etc/. Unchanged files are linked from etc.0

Other tools Rdiff-backup http://www.nongnu.org/rdiff-backup/ Unison http://www.cis.upenn.edu/~bcpierce/unison/ Rnapshot http://www.rsnapshot.org/

Other possible Backup methods Disk duplication Using the dd command mentioned earlier, it is possible to duplicate your entire disk block by block on another disk. However the source and destination disk should be identical in size or the destination must be bigger than the source. Another way of doing this is using RAID1 mirroring and hot swappable disks: make sure the RAID volume is rebuilt (OK) remove one of the two disks (call it “backup”) replace “backup” with a fresh disk, let the RAID rebuild take “backup” home Remember: RAID or mirroring is not backup. An “rm -rf /” on your RAID set works very well!

Other possible Backup methods Disk duplication (2) instead of mirroring the two disks, make two filesystems, and use rsync to copy every night from disk 1 to disk 2 in case of user error (rm -rf), you can recover from disk 2, without having to pull the backup tapes out of the safe NOTE: IT DOES NOT HELP IF THE SERVER IS STOLEN OR THERE IS A FIRE, IF BOTH DISKS ARE IN THE MACHINE!

Networked backup systems There are a number of networked backup systems out there for backing up many servers to one or more backup servers, using tape drives or disk storage. In the Open Source world, two backup systems stand out: AMANDA - http://www.amanda.org/ BACULA - http://www.bacula.org/

Amanda Advanced Maryland Automatic Network Disk Archiver Has been around for many years Networked backup Support incremental backups to disk, tape Can backup to a holding disk, flush to tape later Encrypted data flows and backup data Tape library / loader control and labeling Windows backup using a windows client All source code for Amanda is open source

Bacula Written by the people who invented AutoCAD Extremely popular and well-tested. Claims to be the most popular Open Source, Enterprise-level backup package around. Impressive documentation (400- pages!), including a developer's guide and tutorial Support incremental backups to disk, tape Complete SQL backend (MySQL, PgSQL, SQLite) Encrypted data flows using TLS (standard!) Tape library / loader control and labelling Native Windows client Good documented scenarios for specific backup cases, including complete “bare metal” restore

Bacula: Supported OS’s The “big three” Windows Mac OS X UNIX/Linux Are fully supported in Bacula. Additional, typical “enterprise” OS versions are supported as well (HP/UX, AIX, Solaris, etc.)

Reminder: Backup security Take the disks / tapes / CDs off site! It does not help if there is a fire or if tapes are stolen Consider encrypting the data on the disks / tapes / CDs What happens if the tapes are stolen? What happens when you throw them out?

RAID Redundant Array of Independent Disks Redundant Array of Inexpensive Disks RAID 0 RAID 1 RAID 3 RAID 5 RAID 6 RAID 1+0 or 10

Dedicated to save us from RAID 5 Fun Facts Due to quantum physics… Error rate = 100% for 1TB+ drive writes RAID 6+ required to deal with this issue Enterprise Class Drives Built to reduce vibration Reduced vibration = More reliable Cost a bit more, but essential in critical environments May 2010 1TB, 3Gb/s, 7200RPM, 32 MB Cache, 1.2 million hours MTBF SATA drives around USD $150/each. 2TB, 6Gb/s, 7200RPM, 64 MB Cache, 1.2 million hours MTBF SATA drives around USD $250/each. http://www.baarf.com/ Dedicated to save us from RAID 5

Types of redundancy There are different levels of redundancy: none – if a disk crashes, data is lost RAID1 – 2 disks are mirrored, data is written to both disks at any time. One disk can be lost without losing data. disk DATA DATA disk DATA disk

Types of redundancy RAID3, RAID5 – data is distributed across several disks, data parity, used to rebuild a defective drive, is either placed on a dedicated drive (RAID3) or across all drives (RAID5): DATA DATA PARITY DATA DATA PARITY DATA PARITY DATA PARITY DATA DATA disk disk disk disk disk disk RAID3 RAID5

RAID 0 Striping Not technically RAID, but a RAID card is used to implement. Data is striped between disks. Improves I/O in most cases.

RAID 1 Mirroring Disks are mirrored, data is written to both disks at any time. One disk can be lost without losing data.

RAID 3 Striping+Dedicated Parity Data is written across multiple disks (striping). A dedicated disk is used for parity. Recovering from remaining disks plus parity disk. Lost parity disk = lost RAID array. Fast I/O.

RAID 5 Striping+Distributed Parity Data is written across multiple disks (striping). Parity is written across all disks. Most popular type of RAID after RAID 1. Can lose any 1 disk (set of 3) Has serious subtle issues!

RAID 6 Striping+Double Distributed Parity Data is written across multiple disks (striping). Parity is written across all disks multiple times. Fixes issues with RAID 5. Can lose any 2 disk (set of 4). Fixes issue with 1TB+ drives.

RAID 1+0 or “10” Mirrored Sets in a Striped Set Data is mirrored in multiple sets and sets are striped. Provides performance and fault tolerance. Can lose multiple disks as long as no one mirror loses all disks. Requires more disks for same storage space. Referred to as “nested” or “hybrid” RAID.

RAID Controller Failure What do you do? Use a hot spare RAID Card. Card must be identical. Cards must support hot -spare in BIOS. Generally connected by on- board data path, or via cable between both cards. Otherwise, at a minimum, buy 2xRAID card when building your array. If many arrays buy extra cards.

Hardware or software ? In general, hardware RAID is more transparent to the user, and disk replacement is straightforward: remove defective disk install new disk RAID controller detects this and starts rebuilding on new disk (Note: real hardware RAID controllers, NOT BIOS RAID such as Promise)

Hardware or software ? RAID3 and 5 can be complex to implement in software (in the OS), so hardware might be a better choice But what happens if the RAID controller dies? How does one recover if one does not have a spare controller? Consider having a spare controller for RAID3/RAID5/RAID6/RAID1+0 (Note: we mean real hardware RAID controllers, not BIOS software RAID such as Promise)

Hardware or software ? RAID1 is easy to recover from and easier to implement in software (within the OS) – worst case, all one needs is to skip a header at the beginning of each disk. FreeBSD and Linux have very good software RAID implementations nowadays In FreeBSD, at least 3 implementations: gmirror ccd gvinum (also RAID5, but not recommended) But you want to use ZFS…