Reliability and Recovery CS 537 - Introduction to Operating Systems.

Slides:



Advertisements
Similar presentations
Chapter 16: Recovery System
Advertisements

1 CSIS 7102 Spring 2004 Lecture 9: Recovery (approaches) Dr. King-Ip Lin.
IDA / ADIT Lecture 10: Database recovery Jose M. Peña
1 CPS216: Data-intensive Computing Systems Failure Recovery Shivnath Babu.
Database Recovery Unit 12 Database Recovery 12-1.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
Recovery CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
CSCI 3140 Module 8 – Database Recovery Theodore Chiasson Dalhousie University.
Chapter 19 Database Recovery Techniques
Chapter 11: File System Implementation
Crash Recovery.
Crash Recovery. Review: The ACID properties A A tomicity: All actions in the Xaction happen, or none happen. C C onsistency: If each Xaction is consistent,
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
CS 333 Introduction to Operating Systems Class 16 – Secondary Storage Management Jonathan Walpole Computer Science Portland State University.
©Silberschatz, Korth and Sudarshan17.1Database System Concepts 3 rd Edition Chapter 17: Recovery System Failure Classification Storage Structure Recovery.
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Remote Backup Systems.
Operating Systems COMP 4850/CISG 5550 Disks, Part II Dr. James Money.
Transactions and Recovery
TRANSACTIONS A sequence of SQL statements to be executed "together“ as a unit: A money transfer transaction: Reasons for Transactions : Concurrency control.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Distributed Deadlocks and Transaction Recovery.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
Data Concurrency Control And Data Recovery
Switch off your Mobiles Phones or Change Profile to Silent Mode.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
Recovery System By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Copyright © Curt Hill, RAID What every server wants!
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
1 How can several users access and update the information at the same time? Real world results Model Database system Physical database Database management.
1 Recovery System 1. Failure classification 2. Storage structure 3. Data access 4.Recovery & atomicity 5. Log-based recovery 6. Shadow paging 7. Recovery.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 15 Recovery. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.15-2 Topics in this Chapter Transactions Transaction Recovery System.
Free Space Management.
CSCI Recovery Control Techniques 1 RECOVERY CONTROL TECHNIQUES Dr. Awad Khalil Computer Science Department AUC.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
IT1001 – Personal Computer Hardware & system Operations Week7- Introduction to backup & restore tools Introduction to user account with access rights.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 2) Academic Year 2014 Spring.
Chapter 17: Recovery System
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 17: Recovery System.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Transactions and Reliability Andy Wang Operating Systems COP 4610 / CGS 5765.
Recovery technique. Recovery concept Recovery from transactions failure mean data restored to the most recent consistent state just before the time of.
Transactional Recovery and Checkpoints. Difference How is this different from schedule recovery? It is the details to implementing schedule recovery –It.
CS399 New Beginnings Jonathan Walpole. Disk Technology & Secondary Storage Management.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Database Recovery Zheng (Godric) Gu. Transaction Concept Storage Structure Failure Classification Log-Based Recovery Deferred Database Modification Immediate.
CS Introduction to Operating Systems
File System Consistency

Database Recovery Techniques
Remote Backup Systems.
Database Recovery Techniques
Jonathan Walpole Computer Science Portland State University
Transactions and Reliability
What every server wants!
Backup and Recovery Techniques
Database Recovery Techniques
File Processing : Recovery
Overview Continuation from Monday (File system implementation)
Jonathan Walpole Computer Science Portland State University
Printed on Monday, December 31, 2018 at 2:03 PM.
Overview: File system implementation (cont)
Module 17: Recovery System
Recovery System.
Backup and Recovery Techniques
Database Recovery 1 Purpose of Database Recovery
CS333 Intro to Operating Systems
Remote Backup Systems.
Presentation transcript:

Reliability and Recovery CS Introduction to Operating Systems

System Failures All systems fail Fatal failures –disk bearings fail –controllers go bad –fire burns down entire system Limited failures –single block on disk goes bad –power goes out

Fatal Failures In a fatal failure, all data on system is lost To recover data another copy must be kept –tape drive –floppy drive –second hard drive Most systems are backed up to tape on a regular basis –to save space, only backup files that have changed since the last backup

Limited Failures Limited failures may destroy some data but not all –single bad block may ruin one file but not all Destroyed data may be restored in several ways –from backup –using error correcting codes (ECC) –redo operations that lead to existing system

ECC 100’s of millions of bits in memory 100’s of billions of bits on a disk Virtually impossible to make a memory or disk without bad bits Must find a way to deal with these inevitable bad bits

ECC When a given set of bits are stored, an ECC code is calculated –this code is stored with the data When data is read, the ECC is recalculated and compared with that stored If they don’t match, there is an error in the data These calculations and checks are usually performed by hardware –memory controller or disk controller

ECC Single error correcting, double error detecting –using the ECC code, it is possible to find and correct a single bad bit –it is possible to find up to two bad bits and inform the user of the problem With more complicated math, you can correct more bits and determine more errors

ECC do math here

Block Forwarding Set aside a number of disk blocks –under normal operation, these blocks are not used at all If a bad block is detected, the controller will re-map that block to one of the reserved blocks All future references to the original block are now forwarded to the new block

Block Forwarding Reserved blocks bad blocks remapping All references to blocks 4 and 9 are now forwarded to blocks 12 and

Block Forwarding This indirection keeps things working This indirection can hurt performance Disk scheduling algorithms don’t work as well any more –OS doesn’t know about the remapping –using the elevator algorithm could now jump all over the disk

Transaction A transaction is a group of operations that are to happen atomically –transactions should be synchronized with respect to one another –either all of the operation happens or none of it This can be difficult to do in the event of a system failure

Logging Keep a separate on disk log that tracks all operations Mark the beginning of transaction in log Mark the end of transaction in log On reboot from failure, check the log –any transactions that were started and not finished are undone –any transactions that were completed are redone this has to be done because of caching in memory

Logging Log block 27 3 block Transaction 1 block 41 7 block Transaction 2 23 begin undo 27,3 redo 27, 4 undo 32,19 redo 32, 23 commit begin undo 41, 7 redo 41, 4 system crash

Logging To recover from the above system crash –redo transaction 1 scan the log and perform the redo operations –undo the effects of transaction 2 scan the log in reverse and perform the undo operations To make this work, the log should only be written after a transaction has completed

Shadow Blocks Never modify a data block directly Make a copy of the data block (a shadow block) and modify that When finished modifying data, make the parent point to the shadow block instead of the original Of course, this requires making a copy of the parent to be modified This chain continues up to the root –when root is written the transaction is committed –original blocks are then freed

Shadow Blocks In the event of a crash –if the transaction did not complete garbage collect the shadow copies –if the transaction did complete garbage collect the original copies Garbage collection is easy –scan the data tree –any block not in the tree is garbage

Shadow Blocks root ABXY Modify the data in block 6 from Y to Z 30 Z root Once this is written, the transaction is complete