Section 13.4 - Disk Failures Kevin Grant 007512375.

Slides:



Advertisements
Similar presentations
DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)
Advertisements

Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
CS 346 – April 4 Mass storage –Disk formatting –Managing swap space –RAID Commitment –Please finish chapter 12.
- Dr. Kalpakis CMSC Dr. Kalpakis 1 Outline In implementing DBMS we need to answer How should the system store and manage very large amounts of data?
Chapter 16: Recovery System
Hamming Code.
PROTOCOL VERIFICATION & PROTOCOL VALIDATION. Protocol Verification Communication Protocols should be checked for correctness, robustness and performance,
Math for Liberal Studies.  Problems can occur when data is transmitted from one place to another  The two main problems are  transmission errors: the.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
Disk Scrubbing in Large Archival Storage Systems Thomas Schwarz, S.J. 1,2 Qin Xin 1,3, Ethan Miller 1, Darrell Long 1, Andy Hospodor 1,2, Spencer Ng 3.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Performance/Reliability of Disk Systems So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Data Link Layer – Part 1 V.T.Raja Oregon State University.
CPSC-608 Database Systems Fall 2008 Instructor: Jianer Chen Office: HRBB 309B Phone: Notes #6.
Other Disk Details. 2 Disk Formatting After manufacturing disk has no information –Is stack of platters coated with magnetizable metal oxide Before use,
The Voting Game How do we overcome transmission errors?
CSCI 4550/8556 Computer Networks Comer, Chapter 7: Packets, Frames, And Error Detection.
CS 333 Introduction to Operating Systems Class 16 – Secondary Storage Management Jonathan Walpole Computer Science Portland State University.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology April 1, 2004 MEDIA FAILURES Lecture based on [GUW, ]
Chapter 6 Errors, Error Detection, and Error Control
Unit 1 Protocols Learning Objectives: Understand the need to detect and correct errors in data transmission.
CS352- Link Layer Dept. of Computer Science Rutgers University.
Table of Contents First, isolate the term containing the radical. Equation Containing Radicals: Solving Algebraically Example 1 (one radical): Solve Next,
RAID Systems CS Introduction to Operating Systems.
Error Detection and Correction.  Corrupted files  Attachments that won’t open  Files that won’t download  Videos that won’t play Errors occur when.
CS4432: Database Systems II Data Storage (Better Block Organization) 1.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
J.H.Saltzer, D.P.Reed, C.C.Clark End-to-End Arguments in System Design Reading Group 19/11/03 Torsten Ackemann.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 29 Database Systems II Secondary Storage.
It is physically impossible for any data recording or transmission medium to be 100% perfect 100% of the time over its entire expected useful life. As.
Computer Security and Penetration Testing
Error Detection and Correction
Data Recovery Techniques Florida State University CIS 4360 – Computer Security Fall 2006 December 6, 2006 Matthew Alberti Horacesio Carmichael.
E RROR D ETECTION A ND C ORRECTION C ODES Error Detection Code (Parity bit) Error Correction Code ( Hamming Code)
Checking data Chapter 7 Prepared by:Sir Mazhar Javed.
CIT 307 Online Data Communications Error Detection Module 11 Kevin Siminski, Instructor.
CH 3 Deadlock When 2 (or more) processes remain blocked forever!
Lesson 12: Using the Recycle Bin deleting files or folders what the Recycle Bin is restoring files from the Recycle Bin emptying the Recycle Bin identifying.
Error Control Code. Widely used in many areas, like communications, DVD, data storage… In communications, because of noise, you can never be sure that.
Chapter 7 - Packets, Frames and Error Detection 1. Concepts of Packets 2. Motivation for Packet Switching 3. Framing 4. Frame Formats 5. Transmission Errors.
- Disk failure ways and their mitigation - Priya Gangaraju(Class Id-203)
Overview All data can be corrupted, for reliable communications we must be able to detect and correct errors implemented at the data link and transport.
Computer Communication & Networks Lecture 9 Datalink Layer: Error Detection Waleed Ejaz
Data Link Layer. Data Link Layer Topics to Cover Error Detection and Correction Data Link Control and Protocols Multiple Access Local Area Networks Wireless.
Error-Detecting and Error-Correcting Codes
Section Power AP Statistics March 11, 2008 CASA.
End-to-End Arguments in System Design CSCI 634, Fall 2010.
CS399 New Beginnings Jonathan Walpole. Disk Technology & Secondary Storage Management.
Disk Failures Skip. Index 13.4 Disk Failures Intermittent Failures Organizing Data by Cylinders Stable Storage Error- Handling.
Database Recovery Zheng (Godric) Gu. Transaction Concept Storage Structure Failure Classification Log-Based Recovery Deferred Database Modification Immediate.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
CH 3 Deadlock When 2 (or more) processes remain blocked forever!
CS Introduction to Operating Systems
2.8 Error Detection and Correction
Chapter 9: Data Link Control
Error Correcting Code.
CS 554: Advanced Database System Notes 02: Hardware
Error Detection and Correction
Even/odd parity (1) Computers can sometimes make errors when they transmit data. Even/odd parity: is basic method for detecting if an odd number of bits.
O.S Lecture 13 Virtual Memory.
Chapter 7 Error Detection and Correction
Jonathan Walpole Computer Science Portland State University
Lecture 5- Data Link Layer
3.7 Stable (Robust) Systems
DATA COMMUNICATION Lecture-33.
CS 325: CS Hardware and Software Organization and Architecture
Error Detection and Correction
Disk Failures Disk failure ways and their mitigation
2.8 Error Detection and Correction
Chapter 9: Data Link Control
Presentation transcript:

Section Disk Failures Kevin Grant

Disk Failures – Common Problems Intermittent Failure Happens when an attempt to read or write to a sector is unsuccessful but with repeated attempts it is able to perform successfully. Media Decay Happens when bits of a particular sector becomes corrupted and makes it impossible to read that sector no matter how many repeated tries occur.

Disk Failures – Common Problems Write Failure Happens when an attempt to write to a sector is made but it is unsuccessful and the user can not even retrieve the previously written sector. One possible cause is a power outage during writing. Disk Crash Happens when the entire disk becomes unreadable, suddenly, and permanently.

Intermittent Failures Occurs if we try to read a sector but the correct content of that sector is not delivered to the disk controller. Usually will retry a certain limit or number of times such as 100 tries.

How to cut down on these problems?

Checksums Each sector has additional bits called the checksum. These bits are set depending on the values of the data bits in the sector. If on reading the checksum is different than then checksum of the data bits than an error occurred during reading. One form of a checksum is based on the parity of the bits in the sector (Example Next Slide)

Parity-based Checksum Examples If Sector is composed of bits  Odd number of 1s so the parity bit is 1 and we add it to the original bits to get If Sector is composed of bits  Even number of 1s so the parity bit is 0 and we add it to the original bits to get This method poses a problem as it uses only 1 bit for the checksum thus leaving a 50% chance errors go undetected.

Parity-based Checksum By keeping several bits as parity bits we can improve our chances to detect error. 8 bits of parity would mean 50% chance that errors go undetected for each bit. The total probability for not detecting would be.5^8 = 1/256 would go undetected As a general model using N parity bits as checksum results in 1/2^N probability the error is not detected.

Stable Storage Stable Storage is used to prevent problems that occur when you attempt to overwrite data and an error occurs in writing and you lose the old and new data of that sector. Stable Storage involves having a pair for each sector. So that given a sector X we have both a XL and a XR that are both copies of X. Reading policy usually will alternate which side it reads, XR or XL, assuming if a good read value is received than that side contains true X.

Stable Storage - Operation 1. Write value of X into XL 2. Check that the parity check bits are correct in the written copy. If not, attempt rewrite. 3. If write is still unsuccessful after a set number of retries then XL has a media failure and we must allocate other sector space for XL and perform these steps again 4. Perform steps 1-3 for XR