Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oct. 2006 Error Correction Slide 1 Fault-Tolerant Computing Dealing with Mid-Level Impairments.

Similar presentations


Presentation on theme: "Oct. 2006 Error Correction Slide 1 Fault-Tolerant Computing Dealing with Mid-Level Impairments."— Presentation transcript:

1 Oct. 2006 Error Correction Slide 1 Fault-Tolerant Computing Dealing with Mid-Level Impairments

2 Oct. 2006 Error Correction Slide 2 About This Presentation EditionReleasedRevised FirstOct. 2006 This presentation has been prepared for the graduate course ECE 257A (Fault-Tolerant Computing) by Behrooz Parhami, Professor of Electrical and Computer Engineering at University of California, Santa Barbara. The material contained herein can be used freely in classroom teaching or any other educational setting. Unauthorized uses are prohibited. © Behrooz Parhami

3 Oct. 2006 Error Correction Slide 3 Error Correction

4 Oct. 2006 Error Correction Slide 4

5 Oct. 2006 Error Correction Slide 5 Multilevel Model Component Logic Service Result Information System Legend: Tolerance Entry Last lecture Today

6 Oct. 2006 Error Correction Slide 6 High-Redundancy Codes Triplication is a form of error coding: x represented as xxx (200% redundancy) Corrects any error in one version Detects two nonsimultaneous errors With a larger replication factor, more errors can be tolerated EncodingDecoding f(x)f(x) xy f(x)f(x) f(x)f(x) If we triplicate the voter to obtain 3 results, we are essentially performing the operation f(x) on coded inputs, getting coded outputs V Our challenge today is to come up with strong correction capabilities, using much lower redundancy (perhaps an order of magnitude less) To correct all single-bit errors in an n-bit code, we must have 2 r > n, or 2 r > k + r, which leads to about log 2 k check bits, at least

7 Oct. 2006 Error Correction Slide 7 Error-Correcting Codes: Idea A conceptually simple error-correcting code: Arrange the k data bits into a k 1/2  k 1/2 square array Attach an even parity bit to each row and column of the array Row/Column check bit = XOR of all row/column data bits Data space: All 2 k possible k-bit words Redundancy: 2k 1/2 + 1 checkbits for k data bits Corrects all single-bit errors (lead to distinct noncodewords) Detects all double-bit errors (some triples go undetected) Encoding Decoding Data words Codewords Noncodewords Errors Data space Code space Error space 0 1 1 0 0 1 1 0 1 0 0 1 0 To be avoided at all cost

8 Oct. 2006 Error Correction Slide 8 Error-Correcting Codes: Evaluation Redundancy: k data bits encoded in n = k + r bits (r redundant bits) Encoding: Complexity (cost / time) to form codeword from data word Decoding: Complexity (cost / time) to obtain data word from codeword Capability: Classes of error that can be corrected Greater correction capability generally involves more redundancy To correct c bit-errors, a minimum code distance of 2c + 1 is required Combined error correction/detection capability: To correct c errors and additionally detect d errors (d > c), a minimum code distance of c + d + 1 is required Example: Hamming SEC/DED code has a code distance of 4 Examples of code correction capabilities: Single, double, byte, b-bit burst, unidirectional,... errors

9 Oct. 2006 Error Correction Slide 9 Hamming Distance for Error Correction Red dots represent codewords Yellow dots, noncodewords within distance 1 of codewords, represent correctable errors Blue dot, within distance 2 of three different codewords represents a detectable error Not all “double errors” are correctable, however, because there are points within distance 2 of some codewords that are also within distance 1 of another The following visualization, though not completely accurate, is still useful

10 Oct. 2006 Error Correction Slide 10 A Hamming SEC Code d 3 d 2 d 1 d 0 p 2 p 1 p 0 Data bits Parity bits Uses multiple parity bits, each applied to a different subset of data bits Encoding: 3 XOR networks to form parity bits Checking: 3 XOR networks to verify parities Decoding: Trivial (separable code) Redundancy: 3 check bits for 4 data bits Unimpressive, but gets better with more data bits (7, 4); (15, 11); (31, 26); (63, 57); (127, 120) Capability: Corrects any single-bit error s 2 = d 3  d 2  d 1  p 2 s 1 = d 3  d 1  d 0  p 1 s 0 = d 2  d 1  d 0  p 0 s 2 s 1 s 0 Error 0 0 0None 0 0 1 p0p0 0 1 0 p1p1 0 1 1 d0d0 1 0 0 p2p2 1 0 1 d2d2 1 1 0 d3d3 1 1 1 d1d1 s 2 s 1 s 0 Syndrome

11 Oct. 2006 Error Correction Slide 11 Matrix Formulation of Hamming SEC Code d 3 d 2 d 1 d 0 p 2 p 1 p 0 Data bits Parity bits d 3 d 2 d 1 d 0 p 2 p 1 p 0 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 1 s 2 s 1 s 0 Error 0 0 0None 0 0 1 p0p0 0 1 0 p1p1 0 1 1 d0d0 1 0 0 p2p2 1 0 1 d2d2 1 1 0 d3d3 1 1 1 d1d1 Parity check matrix d3d2d1d0p2p1p0d3d2d1d0p2p1p0 s2s1s0s2s1s0  = Data bits Parity bits SyndromeReceived word Syndrome matches the p 2 column in the parity check matrix Matrix-vector multiplication is done with AND/XOR, instead of  /+

12 Oct. 2006 Error Correction Slide 12 Matrix Rearrangement for Simpler Correction p 0 p 1 d 0 p 2 d 2 d 3 d 1 Data and parity bits p 0 p 1 d 0 p 2 d 2 d 3 d 1 0 0 0 1 1 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 s 2 s 1 s 0 Error 0 0 0None 0 0 1 p0p0 0 1 0 p1p1 0 1 1 d0d0 1 0 0 p2p2 1 0 1 d2d2 1 1 0 d3d3 1 1 1 d1d1 p0p1d0p2d2d3d1p0p1d0p2d2d3d1 s2s1s0s2s1s0  = Data and parity bits Syndrome indicates error in position 4 1 2 3 4 5 6 7 Position number s2s1s0s2s1s0 Decoder 01-7 Data and parity bits Corrected version 1-7

13 Oct. 2006 Error Correction Slide 13 Hamming Generator Matrix d 3 d 2 d 1 d 0 p 2 p 1 p 0 Data bits Parity bits d 3 d 2 d 1 d 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 1 0 1 0 1 1 0 1 1 1 Generator matrix d3d2d1d0d3d2d1d0  = CodewordData word d3d2d1d0p2p1p0d3d2d1d0p2p1p0 Recall that matrix-vector multiplication is done with AND/XOR, instead of  /+ Data bits

14 Oct. 2006 Error Correction Slide 14 Generalization to Wider Hamming SEC Codes p 0 p 1 d 0 p 2... 0 0 0... 1 1 1 : : :... : : : 0 1 1... 0 1 1 1 0 1... 1 0 1 p 0 p 1 d 0 p 2. sr1 :s1s0sr1 :s1s0  = Data and parity bits 1 2 3 2 r –1 Position number s r  1 : s 1 s 0 Decoder 2r12r1 Data and parity bits Corrected version 2r12r1 2r12r1 nk = n – r 74 1511 3126 6357 127120 255247 511502 10231013 Condition for general Hamming SEC code: n = k + r = 2 r – 1

15 Oct. 2006 Error Correction Slide 15 1 1 1... 1 1 1 1 0 : 0 prpr srsr A Hamming SEC/DED Code p 0 p 1 d 0 p 2... 0 0 0... 1 1 1 : : :... : : : 0 1 1... 0 1 1 1 0 1... 1 0 1 p 0 p 1 d 0 p 2. sr1 :s1s0sr1 :s1s0  = Data and parity bits 1 2 3 2 r –1 Position number Add an extra row of all 1s and a column with only one 1 to the parity check matrix Parity check matrixSyndrome Received word s r  1 : s 1 s 0 Decoder 2r12r1 Data and parity bits Corrected version 2r12r1 2r12r1 srsr

16 Oct. 2006 Error Correction Slide 16 Some Other Useful Codes BCH codes: Named in honor of Bose, Chaudhuri, Hocquenghem Reed-Solomon codes: Special case of BCH code Example: A popular variant is RS(255, 223) with 8-bit symbols 223 bytes of data, 32 check bytes, redundancy  14% Can correct errors in up to 16 bytes anywhere in the 255-byte codeword Used in CD players, digital audio tape, digital television Hamming codes are examples of linear codes Linear codes may be defined in many other ways Reed-Muller codes: Have a recursive construction, with smaller codes used to build larger ones

17 Oct. 2006 Error Correction Slide 17 Arithmetic Error-Correcting Codes –––––––––––––––––––––––––––––––––––––––– Positive Syndrome Negative Syndrome error mod 7 mod 15 –––––––––––––––––––––––––––––––––––––––– 111–1614 222–2513 444–4311 818–867 1621–16514 3242–32313 6414–64611 12828–12857 25641–256314 51212–512613 102424–1024511 204848–204837 –––––––––––––––––––––––––––––––––––––––– 409611–4096614 819222–8192513 16,38444–16,384311 32,76818–32,76867 –––––––––––––––––––––––––––––––––––––––– Error syndromes for weight-1 arithmetic errors in the (7, 15) biresidue code Because all the syndromes in this table are different, any weight-1 arithmetic error is correctable by the (mod 7, mod 15) biresidue code

18 Oct. 2006 Error Correction Slide 18 Properties of Biresidue Codes Biresidue code with relatively prime low-cost check moduli A = 2 a – 1 and B = 2 b – 1 supports a  b bits of data for weight-1 error correction Representational redundancy = (a + b)/(ab) = 1/a + 1/b nk 74 1511 3126 6357 127120 255247 511502 10231013 Compare with Hamming SEC code abn=k+a+bk=ab 341912 564130 787156 1112143120 1516271240

19 Oct. 2006 Error Correction Slide 19 Arithmetic on Biresidue-Coded Operands Similar to residue-checked arithmetic for addition and multiplication, except that two residues are involved Divide/square-root: remains difficult Arithmetic processor with biresidue checking. D(z) D(x) D(y) s B s

20 Oct. 2006 Error Correction Slide 20 Higher-Level Error Coding Methods We have applied coding to data at the bit-string or word level It is also possible to apply coding at higher levels Data structure level – Robust data structures Application level – Algorithm-based error tolerance

21 Oct. 2006 Error Correction Slide 21 Preview of Algorithm-Based Error Tolerance 2161 5344 3274 M r = 2161 5344 3274 2611 M f = 216 534 327 M = 216 534 327 261 M c = Matrix MRow checksum matrix Column checksum matrixFull checksum matrix Error coding applied to data structures, rather than at the level of atomic data elements Example: mod-8 checksums used for matrices If Z = X  Y then Z f = X c  Y r In M f, any single error is correctable and any 3 errors are detectable Four errors may go undetected


Download ppt "Oct. 2006 Error Correction Slide 1 Fault-Tolerant Computing Dealing with Mid-Level Impairments."

Similar presentations


Ads by Google