Error detection and correction Techniques to Increase the Reliability Copyright © Curt Hill 2005-2012
The Story Starts with Parity The original error detection scheme Originally used on Teletypes in 1930s Transmitting data over telephone/telegraph lines Add one bit to check if rest of data was correct Next used for tape drives in 1950s Nine track tapes had eight data and one parity Before that were seven and eight track tapes Copyright © Curt Hill 2005-2012
How Does Parity Work? Parity was set to even or odd The data bits were summed Using even parity the parity bit was set to make the sum even If data is: 0 1 1 0 1 0 1 0 set the parity bit to 0 It is already even If data is: 1 1 1 0 0 1 0 1 set the parity bit to 1 To make it even Copyright © Curt Hill 2005-2012
Example Consider the following data: 01101100 01110110 Even parity: 011011000 Odd parity: 011011001 01110110 Even parity: 011101101 Odd parity: 011101100 Copyright © Curt Hill 2005-2012
Options How many bits were being transmitted in a character? How fast were they being transmitted? What parity was being used? How many start and stop bits? The answers to all these questions constitute a Protocol The protocol did not matter as both the sender and receiver agreed on the same one Copyright © Curt Hill 2005-2012
Parity checking It is easy to check and generate If there is a one bit error it will detect it but give you no clue where it is An even number of bits error is undetectable, but odd number of bits error is detectable The number of data bits is irrelevant A single parity bit can be put on 8 bits or 50 However, as the number of bits gets larger the protection gets smaller Copyright © Curt Hill 2005-2012
Semiconductor Memory Many machines also used parity to check their memory The original IBM PC and many successors used 9 bits to store an 8 bit byte If a parity error was detected the machine halted with a parity error Even that generation was not as reliable as hoped Copyright © Curt Hill 2005-2012
Is this the best we can do? If we employ multiple error detection bits we can detect multiple bit errors and correct single bit errors How do we correct an error? Since each bit may only have two states 0 or 1 all we have to know is which bit is bad Correct it by reversing it How do we find location of the error? Copyright © Curt Hill 2005-2012
Basic Scheme Four data bits (0-3) Three syndrome bits (a-c) Each parity bit protects just three of the four A protects 0-2 with parity B protects 1-3 with parity C protects 0,1,3 with parity Each bit is protected by two or three of the parity bits The number of parity bits indicate which bit was in error Copyright © Curt Hill 2005-2012
Four data and three parity b 1 2 3 Error Parity 1 a, b 2 a, b, c 3 a, c 4 b, c a a b b c c 4 c Copyright © Curt Hill 2005-2012
More Single error bit correction, by flipping the bit that is indicated It takes one more bit to detect two bit errors Otherwise it will be mistaken for a one bit error and corrected The three bits are collectively called the syndrome This scheme uses three error bits for four data bits and is somewhat wasteful Copyright © Curt Hill 2005-2012
Who is responsible? Richard Hamming developed the first error correction codes Known as Hamming codes Worked at Bell Labs Was concerned about transmitting digital data at high speeds Telephone calls were often digitized by this time Parity is sufficient for telegraph which is relative low speed, but not telephone Copyright © Curt Hill 2005-2012
Error Correction In order to get error correction we need: 2K-1>M+K where M is the number of data bits and K is the number of check (error correction) bits For four data bits we need four syndrome bits to get error correction 24-1 = 15 15 > 4 + 4 Copyright © Curt Hill 2005-2012
Observations Notice the formula 2K-1>M+K Exponentials grow rather faster than sums Thus adding one to the number of syndrome bits doubles the number of protected bits Large word sizes are proportionally easier to protect than small Copyright © Curt Hill 2005-2012
Summary The number of bits needed is summarized by this table: Data bits SEC % diff SEC/DED 8 4 50% 5 62.5% 16 31.3% 6 37.5% 32 18.8% 7 18.75% 64 10.9% 12.5% Copyright © Curt Hill 2005-2012
Layout Lets try eight data bits and four bits of error correction The four bits can generate 16 values 0-15 We want value zero to represent no error If the four bit value contains a single one bit that will indicate that the error is in the check bits and thus no correction is needed If the four bit value contains more than one bit then we want this number to tell us the bit that is off Copyright © Curt Hill 2005-2012
Layout Picture Each check bit guards the bits that have that bit as a position The C8 checks every bit which has a 1 in the 8s bit 1 1 0 0 M8 1 0 1 1 M7 1 0 1 0 M6 09 1 0 0 1 M5 08 1 0 0 0 C8 07 0 1 1 1 M4 06 0 1 1 0 M3 05 0 1 0 1 M2 04 0 1 0 0 C4 03 0 0 1 1 M1 02 0 0 1 0 C2 01 0 0 0 1 C1 Copyright © Curt Hill 2005-2012
Computing Check Bits Each check bit is the even parity of four or five bits They are calculated as follows The computed syndrome bits are compared with the received syndrome and should give the bit number Copyright © Curt Hill 2005-2012
An Example Consider the data bits: 0 1 0 0 1 1 1 0 Locate 0 1 0 0 c 1 1 1 c 0 c c Even parity for C8 bit – 1 0 1 0 0 1 1 1 1 c 0 c c Even parity for C4 bit – 1 0 1 0 0 1 1 1 1 1 0 c c Even parity for C2 bit – 1 0 1 0 0 1 1 1 1 1 0 1 c Even parity for C1 bit – 1 0 1 0 0 1 1 1 1 1 0 1 1 Copyright © Curt Hill 2005-2012
Suppose An Error The original word now come back as: 0 1 0 1 1 1 1 1 1 0 1 1 Compute the new check bits C8 = 0 C4 = 1 C2 = 1 C1 = 0 C8 and C1 disagree Sum them, the error is in position 9, which is data bit M5 Copyright © Curt Hill 2005-2012
Some examples of use IBM 30xx use an 8bit SEC-DED for each 64 bits, hence they have 12% overhead DEC VAX uses a 7 bit SEC-DED for each 32 bits, hence they have a 22% overhead Some versions of RAID also use this to guarantee recoverability in the case of disk errors Copyright © Curt Hill 2005-2012
Addendum Recently these concepts have been applied to internet/wireless data packets When a mobile phone drops a packet it must ask for it again This greatly reduces perceived bandwidth A study in Boston showed that 3% of packets for mobile phones were lost Packet loss on a fast moving train is typically 5% http://www.technologyreview.com/news/429722/a-bandwidth-breakthrough/ Copyright © Curt Hill 2005-2012
Packets The solution is to put error correction codes for nearby packets in adjacent packets Known as coded TCP In a study doing this with 2% packet loss boosted apparent bandwidth from 1 to 16 Mbit If loss rates are low this does not help but losses in wireless networks are always present Copyright © Curt Hill 2005-2012