Error correcting codes

Slides:



Advertisements
Similar presentations
Optimal Lower Bounds for 2-Query Locally Decodable Linear Codes Kenji Obata.
Advertisements

Randomness Conductors Expander Graphs Randomness Extractors Condensers Universal Hash Functions
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Energy-Efficient Distributed Algorithms for Ad hoc Wireless Networks Gopal Pandurangan Department of Computer Science Purdue University.
Copyright © 2009 EMC Corporation. Do not Copy - All Rights Reserved.
1 A triple erasure Reed-Solomon code, and fast rebuilding Mark Manasse, Chandu Thekkath Microsoft Research - Silicon Valley Alice Silverberg Ohio State.
I/O Errors 1 Computer Organization II © McQuain RAID Redundant Array of Inexpensive (Independent) Disks – Use multiple smaller disks (c.f.
RAID A RRAYS Redundant Array of Inexpensive Discs.
Triple-Parity RAID and Beyond Hai Lu. RAID RAID, an acronym for redundant array of independent disks or also known as redundant array of inexpensive disks,
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Convolutional Codes Representation and Encoding  Many known codes can be modified by an extra code symbol or by deleting a symbol * Can create codes of.
1 S Digital Communication Systems Cyclic Codes.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
15-251: Great Theoretical Ideas in Computer Science Error Correction Lecture 17 October 23, 2014.
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
Cyclic Code.
CSCE430/830 Computer Architecture
Error Control Code.
Locally Decodable Codes from Nice Subsets of Finite Fields and Prime Factors of Mersenne Numbers Kiran Kedlaya Sergey Yekhanin MIT Microsoft Research.
Computer Networking Error Control Coding
15-853:Algorithms in the Real World
Information and Coding Theory
CHANNEL CODING REED SOLOMON CODES.
Enhancing Secrecy With Channel Knowledge
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Information Theory Introduction to Channel Coding Jalal Al Roumy.
1 Channel Coding in IEEE802.16e Student: Po-Sheng Wu Advisor: David W. Lin.
Cellular Communications
6/20/2015List Decoding Of RS Codes 1 Barak Pinhas ECC Seminar Tel-Aviv University.
2/28/03 1 The Virtues of Redundancy An Introduction to Error-Correcting Codes Paul H. Siegel Director, CMRR University of California, San Diego The Virtues.
15-853Page :Algorithms in the Real World Error Correcting Codes I – Overview – Hamming Codes – Linear Codes.
Hamming Codes 11/17/04. History In the late 1940’s Richard Hamming recognized that the further evolution of computers required greater reliability, in.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Analysis of Iterative Decoding
DIGITAL COMMUNICATION Error - Correction A.J. Han Vinck.
1 INF244 Textbook: Lin and Costello Lectures (Tu+Th ) covering roughly Chapter 1;Chapters 9-19? Weekly exercises: For your convenience Mandatory.
Review of modern noise proof coding methods D. Sc. Valeri V. Zolotarev.
Error correcting codes A practical problem of theoretical importance.
Channel Coding Part 1: Block Coding
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
1 SNS COLLEGE OF ENGINEERING Department of Electronics and Communication Engineering Subject: Digital communication Sem: V Cyclic Codes.
COEN 180 Erasure Correcting, Error Detecting, and Error Correcting Codes.
MIMO continued and Error Correction Code. 2 by 2 MIMO Now consider we have two transmitting antennas and two receiving antennas. A simple scheme called.
Error Control Code. Widely used in many areas, like communications, DVD, data storage… In communications, because of noise, you can never be sure that.
Coding Theory. 2 Communication System Channel encoder Source encoder Modulator Demodulator Channel Voice Image Data CRC encoder Interleaver Deinterleaver.
DIGITAL COMMUNICATIONS Linear Block Codes
David Wetherall Professor of Computer Science & Engineering Introduction to Computer Networks Error Detection (§3.2.2)
Channel Coding Binit Mohanty Ketan Rajawat. Recap…  Information is transmitted through channels (eg. Wires, optical fibres and even air)  Channels are.
Elementary Coding Theory Including Hamming and Reed-Solomom Codes with Maple and MATLAB Richard Klima Appalachian State University Boone, North Carolina.
1 Asymptotically good binary code with efficient encoding & Justesen code Tomer Levinboim Error Correcting Codes Seminar (2008)
Digital Communications I: Modulation and Coding Course Term Catharina Logothetis Lecture 9.
Turbo Codes. 2 A Need for Better Codes Designing a channel code is always a tradeoff between energy efficiency and bandwidth efficiency. Lower rate Codes.
Coding No. 1  Seattle Pacific University Digital Coding Kevin Bolding Electrical Engineering Seattle Pacific University.
Information Theory & Coding for Digital Communications Prof JA Ritcey EE 417 Source; Anderson Digital Transmission Engineering 2005.
Block Coded Modulation Tareq Elhabbash, Yousef Yazji, Mahmoud Amassi.
RS – Reed Solomon Error correcting code. Error-correcting codes are clever ways of representing data so that one can recover the original information.
Error correcting codes A practical problem of theoretical importance.
A Tale of Two Erasure Codes in HDFS
8 Coding Theory Discrete Mathematics: A Concept-based Approach.
296.3:Algorithms in the Real World
Sublinear-Time Error-Correction and Error-Detection
15-853:Algorithms in the Real World
Maximally Recoverable Local Reconstruction Codes
Local Error-Detection and Error-correction
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
RS – Reed Solomon List Decoding.
Information Redundancy Fault Tolerant Computing
RAID Redundant Array of Inexpensive (Independent) Disks
UNIT IV RAID.
Presentation transcript:

Error correcting codes A practical problem of theoretical importance

Claude Shannon (1916-2001) 1937. MSc, MIT. “As a 21-year-old master's student at MIT, he wrote a thesis demonstrating that electrical application of Boolean algebra could construct and resolve any logical, numerical relationship. It has been claimed that this was the most important master's thesis of "all time".” 1948 – paper inventing the field of information theory.

The Symmetric binary channel Alice wants to send Bob a binary string. Each transmitted bit is flipped with probability p<1/2 How many bits are needed to reliably transfer k bits of information? How much “information” is transferred in each bit?

Pictorially The scheme is useful if Prx[x'≠x]≤  y=y1..yn y'=y'1..y'n x {0,1}k x' =D(x) {0,1}k y = E(x)  {0,1}k The scheme is useful if Prx[x'≠x]≤  Shannon: There exists a useful scheme (with exponentially small error) transmitting k=(1-H(p))n bits using n communication bits.

Richard Hamming(1915-1998) “History Hamming worked at Bell Labs in the 1940s on the Bell Model V computer, an electromechanical relay-based machine with cycle times in seconds. Input was fed in on punch cards, which would invariably have read errors. During weekdays, special code would find errors and flash lights so the operators could correct the problem. During after-hours periods and on weekends, when there were no operators, the machine simply moved on to the next job. Hamming worked on weekends, and grew increasingly frustrated with having to restart his programs from scratch due to the unreliability of the card reader. Over the next few years he worked on the problem of error-correction, developing an increasingly powerful array of algorithms. In 1950 he published what is now known as Hamming Code, which remains in use today in applications such as ECC memory.” THE PURPOSE OF COMPUTING IS INSIGHT, NOT NUMBERS.

Hamming: Adversarial noise y' differs in at most δ n bits from y y=y1..yn x  {0,1}k x'=D(x)  {0,1}k y = E(x)  {0,1}k The scheme is useful if x,  adversary, x'=x

Rate and distance x ≠y, d(E(x),E(y)) ≥ d Definition: E is a (n,k,d) code, if E :k → n, and, x ≠y, d(E(x),E(y)) ≥ d length Information bits distance Relative rate=k/n Relative distance= d/n Lemma: If E is a (n,k,d) code, then the encoding can detect d-1 errors and correct (d-1)/2 errors.

Linear codes k-dimensional vector space over F, and has distance d. Definition: Let F be a field. E is a [n,k,d]F code, if E is a linear operator from Fk → Fn, and has distance d. Equivalently: C ⊆ Fn is an [n,k,d]F code if it is a k-dimensional vector space over F, and has distance d. Fact: dist(C)=min{c ∈ C} weight(c).

Examples

Repetition code Linear code Length: t Dimension: 1 Relative rate: 1/t Distance: t Relative distance: 1

One Parity bit Linear code Length: n+1 Dimension: n x1 x2 xn  x1 x2 xn xi Linear code Length: n+1 Dimension: n Relative rate: 1-1/(n+1) Distance: 2 Relative distance: 2/(n+1)

Hamming code [7,4,3]2 Encoding: x → Gx Decoding: compute s=Hy If the output it (000) - no error Otherwise – column index of s is the corrupted bit.

Reed-Solomon code Linear code Length: n=|F| Dimension: k Relative rate: r=k/n Distance: n-k+1 Relative distance: δ=1-r+1/n

The singleton bound Theorem: δ ≤ 1-r+1/n Equivalently: d ≤ n-k+1 Proof: Project all codewords on the first k-1 coordinates. Two codewords must have the same projection. Hence d≤ n-(k-1)

Here we assume q=|F| > d > m. Reed-Muller codes C={ f(x1),...,f(xN) | f:Fm  F, def(f) < d}, Fm={x1,...,xN} Here we assume q=|F| > d > m. Linear code Length: qm Dimension: (m+d choose m) Relative rate: ≈ (d/mq)m ≈0 Relative distance: 1-d/q

Hadamard (1865 – 1963)

C={ f∙x1,...,f∙xN | f ∈ F2m}, F2m={x1,...,xN} Hadamard code C={ f∙x1,...,f∙xN | f ∈ F2m}, F2m={x1,...,xN} Linear code Length: 2m Dimension: m Relative rate: m/2m ≈ 0 Relative distance:1/2

How good can Binary error correcting codes be?

The Gilbert Varshamov bound Claim: There exists a (non-linear) code C with length n, distance d, and, |Σ|n / |B(0;d-1)| codewords. Proof: On board. Asymptotic behavior for |Σ|=2, r ≥ 1- H() Same asymptotic for random linear codes.

The quest for asymptotically good binary ECC Concatenation, RS + HAD. (No) RS+RS+RS+OPT (Hmm...) Justesen – sophisticated concatenation (Yes) AG+HAD (yes) Expander codes Finding an explicit ECC (explicit encoding + decoding) approaching the GV bound is still a major open problem.

Can we do better than random? There is a gap between the best upper bound (obtained by a linear programming argument) and the GV bound. For q=p^2 a prime power, q≥49, there exists a better construction (even explicit). For a Fixed , growing q, GV: r = 1-  - O(1/log q) AG: r ≥ 1-  - 1/(sqrt(q)-1)

An Example F=ℤ13 Evaluation polynomials: Span{1,x,x2,x3,y,xy} Evaluation points: {(x,y) : y2-2(x-1)x(x+1)=0 } A linear [19,6,13]13 Compare with RS: [19,6,14]19 S={(0,0)(1,0),(2,  5),(3,  3),(4,  4),(6,  2),(7,  3),(9,  6),(10,  2),(11,  1)}

Efficient testing? Start with the generating matrix, Find the Parity-check matrix Compute the syndrom (should be 0)

What about efficient decoding? Plenty of algorithms for specific codes. A beautiful algorithm for decoding RS Elwin Berlekamp 1940- Madhu Sudan 1966-

Decoding Reed-Solomon codes Input: (x1,y1),...,(xn,yn) Promise: there exists a degree k polynomial p(x) such that p(xi)=yi at least 2n/3 times. Goal: Find p. Algorithm: Find a non-zero low degree Q:F2 →F such that i Q(xi)=yi. Factor Q. Check all factors y-f(x)

Pictorially Input: 13 points in the real Euclidean plane. Algorithm: Find low degree Q, and factor it.

Is it used in practice?

NASA spaceships Deep-space telecommunications NASA has used many different error correcting codes. For missions between 1969 and 1977 the Mariner spacecraft used a Reed-Muller code. The noise these spacecraft were subject to was well approximated by a "bell-curve" (normal distribution), so the Reed-Muller codes were well suited to the situation. The Voyager 1 & Voyager 2 spacecraft transmitted color pictures of Jupiter and Saturn in 1979 and 1980. Color image transmission required 3 times the amount of data, so the Golay (24,12,8) code was used.[citation needed][3] This Golay code is only 3-error correcting, but it could be transmitted at a much higher data rate. Voyager 2 went on to Uranus and Neptune and the code was switched to a concatenated Reed-Solomon code-Convolutional code for its substantially more powerful error correcting capabilities. Current DSN error correction is done with dedicated hardware. For some NASA deep space craft such as those in the Voyager program, Cassini-Huygens (Saturn), New Horizons (Pluto) and Deep Space 1—the use of hardware ECC may not be feasible for the full duration of the mission. The different kinds of deep space and orbital missions that are conducted suggest that trying to find a "one size fits all" error correction system will be an ongoing problem for some time to come.

Satellite communication “Satellite broadcasting (DVB) The demand for satellite transponder bandwidth continues to grow, fueled by the desire to deliver television (including new channels and High Definition TV) and IP data. Transponder availability and bandwidth constraints have limited this growth, because transponder capacity is determined by the selected modulation scheme and Forward error correction (FEC) rate. Overview QPSK coupled with traditional Reed Solomon and Viterbi codes have been used for nearly 20 years for the delivery of digital satellite TV. Higher order modulation schemes such as 8PSK, 16QAM and 32QAM have enabled the satellite industry to increase transponder efficiency by several orders of magnitude. This increase in the information rate in a transponder comes at the expense of an increase in the carrier power to meet the threshold requirement for existing antennas. Tests conducted using the latest chipsets demonstrate that the performance achieved by using Turbo Codes may be even lower than the 0.8 dB figure assumed in early designs.”

Data storage (erasure codes, systematic codes) “RAID 1 RAID 1 mirrors the contents of the disks, making a form of 1:1 ratio realtime backup. The contents of each disk in the array are identical to that of every other disk in the array. A RAID 1 array requires a minimum of two drives. RAID 1 mirrors, though during the writing process copy the data identically to both drives, would not be suitable as a permanent backup solution, as RAID technology by design allows for certain failures to take place. [edit] RAID 3/4 RAID 3 or 4 (striped disks with dedicated parity) combines three or more disks in a way that protects data against loss of any one disk. Fault tolerance is achieved by adding an extra disk to the array and dedicating it to storing parity information. The storage capacity of the array is reduced by one disk. A RAID 3 or 4 array requires a minimum of three drives: two to hold striped data, and a third drive to hold parity data. [edit] RAID 5 RAID 5 (striped disks with distributed parity) combines three or more disks in a way that protects data against the loss of any one disk. It is similar to RAID 3 but the parity is not stored on one dedicated drive, instead parity information is interspersed across the drive array. The storage capacity of the array is a function of the number of drives minus the space needed to store parity. The maximum number of drives that can fail in any RAID 5 configuration without losing data is only one. Losing two drives in a RAID 5 array is referred to as a "double fault" and results in data loss. [edit] RAID 6 RAID 6 (striped disks with dual parity) combines four or more disks in a way that protects data against loss of any two disks. [edit] RAID 10 RAID 1+0 (or 10) is a mirrored data set (RAID 1) which is then striped (RAID 0), hence the "1+0" name. A RAID 1+0 array requires a minimum of four drives: two mirrored drives to hold half of the striped data, plus another two mirrored for the other half of the data. In Linux MD RAID 10 is a non-nested RAID type like RAID 1, that only requires a minimum of two drives, and may give read performance on the level of RAID 0.”

Barcodes

And everywhere else “Reed–Solomon codes are used in a wide variety of commercial applications, most prominently in CDs, DVDs and Blu-ray Discs, in data transmission technologies such as DSL & WiMAX, in broadcast systems such as DVB and ATSC, and in computer applications such as RAID 6 systems.”

Are we done? Not at all. Q1: Can we handle errors when the number of errors is close to the distance? Q2: Can we decode a single bit more efficiently then the whole string? Q3: Are ECC useful for tasks other than error correction? (E.g., for propagating entropy ???)

Johnson's bound Observation: No unique decoding is possible when the number of errors is above half the distance Johnson's bound: Let C be a code with relative distance δ. Then for any α > 2 sqrt{1-δ}  w∈{0,1}n, | {c ∈ C : Ag(w,c) ≥ αn| ≤ O(1/α).

List decoding vs. Stochastic noise Def: A code C list decodes p noise, if  w∈{0,1}n, | { c ∈ C : Ag(w,c) ≥ (1-p)n } | ≤ poly(n). Claim: A code C that can list decode p noise, has rate r<1-H(p) Claim: For any ε>0, there exists a code C that can list decode p noise with r>1-H(p)+ε rate.

List decode RS? The RS decoding algorithm we saw, list decodes RS close to the Johnson's bound. An improvement of Guruswami-Sudan matches the Johnson's bound. r=k/n, needs sqrt{k/n} agreement, r= (1-p)2. It is not known whether one can list decode RS better.

PV - Bundled RS Fq a field. E irreducible, deg(E)=n. F=Fq[X] mod E(X) an extension field. Given f ∈F, compute fi=fh^i∈F , for i=1..m. For every x∈ Fq output (f1(x),..,fm(x)) Farzad Parvaresh was born in Isfahan, Iran, in 1978.He received the B.S. degree in electrical engineering from the Sharif University of Technology, Tehran, Iran, in 2001, and the M.S. and Ph.D. degrees in electrical and computer engineering form the University of California, San Diego, in 2003 and 2007, respectively. He is currently a post doctoral scholar at the Center for Mathematics of Information, California Institute of Technology, Pasadena, CA. His research interests include error-correcting codes, algebraic decoding algorithms, information theory, networks, and fun math problems. Dr. Parvaresh was a recipient of the best paper award from the 46th Annual IEEE symposium on foundations of computer science (FOCS'05).

GR - Folded RS Fq a field. g generates Fq*. Given f ∈F, output a word in (Fqc)(q/c) by chunking the output (f(g),f(g2),..,f(gq-1)) to length c blocks. Rate vs. noise: For any ε>0, r=1-p+ε with field size q=function(ε).

Local decoding Up to now, decoding meant taking the noisy word and recovering the message from it. x=D(y). Suppose we are only interested in one bit of x. Can we recover xi with less queries to y. Note: we are still with an adversarial noise model.

2-Local decoding Hadamard Setting: y=Had(X), y'=y  noise, with up to ⅕ noise. Goal: recover xi from y', i ∈ [n]. Algorithm: Choose z ∈{0,1}n at random. Output: y(z)  y(z  ei).

Efremenko code Lemma: All 2-locally decodable codes have exponential length. Yekhanin gave the first sub-exponential 3-locally decodable code. Efremenko gave 3-locally decodable codes of length ≈ 22^sqrt{log n} Even the non-explicit bound is still wide open.

What's next PCP heavily uses local-testing Derandomization heavily uses local list decoding. Randomness extractors are codes with good list recovery decoding. Many modern extractors are codes in disguise. Intimate connection between randomness extractors and pseudo random generators. And much more...

Summary Rich (and deep) theory Practical applications A very Basic Theoretical notion. Intimately related to: Randomness extractors Pseudo-randomness Derandomization PCP, and more.

Many open problems Is the GV bound tight for binary codes? Find efficient codes meeting the GV bound. Do asymptotically good locally testable codes exist? Is efficient local decoding with a constant number of queries possible?

A project Reading the paper Nearly-linear size holographic proofs By Polishchuk and Spielman On low-degree testing of bi-variate polynomials And explaining it to me.