# Coding Theory and its Applications 編碼及其應用 Hung-Lin Fu Dept. of Applied Mathematics National Chiao Tung University Hsin Chu, Taiwan.

## Presentation on theme: "Coding Theory and its Applications 編碼及其應用 Hung-Lin Fu Dept. of Applied Mathematics National Chiao Tung University Hsin Chu, Taiwan."— Presentation transcript:

Coding Theory and its Applications 編碼及其應用 Hung-Lin Fu Dept. of Applied Mathematics National Chiao Tung University Hsin Chu, Taiwan

Basic Ideas  Messages Transmission  Correctness and Security  Save time and expense  Security Study is the main job of Cryptography  Coding Theory not only deals with the correctness of transmission but also the quickness of transmission.

The flow of Transmission Message Encode Modulation Demodulation Decode Original Message Through Noisy Channel

Examples  Grades A, B, C, and D  Use digits 0 and 1 to encode  A : 00 B : 01 C : 10 D : 11  Send A 00

Receiving  Following demodulation and decoding  We expect to receive the original message A.  Unfortunately, it is possible to make errors due to the “ noise ”.

Probability of Errors  Let p denote the error probability of sending “ 0 ” and receiving “ 1 ”.  In a “ symmetric channel ”, sending “ 1 ” and receiving “ 0 ” also has error probability p.  If t digits are transmitted, then the probability of making s errors is C(t,s)p s (1-p) (t-s).  The probability of making errors is C(t,1)p 1 (1-p) t-1 + C(t,2)p 2 (1-p) t-2 + … + p t.

1 0 1 0 (1-p) pp Symmetric Channel

It happens!  Let p = 0.01.  It looks small. But, in fact, this is a very large number if we consider a transmission of real world. Million digits are transmitted in a minute. So, we have error digits about 10,000 in a minute.  Therefore, if we use 00, 01, 10, and 11 for A, B, C, and D, then errors in transmitting words occur! The probability of making errors(words) is 2x(0.01)x(0.99) + (0.01) 2 = 0.0199.

An Improvement Parity check digits 00 000 01 011 10 101 11 110  The probability of making errors “ without noticing ” is smaller!  C(3,2)x(0.01) 2 x(0.99) + (0.01) 3 = 0.000298.  We can add more digits instead of just one.

Error Correction  When an error occurs, we may not be able to know where is the error digit. So, “ ask for retransmission ”.  Retransmission is not always possible.

The Idea of Correcting Errors  00 000000  01 010101  10 101010  11 111111  Assume that 101110 is received. We shall conclude that the message sent is 101010!

Hamming Distance  The message we send can be expressed as an n-dimension vector over the finite field GF(2) if the message has n digits.  E.g. 010101 (1,0,1,0,1,0)  Let GF(2) = K.  K n is a set of 2 n vectors.

A New Metric  Let (a 1,a 2, …, a n ) and (b 1,b 2, …, b n ) be two vectors of K n. Then the Hamming distance of the two vectors is the number of k ’ s such that a k – b k is not equal to 0, k = 1, 2, …, n.  E.g. d(101010,101110) = 1 d(000000,101110) = 4 d(111111,101110) = 2 d(010101,101110) = 5  Hamming distance is a “ metric ” !

Distance and Decoding  If the distance of two words u and v of length n is d, then the probability of sending u and receiving v is p d (1-p) n-d. Fact: If d(w,u) > d(v,u) and u is received, then v is more probable than w as a sending word. e.g. Let 000000, 010101, 101010, and 111111 be the four possible sending words and 101110 is received. Then we choose 101010 as the sending word.

Maximum Likelihood Decoding  Let C be the code we use for transmission and u be the word which is received through the channel.  CMLD(Complete Maximum Likelihood Decoding): If v satisfies that d(v,u) is minimum for all codewords in C, then we conclude that v is the transmitted codeword no matter v is unique or not.  IMLD(Incomplete MLD): If v(as above) is not unique, then ask for retransmission.

Linear Codes  A code of length n is a subset of K n.  A linear code of length n is a linear subspace of K n. (The sum of two vectors is taken under addition of K for each coordinate.)  A linear (n,k,d)-code is a linear code with dimension k and distance d where d is the minimum distance between two distinct vectors of the linear code.

Weights of Codewords  Each vector of a code is called a codeword.  The weight of a codeword is the number of 1 ’ s in the codeword.  E.g. wt(101011) = 4. Proposition. The distance of a linear code is equal to the minimum weight of a non-zero codeword.

Main Theorem Theorem. A code with distance d can detect d-1 errors and correct [(d-1)/2] errors. Proof. If u and w are two codewords of the code C and d(v,w) d(v,w). u v w

Better Codes  The length of a codeword determines the “ time ” of transmission.  The dimension of a linear code shows the information rate k/n.  The distance of a code tells you how many errors which can be detected (or corrected).  The bits which are not information bits are parity check bits. (n-k)  A(n,d) is the maximum number of words of length n such that the distance between two words is at least d. A code C is (n,d)- optimal if C has A(n,d) codewords. (A[n,d] for linear codes.)

The most Important Problem in Coding Theory  Given two positive integers n and d where d < n, determine A(n,d) and A[n,d].  A(7,3) <= 2 7 / (1+7) = 16 (Sphere packing bound).  A(7,3) = 16. (By direct constructions.)

Two Constructions  Use a Steiner triple system of order 7. {1,2,4}, {2,3,5}, {3,4,6}, {4,5,7}, {5,6,1}, {6,7,2}, {7,1,3}. 1101000 0010111 0000000 0110100 1001011 1111111 0011010 1100101 0001101 1110010 1000110 0111001 0100011 1011100 1010001 0101110

Parity Check Matrix  The code we plan to construct is a linear code of dimension 4.  By using a 7x3 matrix H of rank 3, we conclude that the set of vectors v satisfies vH = 0 form a linear subspace of K 7 with dimension 4. 0 0 1 0 1 1 1  Let H t = 0 1 0 1 0 1 1 1 0 1 1 1 0 1

BCH Codes  BCH represents Bose, Chaudhuri and Hocquengham.  The code we just construct is a 1-error correcting BCH code.  Since no two rows (vectors) are the same, a nonzero vector v satisfies vH = 0 has weight at least 3. Hence the distance of the code is 3 (there are 3 rows which are dependent).  The rows of H can be considered as the set of all non-zero elements of GF(2 3 ).

A different Point of View  K n can be viewed as the set of all polynomials of degree at most n-1 with coefficients in K.  Let R n = K[x]/(x n +1) (x n = 1). Then R n with polynomial addition and multiplication is a ring.  If f(x) is a divisor of x n +1, then the set of all multiples of f(x) is a linear (cyclic) code of dimension n – deg(f(x)).

Quiz  Consider R 7.  x 7 +1 = (1+x)(1+x+x 3 )(1+x 2 +x 3 ) (?) (Hint: 1 = -1, (1+x) 2 = 1+x 2.)  The set of all polynomials in R 7 which are multiples of 1+x+x 3 forms a linear code with 16 codewords. This is “ essentially the same ” code as constructed above.

Reed-Solomen Codes  Instead of using K = GF(2), we shall use K = GF(q) where q is a prime power. (It is well known that a finite field of order q exists.) So, the codewords are vectors with coordinates from GF(q). The one used in CD is letting q = 2 8.  An RS(2 r,d)-code is a linear cyclic (2 r -1,2 r -d,d)-code over GF(q) generated by (x+b m+1 )(x+b m+2 ) … (x+b m+d-1 ) where q = 2 r, m is a nonnegative integer and b is a primitive element of GF(q).

Design of Compact Discs (Key Contributions)  1948, C.E. Shannon publishes “ A mathematical theory of communication.  1950, R.W. Hamming publishes “ Information about error detection/correction codes.  1958, Invention of laser.  1960, Start experiments of computer music.

Story- Continued  1960, I.S. Reed and G. Solomen constructed Reed-Solomen codes.  1969, Klass Copaan, a Dutch physicist comes up with the idea for compact disc.  1970, Klass complete a glass disc prototype and decide to use laser.  1978, Philips releases the video disc player and type of laser selected for CD players.  1980, CD standard proposed by Philips and Sony.  1982, Philips and Sony both have products ready to go.

Keep Going  1983, 30,000 CD players sold in U.S. and 800,000 CD ’ s sold in U.S.  1984, Portable CD players (Sony DiscMan) sold.  1985, CD-ROM drives hit the computer market.  1990, 9.2 millions players sold in U.S. only and about one billion CD ’ s sold in the world.  1997, DVD released. DVD players/movies hit consumer market.  Now, we can not live without it.

A Brief Overview  Data storage in CD format is not simple. Typically, a user pictures the "1 ’ s" and "0 ’ s" in the memory of the computer as being directly transferred to "pits" and "bumps" on the CD disk.  To begin with the incoming data is subjected to a series of coding operations. These coding operations add a number of additional parity bits to the data for error detection and correction purposes. The data is also subject to an interleaving process.

Concealment( 隱藏 )  Interpolation( 添寫 ): In this technique, some “ average ” is constructed using the valid data around an error. This average is then substituted in for the erroneous data. Since most music (with the possible exception of heavy metal!) is continuous -- this method works well for concealing relatively short errors.  Muting( 消音 ): Muting is a last ditch technique -- as it effectively creates a brief period of silence in the audio train. However, it is not effective to simply set all the binary digits to zero --as this produces exactly the click that we are trying to avoid! Instead, the volume is faded out( 淡出 ) and then back in again to conceal the error.

Error-Correcting Ability  CD players use parity and interleaving techniques to minimize the effects of an error on the disk. Theoretically, the combination of parity and interleaving in a CD player can detect and correct a burst error of up to 4000 bad bits -- or a physical defect 2.47 mm long. Interpolation can conceal errors up to 13,700 or physical defects up to 8.5 mm long. (Burst-error- correcting codes)

EFM modulation  EFM means Eight to Fourteen Modulation and is an incredibly clever way of reducing errors. The idea is to minimize the number of 0 to 1 and 1 to 0 transitions( 臨時轉調 )-- thus avoiding small pits. In EFM only those combinations of bits are used in which more than two but less than 10 zeros appear continuously.  E.g. 0000 1010 EFM 10010001000000.

Figure 2

pits

Figure 4

Encoding  The original musical signal is a waveform in time. A sample of this waveform in time is taken and "digitized" into two 16-bit words, one for the left channel and one for the right channel.  For example, a single sample of the musical signal might look like: L1 = 0111 0000 1010 1000 R1 = 1100 0111 1010 1000  Six samples (six of the left and six of the right for a total of twelve) are taken to form a frame such as L1 R1 L2 R2 L3 R3 L4 R4 L5 R5 L6 R6.

Sound has 2 16 Levels  The frame is then encoded in the form of 8- bit words. Each 16-bit audio signal turns into two 8-bit words, such as L1 左 L1 右 R1 左 R1 右 L2 左 L2 右 R2 左 R2 右 L3 左 L3 右 R3 左 R3 右 L4 左 L4 右 R4 左 R4 右 L5 左 L5 右 R5 左 R5 右 L6 左 L6 右 R6 左 R6 右 This gives a grand total of 24 8-bit words. ((L,R) produces stereo effects and one second has 44,100 ticks.)  The even words are then delayed by two blocks and the resulting "word" scrambled.  This delay and scramble is the first part of the interleaving process.

RS codes Show Up!  Encoded by C(227):(28,24,5)-RS: The resulting 24 byte word (remember, it has an included two block delay -- so some symbols in this word are from blocks two blocks behind) has 4 bytes of parity added. This particular parity is called "Q" parity. Parity errors found in this part of the algorithm are called C1 errors. More on the Q parity later.  4-frame delay interleaved: Now, the resulting 24 + 4Q = 28 bytes word is interleaved. Each of the 28 bytes is delayed by a different period. Each period is an integral multiple of 4 blocks. So the first byte might be delayed by 4 blocks, the second by 8 blocks, the third by 12 blocks and so on. The interleaving spreads the word over a total of 28 x 4 = 112 blocks

Another RS code  Encoded by C(223):(32,28,5)-RS: The resulting 28 byte words are again subjected to a parity operation. This generates four more parity bytes called P bytes which are placed at the end of the 28 bit data word. The word is now a total of 28 + 4 = 32 bytes long. Parity errors found in this part of the algorithm are called C2 errors.  Finally, another odd-even delay is performed -- but this time delay by just a single block. Both the P and Q parity bits are inverted (turning the "1 ’ s" into "0 ’ s") to assist data readout during muting.

EFM  A subcode of length 8 is then added to the front end of the word. The subcode specifies the total number of selections on the disk, their length, and so on.  Next, the data-words are converted to EFM format. EFM means Eight to Fourteen Modulation and is an incredibly clever way of reducing errors. The idea is to minimize the number of 0 to 1 and 1 to 0 transitions -- thus avoiding small pits. In EFM only those combinations of bits are used in which more than two but less than 10 zeros appear continuously.

Encode the Sound Each frame finally has a 24-bit synchronization word attached to the very front end -- (just for completeness the word is (100000000001000000000010) and each group of 14 symbols is then coupled by three merged bits. SO! The final frame (which started at 6*16*2 = 192 data bits) now contains: 1 sync word 24 bits 1 subcode signal 14 bits 6*2*2*14 data bits 336 bits (14 comes from 8) 8*14 parity bits 112 bits 34*3 merge bits 102 bits GRAND TOTAL 588 bits. Music:

Final Words  多運動, 身體好 !  多唸數學, 頭腦好 ! You are lucky!

Download ppt "Coding Theory and its Applications 編碼及其應用 Hung-Lin Fu Dept. of Applied Mathematics National Chiao Tung University Hsin Chu, Taiwan."

Similar presentations