1 ELEC692 VLSI Signal Processing Architecture Lecture 10 Viterbi Decoder Architecture.

1 ELEC692 VLSI Signal Processing Architecture Lecture 10 Viterbi Decoder Architecture

2 Outline Convolutional Code Structure –Encoder Structure –Finite state machine representation –Trellis diagram Decoding Algorithm –Viterbi Decoder Viterbi Decoder VLSI Architecture

3 Convolution Code Coding – add redundancy on the original bit data for error checking or error correction –E.g. error checking – parity check code –Error correction code Either block codes or convolutional codes. The classification depends on the presence or absence of memory. block codeA block code has no memory. Each output codeword of an (n,k) block code depends only on the current buffer. –K is the # of original data bit and n is the # of encoded bit The encoder adds n-k redundant bits to the buffered bits. The added bits are algebraically, related to the buffered bits. The encoded block contains n bits. code rateThe ratio k/n is known as the code rate.

4 Convolution Code convolutional coderA convolutional coder may process one of more samples during an encoding cycle. –It is described by 3 integers: n, k, and K. –k/n = Code Rate (Information/coded bits). –But n does not define a block or codeword length. constraint length –K is the constraint length and is a measure of the code redundancy. –The encoder acts on the serial bit stream as it enters the transmitter. memory –Convolutional codes have memory. –The n-tuple emitted by the encoder is not only a function of an input k- tuple, but is also a function of the previous K-1 input k-tuples.

5 Encoder Structure Map k bits to n bits using the previous (K-1)k bits rate k/n code with constraint length K n generators or polynomial, each is a binary vector with K bits long The following shows the case where k=1 (easily extendable) Example: k=1, n=2, K=3, g 1 =[101]=1+z -2, g 2 =[111]=1+z -1 +z -2 + + Input (b 1,b 2,…) Output (c 1,c 2,c 3,c 4,…)

6 Basic Channel Coding for Wideband CDMA Convolutional code is rate 1/3 and rate 1/2, all with constraint length 9 Convolutional Codes Concatenated Codes

7 Convolutional Encoding Let m = m 1, m 2, …, m i, … denote the input message bits. U = U 1, U 2, …, U i, … denote the codeword sequence. with U i = u 1i, u 2i, …, u ni, = ith codeword and u ji, = jth binary code symbol of U i. Let Z = Z 1, Z 2, …, Z i, … denote the demodulated sequence Estimate of the input message bits with Z i = z 1i, z 2i, …, z ni,

8 Convolutional Encoding Decoding Modulate AWGN Channel Demodulate Convolutional Decoder Information sink Information source Convolutional Encoder m=m 1,m 2,….,m i,… Input sequence U=G(m) =U 1,U 2,…U i,.. Codeword sequence where U i =u 1j,….,u ji,….,u ni s i (t) Z=Z 1,Z 2,…,Z i,…. where Z i =z 1i,….,z ji,…,z ni and z ji is the jth demodulator output Symbol of branch word z i

9 Convolutional Encoding A general convolutional encoder with constraint length K and rate k/n consists of. –kK-stage shift register and n mod-2 adders –K = Number of k-bit shifts over which a single information bit can influence the output. –At each unit of time: k bits are shifted into the 1 st k stages of the register All bits in the register are shifted k stages to the right The outputs of the n adders are sequentially sampled to give the coded bits. There are n coded bits for each input group of k information or message bits. Hence R=k/n information bit/coded bit is the code rate (k < n).

10 Convolutional Encoder (with Constraint length K and rate k/n) 1 2 3 … kK m m = m 1, m 2, …, m i, … Input sequence (shifted in k at a time) kK-stage shift register 12... n n modulo-2 Adders Codeword sequence U = U 1, U 2, … U i, … where U i = u 1i,…,u ji,…,u ni, = ith codeword branch u ji = jth binary code symbol of branch word U i. mainly consider Rate 1/n codes Typically binary codes for which k=1 are used. Hence, we will mainly consider Rate 1/n codes

11 Convolutional Codes Representation must describe the encoding functionTo describe a convolutional code, we must describe the encoding function G(m) that characterizes the relationship between the information sequence m and the output coded sequence U. There are 4 popular methods for representation –Connection pictorial and Connections polynomials –State Diagram –Tree Diagram –Trellis Diagram

12 Connection Representation connection vectorsSpecify n connection vectors, g i, (i=1, …, n) one for each of the n mod-2 adders. Each vector has K dimension and describes the connection of the shift register to the mod-2 adders. A 1 in the i th position of the connection vector implies shift register is connected. A 0 in the the i th position of the connection vector implies no connection exists.

13 Convolutional Encoder (K =3, Rate 1/2) g 1 = 1 1 1 g 2 = 1 0 1 If Initial Register Content is 0 0 0 and Input Sequence is 0 0 1. Then Output (or Impulse Response) Sequence is 11 10 11. Or g 1 (X)=1+X+X 2 g 2 (X)=1+X 2 Input 001...001... First code symbol U 1 : First code symbol Second U 2 : Second code symbol code symbol...111... Output Output...101...

14 Example (for the previous code) t1t1 u1u1 u2u2 1 1 0 0 t2t2 u1u1 u2u2 1 0 0 1 0 t5t5 u1u1 u2u2 1 0 0 1 t3t3 u1u1 u2u2 0 1 0 1 t6t6 u1u1 u2u2 0 0 0 0 t4t4 u1u1 u2u2 1 0 0 1 0 Encoder 1 0 1 m = 1 0 1 u TimeOutputTimeOutput Output sequence11 10 00 10 11 Output sequence : 11 10 00 10 11 Message bits input at t 1, t 2, t 3 (K-1)= 2 zeros are input at t 4, t 5 to flush Register. Another 0 input at t 6 to get 00.

15 State Representation stateThe state of a rate 1/n code = Contents of the rightmost K-1 stages. Knowledge of the state and the next input is necessary and sufficient to determine the next output. State Diagramstates represent the possible contents of the rightmost K-1 stages of the shift registerCodes can be represented by a State Diagram where the states represent the possible contents of the rightmost K-1 stages of the shift register. transitionsFrom each state there are only 2 transitions (to the next state) corresponding to the 2 possible input bits. represented by paths on which we write the output word associated with the state transitionThe transitions are represented by paths on which we write the output word associated with the state transition. –A solid line path corresponds to an input bit 0. –A dashed line path corresponds to an input bit 1 –A dashed line path corresponds to an input bit 1.

16 State Diagram for our Code (K=3, Rate ½) b=10 a=00 d=11 c=01 11 01 11 00 10 01 Output Branch word Encoderstate Legend: Input bit 0 Input bit 1 00 10

17 Example 11011 Assume that m = 11011 is the input followed by K-1=2 zeros to flush the register. Also assume that the initial register contents are all zeros. Find the output sequence U. Input bit m i 0 0 0 0 1 0 1 1 0 1 1 0 1 1 0 1 Register contents State at time t i State at time t i+1 Branch word at time t i u1u1 u2u2 -1101100-1101100 0 0 0 1 0 0 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 0 0 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 0 0 10000011000001 11101111110111 State t i State t i+1 Output sequence: U = 11 01 01 00 01 01 11

18 Tree Diagram Representation adds the dimension of timeThe tree diagram is similar to the state diagram, except that it adds the dimension of time. The code is represented by a tree where each tree branch describes and output word input is 0upward direction –If the input is 0, then we move to the next rightmost branch in the upward direction. input is 1downward direction –If the input is 1, then we move to the next rightmost branch in the downward direction. Using the tree diagram, one can dynamically describe the encoder as a function of a particular input sequence.

19 Tree Diagram for our Code t 1 t 2 t 3 t 4 t 5 Structure repeats itself after the 3 rd branching (at t 4 ) m Heavy Line represents m = 1 1 0 1 1 1 0 U Output Codeword U = 11 01 01 00 01

20 Trellis Diagram Representation tree structure repeats itself after K branchingIn general, the tree structure repeats itself after K branching (K = Constraint length). Label each node in the tree by its corresponding State. Each transition from a node state produces 2 nodes (2 states). mergedAny 2 nodes having the same state label, at the same time, can be merged since all succeeding paths will be indistinguishable. Trellis diagramThe diagram we get by doing so is called the Trellis diagram.

21 Trellis Diagram for our Code state Legend Input bit 0 Input bit 1 Codeword branch 00t1t1 a=00 00t4t4 00t3t3 00t2t2 00t5t5 t6t6 11 d=11 c=01 b=10 11 10 01 10 11 00 01 Trellis structure repeats itself after depth K =3

22 Decoding of Convolutional Code Maximum Likelihood DecodingMaximum Likelihood Decoding Viterbi AlgorithmViterbi Algorithm

23 Maximum Likelihood Decoding Let U (m) denote one of the possible (say, the m th ) transmitted sequence and Z the received sequence. optimum decoderThe optimum decoder (which minimizes probability of error) is the one that maximizes P(Z| U (m) ). I.e., Optimum Decoder chooses the sequence U (j) if Maximum Likelihood DecoderThis is known as the Maximum Likelihood Decoder.

24 Maximum Likelihood Metric memorylessAssume a memoryless channel, i.e., noise components are independent. Then, for a rate 1/n code where Z i is the i th branch of Z. Then the problem is to find a path (each path defines a codeword) through the trellis (or tree) s.t.

25 Maximum Likelihood Metric log-likelihood metricThis function which we need to maximize is known as the log-likelihood function or the log-likelihood metric. log-likelihood metric brute-forceexhaustive approachTo find the optimum path, we can compare all possible paths in the tree or trellis and find the path which maximizes the log-likelihood metric. This is known as the brute-force or exhaustive approach. The brute-force approach is not practical as the # paths grows exponentially as the path length increases. Viterbi Decoding Algorithm or Viterbi Decoder.The optimum algorithm for solving this problem is the Viterbi Decoding Algorithm or Viterbi Decoder.

26 Binary Symmetric Channel (BSC) Have Crossover ProbabilityChannel Symbol Error Probability or Channel BER p = Crossover Probability or Channel Symbol Error Probability or Channel BER

27 Log-Likelihood Metric Assume that U (m) and Z are each L-bit long and that they differ in d m positions. I.e., Hamming Distance between them is d m. Then where A and B are positive constants (as p < 0.5).

28 Log-Likelihood Metric equivalentminimizing the Hamming DistanceSince A and B > 0, maximizing the Log-Likelihood Metric is equivalent to minimizing the Hamming Distance. Maximum Likelihood (ML) Decoder Hard Decision DecodingMaximum Likelihood (ML) Decoder (Hard Decision Decoding): Z –Choose in the tree or trellis diagram, the path whose corresponding sequence is at the minimum Hamming distance to the received sequence Z. –I.e Choose the minimum distance metric. Hard-Decision Maximum Likelihood Decoder Minimum Hamming Distance Decoder i.e. Hard-Decision Maximum Likelihood Decoder = Minimum Hamming Distance Decoder

29 Viterbi Decoding (R=1/2 & K=3) Branch metric Decoder tries to find the minimum distance path state 2t1t1 a=00 1t4t4 1t3t3 1t2t2 1t5t5 t6t6 1 111 d=11 c=01 b=10 0 2 2 02 020 220 111 111 0 01 0 Input data sequence Transmitted codeword Received sequence mm:mm: UU:UU: ZZ:ZZ: 1 00 10 0 01 1 1 11 1 01...

30 Viterbi Decoder Basic IdeaBasic Idea: –If any 2 paths in the trellis merge to a single state, one of them can always be eliminated in the search. merge to –E.g., at time t 5, 2 paths merge to (enter) state 00. cumulative Hamming path metricThe cumulative Hamming path metric of a given path at t i = Sum of the branch hamming distance metrics along that path up to time t i. –The upper path metric is 4 and the lower path metric is 1. –The upper path cannot thus be part of the optimum path since the lower path which enters the same state has a lower metric. –This is true because future output branches depend only on the current state and not the previous states.

31 Path Metrics for 2 Merging Paths state Path metric = 4 Path metric = 1 t1t1 a=00 1 t4t4 t3t3 t2t2 d=11 c=01 b=10 0 2 0 11 0 t5t5

32 Viterbi Decoding 2 K-1 states #states complexity measureAt time t i, there are 2 K-1 states in the trellis where K is the constraint length. ( NB: #states is an important complexity measure for Viterbi decoders. ) Each state can be entered by means of 2 states. eliminating one of themViterbi Decoding consists of computing the metrics for the 2 paths entering each state and eliminating one of them. 2 K-1 nodesThis is done for each of the 2 K-1 nodes at time t i. The decoder then moves to at time t i+1, and repeats the process.

33 Viterbi decoding Example Path metrics t1t1 a=00 t2t2 b=10 0 2 a =2 b =0 Path metrics a =3 b =3 d =0 c =2 a=00 b=10 t1t1 t2t2 0 2 1 1 0 2 t3t3 c=01 d=11 t1t1 a=00 t4t4 t3t3 t2t2 d=11 c=01 b=10 0 2 0 1 0 112 2 0 2 1 t1t1 a=00 t4t4 t3t3 t2t2 d=11 c=01 b=10 0 2 0 1 2 0 1 Path metrics a =3 b =3 d =2 c =0 (a) (d)(c) (b) 1

34 Viterbi decoding Example (e) (h)(g) a =1 b =1 d =2 c =3 a=00 c=01 b=10 t5t5 (f) t1t1 t3t3 t2t2 d=11 0 2 0 t4t4 2 0 1 0 0 1 1 t1t1 a=00 t3t3 t2t2 d=11 c=01 b=10 0 2 0 t4t4 1 2 0 1 t5t5 1 0 0 1 1 1 1 2 t6t6 1 2 2 1 1 1 0 0 a=00 c=01 b=10 t5t5 t1t1 t3t3 t2t2 d=11 0 2 0 t4t4 2 0 1 0 0 1 1 a =2 b =2 d =1 c =2 t6t6 1 1 0 0 t5t5 t1t1 t3t3 t2t2 0 0 t4t4 2 0 0 1 1

35 Convolutional Codes Distance Properties minimum distanceThe minimum distance between all pairs of possible codewords is quite important and is related to the error-correcting capability of the code. To compute it we can simply consider the all-zeros sequence (since the code is linear). Assuming that the all-zeros path is the correct one. error event –An error event (or errors) would occur when there exists a path which starts and ends at the a=00 state at time t i (but does not return to the 00 state in between) with a metric that is smaller than the all-zeros path at t i. In this case, we say the correct path does not survive. The minimum distance of such an error path can be found using an exhaustive search for all possible error events.

36 Trellis labeled with distances from the all-zeros path state 0t1t1 a=00 0t4t4 0t3t3 0t2t2 0t5t5 t6t6 2222 d=11 c=01 b=10 2 1111 1 1 11 111 222 000 1 11

37 Minimum Distance In the previous example there are: –1 path with distance 5 (merges at t 4 ) and correspond to the input sequence 1 0 0. –2 paths at distance 6 (One merges at t 5 and the other at t 6.). They are 1 1 0 0 and 1 0 1 0 0. Minimum Free Distanced f = Minimum Free Distance = Minimum distance of all arbitrary long paths that diverge and remerge. d f = 5 in this case and the code can correct any t=2 errors. can correct any t channel errorsA code can correct any t channel errors where (this is an approximation).

38 Formalized Viterbi algorithm Use the maximum likelihood decoding procedure Find the closest sequence of symbols in the given trellis, using either the Euclidean distance of the Hamming distance as distance measure The resulting sequence is called the global most-likely sequence For a received N-state sequence v containing L symbols v={v(0),v(1),…,v(L-1)}, where the first symbol v(0) is received at time instance 0 and the last one v(L-1) is received at time instance L-1, the Viterbi decoder iteratively computes the survivor path entering each state at time instances 1,…,L-1. The survivor path for a given state at time instance n is the sequence of the symbols closest in distance of the received sequence up to time n.

39 Viterbi algorithm Path metrics x i (n) – a metric assigned to each state denoting the distance between the survivor path for state i and the received sequence up to time n. Branch metrics- difference between the current received symbol v(n) and the output symbol in the encoding trellis. From time instance n to n+1, the Viterbi algorithm updates the survivor paths and the path metrics values x i (n+1) from the survivor path metrics at time instance n and the branch metrics (a ij (n)) in the given trellis as follows: The updating mechanism is based on an optimization algorithm called dynamic programming.

40 Viterbi Algorithm Let PM(s 0 =a,s n =b) be the maximum path metric (sum of accumulated branch metric BMs) from s 0 =a to s n =b Then, we can calculate PM(s 0 =a,s 10 =b) easily if we know PM(s 0 =a,s 9 =s) for all possible s, particularly those that has a branch to state b in the trellis PM(s 0 =a,s 10 =b) = max s  PM(s 0 =a,s 9 =s) +BM(s 9 =s,s 10 =b) At this point, we can eliminate one of these two paths

41 Example For this encoding trellis, assume at the time instances n, the path metrics for the 4 states are: –x 1 (n)=2, x 2 (n)=0, x 3 (n)=1, x 4 (n)=2, –Received symbol is v(n)=11 –Using the Hamming distance as the measure of the distance, we have the following branch metrics for all the transitions in the trellis S 11 S 00 S 10 S 01 0/00 1/01 0/11 1/111/10 0/10 1/00 0/01 g1(z)=1+z -2 g2(z)=1+z -1 +z -2 S 00 S 01 S 10 S 11 S 00 S 01 S 10 S 11 0/00 1/11 0/11 0/01 1/101/00 0/10 1/01 Time n Time n+1

42 Example Survivor path and its path metrics for each state from time n to n+1 are updated. 2 possible path entering each state, the one with large metric is discarded. The update process is carried out iteratively from n=1 to n=L

43 Example The global most-likely sequence is the survivor path of the state with minimum path metrics at time=L, i.e. –Where ind -1 means “take the index of the corresponding state”. Optimality guaranteed based on the dynamic programming algorithms have the property that the optimum solution from an initial iteration to the iteration n+m must consist of the optimum solution from initial iteration to iteration n and from iteration n to iteration n+m.

44 Example Figure 1.11

45 Computation in Viterbi algorithm Computing of branch metrics a ij (n) Updating the path metrics –Requires addition, comparison and selection (ACS) for every state each time instance Selecting the final state Tracing back its survivor path

46 Design and Implementation of Viterbi Decoder Real Viterbi Decoder need to consider the following practical problems –Arbitrarily long decoding delays cannot be tolerated. The decoder has to output decoded information bits before the entire encoded message has been retrieved. –Incoming analog signals has to be quantized by ADC –The decoder may be brought on line in the middle of transmission and will thus not know where one n-bit block ends and the next begins. Need block synchronization

47 Block Diagram of a practical Viterbi decoder

48 Quantization Difference in performance between an un-quantized soft-decision and a hard-decision decoder B-bit quantization provides decoder performance in between B=3 (8 levels) quantization introduces only a slight reduction in performance (~0.25db)

49 Block synchronizer Segment the received bit stream into n-bit blocks, each block corresponding to a stage in the trellis. If the received bits are not properly divided up, the results are disastrous. We can use this disastrous nature to help the draw the block boundary. –If the boundary is correct, one or a few partial path metrics will be much lower than the others after a few constraint lengths of branch metric computations. –If the alignment is wrong, the metric tends to be random and all paths have similar partial path metrics and there is not dominant path. –We can use this detect “out-of-sync” and adjust the block boundary until this is fixed –We can use a simple threshold for this detection.

50 Branch Metric (BM) Computer Typically based on a look-up table containing the various bit metrics Look up n bit metrics associated with each branch and sums them to obtain the branch metric For symmetric channel, the BM calculation is simpler. The second row of the bit metric table is simply a reversed image of the first row. Same look-up function is performed n times per branch for each 2 MK branches per stage in the trellis. –An extreme fast decoder may need n2 MK look-up table circuits or a simple decoder needs to use the same look-up table n2MK times. Reducing number of bits required for the BM by simplification and approximation M(r|y) y=0 y =1 R = 0’ 0 1 1’ 5 4 2 0 0 2 4 5 M(r|y) y=0 y =1 R = 0’ 0 1 1’ 3 2 1 0 0 1 2 3 (need 3 bits)(need only 2 bits)

51 Path Metric Updating and Storage Basic trellis element of a rate 1/n convolution code S j,t S j+2^(M-1),t S 2j,t+1 S 2j+1,t+1 M j,2j (r t+1 ) M j+2^(M-1),2j+1 (r t+1 ) M j+2^(M-1),2j (r t+1 ) M j,2j+1 (r t+1 ) A common circuit, add-compare-select (ACS) to calculate the above basic trellis element –Parallel or single ACS can be used depending on the throughput requirement Add CompareMux select V(S j,t ) V( S j+2^(M-1),t ) M j,2j (r t+1 ),M j,2j+1 (r t+1 ) M j+2^(M-1),2j (r t+1 ) M j+2^(M-1),2j+1 (r t+1 ) V(S 2j,t+1 ) V( S 2j+1,t+1 )

52 Information Sequence Updating and Storage This unit is responsible for keeping track of the information bits associated with the surviving paths. Two basic design approaches: –register exchange and trace back –Both need shift register to associate with every trellis node throughout the decoding operations

53 Decoding depth (or Survivor Path Length) # of bits that a register must be capable of storing is a function of decoding depth At some point during decoding, the decoder can begin to output information bits The information bits associated with a survivor branch at time t can be released when the decoder begins operation on the branches at time t+ , where  is called the decoding depth and is usually set to be five to ten times the constraint length of the code. The meaning of the survivor path length is that after trace back to that point, all the shortest path (survivor paths) from all possible starting states should have merged and the input corresponding to the transition from the state at time t is decoded. # of registers need = length K  Once the register is full, (t =  ) the oldest bits in the register are output as new bits are entered. The register are thus FIFO of fixed length

54 Example of Trellis …x 2,x 1,x 0 …y 2 (0),y 1 (0),y 0 (0) …y 2 (1),y 1 (1),y 0 (1) …y 2 (2),y 1 (2),y 0 (2) S0S0 S1S1 S3S3 S2S2 0/000 1/1110/110 0/111 0/001 1/001 1/000 1/110

55 Register Exchange Register for a given node at a given time contains the information bits associated with the surviving partial path that terminates at the node. As the decoding operations proceed, the contents of the registers in the bank are updated and exchanged as dictated by the surviving branches Hardware intensive - Each register must be able to send and receive strings of bits to and from two other registers Simple to implement S0S0 S1S1 S2S2 S3S3 10100 11001 11010 10111 Register Bank t=5 0 1 - - Register Bank t=1 00 01 10 11 Register Bank t=2 000 101 110 111 Register Bank t=3 1100 1101 1010 1011 Register Bank t=4 - - - - Register Bank t=0

56 Trace back Register for each state, but the contents of the registers do not move back and forth. It contains the past history of the surviving branches entering that state. Information bits are obtained by “tracing” back through the trellis as dictated by this connection history. The states in the state diagram (or trellis) were associated with the encoder shift-register contents. –E.g. State S 2 in a two-state encoder corresponds to the encoder shift- register contents 01. –In general, a state S xy can be preceded only by state S Y0 or S Y1. –A zero or one may this be used to uniquely designate the surviving branch entering a given state.

57 Trace back register content S0S0 S1S1 S2S2 S3S3 00011 00110 0100 0101 Register Bank t=5 0 0 - - Register Bank t=1 00 0 0 Register Bank t=2 000 001 01 Register Bank t=3 0001 0011 010 Register Bank t=4 - - - - Register Bank t=0

58 Low Power ACS unit for IS-95 E.g. IS-95: Rate 1/2, K=9 Convolutional Code, generator functions:g 0 = 753 (octal), g 1 = 561(octal) 0g00g0 1g11g1 0c00c0 1c11c1 information bit (input)

59 The path metric coming into state j from state i, at recursion n (PM (i,j) n ): BM (i,j) + PM i n-1 The branch metric (BM (i,j) ) is the squared distance between the received noisy symbol, y n,and the idea noiseless output symbol of that transition.

Branch Metric calculation For IS-95, k = 9, code rate = 1/2, there are 2 competitive paths arriving at each state at each cycle, branch metric calculation: BM i,j,t = (y t - x i,j ) 2 : For IS-95,n=2, there are only 4 different BMs Carefully examining the rate 1/2 convolution code, we can find –and hence –m can be anyone of 512 possible branches in the trellis. –Consequently, there is no need to have additional additions in BMU.

Path Metric Calculation Partial path metrics of 2 competitive paths (m 1 and m 2 ), from states s 1 and s 2 to state s at cycle i: –PM i (m1) = PM (s1) i-1 + BM (s1,s) i –PM i (m2) = M (s2) i-1 + BM (s2,s) i After the new partial path metrics are calculated, the following comparison is carried out: –PM i (s) = min (PM i (m1), PM i (m2) ): For IS-95, there are 256 states,512 Add and 256 Compare and Select operations have to be done for every decoded bit Comparing with BMU and SMU the number of ACS operations is significant and hence reducing its power consumption is essential.

Conventional ACS Unit ONE ACS operation requires reading two Path Metric values Butterfly operation S i1 S o2 S i2 S o1 t-1 t The number of read accesses can be reduce if the ACS operations to calculate the survivor paths at S o1 and S o2 are done together.

63 Bit width requirement for path metric Re-normalization of path metric values is required to avoid overflow. Increase the number of un-necessary operation of the ACSU Modular Normalization [Shung-1990] - if path metric memory bit width is >2D max where D max the maximum possible difference between the path metrics, no normalization is required. For IS-95 the maximum number of bits required for path metrics is 9 bits if the bit precision of the received symbol is 4.

64 ACSU: Modulo Normalization All binary values are evenly distributed on the circle. PMs run on the circle clock-wisely To compare two path metrics, compute the (n-1) th bit of the result of a straightforward 2’s complement subtraction of the two 9-bit numbers. E.g. m 1 = (m 1,8,…,m 1,0 ), m 2 = (m 2,8,…,m 2,0 ) and d = (d 8,…,d 0 ) = (m 1 -m 2 ) then 000 001 111

65 Architecture of conventional ACSU For a butterfly operation, 4 9-bit to 5-bit additions and 2 9-bit comparisons are needed PM t-1 (sa) Adder COMP BM t (sa,S0) BM t (sb,s0) Adder COMP BM t (sa,S1) BM t,1 (sb,s1) PM t-1 (sb) PM t (S 0 ) PM t (S 1 ) Conventional butterfly

Re-arranging the ACS Calculation in the Butterfly For calculating PM (S0) t, instead of finding min(PM (sa) t-1 + BM (sa,S0) t, PM (sb) t-1 + BM (sb,S0) t ), we can compare the values of (PM (sa) t-1 - PM (sb) t-1, BM (sb,S0) t - BM (sa,S0) t ) instead. Similarly, For calculating PM (S1) t, we compare the values of (PM (sa) t-1 - PM (sb) t-1, BM (sb,S1) t - BM (sa,S1) t ). Both computations share PM (sa) t-1 - PM (sb) t-1. One computation can be saved. For IS95, the two values BM (sb,S0) t - BM (sa,S0) t and BM (sb,S1) t - BM (sa,S1) t can be precomputed and stored.

The proposed ACSU Architecture COMP Sub. Adder PM t-1 (sa) PM t-1 (sb) BM t (sa,S0) BM t (sb,S0) BM t (sa,S1) BM t (sb,S1) PM t (S 0 ) PM t (S 1 ) New butterfly BM t (sb,S0) - BM t (sa,S0) BM t (sb,S1) - BM t (sa,S1) For a butterfly operation, 1 9-bit subtraction, 2 9-bit to 5-bit additions, and 2 9-bit to 5-bit comparison are needed

Comparison of the two architectures

69 Pre-computational architecture Further reduction of number of comparisons required during the ACS operation using pre-computation concept. Comparison is done on a 9-bit data and a 6-bit data. Instead of doing 9-bit comparison, we use 4 MSBs of the 9 bits data and the sign bit of the 6-bit data to pre- determine whether the magnitude of 9-bit data is larger and if not, then use a 5-bit comparator to compare the magnitude of the 6-bit data and the 5 LSBs of the 9-bit data.

70 Pre-computation Architecture clk Sel_s a /Sel_s b N i [4:0]>=D i [4:0]? D i [4:0] N i [4:0] D i [4:0] D i [5] N i [7:5] N i [8:0] N i [8] sasa sbsb BM i (sb,S0) - BM i (sa,S0) PM i-1 (m) PM i-1 (m’) Subtractor 5-bit register 5-bit register 1-bit register 1-bit register 5-bit comparator Precomputation Logic

71 Pre-computation Architecture A two-stage pipeline for calculating the selection signal At the first stage, N i [8:5] and D i [5] are used to pre-compute the condition of selecting s a or s b. When the condition is detected, the clock signal going to the 2 5-bit registers is gated to save power for the 5-bit comparison.

72 Results Both conventional and proposed ACSU were synthesized with Synopsys using MOSIS 0.8  m technology library Power consumption was estimated using a gate level power simulator Simulation vectors were generated in compliance with IS-95 and IS-98 standard.

73 Memory Organization for path metrics For M state Viterbi decoder, we need to store m path metrics. Since path metrics at time i+1 are computed using path metrics at time i, it seems that it is necessary to double buffer the path metric memory and we need 2*M memory. One way to eliminate the double buffer is to use in-place computation. –We need only the metrics of the M present states ( j i, j i - l, …, j i - k + 2, x)-for M choices of x-to compute metrics for the M hypotheses, i.e. next state ( y, j i, j i - l, …, j i - k + 2 )-for M choices of y. –If the M metrics needed are read from memory, then M memory locations become available to store the M newly computed metrics, and no double buffering is required for metrics.

74 In-place computation It is natural to treat the contents of the shift register, (a k, a k- 1, …, a l ), as a k-digit M-ary number and use this number as the address to the memory of path metrics. Such an addressing scheme is inconsistent with writing new metrics over old metrics. Consider the example of M = 2, k = 3. –The decoder has eight hypotheses ending in 000=0;001=1;010=2;…;lll=7. –Our natural order would store stage i metrics in table locations 0 through 7. –But the two successors to, say, 000 and 001 are 000 and 100. This means we read metrics from 0 and 1 and write them (by definition of natural order) in 0 and 4. This is not in-place.

75 In-place computation Suppose the path metrics are originally placed in nature1 order. After one, two, and three stages of decoding we would see the evolution of memory organization for path metrics pointers as follows: Computing in-place means we want to write back the result of the current path metric to the path metric locations that are used to calculated the current path metrics To guarantee in-place computation, we need to have an addressing scheme which changes after each decoding cycle. E.g. at the first cycle, we input 0,1 and output 0,4 and put the 0,4 to the location of 0,1. At the second cycle, we need to take input 0,1, again, but now 0 is stored in location 0 but 1 is stored in location 2. SO we need to change the addressing of the input every cycle

76 In-place computation From the previous figure, we can see that if the path metric of the hypothesis with shift register contents (a,b,c) (i.e. the state content) at time t is in memory location 4a+2b+c, then the path metric of the hypothesis with the shift register content (a,b,c) at time i+1 will be in location 4c+2a+b. In general, the metrics accessed together are found by generating their natural address but rotating the bits of these address by i places before reading (or writing) the metrics from (or into) the memory. A cyclic shift of i places is identical to a cyclic shift of i modulo k places.

77 Example 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000->000,001->010,010->100,011->110 100->001,101->011,110->101,111->111 Left rotate by 1 bit 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000 001 010 011 100 101 110 111 000->000,001->100,010->001,011->101 100->010,101->110,110->011,111->111 Left rotate by 2 bit

78 Survival path memory organization To “prune” the survivor path, the hypothesis with the lowest path metric is identified and its old symbols are the decoder output.’ The oldest symbols may then be dropped from all survivor paths. A symbol may be pruned from the path memory once each decoding cycle, or p symbols may be pruned away after every p decoding cycles.

79 Survivor path memory organization For minimum error rate, the length of the survivor path memory field should be made as large as possible. There is a rule of thumb that four or five constraint lengths is adequate. For M,= 2, the constraint length is k + 1. A practical case has k = 6, so a survivor path memory field of 35 bits is implied. It is inconvenient to handle such a long field all at once, although the operations needed are quite simple. To store the survivor path, w can use a pointer mechanism to avoid handling the entire field. Since each pointer can only point to M ancestors, the pointer can be abbreviated to an M-ary symbol. This M-ary symbol is identical to the M-ary symbol which is appended to the path. Thus no extra storage is needed for the pointers as we can interpret the path memory contents as pointers.

80 Survivor path memory organization During the decoding cycle, an M-ary choice is recorded for reach hypothesis in the i th digital position of the survivor path field for each of the Mk surviving hypothesis. –E.g. for the hypothesis with shift register content (a i, a i-1, …,a i-k+1 ) the symbol stored is x. To find its predecessor we look in digital position i-1 of the memory word whose address is (a i-1, …,a i-k+1,x); –If we read a y there, we look in digital position i-2 of the memory word whose address is (a i-2, …,a i-k+1,x,y); –The procedure carries on forward.

81 Example of the Survivor path memory organization m=2, k=3

82 Survivor path memory organization Whenever two survivor paths agree on k successive pointers, they must necessary converge We need to trace back such a path to prune and decode If the path memory field is L M-ary symbols wide, we may decode after p decoding cycles, obtaining p decoding symbols, then overwrite new path symbols into the newly freed digital positions on the next p decoding cycles. A new symbol will be stored in digital position (i mod L) of the path during decoding cycle i.

83 Survivor Sequence Memory Management supporting simultaneous updating and reading the memory Here we discuss several different survivor sequence memory management scheme that support simultaneous updating and reading the memory Traceback memory is organized in a 2-dimensional structure, with rows and columns –# of rows = # of states N = 2 v. –Each column stores the results of N comparisons corresponding to one symbol interval or one stage. 3 types of operations inside a Trace-back decoder –Traceback Read (TB) – reading a bit and interpreting this bit in conjunction with the present state number as a pointer that indicate the previous state number. Pointer values are not output as decoded values. Run to a predetermined depth T before being used to initiate the decode read operation

84 Survivor Sequence Memory Management supporting simultaneous updating and reading the memory 3 types of operations inside a Trace-back decoder –Decode read (DC) – operation same as TB, but operates on older data, with the state number of the first DC in a memory bank being determined by the previously completed traceback. Pointer values are the decoded values and are sent to the bit-order reversing circuit. Decode read multiple columns using one traceback read operation of T columns –Writing New Data (WR) – decisions made by the ACS are written into locations corresponding to the states. Data are written to locations just freed by the DC operations For every set of column write operations (N-bit wide), an average of one decode read must be performed. * ref: G. Feygin, P.G Gulak “Architectural Tradeoffs for Survivor Sequence Memory Management in Viterbi Decoders” IEEE Transactions on Communications, pp. 425-429, March, 1993

85 K-pointer Even Algorithm K=3

86 K-pointer Even Algorithm The memory is divided into 2k 2 memory banks, each of size (T/(k 2 -1)) columns. Each read pointer is used to perform the traceback operation in k 2 -1 memory banks, and the decode read in one memory bank. Every T stages, a new traceback front is started from the fixed state that has the best path metric. Since the traceback depth T must be achieved before decoding can be performed, so k 2 -1 memory banks must be greater than equal to T. Total # of memory required: 2k 2 * (T/ (k 2 -1)) The decoded bit are generated in a reverse order, this a scheme is required for reversing the ordering of the decoded bits –A simple two-stack LIFO is used to perform the bit order reversal Each stack is T/(k 2 -1) in depth During decoding, decoded bits are pushed on one stack while the bits stored on the other stack are popped. Upon completion of the decoding of a given memory bank, stacks switch from pushing to popping and vice versa.

87 K-pointer Odd Algorithm

88 K-pointer Odd Algorithm There are 2k 2 -1 memory banks, each of length (T/(k 2 -1)) Total length = (2k 2 -2)T/(k 2 -1) A 2-stack LIFO structure is also required to perform bit order reversal. The decode pointer and the write pointer always point to the same column in the memory, although the decode pointer will be used to read only one memory location, while the write pointer will be used to sequentially update memory locations corresponding to all states in a given trellis state. It is necessary to perform decoding before new data can be written, otherwise the memory being used may be overwritten.

89 One-pointer algorithm Different from the k-pointer algorithm which use k read pointers to perform the required k reads for every column write operation, a single read pointer, but accelerate read operations are used. Every time the write counter advances by one column, k column reads occur. The acceleration is based that among the writing new data, traceback read and decoder read operations, writing new data is the most time consuming – 2 v bits are written every stage, comparing with only k bits being read at every stage. k 1 +1 memory banks, each T/(k 1 -1) columns long. A single read pointer produces the decoded bit in bursts. –During the decode read operation in the k1th memory bank, decoded bits are generated at a rate of k1 per stage. –2-stack structure can perform both bit order reversal and burst elimintion at the same time.

90 One-pointer algorithm

91 Hybrid algorithm Combine some features of the k-pointer algorithm and a one-pointer algorithm. K column reads per stage are performed using k 2 read pointers, each advancing at a rate of k 1 column per stage. (k = k 1 k 2 and k =< T+1)

92 Hybrid algorithm

93 Radix-4 Viterbi Decoder Radix-2 Trellis and ACS Radix-2 trellis 2-way ACSRadix-2 ACS unit

94 Radix-4 ACS A 2 v -state trellis can be iterated from time index n-k to n by decomposing the trellis into 2 v-k sub-trellis, each consisting of k iterations of a 2 k -state trellis. Each 2 k -state subtrellis can be collapsed into an equivalent one- stage radix-2 k trellis by applying k levels of lookahead to the recursive ACS udpate E.g. 8-state Radix-4 trellis

95 Radix-4 ACS Parallel and Serial implementation of the ACS unit –Parallel – one ACS butterfly for each pair of states –Serial – for large constraint length, parallel implementation may not be feasible, use single/(or fewer # than the # of states) ACS butterfly Throughput can be increased if the number of ACS iteration for each stage can be reduced. # of ACS iteration is reduced by half using Radix-4 ACS. If the critical path of a radix-4 ACS is the same as that of a radix-2 ACS, a potential 2 fold speed up is achievable. Of course the potential speedup comes with a complexity increase since the radix-4 ACS is more complex. Therefore higher-radix ACS is not very practical.

96 Radix-4 ACS (cont.) Radix-4 trellis 4-way ACS Radix-4 ACS unit

97 A 4-way ACS Block diagram

1 ELEC692 VLSI Signal Processing Architecture Lecture 10 Viterbi Decoder Architecture.

Similar presentations

Presentation on theme: "1 ELEC692 VLSI Signal Processing Architecture Lecture 10 Viterbi Decoder Architecture."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 ELEC692 VLSI Signal Processing Architecture Lecture 10 Viterbi Decoder Architecture.

Similar presentations

Presentation on theme: "1 ELEC692 VLSI Signal Processing Architecture Lecture 10 Viterbi Decoder Architecture."— Presentation transcript:

Similar presentations

About project

Feedback