An Efficient FPGA Implementation of IEEE 802.16e LDPC Encoder Speaker: Chau-Yuan-Yu Advisor: Mong-Kai Ku.

An Efficient FPGA Implementation of IEEE 802.16e LDPC Encoder Speaker: Chau-Yuan-Yu Advisor: Mong-Kai Ku

Outline Introduction  Low-Density Parity-Check Codes Related work  General encoding for LDPC codes  Efficient encoding for Dual-Diagonal matrix Better Encoder scheme LDPC Encoder Architecture  Parallel Encoder  Serial Encoder Result Conclusion

Outline Introduction  Low-Density Parity-Check Codes Related work  General encoding for LDPC codes  Efficient encoding for Dual-Diagonal matrix Better Encoding scheme LDPC Encoder Architecture  Parallel Encoder  Serial Encoder Result Conclusion

Low-Density Parity-Check Code Benefit of LDPC Codes.  Approaching Shannon limit  Low error floor  LDPC code is adopted by various standards (e.g. DVB-S2, 802.11n, 802.16e)

Low-Density Parity-Check Code Parity check matrix H is sparse  Very few 1’s in each row and column Null space of H is the codeword space Valid Codeword

Low-Density Parity-Check Code In (n, k) block codes, k-bit information data can be encoded as n-bit codeword. In systematic block codes, the information bits directly exist in the bits of codeword. Systematic Part Parity Part

Low-Density Parity-Check Code General encoding of systematic linear block codes  Finding generator matrix G via H.  C = sG = [s | p] Issues with LDPC codes  The size of G is very large.  G is not generally sparse.  Encoding complexity will be very high.

Structured LDPC Codes Quasi-Cyclic LDPC Codes  In QC-LDPC, H can be partitioned into square sub-blocks of size z x z.  Each sub-blocks can be Z x Z zero sub-block or identity matrix with permutation.

QC Codes With Dual-Diagonal Structure  In IEEE standards QC-LDPC Codes have Dual-Diagonal parity structure.  We take 802.16e code rate ½ matrix for example. Structured LDPC Codes 0 represent identity matrix.

General Encoding for LDPC Codes Richardson and Urbanke (RU) algorithm  Partition the H matrix into several sub-matrix.  In H, the part T is a low triangle matrix.

Richardson and Urbanke (RU) algorithm General Encoding for LDPC Codes O(n+g 2 ) p0 p1 O(n+g 2 )

A valid codeword c = [s|p] must satisfy Replace by dual-diagonal matrix Define lambda value as Efficient Encoding for Dual-Diagonal LDPC Codes Information bitsParity bits From equation, we obtained

Related Work (1) Sequential Encoding Encoding scheme Step 1 Compute lambda value by doing matrix operation x = HsS Step 2 Determines parity vector P 0 by adding all the lambda value Step 3 Rest of parity vector is obtained by exploiting dual-diagonal matrix T One-way derivation

Related Work (2) Arbitrary Bit-generation and Correction Encoding In [1], an alternative encoding for standard matrix was presented. Replace with zero cyclic shift Matrix will be modify by parity portion of weight-3 column set. H can be sectorized into three sub matrices  The information bit region A  The parity bit region Q for bit-flipping operation  The parity bit region U for non bit-flipping. [1] C. Yoon, E. Choi, M. Cheong, and S.-K. Lee, "Arbitrary bit generation and correction technique for encoding QC-LDPC codes with dual-diagonal parity structure," IEEE Wireless Communications and Networking Conference, (WCNC 2007), pp. 662-666, March 2007. A QU

Encoding scheme Step 1 Compute lambda value by doing matrix operation x = As Step 2 Set P 0 as arbitrary binary values. solve unknown parity bits Step 3 Computed correction vector f from P 0 Step 4 Add correction vector to parity bits in region Q to correct them One-way derivation Related Work (2) Arbitrary Bit-generation and Correction Encoding

Advantage  Low-complexity encoding  The number of addition required is less than RU scheme Drawback  Can not directly applicable to standard code  Modifying matrix will decrease code performance Related Work (2) Arbitrary Bit-generation and Correction Encoding

Better encoding scheme Advantages of the encoding scheme proposed in [2]  Low-complexity encoding  Can directly applicable to matrices defined in IEEE standards without any modification  Achieve higher level parallelism [3] C.-Y. Lin, C.-C. Wei, and M.-K. Ku, "Efficient Encoding for Dual-Diagonal Structured LDPC Code Based on Parity bits Prediction and Correction," IEEE Asia Pacific Conference on Circuits and Systems (APPCCAS), pp.1648-1651, Dec. 2008.

Better Encoding Scheme Step 1 Set P 0 ’ as any binary vector Step 2 Compute lambda value by doing matrix operation Hs Step 3 [Forward Derivation] Step 4 [ Backward Derivation] Step 5 Compute the P 0 by adding prediction parity vector Step 6 Compute the correction vector f Step 7 Correct prediction parity by adding f Compute P 0 by adding prediction vector Compute correction vector f Correct prediction vector by f f = (P 0 ) d

Better Encoding Scheme Two-way derivation Reduce encoding delay !! Step 1 Set P 0 ’ as any binary vector. Step 2 Compute lambda value by doing matrix operation Hs. Step 3 [Forward Derivation] Step 4 [ Backward Derivation] Step 5 Compute the P 0 by adding prediction parity vector. Step 6 Compute the correction vector f. Step 7 Correct prediction parity by adding f.

LDPC Encoder Architecture Based on the encoding scheme proposed bedore, we design both parallel and serial architecture. Parallel architecture  Achieve higher level parallelism  High-speed Serial architecture

Parallel architecture Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 1) In this stage, matrix select the shift values and multiply specific value according to the code length. Benefit: 1.When the input data is coming, it can work immediately without all the input data are coming. 2.Reduce the numbers of barrel shifter.

Shifter Value Computation Equation for computing shift value Code rate 2 ∕ 3 A code : Normal code rate : Two type of matrix implement result with multiple rate and length SliceFFsLUTs CLK (MHz) Total gate count One matrix + calculate IP 14,1794,07126,846141.391227,076 Using matrices to save shifter value 41,40912,07876,977165.591635,691

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 2) Divide the datas from matrix. This module used to save the input data. These data are used in barrel shifters.

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 3) These module are used to circulated shift the input data Shifter value This module records the row position of the shifter values Lambda position = 3 Lambda position = 8 Lambda position = 11

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 4) Computed the lambda value by accumulating the shifted data after K b clock cycle KbKb According to the lambda position, in this clock cycle λ 1, λ 2, λ 5, λ 8, λ 9, λ 11 need to be accumulated.

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 5) Computed the prediction vector P i ‘ by equation

Parallel architecture (Stage 5) P_0 <= acc_out0; P_1 <= acc_out0 ^ acc_out1; P_2 <= acc_out0 ^ acc_out1 ^ acc_out2; P_3 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3; P_4 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4; P_5 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4 ^ acc_out5; P_6 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7 ^ acc_out6; P_7 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7; P_8 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8; P_9 <= acc_out11 ^ acc_out10 ^ acc_out9; P_10 <= acc_out11 ^ acc_out10; P_11 <= acc_out11; For saving the hardware area, we use one architecture to compute the prediction values for four different code rate. In code rate 1 / 2, P_0 ~ P_11 are the prediction In code rate 2 / 3, P_0 ~ P_3 P_8~P_11are the prediction

P_0 <= acc_out0; P_1 <= acc_out0 ^ acc_out1; P_2 <= acc_out0 ^ acc_out1 ^ acc_out2; P_3 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3; P_4 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4; P_5 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4 ^ acc_out5; P_6 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7 ^ acc_out6; P_7 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7; P_8 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8; P_9 <= acc_out11 ^ acc_out10 ^ acc_out9; P_10 <= acc_out11 ^ acc_out10; P_11 <= acc_out11; Parallel architecture (Stage 5) For saving the hardware area, we use one architecture to compute the prediction values for four different code rate. In code rate 3 / 4, P_0 ~ P_2 P_9~P_11 are the prediction vectors In code rate 5 / 6, P_0 ~ P_1 P_10~P_11are the prediction vectors

Matrix Input data register lambda position Accumulator Correct Prediction Parity memory Barrel shifter#6 Barrel shifter#1 divider Parallel architecture (Stage 6) Step1: Compute the P 0. In code rate = 1 / 2, P 0 = P 5 ^ P 6 Step2: Correct the other P i. Using the equation P i = P i ’^ P 0

Serial architecture (Stage 1) As the stage1 in parallel architecture. In the first Kb clock cycle, encoder order are from top->middle and down ->middle, column by column 1 1 2 2 3 3 3 Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 1) Reason: 1.Prepare the input data 2.Reduce the slice In the last clock cycle, encoder order are from left->right, row by row 1 1 2 2 3 3

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 2) Choose the corresponding input value to barrel shifter (Take clock cycle #2 for example) Divide the datas from matrix.

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 3) Shift the input data according to the shifter value chosen form Mux

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 4) In this module, there are three works: 1.Compute λ i 2.Compute P i ’ 3.Compute P 0 In normal, this module accumulate the shifted data to compute λ i. When the data is the last value in this row, also compute P i ’.

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 4) When all Pi have been computed, compute the P 0 by Xor P x ’ and P x+1 ’ which are the middle prediction vector in the matrix.

Matrix Input data register Barrel shifter#1 Barrel shifter#2 Correct Accumulator & Predict memory Input control divider Serial architecture (Stage 5) Correct the other P i. Using the equation P i = P i ’^ P 0

Implementation Results The proposed encoder based on IEEE 802.16e LDPC codes can encode the code with code rate 1/2 2/3 3/4 5/6 and code length ranging from 576 to 2304. The hardware implementation was performed and verification on Xilinx Virtex-4 and Altera Stratix Field Programmable Gate Array (FPGA) device.

Implementation Results Parallel architecture Information throughput ranging from 2.262 to 10.441 Gbps The encoder area is constant in any code rate or code length. For a given code rate, an increase in the code length will increase the throughput. Rate 1/2Rate 2/3Rate 3/4Rate 5/6 ZNSliceFFsLUTsCLK (MHz)IT (Gbps) 24576 14,1794,07126,846141.391 2.2622.4682.5452.61 409603.774.1134.2414.35 6014405.6566.176.3636.526 8019207.5418.2268.4838.701 9623049.0499.87210.1810.441

Implementation Results Serial architecture Information throughput ranging from 0.867 to 4.019 Gbps For a given code rate, an increase in the code length will increase the throughput.

Implementation Results Parallel architecture using row by row Area comparison

Implementation Results IT comparison IT/Area comparison

Compare to Related Work We compare implementation with [3]. Code LengthArea (LE)Clk (MHz)IT (Gbps) IT/Total Area (Mb per Le) [2] 5763391192.232.1290.0612 9605100159.572.2530.0648 14407012164.832.6970.0776 19208924148.722.6440.0761 230410339148.412.7580.0793 34766 Table 4.5a The synthesis result of [22] at code rate 1/2 Code LengthArea (LE)Clk (MHz) IT (Gbps) Rate 1/2 IT/Total Area (Mb per Le) rate1/2 Proposed 576 2096097.58 1.5610.07447 9602.6020.12414 14403.9030.18621 19205.2040.24828 23046.2450.29794 Better throughput for longer code length Using less area to implement multiple code length and code rate The clock cycle is shorter the [3]. [3] S. Kopparthi and D. M. Gruenbacher, "Implementation of a fiexible encoder for structured low-density parity-check codes," IEEE Pacic Rim Conference on Communications, Computers and Signal Processing (PacRim 2007), pp.438-441, Aug. 2007.

Compare to Related Work The comparison of throughput The proposed encoder outperforms the work in [3] in terms of throughput when the code length longer then 1200 The proposed encoder architecture provides better throughput for a longer code length while the work in [3] does not have this kind of speed-up

Compare to Related Work The proposed encoder outperforms the work in [3] in terms of throughput/area ratio by 1.216 to 3.757 times The proposed encoder utilizes hardware resources more efficiently The comparison of throughput/area ratio

Compare to Related Work We compare implementation with [2].

Compare to Related Work The comparison of throughput The throughput in our proposed encoder is higher then [2] in all code rate and code length The proposed encoder outperforms the work in [2] in terms of throughput ratio by 1.237 to 1.963 times

Compare to Related Work The comparison of throughput/area The proposed encoder outperforms the work in [2] in terms of throughput ratio by 2.427 to 5.256 times The result shows that our proposed encoder utilizes hardware resources efficiently

Compare to Related Work (Serial) We compare implementation with [4]. SlicesFFsLUTsBlock ramsCLKIT [4]4,7241,8078,335811863.34 Proposed12,5673,88522,0500123.5024.626 Our proposed encoder achieve higher IT in low clock. In our proposed encoder, the matrix information are built in it without additional blockrams. The IT/Area of our serial encoder is 0.3681(Mbps) per slice and the IT/Area of [4] is 0.1768. [4] Jeong Ki KIM 1, Hyunseuk YOO 1 and Moon Ho LEE 1, "Efficient Encoding Architecture for IEEE 802.16e LDPC Codes, " IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 2008.

Outline Introduction  Low-Density Parity-Check Codes Related work  General encoding for LDPC codes  Efficient encoding for Dual-Diagonal matrix Proposed Encoding scheme LDPC Encoder Architecture  Parallel Encoder  Serial Encoder Result Conclusion

An efficient encoding architecture for IEEE 802.16e LDPC codes with multiple code lengths and code rates are implemented. In our design, change between different code rate or code length only to change the type in information data. This architecture is also suitable the IEEE 802.11n standard. Our encoder achieve higher throughput and better throughput/area ratio than conventional encoding scheme when code length longer than 1200.

Thank you!!

An Efficient FPGA Implementation of IEEE 802.16e LDPC Encoder Speaker: Chau-Yuan-Yu Advisor: Mong-Kai Ku.

Similar presentations

Presentation on theme: "An Efficient FPGA Implementation of IEEE 802.16e LDPC Encoder Speaker: Chau-Yuan-Yu Advisor: Mong-Kai Ku."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Efficient FPGA Implementation of IEEE 802.16e LDPC Encoder Speaker: Chau-Yuan-Yu Advisor: Mong-Kai Ku.

Similar presentations

Presentation on theme: "An Efficient FPGA Implementation of IEEE 802.16e LDPC Encoder Speaker: Chau-Yuan-Yu Advisor: Mong-Kai Ku."— Presentation transcript:

Similar presentations

About project

Feedback