Design and Implementation of Turbo Decoder for 4G standards IEEE 802.16e and LTE Syed Z. Gilani.

Slides:



Advertisements
Similar presentations
Variables and Expressions
Advertisements

Iterative Equalization and Decoding
What is a good code? Ideal system
MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.
11/8/2005Comp 120 Fall November 9 classes to go! Read Section 7.5 especially important!
H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005.
6.375 Project Arthur Chang Omid Salehi-Abari Sung Sik Woo May 11, 2011
Strider : Automatic Rate Adaptation & Collision Handling Aditya Gudipati & Sachin Katti Stanford University 1.
CENG536 Computer Engineering Department Çankaya University.
1 Asynchronous Bit-stream Compression (ABC) IEEE 2006 ABC Asynchronous Bit-stream Compression Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion.
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
Turbo Codes – Decoding and Applications Bob Wall EE 548.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
An FPGA Based Adaptive Viterbi Decoder Sriram Swaminathan Russell Tessier Department of ECE University of Massachusetts Amherst.
1 Design and Implementation of Turbo Decoders for Software Defined Radio Yuan Lin 1, Scott Mahlke 1, Trevor Mudge 1, Chaitali.
Improving the Performance of Turbo Codes by Repetition and Puncturing Youhan Kim March 4, 2005.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
Efficient FPGA Implementation of QR
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
III. Turbo Codes.
Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.
Implementation of Finite Field Inversion
RICE UNIVERSITY DSPs for 4G wireless systems Sridhar Rajagopal, Scott Rixner, Joseph R. Cavallaro and Behnaam Aazhang This work has been supported by Nokia,
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Wireless Mobile Communication and Transmission Lab. Theory and Technology of Error Control Coding Chapter 5 Turbo Code.
RICE UNIVERSITY “Joint” architecture & algorithm designs for baseband signal processing Sridhar Rajagopal and Joseph R. Cavallaro Rice Center for Multimedia.
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
Design of a High-Speed Asynchronous Turbo Decoder Pankaj Golani, George Dimou, Mallika Prakash and Peter A. Beerel Asynchronous CAD/VLSI Group Ming Hsieh.
Iterative decoding If the output of the outer decoder were reapplied to the inner decoder it would detect that some errors remained, since the columns.
Chapter 2 Data Manipulation. © 2005 Pearson Addison-Wesley. All rights reserved 2-2 Chapter 2: Data Manipulation 2.1 Computer Architecture 2.2 Machine.
VIRGINIA POLYTECHNIC INSTITUTE & STATE UNIVERSITY MOBILE & PORTABLE RADIO RESEARCH GROUP MPRG Combined Multiuser Detection and Channel Decoding with Receiver.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
Real-Time Turbo Decoder Nasir Ahmed Mani Vaya Elec 434 Rice University.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Data Manipulation Brookshear, J.G. (2012) Computer Science: an Overview.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Semi-Parallel Reconfigurable Architecture for Real-time LDPC decoding Karkooti, M.; Cavallaro, J.R.; Information Technology: Coding and Computing, 2004.
Team LDPC, SoC Lab. Graduate Institute of CSIE, NTU Implementing LDPC Decoding on Network-On-Chip T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin.
Muhammad Shoaib Bin Altaf. Outline Motivation Actual Flow Optimizations Approach Results Conclusion.
Chapter 2 Data Manipulation © 2007 Pearson Addison-Wesley. All rights reserved.
August 2004 doc.: IEEE /0951r1 Submission S. Coffey, et al., WWiSE group Slide 1 WWiSE Group Partial Proposal on Turbo Codes August 13, 2004 Airgo.
Wireless Communication Research Lab. CGU What is Convolution Code? 指導教授:黃文傑 博士 學生:吳濟廷
Fundamentals of Programming Languages-II
Code Construction and FPGA Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code Decoder Lei Yang, Hui Liu, C.-J Richard Shi Transactions.
1 Implementation of Polymorphic Matrix Inversion using Viva Arvind Sudarsanam, Dasu Aravind Utah State University.
Log-Likelihood Algebra
Implementation of Turbo Code in TI TMS320C8x Hao Chen Instructor: Prof. Yu Hen Hu ECE734 Spring 2004.
Directorate of Technical and Quality Management Electrical System Department - TEC-E SCCC, LDPC and 4D-8PSK TCM Comparison of complexities Sergio Benedetto,
A Bandwidth Efficient Pilot Symbol Technique for Coherent Detection of Turbo Codes over Fading Channels Matthew C. Valenti Dept. of Comp. Sci. & Elect.
Doc.: IEEE / n Submission March 2004 PCCC Turbo Codes for IEEE n B. Bougard; B. Van Poucke; L. Van der Perre {bougardb,
Overview of MB-OFDM UWB Baseband Channel Codec for MB-OFDM UWB 2006/10/27 Speaker: 蔡佩玲.
1 Code design: Computer search Low rate: Represent code by its generator matrix Find one representative for each equivalence class of codes Permutation.
Waseda University Low-Density Parity-Check Code: is an error correcting code which achieves information rates very close to the Shanon limit. Message-Passing.
Design of a 300 Mbps Unified 3G/4G Turbo Decoder Using High-Level Synthesis Primary Author: Sandeep RK Secondary Author: Pankaj Saxena Company/Organization:
FEC decoding algorithm overview VLSI 자동설계연구실 정재헌.
Backprojection Project Update January 2002
Length 1344 LDPC codes for 11ay
WWiSE Group Partial Proposal on Turbo Codes
An Efficient Software Radio Implementation of the UMTS Turbo Codec
OR How to decide what math symbols to use
Implementation of IDEA on a Reconfigurable Computer
High Throughput LDPC Decoders Using a Multiple Split-Row Method
Chapter 2: Data Manipulation
Chapter 2: Data Manipulation
Implementation of a De-blocking Filter and Optimization in PLX
III. Turbo Codes.
Chapter 2: Data Manipulation
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Design and Implementation of Turbo Decoder for 4G standards IEEE e and LTE Syed Z. Gilani

Motivation Conventional serial decoding architectures can be performance bottleneck – 6144 bit block, 8 250MHz, 1 bit processed per cycle=> data rate < 6144/ (6144*8*4ns) – ~ 31Mbps Data rates for LTE can be 100Mbps-300Mbps Parallel architecture necessary to support high throughput decoding

Maximum-a posteriori (MAP) algorithm – Alpha – Beta – Gamma – LLR (De)Interleaver P(i) = (f 1 *i + f 2 *i 2 ) mod N switch (i mod 4) case 0: P(i) = (P 0 *i + 1 ) mod N case 1: P(i) = (P 0 *i N/2 + P 1 ) mod N case 2: P(i) = (P 0 *i P 2 ) mod N case 3: P(i) = (P 0 *i + 1 +N/2 + P 3 ) mod N Turbo Decoder Overview

Optimizations Resource Sharing Retiming Look-ahead transformation Variable and adaptive parallelism Multiplierless interleaver

Parallelization Time (cycles) States PE 1 PE 2 PE 3 PE 4

Variable Parallelization Parallel Interleaver Bank 0 Bank 1 Bank 0 Bank 1 Coded Bits Decoded Bits

Variable Parallelization Parallel Interleaver Bank 0 Bank 3 Bank 1 Bank 2 Bank 0 Bank 3 Bank 1 Bank 2 Coded Bits Decoded Bits

Interleaver Optimization Interleaving functions – P(i) = (f 1 *i + f 2 *i 2 ) mod N – switch (i mod 4) case 0: P(i) = (P 0 *i + 1 ) mod N case 1: P(i) = (P 0 *i N/2 + P 1 ) mod N case 2: P(i) = (P 0 *i P 2 ) mod N case 3: P(i) = (P 0 *i + 1 +N/2 + P 3 ) mod N Unoptimized Memory requirements – Don’t want to use multipliers and dividers – Storing all memory address in RAM – LTE alone supprts 40 different block lengths with different interleaving parameters – Block lengths vary from 40 bits to 6144 bits

Interleaver Optimization On-the-fly address generation LTE Interleaving Function P(i) = (f 1 *i + f 2 *i 2 ) mod N P(i+1)= (f 1 *(i+1) + f 2 *(i+1) 2 ) mod N = P(i) +( f 1 + f 2 +2 f 2 ) mod N Wimax Interleaving Function switch (i mod 4) case 0: P(i) = (P 0 *i + 1 ) mod N case 1: P(i) = (P 0 *i N/2 + P 1 ) mod N case 2: P(i) = (P 0 *i P 2 ) mod N case 3: P(i) = (P 0 *i + 1 +N/2 + P 3 ) mod N – P(i+1) = (P 0 (i) + P 0 + constant factor ) mod N Replace sum by residue whenever sum exceeds N to avoid mod N (subtraction)

Interleaver Optimization PEiP(i) Bank Add. Bit Add PEiP(i) Bank Add. Bit Add

Lookahead Transformation tktk t k+1 tktk t k+2 16 Comparisons required for lookahead transformation in Duo-binary Wimax turbo codes Increases throughput by 2x Maximum clock rate decreases from 500MHz to ~300MHz along with significant increase in area

Results No of IterationsNumber of PEsThroughputSerial throughput 22490Mbps243Mbps 24909Mbps243Mbps Mbps243Mbps 42245Mbps122Mbps 44455Mbps122Mbps 48833Mbps122Mbps 82 60Mbps 84228Mbps60Mbps 500Mhz

Questions

Outline Motivation Turbo Encoding Turbo Decoding Optimizations – Look-ahead transformation – Variable and adaptive parallelism – Multiplierless interleaver Results Summary

Turbo Encoder LTE Turbo EncodingWimax Turbo Encoding

Parallelization Example 4 state trellis 1 decoded symbol per cycle Time (cycles) States