Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,

Similar presentations


Presentation on theme: "Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,"— Presentation transcript:

1 Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro, and Behnaam Aazhang This work is supported by Nokia, TI, TATP and NSF

2 Introduction A real-time VLSI architecture for channel estimation Usually neglected, but high computational complexity Current DSP solutions do not meet real-time Iterative fixed point algorithm developed Area-Time Tradeoffs discussed –Area-Constrained (Pico-cells) –Time-Constrained (Theoretical Data Rates) –Area-Time efficient (Real-Time Solution)

3 Outline What is multiuser channel estimation? Need for multiuser channel estimation Implementation problems Algorithm enhancements VLSI architectures –Area-constrained,Time-constrained, Area-Time efficient Comparisons with DSP solutions Related Work and Conclusions

4 Evolution of mobile communications First generation Voice Second/Current generation Voice + Low-rate data (9.6Kbps) Third generation + Voice + High-rate data (2 Mbps/384 Kbps/128 Kbps) + multimedia

5 Channel estimation Direct Path Reflected Path Noise +MAI User 1 User 2 Base Station

6 Need for channel estimation To compensate for unknown fading amplitudes and asynchronous delays. Detector performance depends on accuracy of channel estimator

7 Computing channel estimates Computed by sending a training sequence of known bits to the receiver. When absent, detected bits can be used to update estimates in a decision feedback mode for tracking. Importance usually neglected May exceed detector complexity

8 Baseband signal processing Base-Station Receiver Channel estimation DetectionDecoding Multiple Users Antenna Detected Bits Tracking Training

9 Implementation complexity Matrix inversions (size 32x32) per window Unable to meet real-time on DSPs [Asilomar’99] VLSI fixed-point architectures for matrix inversions –Precision problems Typically, simpler single-user sliding correlator structures used.

10 Outline What is multiuser channel estimation? Need for multiuser channel estimation Implementation problems Algorithm enhancements VLSI architectures –Area-constrained,Time-constrained, Area-Time efficient Comparisons with DSP solutions Related Work and Conclusions

11 Iterative scheme for channel estimation Method of Gradient Descent Stable convergence behavior Same Performance Simpler Bit-Streaming Hardware Implementation

12 Simulations - Static multipath channel SINR = 0 Paths =3 Preamble L =150 Spreading N = 31 Users K = 15

13 Fading channel with tracking Doppler = 10 Kmph

14 Outline What is multiuser channel estimation? Need for multiuser channel estimation Implementation problems Algorithm enhancements VLSI architectures –Area-constrained,Time-constrained, Area-Time efficient Comparisons with DSP solutions Related Work and Conclusions

15 Area-Time Tradeoffs Design for 32 users (K) and spreading code (N) 32 Target Data Rate = 128 Kbps Low Power Issues ignored! Area-Constrained Architecture –Pico-cells ; lower data rates Time-Constrained Architecture –Maximum achieve-able data rates Area-Time Efficient Architecture –Real-Time with minimum area overhead

16 Task Decomposition IterateCorrelation Matrices (Per Bit) Pilot Bits Pilot MUXMUX Detected Bits Data MUXMUX A O(4K 2 N,8) R br O(2KN,8) R bb O(2K 2,8) TIME Channel Estimate to Detector b 0 (2K,1) Tracking Window r 0 (N,8) b(2K,1) r(N,8) L

17 Architecture Design: Auto-correlation b = {+1,-1} Multiplication is a XNOR operation Entire matrix can be updated sequentially or in parallel using XNOR gates Auto-correlation matrix implemented as an UP/DOWN counter(s)

18 Architecture Design: Cross-Correlation b = {+1,-1}, r = 8-bit integer vector (complex) Multiplications reduce to additions/subtractions Entire matrix (complex) can be updated sequentially or in parallel using 8-bit adders Cross-correlation matrix stored as RAM.

19 Architecture Design: Channel Estimate A = 8-bit integer matrix (complex) µ << 1 : Truncated Multiplication [Schulte’93] Matrix-matrix (real-complex) multiplication of integers Forms the bottleneck Can be done sequentially with a single multiplier or totally parallel or partially parallel Concentrate on multiplication for area-time tradeoffs!

20 Area-Constrained Architecture b0 b

21 Area-constrained Architecture: Hardware Requirements

22 Time-constrained Architecture b*b T b 0 *b 0 T b b0b0 MUX R br MUXMUX r r0r0 MUXMUX R bb A Mult Subtract >> Subtract 2K*1 K(2K-1)*1 2K 2 *8 2KN*16 2KN*8 2K*1 N*8 2KN*8 Channel Estimate

23 Auto-correlation Update in Parallel Rbb(i,j) Counter bb T (i,j) U/D# Rbb(i,i) Counter 1 U/D# Array of XNORs a·ba·ca·d b·cb·d c·d bcda b (2K) bb T (K*{2K-1}*1) R bb (2K 2 *8) Array of Counters

24 Cross-Correlation Update in Parallel bcda b (2K*1) r (N*8) R br (2KN*8) r(j) Rbr(i,j) Adder b(i) Add/ Sub# 88 1

25 Time-constrained Architecture: Hardware Requirements

26 Area-Time Efficient Architecture b*b T b 0 *b 0 T b b0b0 MUX MUXMUX r r0r0 Mult Subtract >> Subtract 2K*1 2K*8 1*16 1*8 1*1 1*8 N*8 1*8 R br Counters StoreLoad R bb A DEMUX MUX A new 1*8 Adder 1*8 2K*1 2K*8

27 Area-Time Efficient Architecture: Hardware Requirements

28 Outline What is multiuser channel estimation? Need for multiuser channel estimation Implementation problems Algorithm enhancements VLSI architectures –Area-constrained,Time-constrained, Area-Time efficient Comparisons with DSP solutions Related Work and Conclusions

29 DSP Comparisons DSPs unable to exploit bit-level parallelism Inefficient storage of bits Replacing multiplications by additions/subtractions

30 Related Work: DSP Extensions 64-bit Register D[i][j] +/- 64-bit Register D[i][j] 8-bit Control Register b[i] 8 8 8 (Cross-Correlation) For i = 1..8, j= 1..8 D[i][j] = D[i][j] + b[i]*C[j]

31 Related Work: Online Arithmetic Multiuser Detection –Need to compute only the Sign Bit (Most Significant Digit ) –No back-conversion to conventional representation –complex-number representation possible –Integration with channel estimation also.

32 Related Work : DSP-FPGA solutions Multiple DSP-FPGA task partitioning Bit level parallelism on FPGAs Multiplications on DSPs. Sundance Multi-DSP System –2 TI C67 DSPs –2 Xilinx Virtex FPGAs –http://www.sundance.com

33 Conclusions Real-Time VLSI architecture for multiuser channel estimation Iterative fixed-point algorithm developed to avoid matrix inversions Area-Time Tradeoffs discussed –Area-Constrained (Pico-cells) –Time-Constrained (Data Rates) –Area-Time efficient (Real-Time) VLSI architectures better exploit bit-level computations and parallelism to meet real-time constraints than DSPs.


Download ppt "Efficient VLSI architectures for baseband signal processing in wireless base-station receivers Sridhar Rajagopal, Srikrishna Bhashyam, Joseph R. Cavallaro,"

Similar presentations


Ads by Google