Presentation is loading. Please wait.

Presentation is loading. Please wait.

DSP Design in Wireless Communication LIANG LIU AND FREDRIK EDMAN,

Similar presentations


Presentation on theme: "DSP Design in Wireless Communication LIANG LIU AND FREDRIK EDMAN,"— Presentation transcript:

1 DSP Design in Wireless Communication LIANG LIU AND FREDRIK EDMAN, LIANG.LIU@EIT.LTH.SE

2 The Evolving Wireless Scene Range Data Rate 1m 10m 100m 1km 10km 1Kb 10Kb 100Kb 1Mb 10Mb 100Mb Cellular (WAN) 3G Cellular 2.5 G Cellular 802.11 (LAN) 802.1a Bluetooth (PAN) Sensor networks More bit/sec More bit/($·nJ) Courtesy: Prof. Jan Rabaey, BWRC

3 Evolution of High-Speed Wireless CDMAOFDMMIMO Massive MIMO

4 Bandwidth Modulation MIMO Same in other communication systems

5 Algorithms beats Moore beats Chemists Algorithmic Complexity Processor Performance (~Moore’s Law) Battery Capacity 1G 2G 3G Courtesy: Ravi Subramanian (Morphics)

6 Design Considerations Flexible High speed Reliable Small area Low power Low price Time to market

7 Digital Hardware Platforms Efficiency Flexibility ASIC GPP DSP GPU FPGA ASIP

8 Optimizing DSP Implementation Optimization Algorithm Architecture Module Circuit

9 CDMA

10 CDMA: Code Division Multiple Access “A UMTS Baseband Receiver Chip for Infrastructure Applications”, Texas Instruments, S. Sriram et al.

11 Accumulator

12 UMTS filter in receiver system ant RF ADC Inband Outband Reconfig. UMTS filter De-scramble & despread demod data desired signal “Architectural Optimization for Low Power in a Reconfigurable UMTS Filter”, in Proceedings of Wireless Personal Multimedia Communications Symposium (WPMC), Deepak Dasalukunte et al. San Diego, USA 33dB

13 Adaptive UMTS Filter minimum 33dB stop band attenuation (3GPP specification) required filter length of 65 taps In a Bad Channel DDD D...but ONLY 5 is needed in a Good one!

14 OFDM

15 OFDM: Orthogonal Frequency Division Multiplexing N-point IDFT Parallel to serial CP OFDM

16 FFT/IFFT in OFDM Systems Large number of subcarriers  large FFT, O(Nlog 2 N) OFDM: DVB-2/4/8k FFT WLAN IEEE802.11a/g-64 FFT (48+4 subcarriers) LTE – Long Term Evolution: 2k FFT

17 Folding Pipeline Parallel +  FFT: VLSI

18 ButtflyFIFOMult Multi-path delay commutator 50% Signle-path delay feedback 50%100%50% FFT: VLSI

19 High-throughput Featurenumber Technology0.13-μm Area1.44mm2 Throughput1GS/s Power39.6mW@4 09MS/s “A 1-GS/s FFT/IFFT processor for UWB applications”, IEEE Journal of Solid-State Circuits. 2005, 40(8): 1726 - 1735 FFT: Multi-path delay feedback

20 FFT: Twiddle Factors 1 2 3 4 5 6 7 8

21 Shift-add for constant multiplier FFT: Twiddle Factors

22 Shift-add for constant multiplier (64-P FFT) FFT: Twiddle Factors “A 1-GS/s FFT/IFFT processor for UWB applications”, IEEE Journal of Solid-State Circuits. 2005, 40(8): 1726 - 1735

23 MIMO

24 Transmit Antennas Receive Antennas SISO The Radio Channel MISO Single Input Single Output Multiple Input Single Output (Transmit diversity) Receive Antennas Transmit Antennas MIMO The Radio Channel SIMO Single Input Multiple Output (Receive diversity) Multiple Input Multiple Output (Multiple data streams) Multiple Antenna System, MIMO

25 At least as many receivers as transmitted streams Spatial separation at both transmit and receive antennas Improve transmission throughput or reliability SISO A MISO A A Interference R L MIMO! R L R Interference! L Understanding MIMO via Audio

26 Tx Data Rx r = Hs + n S/P H = h 11 h 12 …….. h 1N h 21 h 22 …….. h 2N h M1 h M2 …….. h MN.. ……... N M h ij models fading gain between the j th transmit and i th receive antenna s r Transmitted vector Received vector MIMO System Model

27 Tx Data Rx high-complexity Signal processing r = Hs + n s ^ S/P Recover the transmitted signal s from the received signal r, which contains interference and noise. Interference Cancellation! R L MIMO Signal Processing - Receiver

28 Tx Data Rx Receiver Signal Processing Symbol Detection Channel Estimation Matrix Manipulation r = Hs + n H ^ H -1 ^ s = H -1 r ^ ^ S/P MIMO Signal Processing - Receiver Channel estimation: obtain the channel status by training signals Matrix manipulation: matrix inversion or decomposition depending on detection algorithm Symbol detection: estimate transmitted signal s given channel matrix H and received signal r

29 Linear Detection: zero-forcing detection Maximum Likelihood (ML) Detection: exhaustive search 1.6×10 7 points per vector detection for 64-QAM, 4×4 MIMO MIMO Signal Detection

30 WLAN 802.11n Example Modulation 256QAM; 4 Tx antennas; 108 sub-channels, 4  s per OFDM symbol ML detection 1.159 x 10 17 points/sec Current DSP technology is 1G inst/sec 10 8 processors! OR (“Moores Law”..... processor capability doubles every 18 months) MUST WAIT 40years ! Mike Faulkner 2005, Victoria Univ. Today… Intel i7 CPU: 10 11 inst/sec Tx Data Rx Receiver Signal Processing ML Detection Channel Estimation Matrix Manipulation H ^ s ^ S/P High Complexity ML Detection

31 ML Detection Sphere Detection Simplified 2D-case Limited search space  reduced complexity Sphere Decoding: Algorithm level optimization

32 Sphere Decoding: complexity reduced Chia-Hsiang Yang, University of California, Los Angeles, 2007 Detection PerformanceComputational Complexity Near optimal ML performance with significantly reduced computational complexity (# search points) 4  4 array 16-QAM Total # search points=65536 Near ML detection with 0.1% computational complexity

33 QR-Decomposition: R upper triangular matrix K-Best Detection, e.g., K=2 Sphere Decoding: tree-search

34 Design [1]Design [2]Design [3] Gate Counter50K91K491K Throuput136Mbps269Mbps1100Mbps Sphere Decoding: VLSI

35 QR-Decomposition: –R upper triangular matrix, Q unitary matrix Q => Q H Q = I –Complexity in the order of N 3 multiplications Tx Data Rx Receiver Signal Processing ML Detection Channel Estimation Matrix Manipulation H ^ s ^ S/P QRD (TCAS1-11)MIMOD (ISSCC-09) Gate Count111K114K Throughput (SC/s)12.5M28.125M Energy (nJ/SC/s)12.765.37 QR Decomposition

36 Q 4 Q 3 Q 2 Q 1 H HQ1HQ1H Q2Q1HQ2Q1H Q 3 Q 2 Q 1 H Rotate by multiplying with Q1: Move all energy of first column into first element Repeat until H becomes upper triangular. QRD Methods Given’s Rotation –Works with small 2x2 rotations matrices –Suitable for small MIMO systems Gram Schmidt –Operates on whole columns –Stability problems in correlated channels Householder Transform –Column-wise operation –Higher stability in correlated channels

37 Householder transformation matrix H is an orthogonal matrix Given vector we chose v such that Householder Transform

38 Householder Transform: Example

39 Q 4 Q 3 Q 2 Q 1 H HQ1HQ1H Q2Q1HQ2Q1H Q 3 Q 2 Q 1 H Col 1 op Col2 op Col3 op Col4 op H4... H3... H2.... H1 Input matrices Col 1 op Col2 op Col3 op Col4 op H4... H3... H2 Input matrices Col 1 op Col2 op Col3 op Col4 op H4... H3 Input matrices Col 1 op Col2 op Col3 op Col4 op H4 Input matrices Col 1 op Col2 op Col3 op Col4 op Input matrices Householder Transform: VLSI Systolic array –Suitable for pipelined data flow for example column wise matrix operations

40 Anything we can do at the transmitter?

41 If the receiver is not positioned directly between the speakers the received streams will be at different levels Equalizing at the receiver side? Like ZF, noise enhancement Balance at the transmitter side Require accurate channel information at the transmitter! R L L + N L, 0.5 R + N R R L L + N L, 2(0.5 R + N R ) 2R2R L L + N L, R + N R Pre-code MIMO Precoder: understanding via audio

42 Tx Data Rx r = Hs + n S/P Zero-forcing pre-coder Tx Data Rx r = Hx + n S/P pre-coding x s Low-Complexity Receiver Is this the best solution? MIMO Precoder

43 The corresponding processor for MIMO?

44 Digital Signal Processors for MIMO Flynn’s Taxonomy –Single instruction stream, single data stream (SISD) –Single instruction stream, multiple data streams (SIMD) –Multiple instruction streams, single data stream (MISD) –Multiple instruction streams, multiple data streams (MIMD)

45 VLIW+SIMD VLIW (very long instruction word) –Instruction-level parallelism, e.g., streaming of data SIMD (single instruction multiple data) –Data-level parallelism, e.g., vectors

46 MIMO Processor Examples: Phillips EVP Vector Processor for LTE (2007)

47 BBE64-128 Massive Parallel for LTE-A 4-issue VLIW 3 SIMD ALU, arithmetic, logic, shift 2 SIMD MAC, each 64 18-bit parallel opt. MIMO Processor Examples: Tensilica ConnX Baseband Engine (2011)

48 MIMO Processor Examples: EIT Vector Processor for LTE-A (2013)

49 What is Next?

50 Goes to ‘LARGE’ Dimension?

51 Goes to ‘LARGE’ Dimension Further Scaling Up? Size limitation in terminals Power consumption in portable devices CellularAntennaWLANAntenna HSPA+2×2802.11n4×4 LTE4×4802.11ac8×4 LTE-A8×8802.11?16×16

52 How many antennas can we have? 2 in phone 16 in laptop [1] 24 in 80mm×80mm×80mm [2] [1] J. B. Andersen, J. O. Nielsen, G. F. Pedersen, and G. Bauch, Channel characteristics of an indoor multiuser environment, COST 2100 TD(07) 009, 2007/Febr/26-28. [2] C-Y. Chiu, J-B. Yan, and R. D. Murch, 24-port and 36-port antenna cubes suitable for MIMO wireless communications, IEEE Trans. on Antennas and Propagation, vol. 56, no. 4, pp. 1170-1176, April 2008.

53 Massive MIMO or Very-Large MIMO We think of very-large MIMO (multi-user) system We mean M>>K>>1 We are looking for M>100 antennas! We serve 10-20 users concurrently F. Rusek, D. Persson, B. K. Lau, E.G. Larsson, T.L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling Up MIMO: Opportunities and Challenges with Very Large Arrays”, IEEE Signal Processing Magazine, Jan. 2013 Several orders of magnitude more transmitted- energy efficient and spectrum efficient!

54 Very-Large Antenna Array Figure from Erik G. Larsson “Massive MIMO: Fundamentals, Opportunities and Challenges” 128 antenna array in Lund

55 Simplified processing block diagram 1st ver.: LTE-like OFDM-based TRX

56 System architecture

57 System components - SDR USRP RIO 2953R (Universal Software Radio Peripheral 2 RF chains –Center frequency from 1.2 to 6 GHz –40MHz BW Xilinx Kintex-7 FPGA –410K logic cell –5Mb RAM ~800 MBps bidirectional data streaming

58 BS hardware setup: side-view Up to 800 MB/s/direction 2.8 GB/s bidirectional Time and Freq. synchonization network PXIe-8389 PXIe-8374 … PXIe-8389 PXIe-8374 … PXIe-8389 PXIe-8374 … PXIe-8389 PXIe-6674T PXIe-8384 PXIe-7976 PXIe-8153

59 LuMaMi – the Lund University Massive MIMO testbed World first 100 coherent RF chains Flexible architecture based on NI platform and Xilinx FPGA Supports 10 simultaneous single antenna users in the same time- frequency resource block Real time operation in the 3.7 GHz band, 20 MHz bandwidth 300kg and 2.5kW...

60 Benefits by Scaling up M “Simplified” signal processing –Linear processing is powerful enough when M>>K »MMSE, Zero-Forcing and Matching Filter 10 4 multiplications for M=100 and K=10!

61 Low-compleixty linear pre-coding

62

63 Possible Hardware Platform Design Challenge –Massive parallel processing; High data traffic; High memory bandwidth Interconnection: Bottleneck for real massive parallelism –Global interconnects can be millimeters long –Routing becomes challenging 3D Integration –Increase the number of “nearest neighbors” seen by each processor –Power and delay reduction by short vertical interconnect

64 Conclusions Digital signal processing is evolving at fast pace with new wireless technologies and applications “Optimal” DSP implementation is crucial to bring new DSP algorithm into practice “Best” design achieves balanced trade-offs depending on application requirements Optimization at earlier stages and do cross-layer (or co-) optimization Keep tracking new technologies for both algorithm and implementation

65 Thanks


Download ppt "DSP Design in Wireless Communication LIANG LIU AND FREDRIK EDMAN,"

Similar presentations


Ads by Google