Download presentation
Presentation is loading. Please wait.
Published byPierce Hardy Modified over 10 years ago
1
DSP Design in Wireless Communication LIANG LIU AND FREDRIK EDMAN, LIANG.LIU@EIT.LTH.SE
2
The Evolving Wireless Scene Range Data Rate 1m 10m 100m 1km 10km 1Kb 10Kb 100Kb 1Mb 10Mb 100Mb Cellular (WAN) 3G Cellular 2.5 G Cellular 802.11 (LAN) 802.1a Bluetooth (PAN) Sensor networks More bit/sec More bit/($·nJ) Courtesy: Prof. Jan Rabaey, BWRC
3
Evolution of High-Speed Wireless CDMAOFDMMIMO Massive MIMO
4
Bandwidth Modulation MIMO Same in other communication systems
5
Algorithms beats Moore beats Chemists Algorithmic Complexity Processor Performance (~Moore’s Law) Battery Capacity 1G 2G 3G Courtesy: Ravi Subramanian (Morphics)
6
Design Considerations Flexible High speed Reliable Small area Low power Low price Time to market
7
Digital Hardware Platforms Efficiency Flexibility ASIC GPP DSP GPU FPGA ASIP
8
Optimizing DSP Implementation Optimization Algorithm Architecture Module Circuit
9
CDMA
10
CDMA: Code Division Multiple Access “A UMTS Baseband Receiver Chip for Infrastructure Applications”, Texas Instruments, S. Sriram et al.
11
Accumulator
12
UMTS filter in receiver system ant RF ADC Inband Outband Reconfig. UMTS filter De-scramble & despread demod data desired signal “Architectural Optimization for Low Power in a Reconfigurable UMTS Filter”, in Proceedings of Wireless Personal Multimedia Communications Symposium (WPMC), Deepak Dasalukunte et al. San Diego, USA 33dB
13
Adaptive UMTS Filter minimum 33dB stop band attenuation (3GPP specification) required filter length of 65 taps In a Bad Channel DDD D...but ONLY 5 is needed in a Good one!
14
OFDM
15
OFDM: Orthogonal Frequency Division Multiplexing N-point IDFT Parallel to serial CP OFDM
16
FFT/IFFT in OFDM Systems Large number of subcarriers large FFT, O(Nlog 2 N) OFDM: DVB-2/4/8k FFT WLAN IEEE802.11a/g-64 FFT (48+4 subcarriers) LTE – Long Term Evolution: 2k FFT
17
Folding Pipeline Parallel + FFT: VLSI
18
ButtflyFIFOMult Multi-path delay commutator 50% Signle-path delay feedback 50%100%50% FFT: VLSI
19
High-throughput Featurenumber Technology0.13-μm Area1.44mm2 Throughput1GS/s Power39.6mW@4 09MS/s “A 1-GS/s FFT/IFFT processor for UWB applications”, IEEE Journal of Solid-State Circuits. 2005, 40(8): 1726 - 1735 FFT: Multi-path delay feedback
20
FFT: Twiddle Factors 1 2 3 4 5 6 7 8
21
Shift-add for constant multiplier FFT: Twiddle Factors
22
Shift-add for constant multiplier (64-P FFT) FFT: Twiddle Factors “A 1-GS/s FFT/IFFT processor for UWB applications”, IEEE Journal of Solid-State Circuits. 2005, 40(8): 1726 - 1735
23
MIMO
24
Transmit Antennas Receive Antennas SISO The Radio Channel MISO Single Input Single Output Multiple Input Single Output (Transmit diversity) Receive Antennas Transmit Antennas MIMO The Radio Channel SIMO Single Input Multiple Output (Receive diversity) Multiple Input Multiple Output (Multiple data streams) Multiple Antenna System, MIMO
25
At least as many receivers as transmitted streams Spatial separation at both transmit and receive antennas Improve transmission throughput or reliability SISO A MISO A A Interference R L MIMO! R L R Interference! L Understanding MIMO via Audio
26
Tx Data Rx r = Hs + n S/P H = h 11 h 12 …….. h 1N h 21 h 22 …….. h 2N h M1 h M2 …….. h MN.. ……... N M h ij models fading gain between the j th transmit and i th receive antenna s r Transmitted vector Received vector MIMO System Model
27
Tx Data Rx high-complexity Signal processing r = Hs + n s ^ S/P Recover the transmitted signal s from the received signal r, which contains interference and noise. Interference Cancellation! R L MIMO Signal Processing - Receiver
28
Tx Data Rx Receiver Signal Processing Symbol Detection Channel Estimation Matrix Manipulation r = Hs + n H ^ H -1 ^ s = H -1 r ^ ^ S/P MIMO Signal Processing - Receiver Channel estimation: obtain the channel status by training signals Matrix manipulation: matrix inversion or decomposition depending on detection algorithm Symbol detection: estimate transmitted signal s given channel matrix H and received signal r
29
Linear Detection: zero-forcing detection Maximum Likelihood (ML) Detection: exhaustive search 1.6×10 7 points per vector detection for 64-QAM, 4×4 MIMO MIMO Signal Detection
30
WLAN 802.11n Example Modulation 256QAM; 4 Tx antennas; 108 sub-channels, 4 s per OFDM symbol ML detection 1.159 x 10 17 points/sec Current DSP technology is 1G inst/sec 10 8 processors! OR (“Moores Law”..... processor capability doubles every 18 months) MUST WAIT 40years ! Mike Faulkner 2005, Victoria Univ. Today… Intel i7 CPU: 10 11 inst/sec Tx Data Rx Receiver Signal Processing ML Detection Channel Estimation Matrix Manipulation H ^ s ^ S/P High Complexity ML Detection
31
ML Detection Sphere Detection Simplified 2D-case Limited search space reduced complexity Sphere Decoding: Algorithm level optimization
32
Sphere Decoding: complexity reduced Chia-Hsiang Yang, University of California, Los Angeles, 2007 Detection PerformanceComputational Complexity Near optimal ML performance with significantly reduced computational complexity (# search points) 4 4 array 16-QAM Total # search points=65536 Near ML detection with 0.1% computational complexity
33
QR-Decomposition: R upper triangular matrix K-Best Detection, e.g., K=2 Sphere Decoding: tree-search
34
Design [1]Design [2]Design [3] Gate Counter50K91K491K Throuput136Mbps269Mbps1100Mbps Sphere Decoding: VLSI
35
QR-Decomposition: –R upper triangular matrix, Q unitary matrix Q => Q H Q = I –Complexity in the order of N 3 multiplications Tx Data Rx Receiver Signal Processing ML Detection Channel Estimation Matrix Manipulation H ^ s ^ S/P QRD (TCAS1-11)MIMOD (ISSCC-09) Gate Count111K114K Throughput (SC/s)12.5M28.125M Energy (nJ/SC/s)12.765.37 QR Decomposition
36
Q 4 Q 3 Q 2 Q 1 H HQ1HQ1H Q2Q1HQ2Q1H Q 3 Q 2 Q 1 H Rotate by multiplying with Q1: Move all energy of first column into first element Repeat until H becomes upper triangular. QRD Methods Given’s Rotation –Works with small 2x2 rotations matrices –Suitable for small MIMO systems Gram Schmidt –Operates on whole columns –Stability problems in correlated channels Householder Transform –Column-wise operation –Higher stability in correlated channels
37
Householder transformation matrix H is an orthogonal matrix Given vector we chose v such that Householder Transform
38
Householder Transform: Example
39
Q 4 Q 3 Q 2 Q 1 H HQ1HQ1H Q2Q1HQ2Q1H Q 3 Q 2 Q 1 H Col 1 op Col2 op Col3 op Col4 op H4... H3... H2.... H1 Input matrices Col 1 op Col2 op Col3 op Col4 op H4... H3... H2 Input matrices Col 1 op Col2 op Col3 op Col4 op H4... H3 Input matrices Col 1 op Col2 op Col3 op Col4 op H4 Input matrices Col 1 op Col2 op Col3 op Col4 op Input matrices Householder Transform: VLSI Systolic array –Suitable for pipelined data flow for example column wise matrix operations
40
Anything we can do at the transmitter?
41
If the receiver is not positioned directly between the speakers the received streams will be at different levels Equalizing at the receiver side? Like ZF, noise enhancement Balance at the transmitter side Require accurate channel information at the transmitter! R L L + N L, 0.5 R + N R R L L + N L, 2(0.5 R + N R ) 2R2R L L + N L, R + N R Pre-code MIMO Precoder: understanding via audio
42
Tx Data Rx r = Hs + n S/P Zero-forcing pre-coder Tx Data Rx r = Hx + n S/P pre-coding x s Low-Complexity Receiver Is this the best solution? MIMO Precoder
43
The corresponding processor for MIMO?
44
Digital Signal Processors for MIMO Flynn’s Taxonomy –Single instruction stream, single data stream (SISD) –Single instruction stream, multiple data streams (SIMD) –Multiple instruction streams, single data stream (MISD) –Multiple instruction streams, multiple data streams (MIMD)
45
VLIW+SIMD VLIW (very long instruction word) –Instruction-level parallelism, e.g., streaming of data SIMD (single instruction multiple data) –Data-level parallelism, e.g., vectors
46
MIMO Processor Examples: Phillips EVP Vector Processor for LTE (2007)
47
BBE64-128 Massive Parallel for LTE-A 4-issue VLIW 3 SIMD ALU, arithmetic, logic, shift 2 SIMD MAC, each 64 18-bit parallel opt. MIMO Processor Examples: Tensilica ConnX Baseband Engine (2011)
48
MIMO Processor Examples: EIT Vector Processor for LTE-A (2013)
49
What is Next?
50
Goes to ‘LARGE’ Dimension?
51
Goes to ‘LARGE’ Dimension Further Scaling Up? Size limitation in terminals Power consumption in portable devices CellularAntennaWLANAntenna HSPA+2×2802.11n4×4 LTE4×4802.11ac8×4 LTE-A8×8802.11?16×16
52
How many antennas can we have? 2 in phone 16 in laptop [1] 24 in 80mm×80mm×80mm [2] [1] J. B. Andersen, J. O. Nielsen, G. F. Pedersen, and G. Bauch, Channel characteristics of an indoor multiuser environment, COST 2100 TD(07) 009, 2007/Febr/26-28. [2] C-Y. Chiu, J-B. Yan, and R. D. Murch, 24-port and 36-port antenna cubes suitable for MIMO wireless communications, IEEE Trans. on Antennas and Propagation, vol. 56, no. 4, pp. 1170-1176, April 2008.
53
Massive MIMO or Very-Large MIMO We think of very-large MIMO (multi-user) system We mean M>>K>>1 We are looking for M>100 antennas! We serve 10-20 users concurrently F. Rusek, D. Persson, B. K. Lau, E.G. Larsson, T.L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling Up MIMO: Opportunities and Challenges with Very Large Arrays”, IEEE Signal Processing Magazine, Jan. 2013 Several orders of magnitude more transmitted- energy efficient and spectrum efficient!
54
Very-Large Antenna Array Figure from Erik G. Larsson “Massive MIMO: Fundamentals, Opportunities and Challenges” 128 antenna array in Lund
55
Simplified processing block diagram 1st ver.: LTE-like OFDM-based TRX
56
System architecture
57
System components - SDR USRP RIO 2953R (Universal Software Radio Peripheral 2 RF chains –Center frequency from 1.2 to 6 GHz –40MHz BW Xilinx Kintex-7 FPGA –410K logic cell –5Mb RAM ~800 MBps bidirectional data streaming
58
BS hardware setup: side-view Up to 800 MB/s/direction 2.8 GB/s bidirectional Time and Freq. synchonization network PXIe-8389 PXIe-8374 … PXIe-8389 PXIe-8374 … PXIe-8389 PXIe-8374 … PXIe-8389 PXIe-6674T PXIe-8384 PXIe-7976 PXIe-8153
59
LuMaMi – the Lund University Massive MIMO testbed World first 100 coherent RF chains Flexible architecture based on NI platform and Xilinx FPGA Supports 10 simultaneous single antenna users in the same time- frequency resource block Real time operation in the 3.7 GHz band, 20 MHz bandwidth 300kg and 2.5kW...
60
Benefits by Scaling up M “Simplified” signal processing –Linear processing is powerful enough when M>>K »MMSE, Zero-Forcing and Matching Filter 10 4 multiplications for M=100 and K=10!
61
Low-compleixty linear pre-coding
63
Possible Hardware Platform Design Challenge –Massive parallel processing; High data traffic; High memory bandwidth Interconnection: Bottleneck for real massive parallelism –Global interconnects can be millimeters long –Routing becomes challenging 3D Integration –Increase the number of “nearest neighbors” seen by each processor –Power and delay reduction by short vertical interconnect
64
Conclusions Digital signal processing is evolving at fast pace with new wireless technologies and applications “Optimal” DSP implementation is crucial to bring new DSP algorithm into practice “Best” design achieves balanced trade-offs depending on application requirements Optimization at earlier stages and do cross-layer (or co-) optimization Keep tracking new technologies for both algorithm and implementation
65
Thanks
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.