Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Defined Radio – A High Performance Embedded Challenge Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and 1 Krisztian.

Similar presentations


Presentation on theme: "Software Defined Radio – A High Performance Embedded Challenge Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and 1 Krisztian."— Presentation transcript:

1 Software Defined Radio – A High Performance Embedded Challenge Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and 1 Krisztian Flautner University of Michigan 1 ARM Ltd

2 Advanced Computer Architecture Laboratory University of Michigan2 Contents Software defined radio Categories of wireless networks Core technologies for future networks Case study : W-CDMA Network  Major algorithms  Workload characterization  Architectural implications

3 Software Defined Radio

4 Advanced Computer Architecture Laboratory University of Michigan4 Wireless Communication System Upper Protocol Layers Physical Layer (PHY) Application bits Baseband Processing Analog Front-end Packets “Air” MAC LINK Network Transport PPP IP TCP/UDP

5 Advanced Computer Architecture Laboratory University of Michigan5 Anatomy of Cellular Phone

6 Advanced Computer Architecture Laboratory University of Michigan6 Audio AMR/QCELP PHY MAC Protocol on Wireless Platform Upper layers Physical layer LINK Network Transport ASIC (Hardware) GPP (Software) Video MPEG GPP (Software) DSP/ Accelerator Source coding Application Processor Baseband Processor

7 Advanced Computer Architecture Laboratory University of Michigan7 Software Defined Radio (SDR) Use software routines instead of ASICs for the physical layer operations of wireless communication system ASICs (PHY) Programmable Hardware Software Routines Both Analog Frontend and Digital Baseband are the scope of SDR

8 Advanced Computer Architecture Laboratory University of Michigan8 Levels of SDR TierNameDescription Tier 0 Hardware Radio (HR) Implemented using hardware components. Cannot be modified Tier 1 Software Controlled Radio (SCR) Only control functions are implemented in software: inter-connects, power levels, etc. Tier 2 Software Defined Radio (SDR) Software control of a variety of modulation techniques, wide-band or narrow-band operation, security functions, etc. Tier 3 Ideal Software Radio (ISR) Programmability extends to the entire system with analog conversion only at the antenna. Tier 4 Ultimate Software Radio (USR) Defined for comparison purposes only

9 Advanced Computer Architecture Laboratory University of Michigan9 Why we need SDR ? Seamless wireless connection – End User  Widely different wireless protocols TDMA : GSM, AMPS CDMA : IS-95, cdma2000, W-CDMA, IEEE 802.11b OFDM : IEEE 802.11a/g/n, WiMAX  Needs a terminal that can support multiple wireless protocols Easy infrastructure upgrade – Service Provider  Wireless protocols evolve continuously Ex) W-CDMA  W-CDMA + HSDPA Time to market – Manufacturer  Reduce hardware development time and cost

10 Advanced Computer Architecture Laboratory University of Michigan10 Where can we use SDR ? Basestations  Weak constraints on power and area  Support several hundred subscribers  Will be commercialized first Wireless terminals  Tight constraints on power and area.  Will be commercialized next

11 Advanced Computer Architecture Laboratory University of Michigan11 Why SDR is challenging ? Analog Frontend  Must be tunable across a range of carrier frequencies and bandwidths. Digital Baseband  Super computer level computation power. > 50 Gops per subscriber  Tight power budget. 200 ~ 300 mW (@terminal)  High level of programmability. Combination of heterogeneous signal processing algorithms.

12 Advanced Computer Architecture Laboratory University of Michigan12 Our Strategy Performance  Exploit the parallelism in signal processing and forward error correction (FEC) algorithms Power  Limit the programmability to minimize power consumption.  Minimize both active and idle mode power consumption There exists trade off between power efficiency and programmability

13 Categories of Wireless Networks

14 Advanced Computer Architecture Laboratory University of Michigan14 Categories of Wireless Networks

15 Advanced Computer Architecture Laboratory University of Michigan15 WWAN (Wireless Wide Area Network)

16 Advanced Computer Architecture Laboratory University of Michigan16 WLAN / WMAN WMAN : Wireless Metro Area Network For last mile problem 802.16d : Fixed WiMax 802.16e : Mobile WiMax WLAN : Wireless Local Area Network High data rate Poor mobility support

17 Advanced Computer Architecture Laboratory University of Michigan17 WPAN (Wireless Personal Area Network) Interconnecting personal devices

18 Core technologies of future networks

19 Advanced Computer Architecture Laboratory University of Michigan19 OFDM (Orthogonal Frequency Division Multiplexing) Transmit signal over several sub-carriers. Frequency spectrum of sub-carriers are overlapped. (High spectral efficiency) Highly susceptible to frequency error in receiver.

20 Advanced Computer Architecture Laboratory University of Michigan20 Major Computation in OFDM system FFT / IFFT  N = 64 : IEEE 802.11a  N = 256~2048 : IEEE 802.16 WiMax  Data precision : 12~16bits Amount of computations for OFDM operation  ~ 10 8 complex multiplications / sec

21 Advanced Computer Architecture Laboratory University of Michigan21 MIMO (Multiple Input Multiple Output) Use multiple antennas for signal transmission and reception In ideal case, linearly increase channel capacity Can effectively compensate multipath fading effect Significantly increase receiver complexity Channel Capacity C = W log 2 (1+SNR) Channel Capacity C = min(n, m) * W log 2 (1+SNR)

22 Advanced Computer Architecture Laboratory University of Michigan22 Computation in MIMO receiver Amount of computation in MIMO receiver  M : # of Tx/Rx antenna  L T : Length of preamble  L P : Length of payload 4 Tx/Rx antenna, 100 Mbps, 64 QAM, ½ coding rate  ~ 6 x 10 8 Computations / Sec

23 Advanced Computer Architecture Laboratory University of Michigan23 LDPC code Low Density Parity Check (LDPC) code  Turbo code like coding gain with lower implementation cost. Encoding  Matrix multiplication, c = xG  G (Generator matrix) is large matrix. (e.g. 4K X 4K matrix) Decoding  Equivalent to find most probable vector x such that Hx mod 2 = 0.  H (Parity check matrix) is large sparse matrix. Implementation  There exist trade-off between coding gain and implementation complexity

24 Advanced Computer Architecture Laboratory University of Michigan24 Hybrid ARQ Reuse error frames for the decoding of retransmitted frame Require huge buffer space

25 Case Study : W-CDMA system

26 Major Algorithms

27 Advanced Computer Architecture Laboratory University of Michigan27 Physical layer of W-CDMA Error Correction Overcome severe error in short time interval Assign signal waveform optimal for data transmission Suppress the signal term in outside of stop band

28 Advanced Computer Architecture Laboratory University of Michigan28 Channel Encoder/Decoder Encoder  Add systematic redundancy on source data Decoder  Fix errors on received data with the systematic redundancy information generated by encoder W-CDMA system uses  Convolutional code (for short voice and control message)  Turbo code (for video stream and high speed packet data)

29 Advanced Computer Architecture Laboratory University of Michigan29 Channel Encoder Consists of flip-flops and exclusive OR gates Has negligible impact on workload Output 0 G 0 = 561 (octal) Input DDDDDDDD Output 1 G 1 = 753 (octal)

30 Advanced Computer Architecture Laboratory University of Michigan30 Channel Decoder Determine maximally probable code sequence from the received sequence. Select C having minimum distance with received sequence r One of dominant workload C1C1 C2C2 CNCN r d1d1 d2d2 dNdN...... - {c i } : code set - r : received signal

31 Advanced Computer Architecture Laboratory University of Michigan31 Channel Decoder – Viterbi Algorithm Most popular decoding algorithm of convolutional code Consists of three steps:  Branch metric calculation (BMC) abs(a-b), Parallelizable  Add compare select (ACS) min(a+b, c+d), Parallelizable  Trace back (TB) Recursive pointer tracing, Sequential Amount of operation in W-CDMA  16Kbps voice : ~2Gops

32 Advanced Computer Architecture Laboratory University of Michigan32 Channel Decoder –Turbo decoder Two algorithms are widely used  SOVA (Soft Output Viterbi Algorithm) Less computation intensive Lower error correction performance  Max-LogMap algorithm More computation required Higher error correction performance Amount of operation in W-CDMA  For 128 Kbps streaming data : ~18 Gops

33 Advanced Computer Architecture Laboratory University of Michigan33 Turbo Decoder Based on the multiple iteration of SOVA / Max-LogMap blocks. More iterations show better performance.

34 Advanced Computer Architecture Laboratory University of Michigan34 Block Interleaver/Deinterleaver Overcome severe signal attenuation within short time interval which frequently appears at wireless channel. Interleaver (@transmitter):  Randomize the sequence of source data. Deinterleaver (@receiver):  Recover original sequence by reordering. Amount of operation : < 10 Mops 123456789 InterleavingDeinterleaving  147258369  123456789  147258369

35 Advanced Computer Architecture Laboratory University of Michigan35 Spreader/Despreader Allow the transmission of several signals at the same time. (x[n] and y[n] in the below diagram) It is based on the orthogonality between spreading codes

36 Advanced Computer Architecture Laboratory University of Michigan36 Spreader/Despreader Spreader / Despreader also suppress noise Amount of operation : ~4 Gops

37 Advanced Computer Architecture Laboratory University of Michigan37 Scrambler/Descrambler Randomize the output signal by multiplying pseudo random sequence so called scrambling code. Allow multiple terminals to communicate at the same time. Amount of operation : ~ 3 Gops Terminal 1, with scrambling code n Terminal 2, with scrambling code m

38 Advanced Computer Architecture Laboratory University of Michigan38 Low Pass Filter Suppress the signal terms at the outside of stop band frequency. Filtering Time domain Freq. domain Impulse signal sinc function Band limited signal Band unlimited signal

39 Advanced Computer Architecture Laboratory University of Michigan39 Low Pass Filter Use conventional FIR filter Number of filter tap (N) = 32 ~ 64 Amount of operation : ~ 12 Gops

40 Advanced Computer Architecture Laboratory University of Michigan40 Rake Receiver – Multipath fading Rake receiver mitigates multipath fading effect Multipath fading is a major cause of unreliable wireless channel characteristic x(t) y(t) = a 0 x(t)y(t) = a 0 x(t)+a 1 x(t-d 1 )y(t) = a 0 x(t)+a 1 x(t-d 1 )+a 2 x(t-d 2 )

41 Advanced Computer Architecture Laboratory University of Michigan41 Rake Receiver - Functions Ideally the function of rake receiver is to aggregate the signal terms with proper delay compensation y(t) = a 0 x(t)+a 1 x(t-d 1 )+a 2 x(t-d 2 ) r(t) = a 0 x(t-t dealy )+a 1 x(t-d 1 -d est1 )+a 2 x(t-d 2 -d est2 ) = (a 0 +a 1 +a 2 ) * x(t-t delay ) Rake receiver We need to know delay spread of received signal that randomly varies

42 Advanced Computer Architecture Laboratory University of Michigan42 Rake Receiver – Detect Delay Spread Scan the received signal in frame buffer while computing correlation with scrambling code sequence. Received signal Correlation window Correlation Result a0a0 a1a1 a2a2 0 d1d1 d2d2

43 Advanced Computer Architecture Laboratory University of Michigan43 Computation of Rake Receiver Correlation computation : L W L B F  L W : Correlation window = 320  L B : Frame buffer size = 5120  F : Operation Frequency = 50  ~ 80 Mega Multiplications / sec  Multiplications can be converted into subtraction Amount of operation in W-CDMA : ~25 Gops Most dominant workload

44 Advanced Computer Architecture Laboratory University of Michigan44 Rake Receiver – Overall Architecture Detects delay spread Compensates propagation delay recombine signal terms without delay

45 Advanced Computer Architecture Laboratory University of Michigan45 Power Control Receiver controls the transmission power of transmitter in order to minimize the interference to other users. Required computation is negligible TerminalBasestation Refrence level uduuddu Strength of pilot signal is below the reference level Terminal sends UP command Strength of pilot signal is above the reference level Terminal sends DOWN command : Pilot Signal u : Power Control Command

46 Advanced Computer Architecture Laboratory University of Michigan46 H/W operation states Radio resource control state defined in W-CDMA specification operation states defined according to H/W activity Idle Control Hold Active For long idle period between sessions Periodic wake up for control message reception Minimum workload but dominate terminal standby time For short idle period between packet burst Hold narrow control channel for fast transition to Active Intermediate workload For packet burst transmission period Use high speed packet channels up to 2Mbps Most heavily loaded state

47 Workload Characterization

48 Advanced Computer Architecture Laboratory University of Michigan48 Workload Profile One operation is equivalent to one RISC instruction Searcher, Turbo decoder, and LPF are dominant workloads Workload profile varies according to operation state

49 Advanced Computer Architecture Laboratory University of Michigan49 Processing Time Requirement Mixture of algorithms with various processing time requirements Classified into two categories  Heavy workload with long processing time (turbo decoder, searcher)  Light workload with short processing time (Scrambler, spreader, LPF, Power control)

50 Advanced Computer Architecture Laboratory University of Michigan50 Parallelism Most heavy workload algorithms have significant vector parallelism Data width of most operation is 8 bit

51 Advanced Computer Architecture Laboratory University of Michigan51 Memory Access Pattern Huge memory is not required Traffic between algorithm is not dominant Access rate of scratch pad memory is very high.

52 Advanced Computer Architecture Laboratory University of Michigan52 Instruction Breakdown ADD/SUB are dominant instruction Multiplication is not dominant in heavy workloads

53 Advanced Computer Architecture Laboratory University of Michigan53 Frequent Computations Most multiplications are simplified into cheaper operations Multiplication in LPF-Rx can not be simplified because both operands are 16bit integer number.

54 Architectural Implications

55 Advanced Computer Architecture Laboratory University of Michigan55 Architectural Implications SIMD because  We can exploit vector parallelism in W-CDMA algorithms  Highly power efficiency can be achieved by sharing control logic between datapath elements. Chip multiprocessor because  There exist substantial algorithm level parallelism  There exist many tiny sequential algorithms  Multiple SIMD + Scalar SIMD …. Scalar Interconnection Network

56 Advanced Computer Architecture Laboratory University of Michigan56 Architectural Implications Memory structure  Cache free Memory access pattern exhibits very dense spatial locality.  Small data memory (<64K)  Small instruction memory (<4K) Simple interconnection network  Low inter-processor communication is possible by algorithm level task mapping on each PE.

57 Advanced Computer Architecture Laboratory University of Michigan57 Architectural Implication Power management  Large workload variation according to operation state and radio channel condition change.  Various power management schemes can be applied DVS, DFS, Clock gating.  Idle mode power must be minimized because it dominates terminal standby time.

58 Advanced Computer Architecture Laboratory University of Michigan58 W-CDMA benchmark suite C based implementation of W-CDMA physical layer operation. Used for the workload characterization done in this paper. Available at  www.eecs.umich.edu/~sdrg

59 Advanced Computer Architecture Laboratory University of Michigan59 Conclusion We discussed :  what is SDR and why it is challenging topic for embedded system.  the evolution history of wireless protocols and what are the core technologies of emerging protocols. We analyzed :  the workload characteristic of W-CDMA protocol and its architectural implication.

60 Backup Slides

61 Advanced Computer Architecture Laboratory University of Michigan61 Viterbi Algorithms –Trellis Diagram Viterbi algorithm is based on trellis diagram. Trellis diagram represents all possible state transition of encoder.

62 Advanced Computer Architecture Laboratory University of Michigan62 Viterbi Algorithm - BMC BMC (Branch metric calculation) operation is to compute difference between the received sequence r and outputs of trellis diagram. BMC i,j = distance(r ij, o ij )=abs(r ij, o ij ) o ij : output of state transition form i to j r ij : corresponding received sequence All BMC operation in a trellis diagram can be done in parallel. distance between r(01) and C n (10) = 1 + 1 = 2 CnCn

63 Advanced Computer Architecture Laboratory University of Michigan63 Viterbi Algorithm - ACS ACS(Add Compare Select) operation is: This procedure is equivalent to finding a local optimal code sequence. If C 1 has smallest ACS value at node state i, then the ACS values of C 2 and C 3 are always greater than that of C 1 Add Compare, Select

64 Advanced Computer Architecture Laboratory University of Michigan64 Viterbi Algorithm - TB Trace back a code sequence which is most close to the received sequence Sequential algorithm

65 Advanced Computer Architecture Laboratory University of Michigan65 Block Interleaver/Deinterleaver Interleaver  Write row by row sequentially  read column by column according to the predefined permutation pattern Deinterlever  Write column by column according to the predefined permutation pattern  read row by row sequentially


Download ppt "Software Defined Radio – A High Performance Embedded Challenge Hyunseok Lee, Yuan Lin, Yoav Harel, Mark Woh, Scott Mahlke, Trevor Mudge, and 1 Krisztian."

Similar presentations


Ads by Google