Presentation is loading. Please wait.

Presentation is loading. Please wait.

A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.

Similar presentations


Presentation on theme: "A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian."— Presentation transcript:

1 A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian Flautner 2 1 Advanced Computer Architecture Laboratory University of Michigan 2 ARM, Ltd.

2 Advanced Computer Architecture Laboratory University of Michigan 2 SDR Design Challenges: Hardware design challenges High computational throughput (~40 Gops) Low power consumption (~200mW) Meet real-time requirements DSP programming support System-level development Inter-algorithm communication Algorithm-level development Efficient DSP representations

3 SDR Benchmark Design & Analysis

4 Advanced Computer Architecture Laboratory University of Michigan 4 W-CDMA Protocol: 2Mbps

5 Advanced Computer Architecture Laboratory University of Michigan 5 W-CDMA Characteristics Plenty of vector parallelism 8 & 16-bit DSP algorithms Multiplication is not dominant No floating-point operation Small instruction/data memory Has periodic real-time tasks

6 Advanced Computer Architecture Laboratory University of Michigan 6 802.11a Protocol: 24Mbps

7 Advanced Computer Architecture Laboratory University of Michigan 7 802.11a Characteristics Similar to W-CDMA Plenty of vector parallelism No floating-point operation Small instruction/data memory Different from W-CDMA Mostly 16-bit DSP algorithms Multiplication is more dominant No periodic real-time tasks

8 SDR Processor Architecture Design

9 Advanced Computer Architecture Laboratory University of Michigan 9 System Architecture Design Tradeoffs Amortized fetch Intra-processor Communication Inter-processor Communication

10 Advanced Computer Architecture Laboratory University of Michigan 10 System Architecture Design Tradeoffs Number of Processing Elements x SIMD width For W-CDMA 2Mbps (51.2GOP/sec) 90nm 1V @400MHz

11 Advanced Computer Architecture Laboratory University of Michigan 11 Our SDR System Architecture Design Scalable system design Standardized SoC interface System interface supports multiple (potentially) heterogeneous PEs and memories For WCDMA & 802.11a 4 homogeneous processing elements (PEs) Dual pipelines: scalar pipeline & SIMD pipeline Local scratchpad memory (no data cache) Global scratchpad memory (64KB) Controller -- ARM general purpose processor

12 Advanced Computer Architecture Laboratory University of Michigan 12 System Architecture Design

13 Advanced Computer Architecture Laboratory University of Michigan 13 PE Design (Area < 1mm 2 Power<50mW)

14 Advanced Computer Architecture Laboratory University of Michigan 14 Mapping DSP Algorithms: Filters Z -1 b0b1b2b3 In Out In b z -1 spread V in, S in shift z, z, up mac z, V in, S in

15 Advanced Computer Architecture Laboratory University of Michigan 15 Mapping DSP Algorithm: Filter spread V in, S in shift z, z, up mac z, V in, S in

16 Advanced Computer Architecture Laboratory University of Michigan 16 Efficient Design Wide SIMD width Small register file with minimum ports Small memories Narrow system BUS Data-path optimized for 8bits Vector shuffle reduce memory ports

17 Advanced Computer Architecture Laboratory University of Michigan 17 Processing Element (PE) Design Scalar pipelines 16bit data path SIMD pipeline 8 bit data path 32x8 SIMD ALU Software controlled local scratchpad memory 4KB scalar memory 4KB SIMD memory Inter-PE communication through DMA

18 Advanced Computer Architecture Laboratory University of Michigan 18

19 Advanced Computer Architecture Laboratory University of Michigan 19

20 Advanced Computer Architecture Laboratory University of Michigan 20 802.11a PE Mapping

21 Advanced Computer Architecture Laboratory University of Michigan 21 Power Results Configuration 4 PEs, 1 ARM (Cortex M3) controller Global scratchpad memory (64Kb) 90nm (1V @ 400 MHZ), Synthesized conservatively

22 Advanced Computer Architecture Laboratory University of Michigan 22 Area Results

23 SDR Programming Language Support

24 Advanced Computer Architecture Laboratory University of Michigan 24 Software Development Flow

25 Advanced Computer Architecture Laboratory University of Michigan 25 SPEX (Signal Processing EXtension) Implemented as a library extension to C System-level development Support concurrent DSP kernel function definitions Channel variables for inter-kernel communications Algorithm-level development Native vector & matrix variables Explicit DSP variable attribute definition Native vector & matrix operations

26 Advanced Computer Architecture Laboratory University of Michigan 26 SPEX Overview

27 Advanced Computer Architecture Laboratory University of Michigan 27 SPEX Example Code: Viterbi ACS Concurrent DSP kernel definitions void* acs(void*) { /* variable declaration */ saturated char metrics1, metrics2; saturated char states; saturated char t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); }

28 Advanced Computer Architecture Laboratory University of Michigan 28 SPEX Example Code: Viterbi ACS Native SIMD variable definition with explicit attributes SPEX variable supports 1. saturated/overflow 2. various variable bit-width 3. vector & matrices void* acs(void*) { /* variable declaration */ saturated char metrics1, metrics2; saturated char states; saturated char t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); }

29 Advanced Computer Architecture Laboratory University of Michigan 29 SPEX Example Code: Viterbi ACS Inter-kernel communication through channel operations Channel types: 1. FIFO queue 2. Broadcast queue 3. Sync/control channel 4. Random-read FIFO queue void* acs(void*) { /* variable declaration */ saturated char metrics1, metrics2; saturated char states; saturated char t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); }

30 Advanced Computer Architecture Laboratory University of Michigan 30 SPEX Example Code: Viterbi ACS SPEX vector operations Supports (Matlab-like C code) 1.SIMD arithmetic operations 2. SIMD permutation 3. SIMD predication void* acs(void*) { /* variable declaration */ saturated char metrics1, metrics2; saturated char states; saturated char t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); }

31 Advanced Computer Architecture Laboratory University of Michigan 31 Summary - Hardware & software solutions for SDR - Hardware - 4 dual-issue asymmetric SIMD processing elements - Consumes 200~300mW for 90nm - Meets the performance requirements for WCDMA & 802.11a - Software - SPEX provides efficient DSP algorithm and system implementation

32 Advanced Computer Architecture Laboratory University of Michigan 32 Questions?


Download ppt "A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian."

Similar presentations


Ads by Google