A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.

A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian Flautner 2 1 Advanced Computer Architecture Laboratory University of Michigan 2 ARM, Ltd.

Advanced Computer Architecture Laboratory University of Michigan 2 SDR Design Challenges: Hardware design challenges High computational throughput (~40 Gops) Low power consumption (~200mW) Meet real-time requirements DSP programming support System-level development Inter-algorithm communication Algorithm-level development Efficient DSP representations

SDR Benchmark Design & Analysis

Advanced Computer Architecture Laboratory University of Michigan 4 W-CDMA Protocol: 2Mbps

Advanced Computer Architecture Laboratory University of Michigan 5 W-CDMA Characteristics Plenty of vector parallelism 8 & 16-bit DSP algorithms Multiplication is not dominant No floating-point operation Small instruction/data memory Has periodic real-time tasks

Advanced Computer Architecture Laboratory University of Michigan 6 802.11a Protocol: 24Mbps

Advanced Computer Architecture Laboratory University of Michigan 7 802.11a Characteristics Similar to W-CDMA Plenty of vector parallelism No floating-point operation Small instruction/data memory Different from W-CDMA Mostly 16-bit DSP algorithms Multiplication is more dominant No periodic real-time tasks

SDR Processor Architecture Design

Advanced Computer Architecture Laboratory University of Michigan 9 System Architecture Design Tradeoffs Amortized fetch Intra-processor Communication Inter-processor Communication

Advanced Computer Architecture Laboratory University of Michigan 10 System Architecture Design Tradeoffs Number of Processing Elements x SIMD width For W-CDMA 2Mbps (51.2GOP/sec) 90nm 1V @400MHz

Advanced Computer Architecture Laboratory University of Michigan 11 Our SDR System Architecture Design Scalable system design Standardized SoC interface System interface supports multiple (potentially) heterogeneous PEs and memories For WCDMA & 802.11a 4 homogeneous processing elements (PEs) Dual pipelines: scalar pipeline & SIMD pipeline Local scratchpad memory (no data cache) Global scratchpad memory (64KB) Controller -- ARM general purpose processor

Advanced Computer Architecture Laboratory University of Michigan 12 System Architecture Design

Advanced Computer Architecture Laboratory University of Michigan 13 PE Design (Area < 1mm 2 Power<50mW)

Advanced Computer Architecture Laboratory University of Michigan 14 Mapping DSP Algorithms: Filters Z -1 b0b1b2b3 In Out In b z -1 spread V in, S in shift z, z, up mac z, V in, S in

Advanced Computer Architecture Laboratory University of Michigan 15 Mapping DSP Algorithm: Filter spread V in, S in shift z, z, up mac z, V in, S in

Advanced Computer Architecture Laboratory University of Michigan 16 Efficient Design Wide SIMD width Small register file with minimum ports Small memories Narrow system BUS Data-path optimized for 8bits Vector shuffle reduce memory ports

Advanced Computer Architecture Laboratory University of Michigan 17 Processing Element (PE) Design Scalar pipelines 16bit data path SIMD pipeline 8 bit data path 32x8 SIMD ALU Software controlled local scratchpad memory 4KB scalar memory 4KB SIMD memory Inter-PE communication through DMA

Advanced Computer Architecture Laboratory University of Michigan 18

Advanced Computer Architecture Laboratory University of Michigan 19

Advanced Computer Architecture Laboratory University of Michigan 20 802.11a PE Mapping

Advanced Computer Architecture Laboratory University of Michigan 21 Power Results Configuration 4 PEs, 1 ARM (Cortex M3) controller Global scratchpad memory (64Kb) 90nm (1V @ 400 MHZ), Synthesized conservatively

Advanced Computer Architecture Laboratory University of Michigan 22 Area Results

SDR Programming Language Support

Advanced Computer Architecture Laboratory University of Michigan 24 Software Development Flow

Advanced Computer Architecture Laboratory University of Michigan 25 SPEX (Signal Processing EXtension) Implemented as a library extension to C System-level development Support concurrent DSP kernel function definitions Channel variables for inter-kernel communications Algorithm-level development Native vector & matrix variables Explicit DSP variable attribute definition Native vector & matrix operations

Advanced Computer Architecture Laboratory University of Michigan 26 SPEX Overview

Advanced Computer Architecture Laboratory University of Michigan 27 SPEX Example Code: Viterbi ACS Concurrent DSP kernel definitions void* acs(void*) { /* variable declaration */ saturated char metrics1, metrics2; saturated char states; saturated char t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); }

Advanced Computer Architecture Laboratory University of Michigan 28 SPEX Example Code: Viterbi ACS Native SIMD variable definition with explicit attributes SPEX variable supports 1. saturated/overflow 2. various variable bit-width 3. vector & matrices void* acs(void*) { /* variable declaration */ saturated char metrics1, metrics2; saturated char states; saturated char t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); }

Advanced Computer Architecture Laboratory University of Michigan 29 SPEX Example Code: Viterbi ACS Inter-kernel communication through channel operations Channel types: 1. FIFO queue 2. Broadcast queue 3. Sync/control channel 4. Random-read FIFO queue void* acs(void*) { /* variable declaration */ saturated char metrics1, metrics2; saturated char states; saturated char t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); }

Advanced Computer Architecture Laboratory University of Michigan 30 SPEX Example Code: Viterbi ACS SPEX vector operations Supports (Matlab-like C code) 1.SIMD arithmetic operations 2. SIMD permutation 3. SIMD predication void* acs(void*) { /* variable declaration */ saturated char metrics1, metrics2; saturated char states; saturated char t1, t2; while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive(); /* add */ metrics1 += states; metrics2 += states; /* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2; /* sending data to TB */ acs_to_tb.send(states); }

Advanced Computer Architecture Laboratory University of Michigan 31 Summary - Hardware & software solutions for SDR - Hardware - 4 dual-issue asymmetric SIMD processing elements - Consumes 200~300mW for 90nm - Meets the performance requirements for WCDMA & 802.11a - Software - SPEX provides efficient DSP algorithm and system implementation

Advanced Computer Architecture Laboratory University of Michigan 32 Questions?

A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.

Similar presentations

Presentation on theme: "A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian.

Similar presentations

Presentation on theme: "A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian."— Presentation transcript:

Similar presentations

About project

Feedback