Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Michigan Electrical Engineering and Computer Science From SODA to Scotch: The Evolution of a Wireless Baseband Processor Mark Woh (University.

Similar presentations


Presentation on theme: "University of Michigan Electrical Engineering and Computer Science From SODA to Scotch: The Evolution of a Wireless Baseband Processor Mark Woh (University."— Presentation transcript:

1 University of Michigan Electrical Engineering and Computer Science From SODA to Scotch: The Evolution of a Wireless Baseband Processor Mark Woh (University of Michigan - Ann Arbor) Yuan Lin (University of Michigan - Ann Arbor) Sangwon Seo (University of Michigan - Ann Arbor) Scott Mahlke (University of Michigan - Ann Arbor) Trevor Mudge (University of Michigan - Ann Arbor) Chaitali Chakrabarti (Arizona State University) Richard Bruce (ARM Ltd.) Danny Kershaw (ARM Ltd.) Alastair Reid (ARM Ltd.) Mladen Wilder (ARM Ltd.) Krisztian Flautner (ARM Ltd.)

2 University of Michigan Electrical Engineering and Computer Science 2 From SODA to Scotch : What is this talk about? If a fully programmable 3G baseband processor commercially viable? ► The SODA processor was the first full research design [ISCA06] ► ARM R&D developed the Ardbeg SDR commercial prototype What we will present ► Comparison study between SODA and Ardbeg ► Lessons learned in the evolution 2

3 University of Michigan Electrical Engineering and Computer Science 3 Mobile Computing In 2007, world-wide mobile telephone subscription: 3.3 billion 1 ► ~Half of the world’s population ► Some countries have mobile penetration over 100% ► Largest consumer electronic device in terms of volume Wireless multimedia anywhere at anytime 3 Cell phones are getting more complex PCs are getting more mobile 1. “Global cellphone penetration reaches 50 pct”, Reuter, Nov. 29th, 2007

4 University of Michigan Electrical Engineering and Computer Science 4 Wireless Communication 4 Bluetooth UWB 802.11g Personal Area Network Local Area Network Wide Area Network Global Network GSM W-CDMA 802.11n DVB GPS

5 University of Michigan Electrical Engineering and Computer Science 5 Software Defined Radio 5 GPS Bluetooth Application Processors Baseband Processor Analog Frontend WCDMA Camera Keypad Display Speaker Microphone

6 University of Michigan Electrical Engineering and Computer Science 6 Software Defined Radio 6 GPS Bluetooth Application Processors Baseband Processor Analog Frontend WCDMA Camera Keypad Display Speaker Microphone MAC Link Network Transport PHY GPP DSP + ASICs

7 University of Michigan Electrical Engineering and Computer Science 7 Software Defined Radio 7 GPS Bluetooth Analog Frontend WCDMA Application Processors Camera Keypad Display Speaker Microphone SDR Baseband Processor

8 University of Michigan Electrical Engineering and Computer Science 8 Advantages of Soft Radio Design factor ► Protocol complexity ► Multi-mode operation ► Prototyping and bug fixes Cost factor ► Time-to-market ► Silicon area ► Higher volume ► Longevity of platform 8 Bluetooth UWB 802.11g GSM W-CDMA DVB GPS 802.11n SDR

9 University of Michigan Electrical Engineering and Computer Science 9 Mobile SDR Design Challenges 9 SDR Design Objectives for 3G and WiFi  Throughput requirements  40+Gops peak throughput  Power budget  100mW~500mW peak power SDR Design Objectives for 3G and WiFi  Throughput requirements  40+Gops peak throughput  Power budget  100mW~500mW peak power

10 University of Michigan Electrical Engineering and Computer Science 10 First Generation SDR Processor : SODA Our first attempt was the SODA processor ► Design at 180nm technology ► Built for WCDMA and 802.11a in mind ► Sub 500mW operation estimated at 90nm

11 University of Michigan Electrical Engineering and Computer Science 11 SODA System: Heterogeneous multi-core architecture Multi-level scratchpad memories PE: SIMD/Scalar/AGU LIW 32-lane 16-bit SIMD 16-bit scalar datapath Scalar-to-SIMD SIMD-to-scalar Iterative Perfect Shuffle Network 11 512-bit SIMD Reg. File E X 512-bit SIMD ALU+ Mult SIMD Shuffle Net- work (SSN) W B Scalar ALU W B E X Scalar RF L1 SIMD Data Memory L1 Scalar Data Memory S T V AGU RF E X W B AGU ALU 1.wide SIMD 2.Scalar 4.AGU V T S Pred. Regs W B SIMD to Scalar (VtoS) ALU RF DMA SODA PE 5.DMA 3.Local memory To System Bus L1 Program Memory Controller

12 University of Michigan Electrical Engineering and Computer Science 12 Mobile SDR requirements SODA Summary 12 SODA 180nmSODA 90nm TI C6x 90nm Picochip 130nm Sandbridge 90nm NXP EVP 90nm req. ASICs

13 University of Michigan Electrical Engineering and Computer Science 13 512-bit SIMD Reg. File 512-bit SIMD Mult SIMD Shuffle Net- work Scalar ALU+ Mult Scalar RF+ACC L1 Data Memory AGU RF AGU 1.wide SIMD Pred. RF SIMD+ Scalar Transf Unit Ardbeg PE 3.Memory SIMD Pred. ALU Scalar wdata 1024-bit SIMD ACC RF SIMD wdata 512-bit SIMD ALU with shuffle E X E X I N T E R C O N N E C T S I N T E R C O N N E C T S L2 Memory 2.Scalar&AGU L1 Program Memory Controller E X E X AGU W B W B W B W B 6 4 - b i t A M B A 3 A X I I n t e r c o n n e c t Control Processor Ardbeg System FEC Accelerator L1 Mem Execution Unit PE L1 Mem Execution Unit PE DMAC Peripherals L1 Mem L2 5 1 2 - b i t B u s Ardbeg SDR Processor Application Specific Hardware Block Floating Point Application Specific Hardware Block Floating Point Combined Scalar/Vector Memory 8,16,32 bit fixed point support 128-lane 8-bit Banyan Network 3 Read/2 Write RF for VLIW Sparse Connected VLIW Multiple Data Address Accesses Fused Permute ALU operations

14 University of Michigan Electrical Engineering and Computer Science 14 Evolution to Ardbeg : Lessons Learned Ardbeg achieved ~3x speedup overall at 30% lower power than SODA To get these improvements many lessons were learned as a result of the studies done We will present a few of these studies ► 1) Benefit of Wide SIMD ► 2) VLIW on SIMD support ► 3) Support for Complex Shuffle Network ► 4) Application Specific Hardware

15 University of Michigan Electrical Engineering and Computer Science 15 1) Benefiting from Wide SIMD Increasing SIMD width still a good idea for SDR But area becomes a big concern ► 32 wide 16-bit SIMD at 90nm seems a good fit 1.2 1.0 0.8 0.6 0.4 0.2 0 12 10 8 6 4 2 0 8163264 SIMD Width N o r m a l i z e d E n e r g y - D e l a y P r o d u c t N o r m a l i z e d A r e a Energy-Delay Area

16 University of Michigan Electrical Engineering and Computer Science 16 2) VLIW Support for Wide SIMD VLIW execution on top of the SIMD datapath 3 read ports, 2 write ports ► Shared between SIMD units ► 2-issue SIMD LIW ► Only support the most frequently used SIMD op pairs 16 SIMD 32- lane SIMD ALU 32- lane SIMD ALU SIMD RF SIMD RF 128- lane SSN 128- lane SSN SIMD scalar trans. unit SIMD scalar trans. unit EXEX EXEX WBWB WBWB scalar RF scalar RF 16-bit ALU 16-bit ALU EXEX EXEX WBWB WBWB Interconnects EXEX EXEX WBWB WBWB EXEX EXEX WBWB WBWB Scalar AGU Data MEM Data MEM AGU

17 University of Michigan Electrical Engineering and Computer Science 17 2) VLIW on SIMD Support There is a distinct set of instructions that execute frequently at the same time We want to take advantage of this in order to reduce complexity of VLIW

18 University of Michigan Electrical Engineering and Computer Science 18 0 0.2 0.4 0.6 0.8 1 1.2 FIRCFIRFFT Radix-2FFT Radix-4Viterbi K7Viterbi K9Average Energy-Delay Product 2 Read/ 2 Write (Single Issue)3 Read/ 2 Write (Ardbeg) 4 Read/ 4 Write (Any two SIMD ops)6 Read/ 5 Write (Any three SIMD ops) 2) VLIW on SIMD Support 3 Read/ 2 Write provides us for the most case the best overall design point

19 University of Michigan Electrical Engineering and Computer Science 19 3) Support for Shuffle Network 7-stage single-cycle SSN ► Banyan network ► 128-lane 8-bit (64-lane 16-bit) 19 2 stage 16-lane Banyan network

20 University of Michigan Electrical Engineering and Computer Science 20 0 0.2 0.4 0.6 0.8 1 1.2 64pt FFT Radix-2 2048pt FFT Radix-2 64pt FFT Radix-4 2048pt FFT Radix-4 Viterbi K9 Energy-Delay Product 32 Wide Perfect64 Wide Perfect 64 Wide Crossbar64 Wide Banyan 3) Support for Shuffle Network 64-Wide Banyan gives us close to a simple iterative interconnect energy with crossbar like performance

21 University of Michigan Electrical Engineering and Computer Science 21 4) Application Specific Optimizations Application specific hardware ► Turbo coprocessor ► Block-floating point support ► Fused Permute-ALU operations ► Interleaving support Trade-off programmability for performance ► Less “soft” than SODA ► But more energy efficient for common operations 21

22 University of Michigan Electrical Engineering and Computer Science 22 4) Application Specific Optimizations Some kernels are common among many different protocols ► Many protocols use the same Error Correction algorithms Turbo Coprocessor is one of them ► Tradeoff between Programmable vs ASIC ASIC implementations is around 5x more efficient than programmable implementation ► SODA PE: 2Mbps with 111mW in 90nm ► ASIC : 2Mbps with 21mW in 90nm

23 University of Michigan Electrical Engineering and Computer Science 23 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 FIR 16-tapsFIR 33-tapsFIR 65-taps CFIR 16-tapsCFIR 33-tapsCFIR 65-taps Average FFT Rx2 64pt FFT Rx2 2048pt FFT Rx4 64pt FFT Rx4 2048pt QAM4 QAM16QAM64 Despreader Descrambler Combiner Average W-CDMA Searcher 802.11a Interpolator DVB-T Equalizer DVB-T Chan. Est. Average Viterbi K7Viterbi K9 Bit Intlv 3Bit Intlv 6 Interleaver Average Ardbeg Speedup Over SODA Baseline SODASIMD ALUSIMD ShuffleVLIWCompiler Optimization FilteringModulationSynchronization Error Correction 7x Overall Improvements Achieves between ~1.5-7x speedup for wireless algorithms compared to SODA

24 University of Michigan Electrical Engineering and Computer Science 24 Summary of Ardbeg Power vs Throughput for protocols on different processors

25 University of Michigan Electrical Engineering and Computer Science 25 Summary of Ardbeg Ardbeg is lower power at same throughput We are getting closer to ASICs

26 University of Michigan Electrical Engineering and Computer Science 26 Conclusion SODA  Ardbeg ► Overall ~1.5-7x improvement across multiple wireless algorithms ► 30% less power over SODA (with turbo also in software) Fully programmable research design evolved to a commercial design that is “less soft” Feasible to design programmable solutions that start to approach ASIC efficiency ► ASICs are locally optimal for single kernels but combined create an inefficient system Programmability allows time multiplexing of hardware = Less hardware, same amount of work

27 University of Michigan Electrical Engineering and Computer Science 27 Questions? Thanks!


Download ppt "University of Michigan Electrical Engineering and Computer Science From SODA to Scotch: The Evolution of a Wireless Baseband Processor Mark Woh (University."

Similar presentations


Ads by Google