Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Enabling Protocol Coexistence: High-Level Hardware-Software Co-design of Flexible Modern Wireless Transceivers Benjamin Drozdenko Graduate Research Assistant.

Similar presentations


Presentation on theme: "1 Enabling Protocol Coexistence: High-Level Hardware-Software Co-design of Flexible Modern Wireless Transceivers Benjamin Drozdenko Graduate Research Assistant."— Presentation transcript:

1 1 Enabling Protocol Coexistence: High-Level Hardware-Software Co-design of Flexible Modern Wireless Transceivers Benjamin Drozdenko Graduate Research Assistant & Ph.D. Candidate Advisors: Prof. Leeser (RCL) & Prof. Chowdhury (GENESYS) Northeastern University, Boston, MA Northeastern ECE Ph.D. Student Seminar Series (NEPSSS) March 30, 2016 Enabling Protocol Coexistence: High-Level Hardware-Software Co-design of Flexible Modern Wireless Transceivers Benjamin Drozdenko Graduate Research Assistant & Ph.D. Candidate Advisors: Prof. Leeser (RCL) & Prof. Chowdhury (GENESYS) Northeastern University, Boston, MA Northeastern ECE Ph.D. Student Seminar Series (NEPSSS) March 30, 2016

2 2 So What-Who Cares? Wireless Transceivers: Y’all Got ‘em! Surge in wireless devices 10B devices today, 50B by 2050 $14 trillion business over next 10 years Challenges: Times are changing C1: Adapt to changing protocols to handle contention C2: Maintain/increase bit rates C3: Decrease energy consumption and error rates LTEWi-Fi

3 3 Another Challenge: Spectrum Scarcity C4: Change center frequency to use new bandwidths 54-698 MHz: 802.11af TV Whitespace Reuse 3.55-3.65 GHz: Military RADAR Reuse 2.4, 5.8 GHz: 802.11a/b Designated ISM Bands

4 4 Modeling Environment Barriers: Why Making Such Wireless Transceivers Is Hard Comms protocols evolve; transceiver HW/SW must evolve too! SW f c =5.8 GHz f s =20 Msps HW B2: HW & SW must be reconfigurable B4: Map behaviors to HW or SW B3: Each processing block (PB) must be same on HW&SW B1: HW-SW modeling environment y=FFT(x) module FFT(x,y) function FFT(x,y) == ProcBlk3ProcBlk1 ProcBlk2 Effective Bus FPGA

5 5 (B1)Signal Processing for Wireless Comms Understanding (B3)Implementation Portability: equivalent functionality on HW&SW (C2)Time Synchronization (C3)Low Energy/Error Rate (C4)Spectrum scarcity: Changing bandwidths (B1)HW & SW Joint Modeling Environment (B2)SW Control of Radio Parameters Enabling Technologies System goals and challenges Fundamental Research (C1)Adapt to changing protocols, contention (B2)Partially Reconfig- urable HW (FPGA) (B1)Time & Energy Optimization Techniques (B4)Identify ideal mapping of wireless behaviors to HW & SW Top Down-Bottom Up Approach to Modeling Wireless Transceivers for Protocol Coexistence R1 R2R3 R4 T1 T2T3

6 6 What Does a HW-SW Modeling Environment Need? 1.Provide a HW-SW Prototyping Platform 1.Modeling for Wireless Processing Blocks (PBs) 2.Hardware (HW) Components 3.Software (SW) Tools 2.Model a HW-SW Divide Point 3.Enact HW-SW Interfacing 4.Exhibit Reusability & Adaptability to Modern Standards What Does a HW-SW Modeling Environment Need? 1.Provide a HW-SW Prototyping Platform 1.Modeling for Wireless Processing Blocks (PBs) 2.Hardware (HW) Components 3.Software (SW) Tools 2.Model a HW-SW Divide Point 3.Enact HW-SW Interfacing 4.Exhibit Reusability & Adaptability to Modern Standards T2: HW & SW Joint Modeling Environment: Testbed Requirements

7 7 HW-SW Prototyping Platform: Modeling for Wireless Processing Blocks PBTransmitter (Tx)Receiver (Rx) 1ScramblingPreamble Detection 2Convolutional CodingOFDM Demodulation 3Block InterleavingBPSK Demodulation 4BPSK ModulationBlock De-interleaving 5OFDM ModulationViterbi Decoding 6Preamble InsertionDescrambling Tx: Data BitsTx: Samples Rx: Samples Rx: Data Bits Simulink Model for Tx or Rx path Simulink: Design Synchronous Dataflow (SDF) Models Integrated Profiling: Look at Entire 802.11a PHY Layer Processing Chain

8 8 HW-SW Prototyping Platform: Hardware Components: Xilinx Zynq Zynq-Based Heterogeneous Computing System Zynq-7000 series System-on-Chip (SoC) Processing System: ARM Cortex-A9 CPU Programmable Logic: FPGA with DSPs & BRAM We prototype on 2 varieties: ZC706 & Zedboard FPGA Zynq SoC CPU FPGA Zynq SoC CPU FPGA Zynq SoC CPU

9 9 JTAG (to FPGA) HW-SW Prototyping Platform: Hardware Components Host PC: Runs SW Tools RF Front End: ADI FMComms3 FPGA Zynq SoC CPU 3 4 5 6 4321 Receive Path Transmit Path 6 5 1 2 Ethernet (to CPU) 2Tx 2Rx AD9361 FMC Slot Zynq-Based Heterogeneous Computing System Radio Frequency (RF) Front End Host Personal Computer (PC) Zynq-Based Heterogeneous Computing System

10 10 HW-SW Prototyping Platform: Software Tools FPGA Zynq SoC CPU 3 4 5 6 5432 Receive Path Transmit Path 7 6 1 2 JTAG (to FPGA) Ethernet (to CPU) MathWorks Simulink™ Model HDL Code Xilinx Vivado ® C Code ARM Executable FPGA Bitstream Embedded Coder™ HDL Coder™ Zynq-Based Heterogeneous Computing System Host PC: Runs SW Tools Embedded Coder: Generate C code for ARM Processor HDL Coder: Create HW Description Language (HDL) code Vivado: Synthesize, Implement, and Generate FPGA Bitstream

11 11 Modeling a HW-SW Divide Point FPGA Zynq SoC CPU 3 4 5 6 4321 Receive Path Transmit Path 6 5 1 2 V1 SW HW V2 SW HW V3 SW HW V4 SW HW V5 SW HW V6 SW HW V7 SW HW V1: SW-only model V2: Adds Tx F6 & Rx F1 to HW V3: Adds Tx F5 & Rx F2 to HW V4: Adds Tx F4 & Rx F3 to HW V5: Adds Tx F3 & Rx F4 to HW V6: Adds Tx F2 & Rx F5 to HW V7: HW-only model Zynq-Based Heterogeneous Computing System

12 12 Advanced eXtensible Interface (AXI): Bus to Connect CPU & FPGA Direct Memory Access (DMA): To Hold Data Sent b/w CPU & FPGA First-In First-Out (FIFO): Queue to Buffer Bits in Transit HW-SW Interfacing: Bus Details Note: the data to transfer between CPU & FPGA has a different size and class for each model variant! FPGA Zynq SoC CPU 3 4 5 67 5432 Receive Path Transmit Path 1 7 6 1 2 2Tx 2Rx DAC: I 1,2, Q 1,2 AD9361 ADC: I 1,2, Q 1,2 AXI DMA Controller FIFOunpack FIFOslice FIFOconcat FIFOpack RF Front End: ADI FMComms3 Zynq-Based Heterogeneous Computing System

13 13 HW-SW Interfacing: Data Transfer Types & Sizes Data to SendData TypeSize of 1#Elements V1SamplesSigned Fixed Point16 bits80 V2SamplesSigned Fixed Point16 bits64 V3SymbolsSigned Integer1-8 bits64 V4Coded BitsBoolean1 bit48 V5Coded BitsBoolean1 bit48 V6Data BitsBoolean1 bit24 V7Data BitsBoolean1 bit24  Before sending data between CPU & FPGA, we translate to a 32-bit unsigned integer format for transfer on AXI interconnect  We build a library of bundling blocks to facilitate this transfer

14 14 Results: CPU Execution Time: Transmitter on Zynq  Moving one processing block from SW to HW does not necessarily cause speedup  Increase in Tx frame time on ZC706 from V1 to V2 is proof  V1 is SW-only, requires no AXI communication  Keeps all operations in SW  V2 adds small component to HW  Time saved < time spent on CPU-FPGA data transfer  Our modeling environment can identify location at which HW-SW interface is best placed

15 15 Results: CPU Execution Time: Receiver on ZC706  Rx maximum CPU frame time decreases as more blocks are moved onto the FPGA  Preamble detection is revealed to be the biggest bottleneck in the Rx model  Moving it in V2 results in the largest drop in frame time  Also drops with FFT in V3 & Viterbi Decoder in V6  Moving Descrambler in V7 does not show decrease, suggesting we can put it in SW  Rx maximum CPU frame time decreases as more blocks are moved onto the FPGA  Preamble detection is revealed to be the biggest bottleneck in the Rx model  Moving it in V2 results in the largest drop in frame time  Also drops with FFT in V3 & Viterbi Decoder in V6  Moving Descrambler in V7 does not show decrease, suggesting we can put it in SW

16 16 Results: FPGA Resource Utilization and Power Usage PBTxRx 1 1.531.57 2 1.822.34 3 1.842.35 4 1.842.11 5 1.842.11 6 1.852.11 7 1.842.12 Transmitter Res Util Receiver Res Util Power

17 17 Variants of Processing Blocks: Preamble Detection MF VariantDefaultHDL LongHDL Training Data Path Delay (ns) 500314132 % LUTs8.938.215.8 % Registers4.32.01.3 % DSPs99.235.314.7 Total Power (W) 2.652.342.09  Block uses a matched filter to correlate 2 frames with a fixed set of coefficients  1 st MF manually assembled from adders & multipliers  Not ideal: uses 99% of DSPs  2 nd MF correlates with full long preamble  But long preamble composed of repetitions of training seq  3 rd MF correlates with only the training sequence  2.38X reduction in path delay  1.12X reduction in power

18 18 Variants of Processing Blocks: Viterbi Decoder VD VariantDelay- Based BRAM- Based Data Path Delay (ns) 308314 % LUTs41.040.3 % Registers4.23.2 BRAM Tiles02 Total Power (W) 2.36 VD Power (W) 0.0110.005  Block reverses effects of Convolutional Encoder  Requires memory to hold intermediate state values  1 st VD uses delay blocks to hold state memory  Exhibits lower path delay  2 nd VD uses BRAM tiles to hold state memory  Uses fewer LUTs and registers  Slightly lower power  Illustrates tradeoff between time and power  Can dynamically tune design to target either objective

19 19 Reusability & Adaptability to Modern Wireless Standards Processing Block802.11aWi-Fi (802.11g)Mobile (LTE) 1.Scrambling(1) 2.Convolutional Coding(1) 3.PSK Modulation(B)(DB)(Q) 4.Block Interleaving(1) 5.OFDM(1) (DL,128-2048) 6.Preamble Insert/Detect(1)(2) (1): Equivalent, Reusable (2): Not Yet Implemented, but a variant can be reused

20 20 Variants of Processing Blocks: OFDM IFFT IFFT Size641282565121024 Data Path Delay (ns)15.216.8 15.618.0 % LUTs19.922.427.837.254.9 % Registers12.314.519.127.744.1 % DSPs6.47.79.110.511.8 Total Power (W)1.84 1.85 1.87  In LTE, OFDM modulation uses different IFFT sizes to spread symbols onto a larger number of subcarriers  We vary the IFFT sizes to identify its impact on FPGA metrics  Delay, resources, and power rises for higher IFFT sizes  Limiting factors: #LUTs for multiple IFFTs on FPGA

21 21 Conclusions  Introduces a method for modeling HW-SW co-designs for wireless transceivers  Enables profiling of all processing blocks  Identifies bottlenecks such as preamble detection  Explores various HW-SW divide points  Identifies which model variants are most desirable  Details interfacing needed at divide point  Shows when variants use more power from data transfer  Shows added FPGA power is a fraction of CPU power  Improves Preamble detection by fewer MF coefficients  Customizes Viterbi decoder to use different resources  Introduces a method for modeling HW-SW co-designs for wireless transceivers  Enables profiling of all processing blocks  Identifies bottlenecks such as preamble detection  Explores various HW-SW divide points  Identifies which model variants are most desirable  Details interfacing needed at divide point  Shows when variants use more power from data transfer  Shows added FPGA power is a fraction of CPU power  Improves Preamble detection by fewer MF coefficients  Customizes Viterbi decoder to use different resources

22 22 Future Work  Perform live tests with online radio transmissions  Measure link latency and error rates  Develop rules to automate HW-SW co-designs  Make decisions about HW-SW divide point  Automate bundling for data transfer between HW & SW  Switch out platform to test newest HW  Altera Arria 10®  Xilinx Ultrascale+ MPSoC  Explore co-existence with modern protocols (802.11 & LTE)  OFDM IFFT study is first look at this  Perform live tests with online radio transmissions  Measure link latency and error rates  Develop rules to automate HW-SW co-designs  Make decisions about HW-SW divide point  Automate bundling for data transfer between HW & SW  Switch out platform to test newest HW  Altera Arria 10®  Xilinx Ultrascale+ MPSoC  Explore co-existence with modern protocols (802.11 & LTE)  OFDM IFFT study is first look at this

23 23 Publications & Acknowledgments  Extended Abstracts & Posters:  BARC 2016, Boston, MA, January 29, 2016.  IEEE INFOCOM 2016, San Francisco, CA, April 11-14, 2016.  Submitted, Pending:  IEEE Transactions on Emerging Topics in Computing, Special Issue on Next Generation Wireless Computing Systems. Submitted Mar 1, 2016.  IEEE Field Programmable Logic & Applications (FPL) 2016. Submitted Mar 27, 2016.  Plans:  ACM Wireless Network Testbeds, Experimental evaluation, and Characterization (WiNTECH) 2016, October 3, 2016.  Acknowledgments:  Extended Abstracts & Posters:  BARC 2016, Boston, MA, January 29, 2016.  IEEE INFOCOM 2016, San Francisco, CA, April 11-14, 2016.  Submitted, Pending:  IEEE Transactions on Emerging Topics in Computing, Special Issue on Next Generation Wireless Computing Systems. Submitted Mar 1, 2016.  IEEE Field Programmable Logic & Applications (FPL) 2016. Submitted Mar 27, 2016.  Plans:  ACM Wireless Network Testbeds, Experimental evaluation, and Characterization (WiNTECH) 2016, October 3, 2016.  Acknowledgments:

24 24 References [1] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High-speed Physical Layer in the 5 GHz Band, IEEE Std. 802.11a-1999, 1999. [2] J. Pendlum, M. Leeser, and K. Chowdhury, “Reducing processing latency with a heterogeneous fpga-processor framework,” in 22 nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, Boston, MA, USA, May 11-13, 2014. IEEE Computer Society, 2014, pp. 17–20. [Online]. Available: http://dx.doi.org/10.1109/FCCM.2014.13 [3] B. Drozdenko, R. Subramanian, K. Chowdhury, and M. Leeser, Cognitive Radio Oriented Wireless Networks: 10th International Conference, CROWNCOM 2015, Doha, Qatar, April 21-23, 2015, Revised Selected Papers. Cham: Springer International Publishing, 2015, ch. Implementing a MATLAB-Based Self-configurable Software Defined Radio Transceiver, pp. 164–175. [Online]. Available: http: //dx.doi.org/10.1007/978-3-319-24540-9 13 [4] National Instruments, Inc. (2016) Real-time lte/wi-fi coexistence testbed. [Online]. Available: http://www.ni.com/white-paper/53044/en/ [5] MathWorks, Inc. (2016) Zynq sdr support from communications system toolbox. [Online]. Available: http://www.mathworks.com/hardwaresupport/zynq-sdr.html [6] Xilinx, Inc. (2016) Vivado design suite - hlx editions. [Online]. Available: http://www.xilinx.com/products/design-tools/vivado.html [7] Analog Devices, Inc. (2015) Integrated transceivers, transmitters, and receivers. [Online]. Available: http://www.analog.com/en/products/rfmicrowave/integrated- transceivers-transmitters-receivers.html [1] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High-speed Physical Layer in the 5 GHz Band, IEEE Std. 802.11a-1999, 1999. [2] J. Pendlum, M. Leeser, and K. Chowdhury, “Reducing processing latency with a heterogeneous fpga-processor framework,” in 22 nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, Boston, MA, USA, May 11-13, 2014. IEEE Computer Society, 2014, pp. 17–20. [Online]. Available: http://dx.doi.org/10.1109/FCCM.2014.13 [3] B. Drozdenko, R. Subramanian, K. Chowdhury, and M. Leeser, Cognitive Radio Oriented Wireless Networks: 10th International Conference, CROWNCOM 2015, Doha, Qatar, April 21-23, 2015, Revised Selected Papers. Cham: Springer International Publishing, 2015, ch. Implementing a MATLAB-Based Self-configurable Software Defined Radio Transceiver, pp. 164–175. [Online]. Available: http: //dx.doi.org/10.1007/978-3-319-24540-9 13 [4] National Instruments, Inc. (2016) Real-time lte/wi-fi coexistence testbed. [Online]. Available: http://www.ni.com/white-paper/53044/en/ [5] MathWorks, Inc. (2016) Zynq sdr support from communications system toolbox. [Online]. Available: http://www.mathworks.com/hardwaresupport/zynq-sdr.html [6] Xilinx, Inc. (2016) Vivado design suite - hlx editions. [Online]. Available: http://www.xilinx.com/products/design-tools/vivado.html [7] Analog Devices, Inc. (2015) Integrated transceivers, transmitters, and receivers. [Online]. Available: http://www.analog.com/en/products/rfmicrowave/integrated- transceivers-transmitters-receivers.html


Download ppt "1 Enabling Protocol Coexistence: High-Level Hardware-Software Co-design of Flexible Modern Wireless Transceivers Benjamin Drozdenko Graduate Research Assistant."

Similar presentations


Ads by Google