Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt.

Similar presentations


Presentation on theme: "Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt."— Presentation transcript:

1 Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

2 Motivation The TLL 5000 Freescale i.MX21 System-on-Chip ARM9-based processor does not have native support for Floating Point Floating point operation simulated using libraries e.g libc Applications which are “Math Heavy” e.g MAC based operations which require computing sine/cos/arctan values are thus not suitable for this platform. EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 2 Hardware Acceleration for Trigonometric Math operations

3 Outline Select a basic mathematical building block. E.g CORDIC (from OpenCores) Implement the CORDIC engine in hardware (FPGA). Implement higher level primitives e.g Discrete Fourier Transform, using CORDIC. Use these blocks in a C program instead of the. Offload the heavy number crunching to the hardware accelerator (FPGA) freeing up valuable CPU resources. EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 3

4 CORDIC engine EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 4 Coordinated Rotation Digital Computer is simple and efficient algorithm to calculate hyperbolic and trigonometric functions. We use it to calculate Sine and Cosine of an angle given in Radians/Degrees. To determine the Sine and Cosine of angle β we need to find the position X and Y on the unit circle.

5 CORDIC contd. CORDIC is an iterative algorithm and used table lookup. First Step: Rotate the vector 45° counterclockwise. If ((β – α) != 0) iterate Else exit. Successive iteration will rotate the vector in one or the other direction in size decreasing steps. The magnitude of rotation is 1/2 i. – Where “i” is the iteration step. Terminate after 16 steps. (approximate 5 digits of precision) EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 5

6 Discrete Fourier Transform(DFT) EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 6 DFT can be implemented using CORDIC

7 Design of CORDIC The CORDIC Verilog from OpenCores could be operated in different modes – Pipelined – Iterative – Combinatorial Pipeline  Efficient from performance perspective. We trade off area for performance. (max number of LUT needed) – Outputs result at every clock after an initial latency. Resolution  limited to 5 bits of precision Algorithm works in the 1 st Quadrant of the unit circle. Appropriate logic added to take care of the polarity EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 7

8 MAC Implementation Pipelined CORDIC gives Sin/cosine values in every cycle if we can maintain steady inflow of inputs. Can implement a MAC based engine based on this CORDIC functionality. Useful in Linear Time variant Control Systems where the coefficients may be sine/cosine values which need to be computed & accumulated Simple example: Discrete Fourier Transform EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 8

9 Design of DFT 32 point of DFT implemented using CORDIC based MAC. Samples sent to the board from the user application. Instantiated one copy of the Cordic based MAC. The design was pipelined to avoid any bubbles providing new input (angle) to the CORDIC every cycle. EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 9

10 Block Diagram of our System EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 10 CORDIC DFT sin (θ) cos (θ) (θ) CORDIC Gain MAC Engine Input Samples DFT Top Level

11 Operation of the System User Application writes the 32 data samples to the RAM followed by a “compute_dft” instruction. Data is read from the RAM by the DFT encoder in a pipeline. Handshaking between two pipelined stages. – MAC operation begins after a delay of 16 clks (initial latency of CORDIC pipeline). – 1 st MAC output generated after N clocks after the initial Latency. (N == 32) is length of the input sequence. – After MAC generates N output samples, the result of the N-point DFT is written to the RAM module followed by an Interrupt. – User application reads the results from the RAM through the device driver on detection of this interrupt. EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 11 User application writes i/p to RAM Initial CORDIC latency Time ------ > MAC Operation begins 1 st MAC output sample N Final o/p from MAC

12 Performance Measurements EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 12

13 Issues Faced Coding a aggressive pipeline (avoid bubbles) is always a challenge. Time consuming process – needs to be done in 2 steps – Code and validate in ModelSim (signals available for debug) – Change the design to run in it on FPGA. Iterate for all modules. Design need to be aware of the memory timing issues (e.g. – back-to-back writes from FPGA to RAM is a problem) Calculating the correct polarity of CORDIC output samples. EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 13

14 Future scope Extending to 256 bit DFT.. Cannot extend to higher because resolution of CORDIC is low.. Need to increase cordic resolution EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 14

15 Lessons Learnt Debug on FPGA is interesting!! EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 15

16 Thank You!! No Questions!!! Please!! :x :p EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 16


Download ppt "Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt."

Similar presentations


Ads by Google