Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.
David Hansen and James Michelussi
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Give qualifications of instructors: DAP
Distributed Arithmetic
ECE 734: Project Presentation Pankhuri May 8, 2013 Pankhuri May 8, point FFT Algorithm for OFDM Applications using 8-point DFT processor (radix-8)
University Of Vaasa Telecommunications Engineering Automation Seminar Signal Generator By Tibebu Sime 13 th December 2011.
ECE 353 Introduction to Microprocessor Systems Michael G. Morrow, P.E. Week 14.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
1 Lecture 2: Review of Computer Organization Operating System Spring 2007.
Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri.
Hypercomputing With the CORDIC Algorithm
Chapter 15 Digital Signal Processing
Digital Kommunikationselektronik TNE027 Lecture 3 1 Multiply-Accumulator (MAC) Compute Sum of Product (SOP) Linear convolution y[n] = f[n]*x[n] = Σ f[k]
The central processing unit and main memory chapter 4, Exploring the Digital Domain The Development and Basic Organization of Computers.
SIMULINK Dr. Samir Al-Amer. SIMULINK SIMULINK is a power simulation program that comes with MATLAB Used to simulate wide range of dynamical systems To.
White and Gloster P741 An Implementation of the Discrete Fourier Transform on a Reconfigurable Processor By Michael J. White 1,2* and Clay Gloster, Jr.,
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Final presentation Encryption/Decryption on embedded system Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Winter 2013 Part A.
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Sub- Nyquist Sampling System Hardware Implementation System Architecture Group – Shai & Yaron Data Transfer, System Integration and Debug Environment Part.
Tightly coupled INS/GPS system using particle filter D0928- system architecture and math functions Part A - Final presentation Students: Royzman Danny.
Live Action First Person Shooter Game Patrick Judd Ian Katsuno Bao Le.
Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.
Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project. Date:22/4/12 Technion – Israel Institute of Technology Faculty.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
Department of Computer Systems Engineering, N-W.F.P. University of Engineering & Technology. DSP Presentation Computing Multiplication & division using.
Efficient FPGA Implementation of QR
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
How computers work The CPU & Memory. The parts of a computer.
ECE 448: Lab 6 DSP and FPGA Embedded Resources (Digital Downconverter)
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU CORDIC (Coordinate rotation digital computer) Ref: Y. H. Hu, “CORDIC based VLSI architecture.
Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor.
Final Presentation Final Presentation OFDM implementation and performance test Performed by: Tomer Ben Oz Ariel Shleifer Guided by: Mony Orbach Duration:
Implementing and Optimizing a Direct Digital Frequency Synthesizer on FPGA Jung Seob LEE Xiangning YANG.
Implementation of a noise subtraction algorithm using Verilog HDL University of Massachusetts, Amherst Department of Electrical & Computer Engineering,
CORDIC Algorithm COordinate Rotation DIgital Computer Method for Elementary Function Evaluation (e.g., sin(z), cos(z), tan -1 (y)) Originally Used for.
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Station Board Testing EVLA Correlator S/W F2F 3-4 April 2006 D. Fort.
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
CORDIC-Based Processor
November 29, 2011 Final Presentation. Team Members Troy Huguet Computer Engineer Post-Route Testing Parker Jacobs Computer Engineer Post-Route Testing.
Lecture 1: Review of Computer Organization
Muhammad Shoaib Bin Altaf. Outline Motivation Actual Flow Optimizations Approach Results Conclusion.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
CORDIC Algorithm COordinate Rotation DIgital Computer
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Teaching Digital Logic courses with Altera Technology
Group 1 chapter 3 Alex Francisco Mario Palomino Mohammed Ur-Rehman Maria Lopez.
VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High- Speed Image Computing - End Presentation Presentor: Eyal Vakrat Instructor:
Progression in KS3/4 Algorithms MONDAY 30 TH NOVEMBER SUE SENTANCE.
Presentation III Irvanda Kurniadi V. ( )
CORDIC (Coordinate rotation digital computer)
CORDIC Based 64-Point Radix-2 FFT Processor
1 Computer System Overview Chapter 1. 2 Operating System Exploits the hardware resources of one or more processors Provides a set of services to system.
CORDIC (Coordinate rotation digital computer)
Processor Organization and Architecture
Highly Efficient and Flexible Video Encoder on CPU+FPGA Platform
Centar ( Global Signal Processing Expo
Final Project presentation
Presentation transcript:

Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt

Motivation The TLL 5000 Freescale i.MX21 System-on-Chip ARM9-based processor does not have native support for Floating Point Floating point operation simulated using libraries e.g libc Applications which are “Math Heavy” e.g MAC based operations which require computing sine/cos/arctan values are thus not suitable for this platform. EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 2 Hardware Acceleration for Trigonometric Math operations

Outline Select a basic mathematical building block. E.g CORDIC (from OpenCores) Implement the CORDIC engine in hardware (FPGA). Implement higher level primitives e.g Discrete Fourier Transform, using CORDIC. Use these blocks in a C program instead of the. Offload the heavy number crunching to the hardware accelerator (FPGA) freeing up valuable CPU resources. EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 3

CORDIC engine EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 4 Coordinated Rotation Digital Computer is simple and efficient algorithm to calculate hyperbolic and trigonometric functions. We use it to calculate Sine and Cosine of an angle given in Radians/Degrees. To determine the Sine and Cosine of angle β we need to find the position X and Y on the unit circle.

CORDIC contd. CORDIC is an iterative algorithm and used table lookup. First Step: Rotate the vector 45° counterclockwise. If ((β – α) != 0) iterate Else exit. Successive iteration will rotate the vector in one or the other direction in size decreasing steps. The magnitude of rotation is 1/2 i. – Where “i” is the iteration step. Terminate after 16 steps. (approximate 5 digits of precision) EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 5

Discrete Fourier Transform(DFT) EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 6 DFT can be implemented using CORDIC

Design of CORDIC The CORDIC Verilog from OpenCores could be operated in different modes – Pipelined – Iterative – Combinatorial Pipeline  Efficient from performance perspective. We trade off area for performance. (max number of LUT needed) – Outputs result at every clock after an initial latency. Resolution  limited to 5 bits of precision Algorithm works in the 1 st Quadrant of the unit circle. Appropriate logic added to take care of the polarity EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 7

MAC Implementation Pipelined CORDIC gives Sin/cosine values in every cycle if we can maintain steady inflow of inputs. Can implement a MAC based engine based on this CORDIC functionality. Useful in Linear Time variant Control Systems where the coefficients may be sine/cosine values which need to be computed & accumulated Simple example: Discrete Fourier Transform EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 8

Design of DFT 32 point of DFT implemented using CORDIC based MAC. Samples sent to the board from the user application. Instantiated one copy of the Cordic based MAC. The design was pipelined to avoid any bubbles providing new input (angle) to the CORDIC every cycle. EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 9

Block Diagram of our System EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 10 CORDIC DFT sin (θ) cos (θ) (θ) CORDIC Gain MAC Engine Input Samples DFT Top Level

Operation of the System User Application writes the 32 data samples to the RAM followed by a “compute_dft” instruction. Data is read from the RAM by the DFT encoder in a pipeline. Handshaking between two pipelined stages. – MAC operation begins after a delay of 16 clks (initial latency of CORDIC pipeline). – 1 st MAC output generated after N clocks after the initial Latency. (N == 32) is length of the input sequence. – After MAC generates N output samples, the result of the N-point DFT is written to the RAM module followed by an Interrupt. – User application reads the results from the RAM through the device driver on detection of this interrupt. EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 11 User application writes i/p to RAM Initial CORDIC latency Time > MAC Operation begins 1 st MAC output sample N Final o/p from MAC

Performance Measurements EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 12

Issues Faced Coding a aggressive pipeline (avoid bubbles) is always a challenge. Time consuming process – needs to be done in 2 steps – Code and validate in ModelSim (signals available for debug) – Change the design to run in it on FPGA. Iterate for all modules. Design need to be aware of the memory timing issues (e.g. – back-to-back writes from FPGA to RAM is a problem) Calculating the correct polarity of CORDIC output samples. EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 13

Future scope Extending to 256 bit DFT.. Cannot extend to higher because resolution of CORDIC is low.. Need to increase cordic resolution EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 14

Lessons Learnt Debug on FPGA is interesting!! EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 15

Thank You!! No Questions!!! Please!! :x :p EE382N-4 Abhik Bhattacharya,Mrinal Deo, Raghunandan R.K, Samir Dutt 16