IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Instruction Set Extensions for Computation on Complex Floating Point Numbers Authors:

Slides:



Advertisements
Similar presentations
Nios Multi Processor Ethernet Embedded Platform Final Presentation
Advertisements

Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
CS/COE1541: Introduction to Computer Architecture Datapath and Control Review Sangyeun Cho Computer Science Department University of Pittsburgh.
Philips Research ICS 252 class, February 3, The Trimedia CPU64 VLIW Media Processor Kees Vissers Philips Research Visiting Industrial Fellow
Computer Science and Engineering Laboratory, Transport-triggered processors Jani Boutellier Computer Science and Engineering Laboratory This.
Superscalar and VLIW Architectures Miodrag Bolic CEG3151.
Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
MP3 Optimization Exploiting Processor Architecture and Using Better Algorithms Mancia Anguita Universidad de Granada J. Manuel Martinez – Lechado Vitelcom.
Embedded Streaming Media with GStreamer and BeagleBoard ESC-228 Presented by Santiago Nunez santiago.nunez (at) ridgerun.com.
Fourier Transforms and Their Use in Data Compression
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
The University of Adelaide, School of Computer Science
Parallell Processing Systems1 Chapter 4 Vector Processors.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
361 div.1 Computer Architecture ECE 361 Lecture 7: ALU Design : Division.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Source Code Optimization and Profiling of Energy Consumption in Embedded System Simunic, T.; Benini, L.; De Micheli, G.; Hans, M.; Proceedings on The 13th.
Center for Embedded Computer Systems Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Intensive.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University.
COM181 Computer Hardware Ian McCrumRoom 5B18,
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Digital Signal Processors for Real-Time Embedded Systems By Jeremy Kohel.
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
1 of 23 Fouts MAPLD 2005/C117 Synthesis of False Target Radar Images Using a Reconfigurable Computer Dr. Douglas J. Fouts LT Kendrick R. Macklin Daniel.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
1 2-Hardware Design Basics of Embedded Processors (cont.)
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.
Lab 2 Parallel processing using NIOS II processors
Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.
A Floating Point Divider for Complex Numbers in the NIOS II Presented by John-Marc Desmarais Authors: Philipp Digeser, Marco Tubolino, Martin Klemm, Daniel.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
Teaching Digital Logic courses with Altera Technology
FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal.
Winter-Spring 2001Codesign of Embedded Systems1 Essential Issues in Codesign: Architectures Part of HW/SW Codesign of Embedded Systems Course (CE )
An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.
Low Power Design for a 64 point FFT Processor
1 “A picture speaks a thousand words.” Art By Ranjith & Waquas Islamiah Evening College.
Fang Fang James C. Hoe Markus Püschel Smarahara Misra
Hardware design considerations of implementing neural-networks algorithms Presenter: Nir Hasidim.
Computer Architecture & Operations I
Embedded Systems Design
ECE 3430 – Intro to Microcomputer Systems
Chapter 1: Introduction
Digital Signal Processors
Pipelining: Advanced ILP
Introduction to cosynthesis Rabi Mahapatra CSCE617
Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions Ramkumar Jayaseelan, Haibin Liu, Tulika Mitra School of Computing, National.
STUDY AND IMPLEMENTATION
Morgan Kaufmann Publishers The Processor
Chapter 7 Microarchitecture
Chapter 7 Microarchitecture
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Presentation transcript:

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Instruction Set Extensions for Computation on Complex Floating Point Numbers Authors: Philipp Digeser, Marco Tubolino, Martin Klemm, Daniel Shapiro, Axel Sikora and Miodrag Bolic {digeserp, tubolinm, klemmm, {dshap092,

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Overview Prior Art Complex Floating Point Division Instruction Set Extensions (ISE) Instruction Hardware Software Interface Experiment Performance Evaluation Hardware Resource Utilization Future Work Conclusion

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Prior Art We described the possibility of accelerating scientific observation using ISEs instead of software libraries such as carith In this work we demonstrated this possibility The extension of our prior work can perform several operations (complex addition/subtraction/multiplication/division) which improves the chances of our ISE being widely applicable.

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Complex Floating Point Computations Unlike real multiplication or division, mathematical operations for complex numbers are usually provided by slow software. Consider complex division: Slow 3 Additions/Subtractions 6 Multiplications 2 Divisions

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Complex Floating Point Computations Fast complex computations are necessary – Image and audio manipulation – Multi-antenna – Correlation – Others Example: STSDAS offers math libraries for image analysis, including stsdas.analysis.fourier.carith, which is used to multiply or divide two complex images [1].

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Instruction Set Extension Instruction-Set Extensions, as the name implies, involves the addition of custom instructions to a processors instruction set Generic custom instruction datapath [2]

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Instruction Set Extension An ISE candidate has limited I/O access to the register file. We use multicycle reads/writes from/to the register bank in order to squeeze several operands into the two input-one-output register file [4] The computations can be distributed to one adder, one multiplier and one divider They can be pipelined In case of divide by zero and overflow flags are set Original custom logic block [3]

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Instruction Hardware Operation when n=0 above, n=1 at right.

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Software Interface The designed hardware for complex division can be used easily in assembly (by inline) or C/C++ code as shown below: ALT_CI_COMPLEX_CORE_INST(0, in_A, in_C); out_real = ALT_CI_COMPLEX_CORE_INST(1, in_B, in_D); out_imag = ALT_CI_COMPLEX_CORE_INST(0, 0, 0);

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Experiment h(u,v) is some blurred picture taken by a telescope – Motion blurring: long exposure time and moving of the camera. E.g. hubble g(u,v) illustrates the image aimed to be recovered f(u,v) the failure, called a point spread function, can be calculated out of the known movement of the target h(u,v) g(u,v) f(u,v)

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Experiment To restore the image, they must be transformed into the freq. domain by applying an FFT and back using IFFT This transformation leads to complex arrays in the freq. domain that need to be divided: h(u,v) f(u,v) g(u,v) f(u,v) g(u,v)=h(u,v) G(u,v)=H(u,v)/F(u,v)

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Performance Evaluation ApproachExecution Time (seconds) Loop Overhead (seconds) Speedup SW division ISE accelerated division SW multiplication ISE accelerated multiplication SW addition ISE accelerated addition SW subtraction ISE accelerated subtraction

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Hardware Resource Utilization Considerable The entire system requires 8864 Logic Elements and 27 9-Bit DSP units The complex core requires 2520 Logic Elements and 23 9-Bit DSP units Optimizing the ISE hardware to maximize reuse was essential to limiting the hardware size

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Future Work Adding FFT and IFFT To accelerate other embedded complex mathematics algorithms Correlation of pictures – Instead of doing a slow time domain correlation – Heavy complex multiplication in freq. domain

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Conclusion The designed ISE can be used to accelerate embedded complex mathematics operations Significant Speedup (up to 12)

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers Questions?

IEEEI 2010 ISE for Computation on Complex Floating Point Numbers References [1] Space Telescope Science Institute. (2010) carith. [Online]. Available: [2] ALTERA Corperation. (2007) Nios II custom instruction user guide. [Online]. Available: nios2 multiprocessor tutorial.pdf [3] P. Digeser, M. Tubolino, M. Klemm, D. Shapiro, and M. Bolic, Instruction set extension in the NIOS II: A floating point divider for complex numbers, in CCECE, [4] L. Pozzi and P. Ienne, Exploiting pipelining to relax register-file port constraints of instruction-set extensions, in CASES 05: Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems. New York, NY, USA: ACM, 2005, pp. 2–10.