VLSI Algorithmic Design Automation Lab. 1 Integration of High-Performance ASICs into Reconfigurable Systems Providing Additional Multimedia Functionality.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

ECE-777 System Level Design and Automation Hardware/Software Co-design
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
Fault Detection in a HW/SW CoDesign Environment Prepared by A. Gaye Soykök.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
PipeRench: A Coprocessor for Streaming Multimedia Acceleration Seth Goldstein, Herman Schmit et al. Carnegie Mellon University.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Configurable System-on-Chip: Xilinx EDK
Chapter 13 Embedded Systems
Dynamically Reconfigurable Architectures: An Overview Juanjo Noguera Dept. Computer Architecture (DAC-UPC)
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
ECE Department: University of Massachusetts, Amherst Lab 1: Introduction to NIOS II Hardware Development.
Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Codesign Part of HW/SW Codesign of Embedded Systems Course (CE )
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
1 Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems Greg Stitt, Frank Vahid, Shawn Nematbakhsh University.
Mahesh Sukumar Subramanian Srinivasan. Introduction Face detection - determines the locations of human faces in digital images. Binary pattern-classification.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.
1  Staunstrup and Wolf Ed. “Hardware Software codesign: principles and practice”, Kluwer Publication, 1997  Gajski, Vahid, Narayan and Gong, “Specification,
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Benefits of Partial Reconfiguration Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and.
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
Paper Review: XiSystem - A Reconfigurable Processor and System
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
Mahesh Sukumar Subramanian Srinivasan. Introduction Embedded system products keep arriving in the market. There is a continuous growing demand for more.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
RICE UNIVERSITY DSPs for future wireless systems Sridhar Rajagopal.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Fast VLSI Implementation of Sorting Algorithm for Standard Median Filters Hyeong-Seok Yu SungKyunKwan Univ. Dept. of ECE, Vada Lab.
The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.
An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.
Embedded Real-Time Systems
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Reconfigurable Computing1 Reconfigurable Computing Part II.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Programmable Logic Devices
Programmable Hardware: Hardware or Software?
Hiba Tariq School of Engineering
Dynamo: A Runtime Codesign Environment
Evaluating Register File Size
FPGAs in AWS and First Use Cases, Kees Vissers
Anne Pratoomtong ECE734, Spring2002
Dynamically Reconfigurable Architectures: An Overview
Presentation transcript:

VLSI Algorithmic Design Automation Lab. 1 Integration of High-Performance ASICs into Reconfigurable Systems Providing Additional Multimedia Functionality This material is based on Paper of H. Blume, H.-M. Blüthgen, C. Henning and P. Osterloh, in 2000.

VLSI Algorithmic Design Automation Lab. 2 Introduction Approaches : r CardBus-based coprocessor board m Integration of additional high-performance multimedia component into computer system using reconfigurable coprocessor board r Reconfigurable computing m Adaptation to a range of different application m varying processing parameter (e.g. coefficient) r Dedicated ASIC providing enough computational power m Acceptable response times Constitution : r EPLD (Embedded Programmable logic) or FPGA m Allowing in-system programmability m For controlling functionality r Memory device r CardBus m interface Connected to PCI bus Up to 132 Mbytes/s m Small and ideal for mobile computer system m Hot plug-in : insertion into running system Dynamic reconfigurable r Coprocessor m Mounted on a socket on the board r Computational component like DSP

VLSI Algorithmic Design Automation Lab. 3 Cardbus based evaluation System First step for realization : r Cardbus interface m Control and data transmission r EPLD m Controller on the coprocessor board exchanging data b/w CardBus and other on-board components r Configuration of EPLD m Configuration flash memory m Via a JTAG r ASIC mounted on a socket Next step : r Dedicated coprocessor board m Removed Socket and directly mounted ASIC r Ball Grid Array r Flash memory replace SDRAM r Hybrid reconfigurable platform m ASIC can be used to relieve the DPSs m EPLD control ASIC, DSP, and on- board data flow Execution of basic application-specific task

VLSI Algorithmic Design Automation Lab. 4 ASICs - highly optimized macro Histogram Processor: r Scalable with respect to throughput rate and power consumption m By a suitable choice of stage number and stage sizes Two-dimensional Transversal filter : r Parameterizable concerning sample and coefficient wordlength, window size r High utilization by time-sharing

VLSI Algorithmic Design Automation Lab. 5 Performance Analysis Text processor ASIC : r Text search m Classical edit distance computation m Handling of wildcard m Recoding of the text to handle special idiomatic properties m Integration of multi-token matching Benchmark – software & Hardware : r 1MByte text file, 8 search words with 8chracters m General-purpose processor : Ultra Sparc I, 167MHz m VLIW signal processor : Philip TRIMEDIA TM-1000, 100MHz : ILP (Instruction Level Parallelism) of 3 m Next generation processor : TRIMEDIA, 64 bit, 166 MHz, ILP of 5 m PLD-based implementation of system for searching DNA sequence in genome database m Text processor ASIC r ASIC : sufficient throughput and adequate flexibility

VLSI Algorithmic Design Automation Lab. 6 Partitioning methodology for dynamically reconfigurable embedded systems This material is based on Paper of J. Harkin, T.M. McGinnity and L.P. Maguire presented in IEE Proceeding 2000.

VLSI Algorithmic Design Automation Lab. 7 Introduction Approaches to the Partitioning : r Partitioning : allocation of the system resources r Hard-wired ASIC to improve implementation efficiency r Introduction of FPGAs to embedded system m Higher levels of performance and flexibility m Increase computational power by customizing the reconfigurable platform H/W & S/W partitioning issue: r Automation of approach r Granularity level of partitioning r Flexibility of implementing different types of operations r Memory requirement, method of obtaining runtime execution value, profiling level r Target hardware Methodologies : r Partitioning application r Estimating performance r Resource-limited embedded system

VLSI Algorithmic Design Automation Lab. 8 Related Work Strategy : r Column labeled “desire” r Codesign stage when no H/W design have been performed r Best speedup without increasing system resource : the use of RTR r Runtime reconfigurable of noncached FPGA

VLSI Algorithmic Design Automation Lab. 9 Method -I Assumption: r Target embedded system m One fixed processing device(166MHz pentium) m One reconfigurable device(Xilinx XC6216) r Partitioning approach is only valid for H/W r Single candidate can reside m The benefits of global RTR m H/W parallelism within a candidate r Concurrent H/W and S/W execution is not considered r Do not deal with preemptive scheduling m Non-reactive embedded system

VLSI Algorithmic Design Automation Lab. 10 Method -II Detection of candidates and software runtime : r Hardware candidate : identify at the high language level, C++ r Performance estimation at abstract level r C/C++ to Verilog or VHDL synthesis at later date r Detection process m Textually scanning for nested FOR and WHILE loops : coarse approach r Timer m Determination of execution time in software Memory analysis and cost evaluation: r Three different memory location m Access time overhead : Main memory > local memory > stored (hard-wired) within reconfig. device r Textually scanning : the number of memory access m Internal (data used exclusively within candidate) m External (data accessed external to candidate)

VLSI Algorithmic Design Automation Lab. 11 Method - III Hardware execution and reconfigurable time : r Estimation by modeling each line of code (instruction) m In terms of simple temporary parallel macro m 32-bit full adder on XC6216 r Assumption m All arithmetic operations can be realized through adder and register CORDIC Estimate of application speedup : r Modified version of Amdahl’s speedup metric m Automatic local clock gating & 3 user- controlled idle modes r The use of Global RTR m Reduce the memory latency : Tm setup time m Potential speedup : value for Tr Partial reconfiguration

VLSI Algorithmic Design Automation Lab. 12 Result r The effect of global RTR m Improvement of speedup : commonality of memory data b/w adjacent candidate in the sequence reduce the latency m Best speedup : all candidates are partitioned to hardware r Near optimal speedup selecting a sequence (not all candidate) m Large design cost in hardware m Exhaustive search of all the possible combination loosely coupled and tightly coupled

VLSI Algorithmic Design Automation Lab. 13 Result & Conclusion Local RTR : r Partial reconfiguration r Upper limit of speedup m Reducing memory latency m Reducing configuration latency Local RTR by exploiting the commonality among scheduled candidate FPGA circuit design. m If overhead is reduced to zero, then upper limit speedup Conclusion : r Global RTR m Exploit functional density of the limited hardware resources r Local RTR m Further improvement in performance m Exploit the programing of hardware resources through local RTR