Presentation is loading. Please wait.

Presentation is loading. Please wait.

VLSI Algorithmic Design Automation Lab. 1 Integration of High-Performance ASICs into Reconfigurable Systems Providing Additional Multimedia Functionality.

Similar presentations


Presentation on theme: "VLSI Algorithmic Design Automation Lab. 1 Integration of High-Performance ASICs into Reconfigurable Systems Providing Additional Multimedia Functionality."— Presentation transcript:

1 VLSI Algorithmic Design Automation Lab. 1 Integration of High-Performance ASICs into Reconfigurable Systems Providing Additional Multimedia Functionality This material is based on Paper of H. Blume, H.-M. Blüthgen, C. Henning and P. Osterloh, in 2000.

2 VLSI Algorithmic Design Automation Lab. 2 Introduction Approaches : r CardBus-based coprocessor board m Integration of additional high-performance multimedia component into computer system using reconfigurable coprocessor board r Reconfigurable computing m Adaptation to a range of different application m varying processing parameter (e.g. coefficient) r Dedicated ASIC providing enough computational power m Acceptable response times Constitution : r EPLD (Embedded Programmable logic) or FPGA m Allowing in-system programmability m For controlling functionality r Memory device r CardBus m interface Connected to PCI bus Up to 132 Mbytes/s m Small and ideal for mobile computer system m Hot plug-in : insertion into running system Dynamic reconfigurable r Coprocessor m Mounted on a socket on the board r Computational component like DSP

3 VLSI Algorithmic Design Automation Lab. 3 Cardbus based evaluation System First step for realization : r Cardbus interface m Control and data transmission r EPLD m Controller on the coprocessor board exchanging data b/w CardBus and other on-board components r Configuration of EPLD m Configuration flash memory m Via a JTAG r ASIC mounted on a socket Next step : r Dedicated coprocessor board m Removed Socket and directly mounted ASIC r Ball Grid Array r Flash memory replace SDRAM r Hybrid reconfigurable platform m ASIC can be used to relieve the DPSs m EPLD control ASIC, DSP, and on- board data flow Execution of basic application-specific task

4 VLSI Algorithmic Design Automation Lab. 4 ASICs - highly optimized macro Histogram Processor: r Scalable with respect to throughput rate and power consumption m By a suitable choice of stage number and stage sizes Two-dimensional Transversal filter : r Parameterizable concerning sample and coefficient wordlength, window size r High utilization by time-sharing

5 VLSI Algorithmic Design Automation Lab. 5 Performance Analysis Text processor ASIC : r Text search m Classical edit distance computation m Handling of wildcard m Recoding of the text to handle special idiomatic properties m Integration of multi-token matching Benchmark – software & Hardware : r 1MByte text file, 8 search words with 8chracters m General-purpose processor : Ultra Sparc I, 167MHz m VLIW signal processor : Philip TRIMEDIA TM-1000, 100MHz : ILP (Instruction Level Parallelism) of 3 m Next generation processor : TRIMEDIA, 64 bit, 166 MHz, ILP of 5 m PLD-based implementation of system for searching DNA sequence in genome database m Text processor ASIC r ASIC : sufficient throughput and adequate flexibility

6 VLSI Algorithmic Design Automation Lab. 6 Partitioning methodology for dynamically reconfigurable embedded systems This material is based on Paper of J. Harkin, T.M. McGinnity and L.P. Maguire presented in IEE Proceeding 2000.

7 VLSI Algorithmic Design Automation Lab. 7 Introduction Approaches to the Partitioning : r Partitioning : allocation of the system resources r Hard-wired ASIC to improve implementation efficiency r Introduction of FPGAs to embedded system m Higher levels of performance and flexibility m Increase computational power by customizing the reconfigurable platform H/W & S/W partitioning issue: r Automation of approach r Granularity level of partitioning r Flexibility of implementing different types of operations r Memory requirement, method of obtaining runtime execution value, profiling level r Target hardware Methodologies : r Partitioning application r Estimating performance r Resource-limited embedded system

8 VLSI Algorithmic Design Automation Lab. 8 Related Work Strategy : r Column labeled “desire” r Codesign stage when no H/W design have been performed r Best speedup without increasing system resource : the use of RTR r Runtime reconfigurable of noncached FPGA

9 VLSI Algorithmic Design Automation Lab. 9 Method -I Assumption: r Target embedded system m One fixed processing device(166MHz pentium) m One reconfigurable device(Xilinx XC6216) r Partitioning approach is only valid for H/W r Single candidate can reside m The benefits of global RTR m H/W parallelism within a candidate r Concurrent H/W and S/W execution is not considered r Do not deal with preemptive scheduling m Non-reactive embedded system

10 VLSI Algorithmic Design Automation Lab. 10 Method -II Detection of candidates and software runtime : r Hardware candidate : identify at the high language level, C++ r Performance estimation at abstract level r C/C++ to Verilog or VHDL synthesis at later date r Detection process m Textually scanning for nested FOR and WHILE loops : coarse approach r Timer m Determination of execution time in software Memory analysis and cost evaluation: r Three different memory location m Access time overhead : Main memory > local memory > stored (hard-wired) within reconfig. device r Textually scanning : the number of memory access m Internal (data used exclusively within candidate) m External (data accessed external to candidate)

11 VLSI Algorithmic Design Automation Lab. 11 Method - III Hardware execution and reconfigurable time : r Estimation by modeling each line of code (instruction) m In terms of simple temporary parallel macro m 32-bit full adder on XC6216 r Assumption m All arithmetic operations can be realized through adder and register CORDIC Estimate of application speedup : r Modified version of Amdahl’s speedup metric m Automatic local clock gating & 3 user- controlled idle modes r The use of Global RTR m Reduce the memory latency : Tm setup time m Potential speedup : value for Tr Partial reconfiguration

12 VLSI Algorithmic Design Automation Lab. 12 Result r The effect of global RTR m Improvement of speedup : commonality of memory data b/w adjacent candidate in the sequence reduce the latency m Best speedup : all candidates are partitioned to hardware r Near optimal speedup selecting a sequence (not all candidate) m Large design cost in hardware m Exhaustive search of all the possible combination loosely coupled and tightly coupled

13 VLSI Algorithmic Design Automation Lab. 13 Result & Conclusion Local RTR : r Partial reconfiguration r Upper limit of speedup m Reducing memory latency m Reducing configuration latency Local RTR by exploiting the commonality among scheduled candidate FPGA circuit design. m If overhead is reduced to zero, then upper limit speedup Conclusion : r Global RTR m Exploit functional density of the limited hardware resources r Local RTR m Further improvement in performance m Exploit the programing of hardware resources through local RTR


Download ppt "VLSI Algorithmic Design Automation Lab. 1 Integration of High-Performance ASICs into Reconfigurable Systems Providing Additional Multimedia Functionality."

Similar presentations


Ads by Google