Presentation is loading. Please wait.

Presentation is loading. Please wait.

TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.

Similar presentations


Presentation on theme: "TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept."— Presentation transcript:

1 TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept. of EE, Univ. of Washington

2 TangP187_MAPLD2004 2 Outline The RADAR architecture Why coarse-grained programmable architectures Features of the RADAR architecture Examples of FIR filter Benchmarks Radiation Hardening of RADAR SETs in combinational logic and pipeline registers Register filtering technique

3 TangP187_MAPLD2004 3 Current Commercial FPGAs – “One Size Fits All” Flexibility -- they can implement any digital function Commodities – not cheap ones, but not near as expensive as ASICs to design and fabricate Fewer man hours to design than ASICs Reprogrammable in situ – allowing updates and bug fixes to be made easily

4 TangP187_MAPLD2004 4 Downside of “One Size Fits All” Power can be 10X that of an ASIC that performs the same function Area/weight can be many times an equivalent ASIC Performance may not meet requirements Varying degrees of susceptibility to radiation effects –Particularly as process feature sizes decrease

5 TangP187_MAPLD2004 5 A Critical Observation! An FPGA in a given system will generally be used only for a limited set of related functions Example: an FPGA that performs high-throughput DSP applications, e.g. a FIR filter - May be reprogrammed to perform a variant of the FIR, e.g. different number of taps, or IIR - But not a totally different operation, e.g. random logic required for a control block - Result for this example is that all the fine-grained, bit-level flexibility in an FPGA is wasted

6 TangP187_MAPLD2004 6 Is There a Better Way? If we can identify the domain of applications that will be used in a given environment ….. Then we can create a customized programmable device (CPD) that will : Approach ASIC performance in terms of power, area and throughput Retain sufficient programmability to enable all applications within the domain

7 TangP187_MAPLD2004 7 ASIC/CPD/FPGA Comparison Flexibility ASICs FPGAs Area/Power ASICs FPGAs Customized PD

8 TangP187_MAPLD2004 8 RADAR is a Programmable Device Customized for DSP Based upon Reconfigurable Pipelined Datapaths (RAPID) Linear bus-based datapath (as opposed to crossbar) –Provides efficient local interconnect, which is dominant in DSP applications Many registers (in the right places) to allow intensive pipelining Combination of static and dynamic control –Static to determine the particular application –Dynamic to control multiple phases within the application D. Cronquist, P. Franklin, C. Fisher, M. Figueroa and C. Ebeling, “Architecture Design of Reconfigurable Pipelined Datapaths,” 20 th Anniversary Conf. On Advanced Research in VLSI, 1999.

9 TangP187_MAPLD2004 9 Example of RADAR Datapath 4 cells – each containing local memory, multiply, ALU and register plus input and output streams

10 TangP187_MAPLD2004 10 Bus Multiplexor and Drivers

11 TangP187_MAPLD2004 11 Bus Connectors

12 TangP187_MAPLD2004 12 Example #1 – 4 Tap FIR Filter Given a vector of coefficient weights Compute the dot product of the coefficient weights and a vector of inputs Easily maps to a linear pipeline Following slides courtesy of Carl Ebeling, Dept. of CSE, UW

13 TangP187_MAPLD2004 13 RADAR Datapath Programmed for 4-tap FIR filter

14 TangP187_MAPLD2004 14 RADAR Performance Benchmarks Assume 16 RaPiD cells each containing a 16X16 multiplier, and 16 bit buses in communication network Applications: 8x8 DCT, motion estimation, FIR filter, matrix multiply, 2D Convolution Experiments so far in 0.18 micron CMOS show that 1GHz is achievable, giving 16 GOPs

15 TangP187_MAPLD2004 15 Common Techniques for SETs TMR-in-Hardware for logic and memory 3X in power/area Voting circuitry must be hardened Using larger gate widths Increased current flow suppresses transients Also increases power/area Equivalent to using feature sizes of previous generation processes Adding resistors and capacitors Low pass filtering of SETs Increases power/area In general, circuit design techniques such as these increase area, delay and power, are difficult to design, and do not transfer well between processes!

16 TangP187_MAPLD2004 16 TMR-in-Time and SETs A single event transient in a pipelined computation may be filtered using TMR-in-time, a simple temporal voting scheme: Same data is applied on successive clock cycles, resulting in three threads of computation followed by a majority function

17 TangP187_MAPLD2004 17 TMR-in-Time This simple scheme works -- providing transients are no longer than the clock period It suffers from a ~3X latency relative to the singlet (unhardened) circuit, but requires one third the hardware of the TMR-in-hardware approach. In the RADAR architecture (where throughput is determined by the number of clock cycles that critical functional units are busy), throughput is the same for TMR-in-time and TMR-in-hardware TMR-in-time approaches the singlet (unhardened) case in energy consumption per computation. Data switching activity occurs only during the first of three cycles! Of course, clock power is 3X that of the singlet.

18 TangP187_MAPLD2004 18 Filtering SETs at Registers Sampling data at every register and applying the majority function yields an optimized form of TMR-in-time. D. Mavis and P. Eaton, “Soft Error Rate Mitigation Techniques for Modern Microcircuits,” Proc. of the 40 th Annual Int. Reliability Physics Symposium, 2002, pp. 216-225

19 TangP187_MAPLD2004 19 Filtering SETs at Registers (cont.) If the delay of the transient is less than the clock separation time DT, only one of the three registers will latch incorrect data and the majority function will filter it out Note that SETs created in majority function itself will be filtered out at the following register. By increasing DT, the circuit can be made immune to transients caused by radiation of increasing LET values. The means of generating clocks delayed by DT can be made a programmable feature in the architecture. i.e. the degree of radiation hardening is programmable!

20 TangP187_MAPLD2004 20 Power/Throughput Comparisons of Hardening Techniques  As applied to a fixed size RADAR array  Implementation assumes static CMOS  Throughput is measured in output data values / unit time

21 TangP187_MAPLD2004 21 Application of Register Filtering to RADAR Register filtering is well suited to the RADAR architecture Better power/throughput characteristics than other methods The degree of radiation hardening can be programmable through adjustment of DT

22 TangP187_MAPLD2004 22 Summary RADAR –Programmable architecture customized for DSP applications –Capable of 16 GOPS in 0.18 micron CMOS Radiation hardening of combinational logic –Using register filtering –Achieves near-ideal power/throughput characteristics –Degree of radiation hardness programmable


Download ppt "TangP187_MAPLD2004 1 High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept."

Similar presentations


Ads by Google