Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi.

Similar presentations


Presentation on theme: "Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi."— Presentation transcript:

1 Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi Course: ENGG 6090*6 – Winter07 Date: Apr. 5 th, 2007

2 Outlines Introduction. Parameterized Soft-cores. Micro-architectural Trade-offs and ISA Sub-setting. Fast Application-specific Customization. Conclusion.

3 Resources P. Yiannacouras, J. Steffan and J. Rose, “Exploration and Customization of FPGA-Based Soft Processors” in IEEE Transactions on Computer-aided Design of integrated Circuits and Systems, Vol. 26, NO. 2, Feb. 2007. D. Sheldon, R. Kumar, R. Lysecky, F. Vahid and D. Tullsen, “Application-Specific Customization of Parameterized FPGA Soft-Core Processors” in IEEE/ACM Int. Conf. on Computer- Aided Deisgn, Nov. 2006.

4 Soft-core vs. Hard-core A hard-core processor is laid out on the chip next to the FPGA’s configurable logic fabric A soft-core processor is synthesized onto the FPGA’s fabric, just like any other circuit. soft-core processors advantages: Utilizing standard mass-produced Utilizing standard mass-produced Enabling a custom number of microprocessors Enabling a custom number of microprocessors Soft-core processors disadvantages: Reduced processor performance Reduced processor performance Higher power consumption Higher power consumption Larger size. Larger size.

5 Commercial Soft-cores Xilinx MicroBlaze A 32-bit soft-core processor. A 32-bit soft-core processor. A single-issue in order execution processor. A single-issue in order execution processor. Configurable to five components: multiplier, barrel shifter, divider, floating-point unit (FPU), and data cache. Configurable to five components: multiplier, barrel shifter, divider, floating-point unit (FPU), and data cache. Altera Nios II. It has three mostly unparameterized variations: Nios II/e, a small unpipelined 6 cycles per instruction (CPI) processor with serial shifter and software multiplication; Nios II/e, a small unpipelined 6 cycles per instruction (CPI) processor with serial shifter and software multiplication; Nios II/s, a five-stage pipeline with multiplier-based barrel shifter, hardware multiplication, and instruction cache Nios II/s, a five-stage pipeline with multiplier-based barrel shifter, hardware multiplication, and instruction cache Nios II/f, a large six-stage pipeline with dynamic branch prediction, and instruction and data caches. Nios II/f, a large six-stage pipeline with dynamic branch prediction, and instruction and data caches.

6 Parameterized Soft-cores Configurability. Application Specific. Size, performance and power constraints. Configurable Parameters: Instantiating Functional Units (0,1). Instantiating Functional Units (0,1). Unit-Specific Parameters (Cache type/size). Unit-Specific Parameters (Cache type/size). Instruction Set Architecture. Instruction Set Architecture. Pipelining (Depth). Pipelining (Depth).

7 Exploration and Customization of FPGA- Based Soft Processors Exploration of the micro-architectural tradeoffs for soft processors A set of customization techniques: Tuning the micro-architecture to the application. Tuning the micro-architecture to the application. Subsetting the ISA Subsetting the ISA Hybrid approach Hybrid approach To improve the performance/area of a soft processor for a specific application. A CAD Tool.

8 Approach Developing a customization tool that will generate the most customized soft-core. SPREE (soft-processor rapid exploration environment). Targeting functional unit customization and ISA subsetting.

9 SPREE Input: Textual Description (ISA& Datapath). ISA & datapath verification. Constructing the Datapath. Control Generation. Synthesizable RTL (Verilog)

10 Framework Altera Startix I. Comparison with Nios-II variations (e, s and f) MIPS Instructtion Set. Performance Metrics Area in LE Area in LE Performance in MIPS Performance in MIPS Efficiency in MIPS/LE Efficiency in MIPS/LE Equal weight for performance and area Equal weight for performance and areaBenchmark 20 varied applications (fir, FFT, DES, CRC, QSORT, Bubble- sort) 20 varied applications (fir, FFT, DES, CRC, QSORT, Bubble- sort)

11 SPREE vs. Nios

12 Micro-architecture Exploration (1) Functional Units Shifter Implementation (serial, shared multiplier) Shifter Implementation (serial, shared multiplier) Multiplication (SW, HW). Multiplication (SW, HW).

13 Micro-architecture Exploration (2) Pipelining Depth Depth Organization Organization

14 Micro-architecture Customization 6 micro-architectural axes Exhaustive search for the generated solutions.

15 ISA Subsetting Eliminate the unused instruction Simplify Control Unit  Reduce Area Simplify Control Unit  Reduce Area Less than 50% utilization of the ISA.

16 Impact of ISA subsetting Impact on Area Impact on Performance

17 Results Fine Customization Environment an improvement in performance per area of 14.1% on average across all benchmarks. Combined approach improved the performance per area by 24.5% on average across all applications.

18 Application-Specific Customization of Parameterized FPGA Soft-Core Processors A methodology for fast application-specific customization of a parameterized FPGA soft core. Targeting 1-2 hours Runtime Near-optimal Results Traditional CAD with 0-1 Knapsack Algorithm Traditional CAD with 0-1 Knapsack Algorithm Synthesis-in-the-loop exploration. Synthesis-in-the-loop exploration.

19 Framework Xilinx MB on Virtex-II Pro FPGA Comparison with Base and Full MB Performance Metrics Area in equivalent LUTs Area in equivalent LUTs Performance by the application runtime in (ms) Performance by the application runtime in (ms)Benchmark 11 applications from EEMBC 11 applications from EEMBC

20 Justification

21 Approach-1 Traditional CAD Approach 0-1 knapsack problem Maximize performance Maximize performance Constraint on area Constraint on area 6 synthesis/execution runs

22 Approach-2 Synthesis-in-the-loop pre-determines the impact each parameter individually has on design metrics pre-determines the impact each parameter individually has on design metrics then search the parameters in sequence, ordered from highest impact to lowest. then search the parameters in sequence, ordered from highest impact to lowest. Two orders (fixed-ordered and impact-ordered)

23 Results Exhaustive search took 11 hours. The fixed impact-ordered tree approach had the fastest runtime of 108 minutes. Knapsack algorithm with similar results to the fixed impact-ordered tree approach. Similar results for 50% constraint. No Constraint Fixed 80% constraintPer application 80% constraint

24 Results Reimplementation on Spartan2 FPGA 1.5 hours runtime for the fixed-order impact-ordered tree 200 minutes for the application-specific impact-ordered tree

25 Scalability Increasing the number of parameters Increase the runtime. Increase the runtime. Fixed-order impact-ordered tree and knapsack scale well. Fixed-order impact-ordered tree and knapsack scale well.

26 Conclusion Impact of customization on performance and area. Emphasis on performance. Customizable parameters span the micro-architecture and the ISA. Use of near-optimal solutions to save on runtime. Possibility to look for finer customization, but scalability have to be addressed. Finer customization might consider 0-1 parameters or multi-valued parameters.

27 THANK YOU Q&A


Download ppt "Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi."

Similar presentations


Ads by Google