Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application-Specific Customization of Soft Processor Microarchitecture

Similar presentations


Presentation on theme: "Application-Specific Customization of Soft Processor Microarchitecture"— Presentation transcript:

1 Application-Specific Customization of Soft Processor Microarchitecture
Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Edward S. Rogers Sr. Department of Electrical and Computer Engineering

2 Processors and FPGA Systems
Processors lie at the “heart” of FPGA systems UART Custom Logic Soft Processor Memory Interface Ethernet Performs coordination and even computation Better processors => less hardware to design We seek improvement through customization

3 Motivating Application-Specific Customizations of Soft Processors
FPGA Configurability Can consider unlimited processor variants A soft processor might be used to run either: A single application A single class of applications Many applications, but can be reconfigured Applications differ in architectural requirements Can specialize architecture for each application Platform allows it (1), FPGA designs allow it (2), it can be beneficial (3) We want to evaluate effectiveness of specialization

4 Research Goals To investigate The potential for “Application-tuning”
Tune processor microarchitecture to favour an application Preserve general purpose functionality “Instruction-set Subsetting” Sacrifice general purpose functionality Eliminate hardware not required by application Combination of both methods We still build a complete processor, but pick and choose the components and pipeline to use which is best for the application. Call this Application-tuning Measure efficiency gained through real implementations

5 SPREE System (Soft Processor Rapid Exploration Environment)
Description ISA Datapath Input: Processor description Made of hand-coded components SPREE System Verify ISA against datapath SPREE Datapath Instantiation Control Generation Multi-cycle/variable-cycle FUs Multiplexer select signals Interlocking Branch handling Exports to synthesizable Verilog RTL RTL Output: Synthesizable Verilog

6 Back-End Infrastructure
RTL Benchmarks (MiBench, Dhrystone 2.1, RATES, XiRisc) Modelsim RTL Simulator Quartus II 4.2 CAD Software Stratix 1S40C5 Cycle Count 2. Resource Usage 3. Clock Frequency 4. Power We can measure area/performance/energy accurately

7 Comparison to Altera’s Nios II
Has three variations: Nios II/e – unpipelined, no HW multiplier Nios II/s – 5-stage, with HW multiplier Nios II/f – 6-stage, dynamic branch prediction

8 Architectural Parameters Used in SPREE
Multiplication Support Hardware FU or software routine Shifter implementation Flipflops, multiplier, or LUTs Pipelining Depth (2-7 stages) Organization Forwarding Ideally everything … What can we gain by customizing the actual core? We focus on core microarchitecture

9 SPREE vs Nios II faster smaller 3-stage pipe HW multiply
Multiply-based shifter EXCITEMENT HERE – this graph should be burned in people’s memory faster smaller

10 Exploration of Soft Processor Architectural Customizations
Architectural-tuning Instruction-set subsetting Combination (Arch-tuning + Subsetting)

11 1. Architectural Tuning Experiment
Vary the same parameters Multiplication Support Shifter implementation Pipelining Determine Best overall (general purpose) processor Best per application (application-tuned) Metric: Performance per Area (MIPS/LE) Basically inverse of Area-Delay product Using SPREE, we’ve varied the following

12 Performance per Area of All Processors
32% 14.1%

13 2. Instruction-set Subsetting
SPREE automatically removes Unused connections Unused components Reduce processor by reducing the ISA Can create application-specific processor Eliminate unused parts of the ISA

14 Instruction-set Usage of Benchmarks
Applications do not use complete ISA Dynaminc instruction Strong potential for hardware reduction

15 Area Reduction from Subsetting
23% Fraction of Area Give an example of what this can do for a processor Similar seen for energy Area reduced by 60% in some , 23% on average Similar reductions for energy, small impact on performance

16 3. Combining Application Tuning and Instruction-set Subsetting
Subsetting is effective on its own Can apply subsetting on top of tuning Compare different customization methods Tuning Subsetting Tuning + Subsetting

17 Combining Application Tuning and Instruction-set Subsetting
25% 16% 14% Tuning reduces the waste that subsetting eliminates

18 Summary of Presented Architectural Conclusions
Application tuning 14% average efficiency gain Will increase with more architectural axes Instruction-set Subsetting Up to 60% area & energy savings 16% average efficiency gain Combined Tuning & Subsetting 25% average efficiency gain

19 Future Work Consider other promising architectural axes
Branch prediction, aggressive forwarding ISA changes Datapaths (eg. VLIW) Caches and memory hierarchy Compiler assistance Can improve tuning & subsetting


Download ppt "Application-Specific Customization of Soft Processor Microarchitecture"

Similar presentations


Ads by Google