Presentation on theme: "Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC"— Presentation transcript:
1Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC Intro: Traditionally we have a dsp, and it interacts with other modules, usual other asics. Then we have SOCs, integrate other logics to improve latency. Now we have FPGAs, added reconfiguration. Well, we want to integrate that too. SOPCs: system on a programmable chip. This is what the NIOS II is suppose to do. What happens when we want to integrate a dsp on an sopc system. (we have a thing called a hard processor)
2Outline What is a “Soft” Processor What is the NIOS II? Architecture for NIOS II, what are the implicationsTigerSHARC VS. NIOS IIPipeline IssuesIssues related to FIRHardware acceleration, using FPGA logicYay outline! Basically, the concept, how it looks like in software
3What’s is a “Soft” Processor? Processor implemented in VHDL, Verilog, etc., and downloaded onto FPGA hardwareCan implement many parallel processors on one FPGACan use addition FPGA resources on the same chip that is not part of the processor core.NIOS II is a “Soft” ProcessorSimilar to how a verilog wire circuit can be put on a fpga to allow for high configurability, a soft processor is a processor implemented on a fpga. This is different than a hard processor, which is a processor implemented in hardware. Soft processor is a logical schematic (software) that can be loaded onto any fpga. So a soft processor isn’t really a processor, but just a schematic (or code like software). This gives it all the advantages of software such as giving updates and improving the development cycle.Well, why do you want to do this? Isn’t an fpga slower clocked, high power consumption…
4Why “Soft” Processor? Higher level of design reuse Reduced obsolescence riskSimplified design update or changeIncreased design implementation optionsLower latency between processor and FPGA componentsNo, not more power hungry because it can be better customized for the application, slower clocked doesn’t mean slower, it means more has to be done in a cycle, and an fpga allows the developer to customize it to make instructions finish in one cycle. Plus you get all the other advantages.
5What is NIOS II? Software-defined processor The processor core is loaded onto FPGAProgrammed using ‘normal’ programming tools (C, asm), not hardware description languagesCan use the rest of the FPGA hardware for accelerating parts of the codeIt is a special schematic designed by altera that interacts very well with other altera IP mega blocks.
6How Is NIOS II Implemented The custom FPGA logic that interacts with the processor is implemented in Altera Quartus IIThe Avalon Interface bus (common instruction/data bus) is implemented in Quartus IIThe architecture is generated in Quartus II and used for programming in Eclipse IDEWell, if the processor is in software, how do you write programs for it? So are you basically writing software for software? Doesn’t this seem somewhat redundant? Yes, exactly, it does seem a bit redundant. But it is the current model of soft processor right now, perhaps there will be a better programming environment for it later. What you need to do is write the processor (bus and fpga logic) in software first using quartus, make an emulation file, and use that to write your dsp program in ecilipse. (there is no hardware optimizer, like an assembler optimizer)
7Here is what it looks like for quartus Here is what it looks like for quartus. You need to define the schematic. At the top you have your clock source. The middle is your avalon interface, and the bottom is your FPGA logic.
8NIOS II IDE Coding is implemented in Eclipse rather than VisualDSP. Here is your NIOS II IDE environment. Now you take your emulated file and program for it like VDSP. So if the processor is in software, does that mean you can do simulation analysis, and not hardware like in the labs? No… you can run the generated processor on an FPGA and have this connect to the FPGA when it runs.Coding is implemented in Eclipse rather than VisualDSP.
9The Different NIOS II Cores There are 3 cores available from AlteraNIOSII/e: Economical CoreNIOSII/s: Standard CoreNIOSII/f: Fast CoreSo exactly, what does altera give you as the basic architecture for you to customize?3 cores of different features. Here are the specs…
10What’s the Difference between the Cores? Notice it is very similar to a MIPS processor we learned in other classes.An LE is equivalent to a 8-1 NAND gate + 1 D-Flip FlopAn ALM is equivalent to 2 LE’s
11Comparison of TigerSHARC and NIOS II architecture
12TigerSHARC Architecture Print off sheet to list the architecture features
13NIOS II ArchitecturePrint sheet to list of architectureAll the ports on the right actually share one bus, the avalon archtecture.-thirty two 32-bit general registers, six 32-bit control registers-variable cache based on how much FPGA space you have-ALU- 32bit two input to one input, does shifts, logic and arithmetic. Shifter is not separate like TigerSHARC
14Avalon Interface -separate address, data and control lines -separate address, data and control lines. No need to decode data for address.-up to 1024-bit data width transfer, can be set to any width (not power of 2)-synchronous operation-dynamic bus sizing: this means no design consideration when address items that have different bus widths.-one transfer per clock cycle.-The Avalon Interface is basically an interface that creates a common interface from different interfaces of the all the memory and peripheral components of the system.Are there bus issues because it’s one common interface? No… it’s a special inteface. With dedicated memory ports.-separate address, data and control lines-up to 1024-bit data width transfer, can be set to any width (not power of 2)-one transfer per clock cycle.
15NIOS II/f pipeline Six stages One instruction can be dispatched and/or retired pre cycleDynamic branch prediction: 2-bit branch history table (no BTB like in TigerSHARC)
16NIOS II/f pipeline The pipeline stalls for: Multi-cycle instructions Cache missesData dependencies (2 cycles between calculating and using result)Mispredicted branch penalty: 3 cycles
18Hardware multiplyCan use different options for multiplier (at the processor design stage)No h/w multiply (saves FPGA gates)Speed depends on algorithmUse embedded multipliers (if FPGA has those)1-5 cycles (depends on FPGA)Implement multipliers on FPGA gates11 cyclesDivision 4-66 cycles on hardware
19Compare to TigerSHARC No support for parallel instructions No support for SIMD operationsMulticycle instructions stall the pipelineAll the above limitations can be overcome by using FPGA space unoccupied by the processor itself
20Comparison of NIOS II and TigerSHARC on an FIR Algorithm
22Speed analysis i = 8 1 load data 2 load coefficient 3 i-- 4 coeffPt++ movi r4,8i = 81Loop:ldw r2,0(r6)load data2ldw r3,0(r7)load coefficient3addi r4,r4,-1i--4addi r6,r6,4coeffPt++5mul r2,r2,r3data = data * coeff6addi r7,r7,-4dataPt--7stalldata stall – waiting for multiplication result8add r5,r5,r2output += data9bne r4,zero,0x10002a0will mispredict 2 times in the beginning, and 1 time in the end of the loop (waste 3 cycles each time)
23Speed analysis9 cycles per iteration except the first two (branch predicted not taken) and the last (branch predicted taken) – those will be 9+3=12 cycles1 data stall – can remove by moving instruction from line 4 to 7Speed: 8 cycles * (N-3) + 11 cycles * 3 =8*(N-3)+33 cyclesFor 1024-tap FIR: 8201 cyclesClock cycle is 3 times longer (200MHz vs 600MHz)
24Speed comparison8201 NIOS II cycles equivalent to TigerSHARC cyclesLab3 timing:56000 cycles Debug mode13000 unoptimized ASM4000 Optimized ASMWorse than unoptimized assembly, but no hardware acceleration used, so this is not that bad
25Hardware Acceleration Profiling tool in Eclipse can show how long each function takesIf function takes too long, it can be sped up byCustom instructionsHardware AccelerationHardware Acceleration is to take the function and transform it into FPGA circuitry
26Hardware Acceleration Can be done using C2H compiler from AlteraTrades off Logic Size for Speed up.Table 1. User Application Results ExampleAlgorithmSpeed Increase (vs. Nios II CPU)System fMAX (Mhz)System Resource Increase (1)Autocorrelation41.0x115124%Bit Allocation42.3x110152%Convolution Encoder13.3x95133%Fast Fourier Transform (FFT)15.0x85208%High Pass Filter42.9x181%Matrix Rotate73.6x106%RGB to CMYK41.5x12084%RGB to YIQ39.9x158%
27Conclusion“Soft” Processors such as the NIOSII offers another alternative in the embedded system scene.The NIOSII offers the advantage of added configurability, and customization that blur the line between FPGAs and DSPsCost Vs. Performance:niosII package $495for a year + $150 for cyclone II fpga, C2H is $3000/computer TigerSharcVDSP is $3500/computer + $750 for evaluation board tigerSHARC
28References Describes an FPGA-DSP project based on Altera Nios  Official Nios II page  DSP or FPGA? What is better when?  Article from Xilinx about FPGA DSPs  Community forum for NIOS  NIOSII Processor Handbook –Altera Corporation  Avalon Memory-Mapped Interface Specifications – Altera Corporation  ADSP-TS201S 500/600 MHz TigerSHARC Processor with 24 Mbit on-chip embedded DRAM