Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC

Similar presentations


Presentation on theme: "Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC"— Presentation transcript:

1 Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC
Intro: Traditionally we have a dsp, and it interacts with other modules, usual other asics. Then we have SOCs, integrate other logics to improve latency. Now we have FPGAs, added reconfiguration. Well, we want to integrate that too. SOPCs: system on a programmable chip. This is what the NIOS II is suppose to do. What happens when we want to integrate a dsp on an sopc system. (we have a thing called a hard processor)

2 Outline What is a “Soft” Processor What is the NIOS II?
Architecture for NIOS II, what are the implications TigerSHARC VS. NIOS II Pipeline Issues Issues related to FIR Hardware acceleration, using FPGA logic Yay outline! Basically, the concept, how it looks like in software

3 What’s is a “Soft” Processor?
Processor implemented in VHDL, Verilog, etc., and downloaded onto FPGA hardware Can implement many parallel processors on one FPGA Can use addition FPGA resources on the same chip that is not part of the processor core. NIOS II is a “Soft” Processor Similar to how a verilog wire circuit can be put on a fpga to allow for high configurability, a soft processor is a processor implemented on a fpga. This is different than a hard processor, which is a processor implemented in hardware. Soft processor is a logical schematic (software) that can be loaded onto any fpga. So a soft processor isn’t really a processor, but just a schematic (or code like software). This gives it all the advantages of software such as giving updates and improving the development cycle. Well, why do you want to do this? Isn’t an fpga slower clocked, high power consumption…

4 Why “Soft” Processor? Higher level of design reuse
Reduced obsolescence risk Simplified design update or change Increased design implementation options Lower latency between processor and FPGA components No, not more power hungry because it can be better customized for the application, slower clocked doesn’t mean slower, it means more has to be done in a cycle, and an fpga allows the developer to customize it to make instructions finish in one cycle. Plus you get all the other advantages.

5 What is NIOS II? Software-defined processor
The processor core is loaded onto FPGA Programmed using ‘normal’ programming tools (C, asm), not hardware description languages Can use the rest of the FPGA hardware for accelerating parts of the code It is a special schematic designed by altera that interacts very well with other altera IP mega blocks.

6 How Is NIOS II Implemented
The custom FPGA logic that interacts with the processor is implemented in Altera Quartus II The Avalon Interface bus (common instruction/data bus) is implemented in Quartus II The architecture is generated in Quartus II and used for programming in Eclipse IDE Well, if the processor is in software, how do you write programs for it? So are you basically writing software for software? Doesn’t this seem somewhat redundant? Yes, exactly, it does seem a bit redundant. But it is the current model of soft processor right now, perhaps there will be a better programming environment for it later. What you need to do is write the processor (bus and fpga logic) in software first using quartus, make an emulation file, and use that to write your dsp program in ecilipse. (there is no hardware optimizer, like an assembler optimizer)

7 Here is what it looks like for quartus
Here is what it looks like for quartus. You need to define the schematic. At the top you have your clock source. The middle is your avalon interface, and the bottom is your FPGA logic.

8 NIOS II IDE Coding is implemented in Eclipse rather than VisualDSP.
Here is your NIOS II IDE environment. Now you take your emulated file and program for it like VDSP. So if the processor is in software, does that mean you can do simulation analysis, and not hardware like in the labs? No… you can run the generated processor on an FPGA and have this connect to the FPGA when it runs. Coding is implemented in Eclipse rather than VisualDSP.

9 The Different NIOS II Cores
There are 3 cores available from Altera NIOSII/e: Economical Core NIOSII/s: Standard Core NIOSII/f: Fast Core So exactly, what does altera give you as the basic architecture for you to customize? 3 cores of different features. Here are the specs…

10 What’s the Difference between the Cores?
Notice it is very similar to a MIPS processor we learned in other classes. An LE is equivalent to a 8-1 NAND gate + 1 D-Flip Flop An ALM is equivalent to 2 LE’s

11 Comparison of TigerSHARC and NIOS II architecture

12 TigerSHARC Architecture
Print off sheet to list the architecture features

13 NIOS II Architecture Print sheet to list of architecture All the ports on the right actually share one bus, the avalon archtecture. -thirty two 32-bit general registers, six 32-bit control registers -variable cache based on how much FPGA space you have -ALU- 32bit two input to one input, does shifts, logic and arithmetic. Shifter is not separate like TigerSHARC

14 Avalon Interface -separate address, data and control lines
-separate address, data and control lines. No need to decode data for address. -up to 1024-bit data width transfer, can be set to any width (not power of 2) -synchronous operation -dynamic bus sizing: this means no design consideration when address items that have different bus widths. -one transfer per clock cycle. -The Avalon Interface is basically an interface that creates a common interface from different interfaces of the all the memory and peripheral components of the system. Are there bus issues because it’s one common interface? No… it’s a special inteface. With dedicated memory ports. -separate address, data and control lines -up to 1024-bit data width transfer, can be set to any width (not power of 2) -one transfer per clock cycle.

15 NIOS II/f pipeline Six stages
One instruction can be dispatched and/or retired pre cycle Dynamic branch prediction: 2-bit branch history table (no BTB like in TigerSHARC)

16 NIOS II/f pipeline The pipeline stalls for: Multi-cycle instructions
Cache misses Data dependencies (2 cycles between calculating and using result) Mispredicted branch penalty: 3 cycles

17

18 Hardware multiply Can use different options for multiplier (at the processor design stage) No h/w multiply (saves FPGA gates) Speed depends on algorithm Use embedded multipliers (if FPGA has those) 1-5 cycles (depends on FPGA) Implement multipliers on FPGA gates 11 cycles Division 4-66 cycles on hardware

19 Compare to TigerSHARC No support for parallel instructions
No support for SIMD operations Multicycle instructions stall the pipeline All the above limitations can be overcome by using FPGA space unoccupied by the processor itself

20 Comparison of NIOS II and TigerSHARC on an FIR Algorithm

21 Integer FIR algorithm int coeff[]={1, 2, 3, 4, 5, 6, 7, 8}; int data1[] = {1, 0, 0, 0, 0 ,0 ,0 ,0}; int output[8]; int i=0, j=0, k=0; for(k=0; k<8; k++) output[k] =0; for( j =0; j< 8; j++) { for( i= 0; i< 8; i++) output[j] += data1[i]*coeff[7-i]; }

22 Speed analysis i = 8 1 load data 2 load coefficient 3 i-- 4 coeffPt++
movi r4,8 i = 8 1 Loop: ldw r2,0(r6) load data 2 ldw r3,0(r7) load coefficient 3 addi r4,r4,-1 i-- 4 addi r6,r6,4 coeffPt++ 5 mul r2,r2,r3 data = data * coeff 6 addi r7,r7,-4 dataPt-- 7 stall data stall – waiting for multiplication result 8 add r5,r5,r2 output += data 9 bne r4,zero,0x10002a0 will mispredict 2 times in the beginning, and 1 time in the end of the loop (waste 3 cycles each time)

23 Speed analysis 9 cycles per iteration except the first two (branch predicted not taken) and the last (branch predicted taken) – those will be 9+3=12 cycles 1 data stall – can remove by moving instruction from line 4 to 7 Speed: 8 cycles * (N-3) + 11 cycles * 3 = 8*(N-3)+33 cycles For 1024-tap FIR: 8201 cycles Clock cycle is 3 times longer (200MHz vs 600MHz)

24 Speed comparison 8201 NIOS II cycles equivalent to TigerSHARC cycles Lab3 timing: 56000 cycles Debug mode 13000 unoptimized ASM 4000 Optimized ASM Worse than unoptimized assembly, but no hardware acceleration used, so this is not that bad

25 Hardware Acceleration
Profiling tool in Eclipse can show how long each function takes If function takes too long, it can be sped up by Custom instructions Hardware Acceleration Hardware Acceleration is to take the function and transform it into FPGA circuitry

26 Hardware Acceleration
Can be done using C2H compiler from Altera Trades off Logic Size for Speed up. Table 1. User Application Results Example Algorithm Speed Increase (vs. Nios II CPU) System fMAX (Mhz) System Resource Increase (1) Autocorrelation 41.0x 115 124% Bit Allocation 42.3x 110 152% Convolution Encoder 13.3x 95 133% Fast Fourier Transform (FFT) 15.0x 85 208% High Pass Filter 42.9x 181% Matrix Rotate 73.6x 106% RGB to CMYK 41.5x 120 84% RGB to YIQ 39.9x 158%

27 Conclusion “Soft” Processors such as the NIOSII offers another alternative in the embedded system scene. The NIOSII offers the advantage of added configurability, and customization that blur the line between FPGAs and DSPs Cost Vs. Performance: niosII package $495for a year + $150 for cyclone II fpga, C2H is $3000/computer TigerSharc VDSP is $3500/computer + $750 for evaluation board tigerSHARC

28 References [1] Describes an FPGA-DSP project based on Altera Nios [2] Official Nios II page [3] DSP or FPGA? What is better when? [4] Article from Xilinx about FPGA DSPs [5] Community forum for NIOS [6] NIOSII Processor Handbook –Altera Corporation [7] Avalon Memory-Mapped Interface Specifications – Altera Corporation [8] ADSP-TS201S 500/600 MHz TigerSHARC Processor with 24 Mbit on-chip embedded DRAM


Download ppt "Comparison of Altera NIOS II Processor with Analog Device’s TigerSHARC"

Similar presentations


Ads by Google