Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.

Similar presentations


Presentation on theme: "© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010."— Presentation transcript:

1 © 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010

2 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 2 Introduction Lutiac is an experimental soft processor Designed for very small programs  roughly 200 instructions  roughly 200 words of data Take a drastic step to reduce the size of the processor Measure its area and speed Compare to NIOS II

3 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 3 Typical Microprocessor ALU A registersB registers From Outside World To Outside World PC +1 Instruction Memory Decoder To Control Points

4 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 4 Typical Microprocessor Typical Microprocessor consists of:  data path (registers, ALU,...)  controller (PC, instruction memory, decoder) Data path has control inputs  register file read addresses  register file write address  register file write enable  instruction is add/subtract/and/or/copy/...

5 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 5 Control Inputs Control inputs are driven from the decoder Decoder driven from current instruction Current instruction determined by program counter If instruction memory never changes:  current instruction is a constant function of the program counter  so control inputs depend entirely on the value of the program counter

6 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 6 Control Inputs Are Function of PC If we have small programs (≤ 64 total instructions)  program counter only needs 6 bits Each control input is a function of 6 PC bits  could be replaced by a 6-lut Entire decoder is a set of 6-luts Instruction memory isn’t needed at all, and can be removed

7 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 7 Drastic Step - Delete Instruction Memory ALU A registersB registers From Outside World To Outside World PC +1 Instruction Memory Decoder To Control Points X

8 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 8 Lutiac ALU A registersB registers From Outside World To Outside World PC +1 Decoder To Control Points

9 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 9 Another Way to Think About It At the point in a normal soft processor where the instruction is read from the instruction memory: instruction = instruction_memory[pc]; if(instruction is this) do this; if(instruction is that) do that;... Replace by a case statement based on the pc: case(pc) 0:do this; 1:do that; 2:do the other thing;...

10 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 10 Lutiac Implementation Built a very simple prototype 16-bit processor that uses hard-wired programs instead of an instruction memory 3 stage pipeline  decode: sets read addresses on register file  execute: computes results, sets up register file writes  write back: register file write One cycle per instruction

11 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 11 Lutiac Implementation No data memory, just registers  no fixed instruction format, so no hard limit on number of registers One input port from outside world, one output port Simple assembler converts my_program.s file into an equivalent Verilog processor description

12 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 12 Experiments Measure size and speed of Lutiac, varying:  number of different kinds of instructions in the program  size of the program  number of registers used Used Quartus 8.0 (2 years ago now) Stratix IV chips of various sizes, fastest speed grade  Each Stratix IV LAB contains 20 FFs + roughly 10 6-LUTs  Some LABs can be re-configured as 640 bit RAMs known as “MLABs” Will compare to NIOS II at the end, but for now, remember that a medium sized NIOS II uses 58 LABs and 11 M9K rams

13 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 13 Lutiac Size vs. Instruction Mix Each program contains 64 random instructions, chosen from the allowed instruction types

14 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 14 Fmax vs. Instruction Mix

15 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 15 Effect of Program Size Size grows linearly as program size increases beyond 64 instructions, roughly 1 LAB for every 20 additional instructions

16 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 16 Effect of Number of Registers Very large Lutiac (512 random instructions) grows by the number of MLABs needed to hold additional registers Would save area if we used M9Ks instead of MLABs once we needed more than 96 16-bit registers

17 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 17 Scalability of Multiple Lutiac Cores Chained N identical 64 instruction Lutiac cores together  LABs grow by 14.5 per core  Fmax drops as Quartus placement worsens  Ran out of DSP blocks above 256 cores

18 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. Comparison to NIOS II Very inexact  NIOS II is 32 bits, Lutiac is 16 bits  NIOS II also has memory interfaces, caches, traps,... Configure NIOS II systems with 4K bytes of RAM  allows up to 1K words of instructions or data Lutiac has no RAM, all instructions and data in MLABs Lutiac and NIOS II both use four 18x18 multipliers (Multiplier/Accumulate mode) 18

19 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 19 Comparison to NIOS II

20 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 20 Comparison to NIOS II Back of the envelope guess (± factor of 2x) Un-optimized 32-bit Lutiac is nearly twice the size of a 16- bit Lutiac (25 LABs);.75 the speed (177 MHz) 32-bit Lutiac/NIOS IIs speed ratio = (177 / 235) area ratio of Lutiac/NIOS IIs  (25 LABs + DSP) / (58 LABs + 11 M9K RAMs + DSP) =.3 32-bit Lutiac/NIOS IIs throughput/area  (177/235) /.3 = 2.5x 32-bit Lutiac/NIOS IIe throughput/area  NIOS IIe is smallest NIOS, but isn’t pipelined, so has 5 cycles/instruction  (177/368 * 5/1) / ((25 LABs + DSP) / (37 LABs + 6 M9K RAMs)) = 4.5x

21 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 21 Lutiac Disadvantages Limited to very small programs (200 instructions or so) Must re-synthesize circuit every time program changes  instruction memory replaced by LUTs  would need good simulation tools  or a debug version of the processor that did have an instruction memory

22 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 22 Lutiac Advantages Circuit is smaller, less complex than standard soft processor One less stage in the pipeline  no instruction memory read required Program contents are exposed to logic synthesis  data path components that aren’t used will be removed by synthesis  circuit may be smaller and faster

23 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 23 Lutiac Advantages Flexible and powerful  wide range of useful instructions can be available  if not used by program, they will be synthesized away  easy to add specialized instructions if needed Not limited by a fixed instruction word width or encoding  can use as many registers as the program wants

24 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 24 Lutiac Advantages Processor self configures based on program  no “mega-wizard” needed  if multiplier/adder/etc. isn’t used, synthesis will leave it out Data path can adapt to the program Examples:  if program ever references a register immediately after writing to it, create a bypass register; else leave bypass register out of circuit  if multiplier and adder were used in parallel, create a separate copy of the register file for the multiplier; else have it share the adder’s register file

25 © 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 25 Conclusions For small programs, it is possible to build 16-bit soft processors using only 12-25 LABs (plus multiplier)  smaller and faster than smallest 32-bit NIOS II (37 LABs, 6 M9K RAMs)  with instructions/second on the same order as the mid-size NIOS II (58 LABs, 11 M9K RAMs)  size advantage over NIOS II disappears as program size approaches 1000 instructions


Download ppt "© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010."

Similar presentations


Ads by Google