Download presentation
Presentation is loading. Please wait.
Published byNoah Horton Modified over 9 years ago
1
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010
2
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 2 Introduction Lutiac is an experimental soft processor Designed for very small programs roughly 200 instructions roughly 200 words of data Take a drastic step to reduce the size of the processor Measure its area and speed Compare to NIOS II
3
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 3 Typical Microprocessor ALU A registersB registers From Outside World To Outside World PC +1 Instruction Memory Decoder To Control Points
4
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 4 Typical Microprocessor Typical Microprocessor consists of: data path (registers, ALU,...) controller (PC, instruction memory, decoder) Data path has control inputs register file read addresses register file write address register file write enable instruction is add/subtract/and/or/copy/...
5
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 5 Control Inputs Control inputs are driven from the decoder Decoder driven from current instruction Current instruction determined by program counter If instruction memory never changes: current instruction is a constant function of the program counter so control inputs depend entirely on the value of the program counter
6
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 6 Control Inputs Are Function of PC If we have small programs (≤ 64 total instructions) program counter only needs 6 bits Each control input is a function of 6 PC bits could be replaced by a 6-lut Entire decoder is a set of 6-luts Instruction memory isn’t needed at all, and can be removed
7
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 7 Drastic Step - Delete Instruction Memory ALU A registersB registers From Outside World To Outside World PC +1 Instruction Memory Decoder To Control Points X
8
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 8 Lutiac ALU A registersB registers From Outside World To Outside World PC +1 Decoder To Control Points
9
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 9 Another Way to Think About It At the point in a normal soft processor where the instruction is read from the instruction memory: instruction = instruction_memory[pc]; if(instruction is this) do this; if(instruction is that) do that;... Replace by a case statement based on the pc: case(pc) 0:do this; 1:do that; 2:do the other thing;...
10
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 10 Lutiac Implementation Built a very simple prototype 16-bit processor that uses hard-wired programs instead of an instruction memory 3 stage pipeline decode: sets read addresses on register file execute: computes results, sets up register file writes write back: register file write One cycle per instruction
11
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 11 Lutiac Implementation No data memory, just registers no fixed instruction format, so no hard limit on number of registers One input port from outside world, one output port Simple assembler converts my_program.s file into an equivalent Verilog processor description
12
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 12 Experiments Measure size and speed of Lutiac, varying: number of different kinds of instructions in the program size of the program number of registers used Used Quartus 8.0 (2 years ago now) Stratix IV chips of various sizes, fastest speed grade Each Stratix IV LAB contains 20 FFs + roughly 10 6-LUTs Some LABs can be re-configured as 640 bit RAMs known as “MLABs” Will compare to NIOS II at the end, but for now, remember that a medium sized NIOS II uses 58 LABs and 11 M9K rams
13
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 13 Lutiac Size vs. Instruction Mix Each program contains 64 random instructions, chosen from the allowed instruction types
14
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 14 Fmax vs. Instruction Mix
15
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 15 Effect of Program Size Size grows linearly as program size increases beyond 64 instructions, roughly 1 LAB for every 20 additional instructions
16
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 16 Effect of Number of Registers Very large Lutiac (512 random instructions) grows by the number of MLABs needed to hold additional registers Would save area if we used M9Ks instead of MLABs once we needed more than 96 16-bit registers
17
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 17 Scalability of Multiple Lutiac Cores Chained N identical 64 instruction Lutiac cores together LABs grow by 14.5 per core Fmax drops as Quartus placement worsens Ran out of DSP blocks above 256 cores
18
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. Comparison to NIOS II Very inexact NIOS II is 32 bits, Lutiac is 16 bits NIOS II also has memory interfaces, caches, traps,... Configure NIOS II systems with 4K bytes of RAM allows up to 1K words of instructions or data Lutiac has no RAM, all instructions and data in MLABs Lutiac and NIOS II both use four 18x18 multipliers (Multiplier/Accumulate mode) 18
19
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 19 Comparison to NIOS II
20
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 20 Comparison to NIOS II Back of the envelope guess (± factor of 2x) Un-optimized 32-bit Lutiac is nearly twice the size of a 16- bit Lutiac (25 LABs);.75 the speed (177 MHz) 32-bit Lutiac/NIOS IIs speed ratio = (177 / 235) area ratio of Lutiac/NIOS IIs (25 LABs + DSP) / (58 LABs + 11 M9K RAMs + DSP) =.3 32-bit Lutiac/NIOS IIs throughput/area (177/235) /.3 = 2.5x 32-bit Lutiac/NIOS IIe throughput/area NIOS IIe is smallest NIOS, but isn’t pipelined, so has 5 cycles/instruction (177/368 * 5/1) / ((25 LABs + DSP) / (37 LABs + 6 M9K RAMs)) = 4.5x
21
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 21 Lutiac Disadvantages Limited to very small programs (200 instructions or so) Must re-synthesize circuit every time program changes instruction memory replaced by LUTs would need good simulation tools or a debug version of the processor that did have an instruction memory
22
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 22 Lutiac Advantages Circuit is smaller, less complex than standard soft processor One less stage in the pipeline no instruction memory read required Program contents are exposed to logic synthesis data path components that aren’t used will be removed by synthesis circuit may be smaller and faster
23
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 23 Lutiac Advantages Flexible and powerful wide range of useful instructions can be available if not used by program, they will be synthesized away easy to add specialized instructions if needed Not limited by a fixed instruction word width or encoding can use as many registers as the program wants
24
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 24 Lutiac Advantages Processor self configures based on program no “mega-wizard” needed if multiplier/adder/etc. isn’t used, synthesis will leave it out Data path can adapt to the program Examples: if program ever references a register immediately after writing to it, create a bypass register; else leave bypass register out of circuit if multiplier and adder were used in parallel, create a separate copy of the register file for the multiplier; else have it share the adder’s register file
25
© 2010 Altera Corporation - Public ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 25 Conclusions For small programs, it is possible to build 16-bit soft processors using only 12-25 LABs (plus multiplier) smaller and faster than smallest 32-bit NIOS II (37 LABs, 6 M9K RAMs) with instructions/second on the same order as the mid-size NIOS II (58 LABs, 11 M9K RAMs) size advantage over NIOS II disappears as program size approaches 1000 instructions
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.