1ECE 506 Reconfigurable Computing http://www. ece. arizona ECE 506 Reconfigurable Computing Lecture 4 Reconfigurable Architectures Ali AkogluGive qualifications of instructors:DAPteaching computer architecture at Berkeley since 1977Co-athor of textbook used in classBest known for being one of pioneers of RISCcurrently author of article on future of microprocessors in SciAm Sept 1995RYtook 152 as student, TAed 152,instructor in 152undergrad and grad work at Berkeleyjoined NextGen to design fact 80x86 microprocessorsone of architects of UltraSPARC fastest SPARC mper shipping this Fall
2FPGA Introduced in 1985 by Xilinx Similar to CPLDs A function to be implemented in FPGAPartitioned into modules , each implemented in a logic block.Logic blocks connected with the programmable interconnection.
3FPGA Components Problem: How to handle sequential logic Truth tables don’t workPossible solution:Add a flip-flop to the output of LUTBLEs: the basic logic elementCircuit can now use output from LUT or from FFWhere does select come from?
5FPGA Components Example: 8-bit register using 3-input, 2-output LUTs Input: x, Output: yWhat does LUT need to do to implement register?x(7)x(6)x(5)x(4)x(3)x(2)x(1)x(0)3-in, 2-outLUT3-in, 2-outLUT3-in, 2-outLUT3-in, 2-outLUTFFFFFFFFFFFFFFFFy(7)y(6)y(5)y(4)y(3)y(2)y(1)y(0)
8FPGA ComponentsIsn’t it a waste to use LUTs for registers?YES! (when it can be used for something else)Commonly used for pipelined circuitsExample: Pipelined adder3-in, 2-outLUT3-in, 2-outLUT++RegisterRegisterFFFFFFFF+RegisterAdder and output register combined – not a separate LUT for each
9FPGA ComponentsConfigurable Logic Blocks (CLBs) usually contain more than 1 BLEWhy?Efficient way of handling common I/O between adjacent LUTsSaves routing resources2x13-in, 2-outLUT3-in, 2-outLUTCLBFFFFFFFF2x12x12x12x1
10FPGA Components Example: Ripple-carry adder CLB Each LUT implements 1 full adderUse efficient connections between LUTs for carry signalsA(1)B(1)A(0)B(0)Cin(0)Cin(1)2x13-in, 2-outLUT3-in, 2-outLUTCLBFFFFFFFF2x12x12x12x1Cout(0)Cout(1)S(1)S(0)
11FPGA ComponentsOn real FPGAs: a cluster of LUTs per switch matrix (e.g., eight LUTs and switch matrix form a configurable logic block on Xilinx FPGAs)
12Typical CLB The arithmetic logic provides XOR-gate and faster carry chain to build faster adder without wasting too much LUT-resources.
13Xilinx CLBs Macro cells are CLBs. The number of basic blocks in a CLB varies from device to device.4000, Virtex, Virtex E, Spartan: 1 slice, 2 basic blocksSpartan 3, VirtexII, Virtex II-Pro Virtex 4: 4 slices, 2 basic blocks/sliceVirtex 5:2 slices, 4 basic blocks/slice
14Xilinx CLBs Left part slices of a CLB (SLICEM) SLICEL Each BLE configured either as combinatorial logic or SRAM or shift registerSLICELonly to be configured as combinatorial logic.Each BLE4 inputs and 1 outputSpartan 3, VirtexII, Virtex II-Pro Virtex 4: 4 slices, 2 basic blocks/slice
15Xilinx CLBs LUT has 6 inputs and 2 outputs. LUT can be configured either asa 6-input LUT, in which case only one output can be used,or as two 5-input LUTs, two outputs usedVirtex 5:2 slices, 4 basic blocks/slice
16Xilinx Virtex6 CLB A CLB contains 2 identical slices on Virtex 6 2 slices are split in two columns of 1 slices each1 slice contains:4x 6-inputs LUT8x FF for storing LUT resultsMUX to feed LUT either to a FF or the outputCarry in and carry out to construct fast adder using neighbor CLBs
18Altera FPGA Basic Blocks Altera’s FPGAs (Cyclone, FLEX)basic unit of logic is the logic element (LE)also LUT-baseda 4-LUT, a flip flop, a multiplexer and additional logic for carry chainLEs can operate in different modes each of which defines different usage of the LUT inputs.Altera LEsgrouped into logic array blocks (LAB).Flex 6000 LAB contains 10 LEsFLEX 8000 LAB contains 8 LEs.Cyclone II LAB contains 16 LEs
19Altera FPGA Basic Blocks Stratix IIbasic computing unit is called adaptive logic module (ALM)Each LAB contains 8 ALMsALM can be used to implement functions with variable number of inputs.Ensures a backward compatibility to 4-input-based designs,Possible to implement module with up to 8 inputs.Additional modules: including flip flops, adders and carry logic
20FPGA ComponentsCLBs often have specialized connections between adjacent CLBsFurther improves carry chainsAvoids routing resourcesBasic building block is CLBCan implement combinational+sequential logicAll circuits consist of combinational and sequential logicSo what else is needed?FPGAs need some way of connecting CLBs togetherReconfigurable interconnectBut, we can only put fixed wires on a chip
21FPGA ComponentsProblem: If FPGA doesn’t know which CLBs will be connected, where does it put wires?Solution:Put wires everywhere!Referred to as channel wires, routing channels, routing tracks, many othersCLBs typically arranged in a grid, with wires on all sidesCLB
22How to connect CLB to wires? Solution: Connection box FPGA ComponentsHow to connect CLB to wires?Solution: Connection boxDevice that allows inputs and outputs of CLB to connect to different wiresConnection boxCLBCLB
23FPGA Components Connection box characteristics CLB CLB CLB CLB TopologyDefines the specific wires each CLB I/O can connect toExamples: same flexibility, different topologyCLBCLBCLBCLB
24FPGA ComponentsConnection boxes allow CLBs to connect to routing wiresBut, that only allows us to move signals along a single wireNot very usefulHow do FPGAs connect wires together?
25Solution: Switch boxes, switch matrices FPGA ComponentsSolution: Switch boxes, switch matricesConnects horizontal and vertical routing channelsBut, we can only put fixed wires on a chipCLBCLBSwitch box/matrixCLBCLB
26FPGA Components Switch boxes Every possible connection? Flexibility - defines how many wires a single wire can connect toEvery possible connection?Too bigToo slow
27FPGA Components Why do flexiblity and topology matter? Routability: a measure of the number of circuits that can be routedHigher flexibility = better routabilityWilton switch box topology = better routabilitySrcSrcCLBCLBNo possible route from src to destDestDest
28FPGA Components Many Topologies possible Fs = 3 is commonDisjointWiltonUniversalTopology - defines which wires can be connected
29FPGA ComponentsDisjoint: a wire entering can only connect to other wires with the same numerical designation.potential source–destination routes in the FPGA are isolated into distinct routing domains, limiting routing flexibility.Wilton: uses same number of routing switches but overcomes the domain issueBy allowing for a change in domain assignment on connections that turn.ability to change domains in at least one direction facilitates routing as a greater diversity of routing paths from a net source to a destination is possible.G. Lemieux and D. Lewis, “Circuit design of routing switches,” in Proceedings: ACM/SIGDA International Symposium on Field Programmable Gate Array, pp. 19–28, February 2002.G. Lemieux and D. Lewis, Design and Interconnection Networks for Programmable Logic. Boston, MA: Kluwer Academic Publishers, 2004.
30FPGA ComponentsAt each switch block: some tracks end some tracks pass right through
31FPGA Components Switch boxes Short channels Long channels Useful for connecting adjacent CLBsLong channelsUseful for connecting CLBs that are separatedAllows for reduced routing delay for non-adjacent CLBsMediumShortLong
32FPGA Components FPGA layout called a “fabric” 2-dimensional array of CLBs and programmable interconnectSometimes referred to as an “island style” architecture
33Solution 1: Use LUTs for logic or memory FPGA and Data StorageSolution 1: Use LUTs for logic or memoryLUTs are just an SRAMXilinx refers to as distributed RAMSolution 2: Include dedicated RAM components in the FPGA fabricXilinx refers to as Block RAMCan be single/dual-portedCan be combined into arbitrary sizesCan be used as FIFODifferent clock speeds for reads/writes
34FPGA Components Fabric with Block RAM . . . . . . . Block RAM can be placed anywhereTypically, placed in columns of the fabricBRCLBCLBCLBCLBBR. . .BRCLBCLBCLBCLBBRBRCLBCLBCLBCLBBR
35FPGA Components FPGAs commonly used for DSP apps Example: Xilinx DSP48 Makes sense to include custom DSP units instead of mapping onto LUTsCustom unit = faster/smallerExample: Xilinx DSP48Starting with Virtex 4 family, Xilinx introduced DSP48 block for high-speed DSP on FPGAsEssentially a multiply-accumulate core with many other featuresProvides efficient way of implementingAdd/subtract/multiplyMAC (Multiply-accumulate)Barrel shifterFIR FilterSquare root
36FPGA ComponentsFPGAs are 2-dimensional arrays of CLBs, DSP, Block RAM, and programmable interconnectActual layout/placement differs for different FPGAsBRDSPDSPDSPDSPBRBRCLBCLBCLBCLBBRBRCLBCLBCLBCLBBRBRCLBCLBCLBCLBBR
42Virtex7 FPGA DSP48DSP48E1 Tile (Two DSP48E1 Slices and Interconnects)
43Virtex7 FPGA DSP48Single-instruction-multiple-data (SIMD) arithmetic unit:Dual 24-bit or quad 12-bit add/subtract/accumulateCascading capability on both pipeline paths for larger multipliers and larger post-adders
48Mapping a circuit onto FPGA fabric Programming FPGAsMapping a circuit onto FPGA fabricKnown as technology mappingProcess of converting a circuit in one representation into a representation that corresponds to physical componentsGates to LUTsMemory to Block RAMsMultiplications to DSP48sEtc.But, we need some way of configuring each component to behave as desiredExamples:How to store truth tables in LUTs?How to connecting wires in switch boxes?
49Programming FPGAs FPGAs programmed with a “bitfile” File containing all information needed to program FPGAContains bits for each control FFAlso, contains bits to fill LUTsBut, how do you get the bitfile into the FPGA?> 10k LUTsSmall number of pins
50Programming FPGAs Solution: Shift Registers General Idea Make a huge shift register out of all programmable components (LUTs, control FFs)Shift in bitfile one bit at a timeConfiguration bits input hereCLBShift register shifts bits to appropriate location in FPGA
51Programming FPGAs Example: Program CLB with 3-input, 1-output LUT to implement sum output of full adderAssume data is shifted in this direction11Should look like this after programmingInOutABCinS1FFFF12x112x1
52Programming FPGAs Example, Cont: Bitfile is just a sequence of bits based on order of shift registerDuring programmingAfter programming1FFFF2x112x1
53Programming FPGAs During programming After programming FF FF 2x1 2x1 11FFFF2x112x1
54Programming FPGAs During programming After programming FF FF 2x1 2x1 11FFFF2x112x1
55Programming FPGAs During programming After programming FF FF 2x1 2x1 01101011FFFF2x112x1
56Programming FPGAs During programming After programming FF FF 2x1 2x1 0110111FFFF2x112x1
57Programming FPGAs During programming After programming FF FF 2x1 2x1 011011FFFF2x112x1
58Programming FPGAs During programming After programming FF FF 2x1 2x1 01111FFFF2x112x1
59Programming FPGAs During programming After programming FF FF 2x1 2x1 0111FFFF2x112x1
60Programming FPGAs During programming After programming FF FF 2x1 2x1 1 11FFFF2x112x1
61Programming FPGAs During programming After programming 11CLB is programmed to implement full adder!FFEasily extended to program entire FPGAFF12x112x1
62Problem: Reconfiguring FPGA is slow Programming FPGAsProblem: Reconfiguring FPGA is slowShifting in 1 bit at a time not efficientBitfiles can be greater than 1 MBEliminates one of the main advantages of RCPartial reconfigurationWith shift registers, entire FPGA has to be reconfigured
64CPLD vs FPGA Simpler interconnect structure Timing performance more predictable than FPGAs.Density is less than most FPGAsCPLDs feature logic resources with a wide number of inputs (AND planes)Performance is usually better than FPGAsA single FPGA can replace tens of normal PLDsPrimitive FPGA 'logic cells' are more complex than PLD cells.Can program the routing between FPGA logic cells in addition to programming the logic cells themselves.Many FPGAs now offer embedded memory blocks in addition to logic blocks or other special features such as fast carry logic chains.FPGAs offer a higher ratio of flip-flops to logic resources than do CPLDs.