Presentation on theme: "Basic FPGA Architectures"— Presentation transcript:
1Basic FPGA Architectures This material exempt per Department of Commerce license exception TSU
2Objectives After completing this module, you will be able to: Describe the basic slice resources available in Spartan-6 FPGAsIdentify the basic I/O resources available in Spartan-6 FPGAsList some of the dedicated hardware features of Spartan-6 FPGAsDifferentiate the Virtex-6 family of devices from the Spartan-6 familyIdentify latest members of Virtex-7 device family\\Basic Architecture 2
4Overview All Xilinx FPGAs contain the same basic resources Logic ResourcesSlices (grouped into CLBs)Contain combinatorial logic and register resourcesMemoryMultipliersInterconnect ResourcesProgrammable interconnectIOBsInterface between the FPGA and the outside worldOther resourcesGlobal clock buffersBoundary scan logic
6Spartan-6 Lowest Total Power 45 nm technologyStatic power reductionsProcess & architectural innovationsDynamic power reductionLower node capacitance & architectural innovationsMore hard IP functionalityIntegrated transceivers & other logic reduces powerHard IP uses less current & power than soft IPLower IO powerLow power option -1L reduces power even furtherFewer supply rails reduces powerTwo families: LX and LXTBasic Architecture 6
7Spartan-6 LX / LXT FPGAsThese are the planned product offerings for the LX (base) and LXT (High Speed Serial) platforms.Note that you do not lose I/O as you migrate to larger devices within the same package. Again, the smaller device packages are .8mm and the larger devices have 1mm packages to aid in saving overall system cost due to increase routing complexity and board layers.** All memory controller support x16 interface, except in CS225 package where x8 only is supportedBasic Architecture 7
9Spartan-6 FPGA CLB CLB contains two slices Connected to switch matrix for routing to other FPGA resourcesCarry chain runs vertically through Slice0 onlySwitch MatrixSlice0Slice1CINCOUT
10Three Types of Slices in Spartan-6 FPGAs SLICEM: Full sliceLUT can be used for logic and memory/SRLHas wide multiplexers and carry chainSLICEL: Logic and arithmetic onlyLUT can only be used for logic (not memory)SLICEX: Logic onlyNo wide multiplexers or carry chainSLICEXSLICEMorSLICEXSLICELIn the Spartan-6 FPGA, ¼ of slices are SLICEM, ¼ are SLICEL, and ½ are SLICEX.One slice in each CLB is a SLICEX; the other alternates between SLICEL and SLICEM in adjacent columns.The carry chain exists in the SLICEM or SLICEL half of each CLB.
11Spartan-6 CLB Logic Slices SliceM (25%)SliceL (25%)SliceX (50%)LUT68 RegistersCarry LogicWide Function MuxesDistributed RAM / SRL logicLUT68 RegistersCarry LogicWide Function MuxesLUT6Optimized for Logic8 RegistersEach CLB has 2 side-by-side Slices = total of 8 LUTs and 16 flip-flopsEach Slice has 4 six-input LUTs and 8 flip-flops with common clock, CE and S/REach LUT has 2 flip-flops, one of which can be configured as latch Note: the latch option is seldom used25% of slices provide carry logic and memory / SRL (SLICE_M)Additional 25% of slices provide carry logic but no memory (SLICE_L)The remaining 50% of slices provide neither memory nor carry (SLICE_X)Eliminating carry in 50% of the slices saves area and thus costCarry is needed only for arithmetic, accumulators & countersSlice mix chosen for the optimal balance of Cost, Power & PerformanceBasic Architecture 11
12Spartan-6 FPGA SLICE Four LUTs Eight storage elements F7MUX and F8MUX Four flip-flop/latchesFour flip-flopsF7MUX and F8MUXConnects LUT outputs to create wide functionsOutput can drive the flip-flop/latchesCarry chain (Slice0 only)Connected to the LUTs and the four flip-flop/latchesLUT/RAM/SRL0 1Each LUT also has some associated logic that includes carry logic. The carry chain enables the propagation of a carry signal between the corresponding bits when implementing arithmetic functions (such as accumulators, subtractors, or comparators, for example). This enables high performance and efficient device utilization.The dedicated multiplexers, called the F7 and F8 multiplexers, allow for the implementation of wider logic. If two LUT6s and the associated F7MUX are used, any arbitrary 7-input combinatorial function can be implemented.Similarly, if all four LUTs, the F7MUX resources and the F8MUX are used, an arbitrary 8-input combinatorial function can be implemented. These multiplexers can also be used to build larger multiplexers.Because a 4-input multiplexer can be implemented in one LUT6 (4 data inputs and 2 control inputs), a 16-1 multiplexer can be implemented using all 4 LUTs and the F7MUX and F8MUX. Logic that uses these built-in multiplexers will be significantly faster than logic built using only LUTs.
136-Input LUT with Dual Output 6-input LUT can be two 5-input LUTs with common inputsMinimal speed impact to a 6-input LUTOne or two outputsAny function of six variables or two independent functions of five variables5-LUTDA5A4A3A2A1A6O6O56-LUTEach 6-input LUT can be configured as two 5-input LUTs. This gives the device a great deal of flexibility to build an efficient design. So it can build any function of six variables or two independent functions of five variables.LUTs can perform any combinatorial function limited only by the number of inputs. It is your primary combinatorial logic resource and it is the industry standard.
14Slice Flip-Flop and Flip-Flop/Latch Control AFFAll flip-flops and flip-flop/latches share the same CLK, SR, and CE signalsThis is referred to as the “control set” of the flip-flopsCE and SR are active highCLK can be inverted at the slice boundarySet/Reset (SR) signal can be configured as synchronous or asynchronousAll four flip-flop/latches are configured the sameAll four flip-flops are configured the sameSR will cause the flip-flop to be set to the state specified by the SRINIT attributeAFF/LATCHDCESRQCKDCESRQCKD● ● ●● ● ●DFFDFF/LATCHDCESRQCKThe four flip-flops in each slice are named AFF, BFF, CFF, and DFF. The four FF/LATCH elements are named AFF/LATCH, BFF/LATCH, CFF/LATCH, and DFF/LATCH.The SRINIT of a flip-flop is set by the software depending on the reset state of the flip-flop. It will be set to SRLOW if the flip-flop is set to 0 during the reset condition, or SRHIGH if the flip-flop is set to 1.DCESRQCK
15Configuring LUTs as a Shift Register (SRL) DDQCECECLKDQCEDQCEQIn the SLICEM slices, the LUT can also be configured as a Dynamically Addressable Shift Register, or SRL. This component basically acts as a programmable delay element.This diagram seems to imply that each LUT has a number of registers as part of its construction, but this component only allows you to load data in serially and then make it available a few clock cycles later.As data is presented to be loaded, the previously loaded data will be shifted down.Also, there are no set or reset capabilities, it is not loadable, and data can only be read serially.So the SRL does not behave exactly the same as a shift register implemented with registers.So what does this imply about inferring the Shift Register LUT?Its contents cannot be read at any one time because there is no parallel read functionality.Remember that it is serial in/serial out. Likewise, if you coded for a shift register that was initialized, it could not be mapped to the SRL primitive.Note that most synthesis tools require this coding style and the use of an attribute.There is a maximum delay of 32 clock cycles per LUT. The SRLs can be cascaded to other LUTs or CLBs for longer shift registers. The shift register length also can be changed asynchronously by toggling address A. This means that you could dynamically change the delay associated with an SRL.LUTDQCEA[4:0]Q31 (cascade out)
16Shift Register LUT Example Operation D - NOP must add 17 pipeline stages of 64 bits each1,088 flip-flops (hence 136 slices) or64 SRLs (hence 16 slices)20 CyclesOperation AOperation B648 Cycles12 Cycles64Operation COperation D - NOP3 Cycles17 CyclesPaths are StaticallyBalanced20 CyclesThe SRL can be used as a programmable delay element (or No Operation, NOP).In this example, you see a 64-bit bus be processed through operation A, B, and C.A has a delay of eight cycles, B has a delay of twelve cycles, and C has a delay of three cycles.Because the data processed is also grouped at its output with a multiplexer, these datapaths must synchronize so that appropriate data is compared at the multiplexer. To do this, the SRL can be used to delay the C operation by 17 clock cycles.If you were to do this with registers, it would require 1,088 registers. If you use the SRL functionality instead, you only need 64 LUTs, each programmed for 17 clock cycles of delay.So, this example uses 64 LUTs to replace 1,088 flip-flops and the associated routing resources to complete this (pretty good justification for using the SRL, right?).Because there are so many registers in FPGAs, pipelining is an effective way of designing to increase design performance. And because pipelines can sometimes become unbalanced when too much logic must be generated, it is necessary to delay some of the signals. One of the best uses of the SRL is to add delay to balance pipelines.
18Interconnect to FPGA fabric I/O Block DiagramLogical ResourcesElectrical ResourcesInterconnect to FPGA fabricMasterIOSERDESIODELAYIOLOGICSlaveNPLVDSTerminationThe electrical resources include the I/O pads and buffers.The logical resources include single data rate and double data rate register resources, a SERDES converter, and a programmable I/O delay line.
19Spartan-6 FPGA Supports 40+ Standards Each input can be 3.3 V compatibleLVCMOS (3.3 V, 2.5 V, 1.8 V, 1.5 V, and 1.2 V)LVCMOS_JEDECLVPECL (3.3 V, 2.5 V)PCII2C*HSTL (1.8 V, 1.5 V; Classes I, II, III, IV)DIFF_HSTL_I, DIFF_HSTL_I_18DIFF_HSTL_II*SSTL (2.5 V, 1.8 V; Classes I, II)DIFF_SSTL_I, DIFF_SSTL18_IDIFF_SSTL_II*LVDS, Bus LVDSRSDS_25 (point-to-point)Easier and More Flexible I/O Design!Use the I/O Planner to assign your I/O standards and ensure that your pinout follows the I/O banking rules (more on this in next slide).* Newly added standards
20Spartan-6 FPGA I/O Bank Structure All I/Os are on the edges of the chipI/Os are grouped into banks30 ~ 83 I/O per banksEight clock pins per edgeCommon VCCO, VREFRestricts mixture of standards in one bankThe differential driver is only available in Bank0 and Bank2Differential receiver is available in all banksOn-chip termination is available in all banksBANK 3BANK 1BANK 2Chip View(LX45/T and Smaller)BANK 0BANK 5BANK 4BANK 1BANK 3BANK 2Chip View(LX100/T and Larger)
21Interconnect to FPGA Fabric I/O Logical ResourcesTwo IOLOGIC block per I/O pairMaster and slaveCan operate independently or concatenatedEach IOLOGIC containsIOSERDESParallel to serial converter (serializer)Serial to parallel converter (De-serializer)IODELAYSelectable fine-grained delaySDR and DDR resourcesInterconnect to FPGA FabricMasterIOSERDESIODELAYIOLOGICSlaveThe primary use of the IOBs is for registering data. Incoming and outgoing data can be registered using a simple single data rate (SDR) flip-flop or a double data rate (DDR) flip-flop.High speed incoming serial data can also be deserialized using the SERDES capability within the IOB. Similarly, outgoing parallel data can be serialized onto a single output pin.
23SLICEM Used as Distributed SelectRAM Memory Uses the same storage that is used for the look-up table functionSynchronous write, asynchronous readCan be converted to synchronous read using the flip-flops available in the sliceVarious configurationsSingle portOne LUT6 = 64x1 or 32x2 RAMCascadable up to 256x1 RAMDual port (D)1 read / write port + 1 read-only portSimple dual port (SDP)1 write-only port + 1 read-only portQuad-port (Q)1 read / write port + 3 read-only portsSingle PortDual PortSimple Dual PortQuad Port32x232x432x632x864x164x264x364x4128x1 128x2 256x132x2D 32x4D 64x1D 64x2D 128x1D32x6SDP 64x3SDP32x2Q 64x1QThe look-up table functionality is essentially a small memory containing the desired output value for each combination of input values. These storage cells are programmed at configuration time, and the look-up itself is done by using the inputs as the control for a wide multiplexer.By allowing these storage elements to be modified using FPGA fabric resources, the LUT can be used for the implementation of a small distributed memory.Each LUT can be a single ported 64-bit RAM with synchronous write and asynchronous read.LUTs in slices can be combined to create small dual-port and multi-port RAMs.In the Virtex-6 and Spartan-6 FPGAs, approximately one quarter of slices are SLICEMs in which the LUTs can be programmed as distributed RAMs (this varies with family).Simple dual-port configurations can be used to implement LUT FIFOs and MicroBlaze™ processor register files.Each port has independent address inputs
24Spartan-6 FPGA Block RAM Features 18 kb sizeCan be split into two independent 9-kb memoriesPerformance up to 300 MHzMultiple configuration optionsTrue dual-port, simple dual-port, single-portTwo independent ports access common dataIndividual address, clock, write enable, clock enableIndependent widths for each portByte-write enable18k MemoryDual-PortBRAM
25Better, More BRAM More Block RAMs More port flexibility 2x higher BRAM to Logic Cell ratio than Spartan-3A platformMore port flexibility18K can be split into two 9K BRAM blocks and can be independently addressedImproves buffering, caching & data storageExcellent for embedded processing, communication protocolsEnables DSP blocks to provide more efficient video and surveillance algorithmsLower Static PowerFor Spartan-6, we just about doubled the amount of BRAM to logic cell ratio compared to the large Spartan-3E/A devices. This is a significant increase of up to 4.8 Mb. And we made it even more efficient by adding the ability to break the 18K blocks into 9K blocks. This is a critical enhancement since it allows the use of small BRAM requirements without exhausting the ports and wasting resources.To Lower Static Power we redesigned the memory circuitry to reduce leakage. This was the goal of many of the newly designed features in Spartan-6.Basic Architecture 25
26Memory Controller Only low cost FPGA with a “hard” memory controller Guaranteed memory interface performance providingReduced engineering & board design timeDDR, DDR2, DDR3 & LP DDR supportUp to 12.8Mbps bandwidth for each memory controllerAutomatic calibration featuresMultiport structure for user interfaceSix 32-bit programmable ports from fabricController interface to 4, 8 or 16 bit memories devicesBasic Architecture 26
27Spartan-6 Hard Memory Controller New Hard Block Memory ControllerUp to 4 controllers per deviceWhy a Hard Memory Block?Very common design componentMultiple customer benefitsCustomer RequestsSpartan-6 Hard Block Memory Controller BenefitsHigher performanceUp to 800 MbpsLower costSaves soft logic, smaller dieLower powerDedicated logicEasier designsTiming closure no longer an issueConfigurable MultiPort user interfaceCoreGen/MIG wizard & EDK supportTranscript:Okay, so let's look now a little bit at the features that we have. I talked about why hard block. Well, because you have to meet the minimum frequency, but there are a lot of benefits to hard block. It's dedicated logic, you consume less power. You have more fabric logic at your disposal to do other things. Also, you have to think about the cost, if you have to implement a soft controller in fabric, then you take up a lot of logic. So you may end up with a bigger device than you really need. So these things are really helpful. We have actually implemented multiple controllers. So if you look at -- I'll show you in a minute, in the mid-size devices you have two controllers, in the larger devices you have four. And you can interface on a 16-bit bus only, or 8-bit bus, depending if you use a x16 or x8. So then a question would come, well, okay so my customer wants to have more than the bandwidth you can give with the x16 device. Well in that case, you can use two controllers. Or in the more extreme cases with the larger parts, you can use four controllers. And you can have the soft logic to basically take data from two or four controllers and use that. So that's an option we thought about. And actually we're thinking to implement a reference design that actually would use, hookup two of these controllers, that's something to look at in the future. And of course everything will go through the same tool flow as the soft controllers. For the non-embedded applications, we have the MIG and for embedded you can use EDK and have the MPMC support. So I won't go over data rates, it's 800 for DDR3 and DDR2 and as fast as DDR and low-power DDR can go, which is 400 megabit.Author’s Original Notes:Yes, we are integrating a hard block memory controller into the Spartan-6 family. In fact, we will have up to 4 memory controller blocks per device. The block will support DDR3, DDR2, DDR, and LPDDR at the rates shown here on the right. And why did we choose to integrate a hard memory controller? Well, like most other hard blocks, we integrate them when we think they can be defined in a way that handles the vast majority of applications and when we know that a significant percentage of the customer base is using such a block in their designs. Memory controllers are a very common design component and in the Spartan space, a DRAM controller with the most common capabilities and features will address a big chunk of what customers need.Furthermore, we can hit much higher data rates (800 Mbps) with a hard solution to provide about 2X the memory bandwidth of prior generation soft solutions. The hard block also conserves FPGA resources for the “secret sauce” of the user design and potentially allows the user to get into a smaller device. And of course the hardened solution will save on power as well by only using the transistors necessary to do the job. Finally, the hard block is considerably easier to design with, because the block is tested to guarantee performance so there are no concerns about meeting timing. And the CoreGen MIG wizard, or alternatively the EDK IP configurator, guides the user through the complete design implementation.Expected Questions:What if a customer needs more interfaces or the MCB doesn’t support the interface / features needed, will we still do soft IP solutions for Spartan-6? Answer: we believe that the hard MCB blocks will be appropriate for most situations, and so all current development effort is directed at IP offerings based on the MCB. However, we are always willing to hear input about customer needs and will adjust our IP solution roadmaps when there is clear demand for additional offerings.Basic Architecture 27
28Spartan-6 FPGA DSP48A1 Slice PCIND:A:BXMC18 X 18PZ+/-48OPMODE[3:0]OPMODEOPMODEPCOUTOPMODE[6,4]BCINDA18BCCOUTCFOUTBCOUTCIN12Dual B, D RegisterWithPre-adder36MFOUT18x18 signed multiplier48-bit add/subtract/accumulatePipeline registers for high speedCascade paths for wide functionsA0A1The DSP slice is designed for DSP applications and large arithmetic operations. DSP designers who traditionally use the FPGA fabric for arithmetic applications will find that much of their job is done for them internally to this block. All they need to do is configure the block by using OPMODE inputs, which control the flow of data in the block.The DSP slice has an 18x18 2’s complement multiplier with a pre-adder on one of the inputs. It also has a 2-input adder/subtractor following the multiplier, which can be used to create several different arithmetic operations.Cascade pins are included to support complex functions with no speed penalty. This allows you to implement larger arithmetic operations by linking multiple slices together. This is especially useful for DSP applications.There are optional pipeline registers at several points within the DSP slice to maximize performance.Most designers are targeting this resource with the CORE Generator™ software. Refer to the data sheet and user guides for more information about this resource.
30Spartan-6 FPGA Global Clock Network 16 global clock buffers in the Spartan-6 FPGA allow clocks to be distributed to potentially every clocked element on the die16 HCLK lines connect clock signals to logic resources in each rowHCLK lines can be driven byGlobal clock buffersDCM outputsPLL outputsEach BUFG and HCLK row can only drive the clock and reset ports of each synchronous element (flip-flop or DSP slice, for example). This means that besides global clocks, only global resets are going to be routed on BUFGs. All secondary control signals (CE, Set, and Reset) will be routed on general interconnect.The global clock network in Spartan-6 FPGAs is driven by 16 BUFGMUX resources located in the center of the device.Clocks in each row of the FPGA are driven by 16 HCLK lines. These HCLK lines can be driven by either global clock buffers, or by the PLL and DCM signals generated within the adjacent clock management tile (CMT).
31Spartan-6 FPGA I/O Clock Network P NCMT PLLIO bankIOLOGICBUFIO2BUFPLLSpecial clock network dedicated to I/O logical resourcesIndependent of global clock resourcesSpeeds up to 1 GHzMultiple sources for clocking I/O logicBUFIO2: for high-speed dedicated I/O clock signalsBUFPLL: for clocks driven by the PLL in the CMTEach I/O bank has two I/O clock regions. There are four high-speed I/O clock networks (BUFIO2) in every I/O clock region, driven by four dedicated clock input pins.
32Spartan-6 FPGA Clock Management Tile (CMT) dcm1_clkout<9:0>dcm2_clkout<9:0>10PLLpll_clkout<5:0>6CLKINCLKFBCLKOUT<5:0>DCMClocks from BUFGCLKOUT<9:0>GCLK InputsFeedback clocks from BUFIO2FBThe Spartan-6 FPGA clock management tile includes two digital clock managers (DCM) and one phase locked loop (PLL). There are dedicated routing connections between components within the same tile, as well as connections to the global clock buffers and HCLK lines.The DCM can remove clock insertion delay using the DLL feature, as well as perform digital phase shifting and frequency synthesis. The PLL can perform more complex frequency synthesis and can filter clock jitter.
34Designers Eccentrics Higher System Performance Lower System Cost More design margin to simplify designsHigher integrated functionalityLower System CostReduce BOMImplement design in a smaller device & lower speed-gradeLower PowerHelp meet power budgetsEliminate heat sinks & fansPrevent thermal runawayBasic Architecture 34
35Architecture Alignment Virtex-6 FPGAsSpartan-6 FPGAs760KLogic CellDevice150KLogic CellDeviceCommon ResourcesLUT-6 CLBBlockRAMDSP SlicesHigh-performance ClockingFIFO LogicParallel I/OHardened Memory ControllersTri-mode EMACHSS Transceivers*3.3 Volt compatible I/OSystem MonitorPCIe® Interface*Optimized for target application in each familyEnables IP Portability, Protects Design InvestmentsBasic Architecture 35
36Virtex-6 and Spartan-6 FPGA Sub-Families CXT FPGAVirtex-6LXT FPGAVirtex-6SXT FPGAVirtex-6HXT FPGAUpto 3.75Gbps serial connectivity and corresponding logic performanceHigh Logic DensityHigh-Speed Serial ConnectivityHigh Logic DensityHigh-Speed Serial ConnectivityEnhanced DSPHigh Logic DensityUltra High-Speed Serial ConnectivitySpartan-6LX FPGASpartan-6LXT FPGANote that a hard processor core is NOT available in any of the Spartan-6 or Virtex-6 devices.There are three Virtex-6 sub-families and two Spartan-6 sub-families.The Spartan-6 traditional logic (LX) sub-family contains block RAM, memory controllers and DSP slice resources. It is targeted for general logic applications.The Spartan-6 LXT sub-family includes low-cost serial gigabit transceivers (GTP) and PCI Express® cores.The Virtex-6 devices all contain high-performance serial transceivers (GTX) and PCI Express cores, as well as Tri-mode Ethernet MAC cores.The SXT sub-family has more block RAM and DSP slices than other sub-families and is ideal for DSP applications.The HXT sub-family has ultra-high speed serial transceivers (GTH).LogicBlock RAMDSPParallel I/OSerial I/OLowest Cost LogicLowest Cost LogicLow-Cost Serial Connectivity
38Virtex® Product & Process Evolution 40-nmVirtex-565-nmVirtex-490-nmVirtex-II Pro130-nmVirtex-II150-nmVirtex-E180-nmVirtex220-nm1st Generation2nd Generation3rd Generation4th Generation5th Generation6th GenerationDelivering Balanced Performance, Power, and CostBasic Architecture 38Virtex-6 Base Platform3838
39Strong Focus on Power Reduction Static Power ReductionHigher distribution of low leakage transistorsDynamic Power ReductionReduced capacitance through device shrinkReduced Core Voltage Devices Lower Overall PowerVCCINT = 0.9V option allows power / performance tradeoffI/O Power ImprovementsDynamic terminationSystem MonitorAllows sophisticated monitoring of temperature and voltageUp to 50% Power Reduction vs. Previous GenerationBasic Architecture 39
40Power Consumption Benefits Virtex-6 Logic FabricVirtex-6 Configurable Logic Block (CLB)Each CLB contains two slicesEach slice contains four 6-input Lookup Tables (6LUT)Slices implement logic functions (slice_l)Slices for memories and shift registers (slice_m)LUT6 implementsAll functions of up to 6 variablesTwo functions of up to 5 or less variables eachShift registers up to 32 stages longMemories of 64 bitsMultiple configurations within a slicePower Consumption BenefitsPerformance BenefitsCost BenefitsShift register mode greatly reduces power consumption over FF implementationIncreased ratio of slice_m – memories available closer to the source or target logicCan pack logic and memory functions more efficientlyBasic Architecture 40
41Higher DSP Performance Most advanced DSP architectureNew optional pre-adder for symmetric filters25x18 multiplierHigh resolution filtersEfficient floating point supportALU-like second stage enables mapping of advanced operationsProgrammable op-codeSIMD supportAddition / Subtraction / Logic functionsPattern detectorLowest power consumptionHighest DSP slice capacityUp to 2K DSP SlicesBasic Architecture 4141
43Power, Performance and Productivity Drive Market Trends Lower PowerLegislation and RegulationsHigher PerformanceSystem Capacity and PerformanceImproved ProductivityReduce Capital and Operating Expenses (OPEX, CAPEX)Flat panel/TV, Central Office, Server Farms, Portable Medical, Portable ConsumerWired Infrastructure, Wireless, Broadcast, 300G+ Networks, Aerospace and Defense, High Performance ComputingAll Market SegmentsSome applications see “performance per unit of power” to be most critical.Others consider “cost per unit of power” to be most critical.Others see both … and have more universal needs for low power that can be summarized as “capability per unit of power”What customer are telling us:PowerSimpler heat sinks and airflowOverall power reduction with fewer and lower cost power suppliesExcessive system operating expensesMandated Energy Star ComplianceHandheld/battery products require low static powerSystem PerformanceSystems continue to drive more bandwidth for chip-to-chip and box-to-boxNeed to interface with cutting edge interface technologyIncrease the amount of parallel processingProductivityLower development costsLower BOM costsIntegrated functionality allowing decreased device countImproved product reliabilityNeed options for further cost reductionCostStrained R&D budgets – do more with less time and less moneyLeverage prior investments#1 Customer Problem: Lower Power enables better Cost, Performance, and CapabilityBasic Architecture 43
44The Unified Architecture Advantage Common elements enable easy IP reuse for quick design portability across all 7 series familiesDesign scalability from low-cost to high-performanceExpanded eco-system supportQuickest TTMArtix™-7 FPGALogic FabricLUT-6 CLBPrecise, Low Jitter ClockingMMCMsKintex™-7 FPGAOn-Chip Memory36Kbit/18Kbit Block RAMEnhanced ConnectivityPCIe® Interface BlocksSimplified design reuse and migrationCommon building blocks minimize time for coding, simulation and de-bugCommon hard IP for familiarity and reliabilityCommon, optimized interconnect for improved place and routeQuickly scale designs to address adjacent marketsMinimum design and deployment effort lowers development costsSimplified design migration enables designers to carry forward today’s investment to future platformsDSP EnginesDSP48E1 SlicesHi-perf. Parallel I/O ConnectivitySelectIO™ TechnologyHi-performance Serial I//O ConnectivityTransceiver TechnologyVirtex®-7 FPGABasic Architecture 44
45The Xilinx 7 Series FPGAs Industry’s First Unified Architecture Industry’s Lowest Power and First Unified ArchitectureSpanning Low-Cost to Ultra High-End applicationsThree new device families with breakthrough innovations in power efficiency, performance-capacity and price-performanceXilinx 7 series FPGAs comprise three unified FPGA families that offer a breakthrough 50% reduction in power and address the complete range of system requirements, ranging from low cost and small form factor packaging for cost-sensitive, high-volume applications to ultra-high end connectivity bandwidth, logic capacity, and signal processing capability for the most demanding high-performance applications.Basic Architecture 45
46Virtex-7 Sub-Families The Virtex-7 family has several sub-families Virtex-7: General logicVirtex-7XT: Rich DSP and block RAMVirtex-7HT: Highest serial bandwidthVirtex-7 FPGAVirtex-7 XT FPGAVirtex-7 HT FPGALogicBlock RAMDSPParallel I/OSerial I/OHigh Logic DensityHigh-Speed Serial ConnectivityHigh Logic DensityHigh-Speed Serial ConnectivityEnhanced DSPHigh Logic DensityUltra High-Speed Serial Connectivity
48SummaryThe Spartan-6 FPGA slices contain four 6-input LUTs, eight registers, and carry logicLUTs can perform any combinatorial function of up to six inputsLUTs are connected with dedicated multiplexers and carry logicSome LUTs can be configured as shift registers or memoriesThe Spartan-6 FPGA IOBs contain DDR registers as well as SERDES resourcesThe SelectIO™ interfaces enable direct connection to multiple I/O standardsThe Spartan-6 FPGA includes dedicated block RAM and DSP slice resourcesThe Spartan-6 FPGA includes dedicated DCMs, PLLs, and routing resources to improve your system clock performance and generation capabilityLatest introduced families are architected for power efficienciesConsists of Artix, Kintex, and Virtex devices