Presentation is loading. Please wait.

Presentation is loading. Please wait.

Redefining the FPGA The first fully programmable system solution designed specifically for intellectual property.

Similar presentations


Presentation on theme: "Redefining the FPGA The first fully programmable system solution designed specifically for intellectual property."— Presentation transcript:

1 Redefining the FPGA The first fully programmable system solution designed specifically for intellectual property.

2 Agenda Technology Roadmap Redefining the FPGA Architecture Overview
The CLB Tile, Vector Based Interconnect, Internal Bus Support, SelectRAM+, Clocking & DLLs, SelectI/O, Thermal Management & The SelectMap Interface Software & Cores Support Summary - A System Level Solution

3 1 Million+ System Gates with High Performance System Solution
Technology Roadmap 1995 1997 1998 1999 XC4000E 2LM - 0.5µm (XC4025E) XC4000EX (XC4036EX) XC4000XL 3LM µm (XC4085XL) XC4000XV 3LM µm (XC40250XV) 1996 Density/Performance Virtex 1 Million+ System Gates with High Performance System Solution 5LM µm (7LM µm)

4 "Virtex moves FPGAs from glue to system component”
Redefining the FPGA GTL+ High Speed System Backplane Low Voltage CPU LVTTL 133MHz SDRAM SSTL3 SRAM Cache (Mbytes) LVCMOS Chip 2 1x CLK 2x CLK "Virtex moves FPGAs from glue to system component” Chip 1 3 4 1 2

5 Redefining the FPGA System Memory System System Integration Timing
2 System Memory 1 System Integration 3 System Timing 4 System Interfaces Value Extends Beyond the Socket

6 Redefining the FPGA 50,000 to 1,000,000 System Gates High Performance
Extremely Dense 1,728 to 27,648 Logic Cells Advanced Process Technology Allows for Almost 10x the Density of Today’s FPGAs System Integration 1 2ns Vector Based Interconnect Predictable Routing Delays Produce a Core Friendly Architecture With Fast Place & Route Times High Performance Routing

7 Redefining the FPGA Bytes Kilobytes Megabytes System Memory 2
200 MHz Distributed SelectRAM Bytes RAMB4_S4_S16 WEB ENB RSTB CLKB ADDRB[7:0] DIB[15:0] WEA ENA RSTA CLKA ADDRA[9:0] DIA[3:0] DOA[3:0] DOB[15:0] Kilobytes 200 MHz Block SelectRAM 200 MHz Access to External Memory Megabytes

8 Redefining the FPGA Multiplication, Division & Phase Generation
CLKDLL CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLK2X CLKDV LOCKED Multiplication, Division & Phase Generation 45 MHz (Divide by 2) DLL 90 MHz 180 MHz (Multiply by 2) 3 System Timing DLL CLK Virtex ZeroDelay Clock Distribution & System Synchronization Route to Other Devices

9 Redefining the FPGA External Devices Backplanes System Interfaces 5.0V
PCI SSTL HSTL SelectI/O Allows Connection Directly to External Signals of Varied Voltages & Thresholds Backplanes GTL GTL+ AGP Future Standards Can be Supported Without Having to Make Silicon Changes 4 System Interfaces

10 Redefining the FPGA System Integration System Memory System Timing
1 System Integration Intellectual Property is Critical for High Density Design & Must Drop in Easily Without Penalty Across an Entire Family System Memory Memory Bandwidth is Always Key Size & Depth Requirements Vary Depending on the Application System Timing Chip to Chip Performance Typically Limits System Speeds Clock Skew is an Important Factor in High Performance Systems System Interfaces Process Technology Leads to Mixed Voltage Systems High performance, Lower Power Signal Standards Have Emerged 2 3 4

11 Redefining the FPGA Virtex New Modules IP Modules Design Reuse
VHDL Design Environment Verilog Design CoreGen Designer #2 DSP FIFO Designer #1 New Modules 160 MHz I/O 133 MHz Memory 1 Million+ System Gates Virtex IP Modules 66Mhz PCI 133Mhz SDRAM Giga-bit Ethernet AllianceCore LogiCore CPU Design Reuse

12 Redefining the FPGA Extremely Dense System Performance & Features
50,000 to 1,000,000 System Gates 1,728 to 27,648 Logic Cells System Performance & Features 160 MHz+ System Performance Multiple DLLs & Block SelectRAM Supports Multiple I/O Standards Internal Performance & Features 100 MHz+ at 3 to 4 Logic Levels TBUFs & Distributed SelectRAM Superior Intellectual Property Infrastructure - CoreGen & Web Proven Software Flows for High Density & Performance - M1.5 Segmented Routing 4-Input LUT Architecture Fast, Flexible I/Os System Building Blocks Software IP Leading Edge Process Technology The World’s First Fully Programmable System-Level Architecture

13 Architecture Overview
2ns Vector Based Interconnect 1 The CLB Tile RAMB4_S4_S16 WEB ENB RSTB CLKB ADDRB[7:0] DIB[15:0] WEA ENA RSTA CLKA ADDRA[9:0] DIA[3:0] DOA[3:0] DOB[15:0] 2 Block SelectRAM CLKDLL CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLK2X CLKDV LOCKED 3 DLL 5.0V 3.3V 2.5V 1.8V PCI SSTL HSTL GTL GTL+ AGP 4 SelectI/O Distributed SelectRAM SelectMAP Configuration Thermal Management

14 The CLB Tile 50,000 to 1,000,000 System Gates High Performance Routing
Extremely Dense 1,728 to 27,648 Logic Cells Advanced Process Technology Allows for Almost 10x the Density of Today’s FPGAs 1 System Integration 2ns Vector Based Interconnect Predictable Routing Delays Produce a Core Friendly Architecture With Much Faster Place & Route Times High Performance Routing

15 The CLB Tile CLB Tile is Composed of a Switch Matrix, Configurable Logic Block, and Associated General Routing Resources All CLB Inputs Have Access to Interconnect on All 4 Sides CLB is Divided into Two Identical Slices Wide Single CLB Functions Slices Have a Bit Pitch of 2 Fast Local Feedback Within the CLB & Direct Connects to Adjacent Horizontal Neighbors DIRECT CONNECT INTERNAL BUSSES

16 Simplified CLB Structure
Slice LUT Carry D Q CE PRE CLR 2 Slices in Each CLB Virtex Slice is Similar in Contents to the Current XC4000 CLB 2 BUFTs Associated with Each CLB, Accessible by All 8 CLB Outputs

17 Detailed Slice Structure
LUT/RAM/ROM/SHIFT A1 A2 A3 A4 WS DI O Write Strobe Logic F4 F3 F2 F1 D Q S R CE 1 GSR G1 G2 G3 G4 BY SR CLK BX F5 from other slice * Position of F5 tap on COUT CIN YB Y YQ XB X XQ * Controlled by the same pair of memory cells ** Implemented as extra inputs on the BX input mux *** CLK and SR inputs are common to both slices Data In Multiplex

18 Wide Single CLB Functions
Slice 1.1ns LUT CLB 0.3ns 2.5ns Implement 13-Input Functions in a Single CLB Builds on XC4000 Architecture 9-Input Function 2 Logic Levels and 1 Local Interconnect Yield a 2.5ns Max Delay

19 Slice Features Two 4-Input LUTs in Each Slice
Includes 2 Highly Flexible Sequential Elements Dedicated Logic for 4x1 & 8x1 Muxes Fast Look Ahead Carry Logic Dedicated Multiplier Fabric New SelectShift Feature Create Shift Registers up to 16 Cycles Deep in a Single 4-Input LUT 4-Input LUTs can be used as Distributed SelectRAM Same as XC4000 Synchronous Modes - Single & Dual Port

20 Flexible Sequential Elements
D CE PRE CLR Q FDCPE S R FDRSE LDCPE G Sequential Elements Can be Flip-flops or Latches 2 in Each Slice, 4 in Each CLB Can be Sourced from LUTs or an Independent CLB Input Separate Set & Reset Controls Controls Can be Synchronous or Asynchronous GSR Can be Used for Power On Set/Reset All Controls Can be Inverted Controls are Shared Within Each Slice

21 Fast Efficient Muxes Primary Use of XC4000 HMAP was to Implement a 2x1 Mux Dedicated Muxes are Faster & More Space Efficient Space Freed Up is Used for Muxes & Other Special Logic MUXF5 Can be Used to Combine the Two LUTs in a Slice to Create a 4x1 Mux or Any Function of 5 Inputs MUXF6 Can be Used to Combine the Two Slices in a CLB to Create an 8x1 Mux or Any Function of 6 Inputs CLB MUXF6 Slice LUT MUXF5

22 Fast Look Ahead Carry Logic
Simple, Fast & Complete Arithmetic Logic Vertical, Up Only Carry Direction Look Ahead Carry Implementation Yields 32-Bit Counters & Arithmetic Functions that Perform at 100MHz+ Discrete XOR Component for Single Level Sum Completion 2 Separate Carry Chains in CLB Allow for 3 Operand Functions

23 Dedicated Multiplier Fabric
CO DI CI S LUT CY_MUX CY_XOR MULT_AND A B A x B Highly Efficient ‘Shift & Add’ Implementation Logic Added for Implementation of Binary Tree Style Multipliers 30% Reduction in Area for a 16x16 Multiply & 1 Less Logic Level

24 SelectShift Dynamically Addressable Shift Registers - DASRs
Ultra-Efficient Programmable Clock Cycle Delay Serial In, Serial Out, Clock, Clock Enable, and Shift Depth Address Single LUT Maximum Cycle Delay of 16 Cascade DASRs for Cycle Delays Greater than 16 CLB Flip-Flops Can be Used for Other Functions or to Add to DASR Depth D Q CE LUT IN CLK DEPTH[3:0] OUT Slice CLB

25 SelectShift 64 Operation A 4 Cycles 8 Cycles Operation B 3 Cycles Operation C 12 Cycles 9-Cycle Imbalance Register Rich FPGAs Allow for the Addition of Pipeline Stages to Increase Throughput Data Paths Must be Balanced to Maintain Desired Functionality

26 SelectShift 12 Cycles 64 Operation A 4 Cycles 8 Cycles Operation B 3 Cycles Operation C Paths Statically Balanced 9 Cycles Operation D - NOP SelectShift Feature of the 4-Input LUT Can be Used to Create NOPs Above Example Uses 64 LUTs to Replace 576 Flip-flops (64*9)

27 SelectShift (continued)
64 Operation A 4 Cycles 8 Cycles Operation B 3 Cycles Operation C 12 Cycles Paths Dynamically Balanced 1/10 Cycles Operation D - NOP # NOP Cycles SelectShift Depth Can be Dynamically Changed Above uses 64 LUTs to Replace 704 Flip-flops & 64 2x1 Muxes Paths Statically Balanced

28 Internal Bus Support One Pair of BUFTs Associated with Each CLB
Same ‘Pitch’ as Slice Carry Logic - 2 Bits/Slice Each BUFT has an Independent Control Input All CLB Outputs can Source Either BUFT Data Input Combine BUFTs to Create Wide Muxes Replace LUT Based Mux Logic to Increase Density Much Faster than Previous Architectures Approximately 10ns to Span Entire XCV Columns Ties Groups of 4 BUFTs with Bi-directional Look Ahead Scheme Similar to Slice Carry Logic

29 Internal Bus Support And-Or Implementation Replaces Three-State Drivers Simultaneously Driving BUFTs will not Cause Contention Capacitance of Entire Load Reduced Dramatically Slow, Power Hungry Pullups & Weak Keepers Unnecessary Output Flexibility Removal of Pullups Allows for Outputs to Span Rows Segments of 4 Columns Allow for Many Outputs Per Row

30 High Performance Routing
2ns Vector Based Interconnect CLB Array General Purpose Routing Routing Delay Depends on Radial Distance Routing Structure Designed to Handle High Fanout Nets 1000+ Loads - Sub 10ns Much More Predictable Predictability is Critical for Core Integration & Reuse Optimized for 5 Layer Metal

31 High Performance Routing
Segmented Routing Architecture Allows For Optimal Connection Delay, Power, Capacitance & Resource Utilization Combined With Timing Driven Place & Route Yields Superior Path Delays Increasing Device Utilization Does Not Decrease Design Performance Resource Mix Optimized for Large Devices - Optimized for 5 LM Algorithmically Friendly Structure Significant Compile Time Reduction Without Performance Penalty DIRECT CONNECT INTERNAL BUSSES

32 High Performance Routing
Advanced Local CLB Routing Massive Hierarchical General Routing Resources Designed For Speed 24 Singles, 72 Hexes, 12 Longs per Tile (4KXL: 8 Singles, 4 Doubles, 12 Quads, 12 Longs per Tile) Selective Connectivity Between Resource Types to Limit Loading Longs and Hexes Can be Used as Secondary Global Resources for Clocks and Controls With Sub 10ns Delays Special Backbone Routing in Top and Bottom I/O Edges to Connect Vertical Longs to Create Low Skew Resources Increased Switch Matrix Connectivity Higher Connectivity Eliminates Congestion

33 Advanced Local CLB Routing
Slice LUT Each LUT Output Can Connect to the Three Other LUTs 100ps to 300ps Maximum Delay Create 13-Input Functions Within the Same CLB - 2.5ns Total Delay Synthesis Tools Use FastConnects on Critical Paths IMUX Receives 96 Connections from General Routing Matrix (GRM) Highly Exhaustive Connection Matrix OMUX Equivalent to 8-bit 13x1 Mux All 8 Outputs Connect to the GRM 2 Outputs Can be Used to Connect Directly to the Horizontal Neighbors All Outputs Can Feed the 2 BUFTs

34 Massive Hierarchical Resources
Routing Needs Based On XCV1000 Loading of Resources Minimized While Connectivity Increased Both Long Lines & Hexes are Buffered To Reduce RC Delays Longs Have Access Every 6 Tiles Hexes Have Access at Ends & Middle Special Hexes Added to Top and Bottom to Create High Fanout Resources with Vertical Long Lines Horizontal Singles Connect Directly to Vertical Long Lines for Fast Control Signal Distribution

35 Increased Matrix Connectivity
Previous Families Use Planar Pipulation Allows for Routing Along Same Channel Restricts Connectivity of Dissimilar Resources Virtex Devices Use Non-Planar Pipulation Allows for Routing Across Resource Types Longs Drive Hexes, Hexes Drive Hexes and Singles, Singles drive Singles and CLB IMUXs - Vertical Hexes Drive CLB Controls Inputs As Well CLB OMUXs Drives All Types Switch Matrix Connectivity Determines Design Routabilty Increased Switch Matrix Connectivity Alleviates Congestion Planar pipulation Non-Planar pipulation

36 SelectRAM+ Bytes Kilobytes Megabytes System Memory 2
200 MHz Distributed SelectRAM Bytes RAMB4_S4_S16 WEB ENB RSTB CLKB ADDRB[7:0] DIB[15:0] WEA ENA RSTA CLKA ADDRA[9:0] DIA[3:0] DOA[3:0] DOB[15:0] Kilobytes 200 MHz Block SelectRAM 200 MHz Access to External Memory Megabytes

37 SelectRAM+ Hierarchy Distributed SelectRAM Block SelectRAM
Proven Synchronous RAM of the XC4000 Families 16x1 Implemented in a LUT - 4 in Each CLB 32x1 Implemented in a Slice - 2 in Each CLB Ideal for DSP Applications Block SelectRAM True Dual Port, Fully Synchronous RAM 4096-Bit Block Configurable in Widths From 1 to 16 Ideal for Data Buffers & FIFOs Fast Access to External RAM 133MHz Direct Interface to SSTL3, 3.3V Synchronous DRAM

38 Distributed SelectRAM
RAM16X1S O D WE WCLK A0 A1 A2 A3 LUT Builds on XC4000 Tradition Synchronous Write Asynchronous Read No Asynchronous Write Use a Single LUT to Create a RAM16X1S Use a Pair of LUTs to Create a RAM32X1S or RAM16X1D RAM16X1D Comes With One R/W Address & One Read Only Address Accompanying Flip-Flops Can Be Used to Register Read RAM32X1S O D WE WCLK A0 A1 A2 A3 A4 RAM16X1D SPO DPRA0 DPO DPRA1 DPRA2 DPRA3 Slice LUT

39 Block SelectRAM True Dual Port Synchronous RAM
2 R/W Ports with Independent Controls Synchronous Read & Write Block Count Increases With FPGA Size 8 Blocks in the XCV Kb 32 Blocks in the XCV Kb Located on Left & Right Sides with 1 Block Every 4 Rows Flexible 4096-Bit Block Variable Aspect Ratio Each Port can be a Different Width Synchronous Reset & INIT Values State Machines, Decodes, Etc Sub-10ns Cycle Time For All Widths RAMB4_S#_S# WEB ENB RSTB CLKB ADDRB[#:0] DIB[#:0] WEA ENA RSTA CLKA ADDRA[#:0] DIA[#:0] DOA[#:0] DOB[#:0] Allowed Widths

40 Block SelectRAM RAMB4_S4_S16 Library Name Specifies Port Configuration DOA[3:0] DOB[15:0] WEA ENA RSTA ADDRA[9:0] CLKA DIA[3:0] WEB ENB RSTB ADDRB[7:0] CLKB DIB[15:0] Port A Out 4-Bit Width Port B In 256-Bit Depth Port A In 1K-Bit Depth Port B Out 16-Bit Width Each Dual Port can be configured with a different width

41 Block SelectRAM The Dual Ports Access the Same 4096 Bits
4096-Bit Storage When Viewed by a Port Configured as 1kx4 The Dual Ports Access the Same 4096 Bits Combine Blocks For Additional Depth & Width The Depth/Width Ratio Determines How the Bits are Accessed For Example: A RAMB4_S4_S16 Has a 1kx4 Port & a 256x16 Port Provides Easy Data Width Conversion Without Any Additional Logic 4096-Bit Storage When Viewed by a Port Configured as 256x16

42 Block SelectRAM Build State Machines & PROM Based Address Decodes
4093 4094 4095 0000 0001 0002 FFFXXXXX FFEXXXXX FFDXXXXX 002XXXXX 001XXXXX 000XXXXX Subdivide 32-Bit Address Space into 4096 1MB Blocks Using a DLL, the Enable is Available Only 5.1ns After the Rising Edge of the External System Clock Enable 1 Clock A[31:20] N/C RAMB4_S1 WE EN RST CLK ADDR[11:0] DI[7:0] DO Build State Machines & PROM Based Address Decodes

43 Clocking & DLLs Multiplication, Division & Phase Generation
CLKDLL CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLK2X CLKDV LOCKED Multiplication, Division & Phase Generation 45 MHz (Divide by 2) DLL 90 MHz 180 MHz (Multiply by 2) 3 System Timing DLL CLK Virtex ZeroDelay Clock Distribution & System Synchronization Route to Other Devices

44 General Clock Support 4 Dedicated Global Low Skew Buffers
Dedicated Input Pin - Intended to Distribute Clocks Only 66 MHz PCI Performance With 500ps Maximum Skew 3ns TSetup /0ns THold - Input IOB Flip-flop with No Data Delay 6ns TClock2Out - Output IOB Flip-flop 24 Additional Shared Resources Intended to Distribute Low Skew/High Fanout Signals Distribute Control Signals Across the Device under 10ns additional clocks, clock enables, three-state controls & resets 4 Delay Lock Loops on Each Device 100% Digital Implementation 2 Global Buffers Associated with Each DLL Pair

45 DLLs Versus PLLs Both types are used to remove clock delay & provide additional clocking functionality Frequency synthesis, Phase adjustment & clock conditioning Both can be implemented using either analog or digital logic CLKIN CLKOUT Programmable Delay Line Control Logic CLKFB Clock Distribution Oscillator DLLs use Programmable Delay Line in Conjunction with Control Logic that Selects the Delay to Match the Distribution PLLs use Programmable Oscillators in Conjunction with Phase Detectors & Filters to Phase Adjust the Clock

46 DLLs Versus PLLs The Oscillator Used in a PLL Inherently Introduces Instability & Phase Error The DLL Architecture is Unconditionally Stable and Does Not Accumulate Phase Error It is Generally Accepted that DLLs are Better for Delay Compensation and Clock Conditioning PLLs Typically Have an Advantage When Performing Frequency Synthesis and Can Operate Over a Larger Input Clock Frequency

47 DLL Functions Virtex Clock Phase Synthesis
For Use Internally Or Externally Clock Mirror Zero-Delay Board Clock Buffer Virtex Speedup Tc2o Zero-Delay Internal Clock Buffer Clock Multiplication & Division

48 DLL Functions Speedup Tc2o by Eliminating Clock Distribution Delay
Generate Phase Shifted Clocks Perform Clock Multiplication & Division Cleanup Clocks with 50/50 Duty Cycle Correction Generate Clock Lock for Internal & External Use Can Require Configuration to Synchronize with DLL Lock DLL Feedback can be Connected Internally or Externally Can be Used to Create Clock Mirrors & Perform System Synchronization

49 DLL Tc2o Speedup Nullify Clock Delay - Fast Tc2o on XCV1000
D Q > Tc2q + Tout = Tc2o CLKext OUT CLKint Tclock = 0ns Nullify Clock Delay - Fast Tc2o on XCV1000 External CLKext pin and Internal CLKint pin are Aligned 2.5ns Setup/0.0ns Hold & 3.5ns Tc2o on All Devices Optional Duty Cycle Correction 50/50 Duty Cycle Correction Applied when Specified Not sensitive to clock input noise - use standard cans

50 DLL Phase Shift Coarse Phase Shifts Available 0°, 90°, 180°, and 270°
Available for Internal & External Use 50/50 Duty Cycle Correction Available 100MHz - 180° Phase Shift 100 MHz (0 Phase) 100 MHz (180° Shift) DLL

51 DLL Multiplication Generate 2x & 4x Clocks
Data Buffer IO 16 Internal Logic 32 CLK 2x x DLL Generate 2x & 4x Clocks Reduce Board EMI and Trace Concerns by Routing Low Frequency Clocks Externally and Multiplying Internally Cross Clock Domains Without Worry Multiplied & Divided Clocks Have Synchronized Edges No External Clock Drift & Minimal External Clock Skew - Eliminates Metastable Events

52 DLL Multiplication 2 DLLs on Top & Bottom
Use 1 DLL on an Edge for 2x Multiplication or Both for 4x Multiplication 180 MHz Maximum Output Frequency 66MHz - 2x Clock Multiplication 66 MHz 132 MHz (Multiply by 2) DLL

53 DLL Division Selectable Division Values 1.5, 2, 2.5, 3, 4, 5, 8, or 16
Input 180 2X DV2 Selectable Division Values 1.5, 2, 2.5, 3, 4, 5, 8, or 16 50/50 Duty Cycle Correction Available Use DLL Pair to Combine Functions 30 MHz - 180° Phase Shift 15 MHz (Divide by 2) DLL 30 MHz 180° Phase Shift - Clock Multiply & Clock Divide 30 MHz (180° Shift) 60 MHz (Multiply by 2) 30 MHz (180° Shift) Used for FB 30 MHz

54 Clock Mirrors Generate Clock Mirrors for Cascaded & Other Devices
Extremely Low Output Skew Rising Edge Skew -20ps* Falling Edge Skew +40ps* Input Output *Actual Device Measurements 100MHz - 100MHz Clock Mirror 100 MHz LVTTL DLL Feedback from External Trace

55 System Synchronization
Synchronize All Devices Eliminate Clock Skew Nullify Clock Input & Board Delay in Addition to Internal Distribution Delay Chip to Chip Race Conditions Removed Increase Chip to Chip Interface Speed - 160MHz DLL FPGA 3 FPGA N CLK FPGA 1 FPGA 2

56 DLL Modes Low Frequency High Frequency
Input Frequency Range - 25 MHz to 100 MHz Minimum High/Low Time ns All 6 Outputs Available for use Internally & Externally CLK0, CLK90, CLK180, CLK270, CLK2X, CLKDV High Frequency Input Frequency Range - 60 MHz to 200 MHz 3 Outputs Available for use Internally & Externally CLK0, CLK180 & CLKDV Both Modes Supported with Simple Design Primitives VHDL & Verilog Simulation Support Available

57 DLL Software Support Use BUFGDLL Macro for Common Clock Usage
Build Complex Structures Using CLKDLL Primitive DLL FB IBUFG BUFG PAD To distributed clock network 0ns BUFGDLL Equivalent Structure CLKDLL CLKIN CLKFB RST CLK0 CLK90 CLK180 CLK270 CLK2X CLKDV LOCKED

58 SelectI/O External Devices Backplanes System Interfaces 5.0V 3.3V 2.5V
PCI SSTL HSTL SelectI/O Allows Connection Directly to External Signals of Varied Voltages & Thresholds Backplanes GTL GTL+ AGP Future Standards Can be Supported Without Having to Make Silicon Changes 4 System Interfaces

59 Supply Voltage Migration
Lower cost Faster speed Higher density Lower power Feature Size (µm) 0.2 0.4 0.6 0.8 1990 1992 1994 1996 1998 2000 2002 5.0 3.3 2.5 1.8 1.3 Voltage Virtex FPGAs Ship 1.0 1.2 Process Technology Migration Leads to Mixed Voltage Systems

60 Supply Voltage Migration
Any 5 V device (XC4000E) Virtex & XC4000XV 2.5 V logic 3.3 V I/O 3.3 V (XC4000XL) 2.5 V I/O Supply Logic Meets TTL Levels Accepts 5 V levels Supply Voltage Sequencing Independent Virtex Supports Additional I/O Standards

61 SelectI/O Allows Connection & Use of a Wide Variety of Devices
Processors, Memory, Bus Specific Standards, Mixed Signal... Provides Industry Standard IEEE/JDEC I/O Standards Maximizes Speed/Noise Tradeoff - Use Only What is Needed Can Connect to or Create High Performance Backplanes PCI, GTL+, HSTL DIY - Virtex Based Backplane Design in Progress Define I/O by Simply Placing Desired Input And/Or Output Buffers Into the Design Special IBUF and OBUF Components Provided in Schematic Based and HDL Based Design Flows For Example: SSTL3, Class I Output Buffer - OBUF_SSTL3_I

62 Simplified IOB Structure
Fast I/O Drivers Separate Registers for Input, Output & Three-State Control Asynchronous Set or Reset Available on Each Flip-flop Common Clock, Separate Clock Enables Programmable Slew Rate, Pullup, Input Delay, Etc Selectable I/O Standard Support Supported Standards List can be Updated After Testing D CE S/R Q DFF/LATCH PAD

63 How It Works SSTL3 Class1 Output Driver SSTL3 Class1 Input Receiver
Configuration Bits SelectI/O Output OBUF_SSTL3_I IBUF_SSTL3_I SelectI/O Input SSTL3 Class1 Input Receiver

64 How It Works Separate I/O & Core Supply Rails
Programmable Driver Strength P & N Drivers Individually Controlled 16 Different Setting for Each Variable I/O & Vref Voltages 8 Banks on Each Device Specific I/Os are Used as Reference Inputs Differential Inputs Supported nMOS for High Vref pMOS for Low Vref VCCO

65 Currently Supported Standards

66 I/O Performance I/O Standard
*DLLs Used to Eliminate Clock Distribution Delay

67 SelectI/O Banks BANK 0 BANK 1 BANK 6 BANK 7 BANK 3 BANK 2 BANK 5

68 SelectI/O Banks Each Device is Broken in 8 Banks Regardless of Size
2 Banks on Each Side of the Device Each Bank has Voltage Sources Shared Among Associated I/Os in that Bank All I/O Requiring a Voltage Source Must be of the Same Type Input Banking - Vref I/O Standards Which use a Differential Amplifier Require a Voltage Reference Input All Fixed Location/Dual Purpose Vref Inputs in a Bank Must be Used When Supplying a Voltage Reference Output Banking - Vcco Dedicated Pins provide drive source voltage for output pins

69 SelectI/O Input Banks 1 Voltage Reference can be Supplied in a Bank
Any input not requiring a Vref can be placed in Bank Flexible Use of Voltage Reference Inputs Pins Can be Used as General Purpose I/O If a Voltage Reference is Not Needed - All Must be Used to Supply a Voltage Reference Locations are Fixed for Each Device/Package Combination Any Single Output Buffer Type Can be Placed in the Bank Multiple Output Buffer Types Must Adhere to Output Bank Rules OBUFTs with Keepers Circuits Requiring a Voltage Reference are Treated as IOBUFs

70 SelectI/O Output Banks
Only One Vcc Output is Supplied to Each Bank Any Output Not Requiring Use of the Vcc Output can be Placed in the Bank Any Single Input Buffer Type Can be Placed in the Bank Multiple Input Buffer Types Must Adhere to Input Bank Rules Special Consideration Must be Given to Configuration I/O Configuration I/O is Located on the Right Side of the Device Serial PROM Downloads Require Vcco Set to 3.3V In Banks 2 & 3 Non-PROM Serial Downloads will generate warning (Even though Vcco Connection dependent on data source)

71 Thermal Management

72 Thermal Challenge Ambient Temp Data Demands Heat Sinking Vcc Tolerance Virtex XCV M Transistors* 100+ MHz Advanced Signal Processing Apps 20W+ Power Dissipation * Pentium II = 7.5 Million Transistors Today’s FPGA Density is Absorbing Large Percentages of Board Designs Because of its Highly Dynamic Nature, Power Can Only be Estimated Before Design Completion Even as Voltages Decrease, Power Consumption is a Major Concern How do I Know My Die Temp is Within Spec?

73 Thermal Solution Remote Die Sensor System Management is Now Possible
Maxim MAX1617 Virtex SBMCLK SBMDATA ALERT* DXP DXN 2-Pin SMBUS Serial Interface Interrupt Remote Die Sensor Specially Designed to be Used With the Maxim MAX1617 Simple 2-Pin Interface with no Calibration Required Provides Two Channels FPGA Die Temp Reported from -40°C to +125°C at +/- 3°C Maxim Die Temp also at +/- 3°C Programmable Over-Temp & Under-Temp Alarms Same Technology as Pentium II System Management is Now Possible

74 SelectMAP

75 Advanced Configuration
Master/Slave Serial JTAG SelectMAP Virtex Simple Serial Interface System Integrated Serial High Performance Parallel Simplified Configuration Mode Set 50 Megabyte/Second Download Rate Using SelectMAP Dedicated JTAG Port - No Contention Issues No Master Parallel Support Direct, JTAG & SelectMAP Device Readback

76 Software & Cores Support

77 HDL Design Entry Focus Synthesis Support is Critical for Large Designs
Architecture Decisions Made Based on Synthesis Tool Tendencies Xilinx Relationships With Synthesis Vendors Initiated Direct 4-Input LUT & Carry Chain Synthesis - The Building Blocks of XL & Virtex Xilinx Will Continue to Drive Synthesis Vendors to Support Virtex Specific Features - Block SelectRAM, SelectShift & CLKDLLs Virtex Architecture Adds Additional Resources That Synthesis Vendors Easily Synthesize To Today Implementation Software Written With Synthesis Tool Flow Focus All Three Major Synthesis Vendors Supported Virtex for Beta Large Designs Also Require Team Based Design Must be able to Support Multiple Designers on the Same Device as Well as Core Integration

78 Implementation Software
Virtex Software is built on proven M1 technology Builds on Robust Integration with Third Party Design Entry Tools Emphasizes Constraint Driven Design Philosophy Vector Based Interconnect Yields More Predictable Routing Results Predictable Results Allows the Placement Algorithms to Make Better Routing Estimations in Must Less Time Architecture fully software tested before 1st silicon Virtex Implementation Software Was Available 18 Months Before Actual Silicon was Produced Used Proven Place & Route Software as a Gauge of the Architecture’s Ability to Meet Density & Performance Needs Early Software Allowed for Changes to be Made in the Finalization of the Architecture - Necessary Routing Mix, Special Features, etc

79 A System Level Solution
2 System Memory 1 System Integration 3 System Timing 4 System Interfaces Virtex is a True System Level Solution

80 A System Level Solution
1 Virtex Opens New System Level Applications to FPGAs Extremely Dense - 50,000 to 1,000,000 System Gates Flexible Architecture Efficient for Random Logic, Memory, DSP & Data Path Circuits Automatically Implemented by Today’s Leading Synthesis Vendors Vector Based Interconnect Much More Predictable Before Place & Route Enhances Synthesis Based Flows Excellent Platform for Core Integration Software Based on Proven M1 Timing Driven Place & Route Hierarchical Memory Support SelectRAM+ Can be Used to Create Bytes or KBytes of Internal Storage and Access MBytes of Fast External Memory 2

81 A System Level Solution
3 System Speedup & Synchronization Nullify Clock Distribution Delays MHz System Performance Synthesize Clocks for Internal and External Use Synchronize Systems - Create Clock Mirrors & Nullify Board Delay Flexible System Interface Controllable Current, Input Vref and Vcco Characteristics Connect Directly to Existing & Emerging I/O Standards SelectMap Protocol Allows for Easy Interfacing to µControllers and µProcessors 400+ Mb/sec Configuration, Verify & Debug Using a Simple 8-Bit Interface SelectMAP Port Can Remain on After Configuration JTAG Can Also be Used to Configure 4


Download ppt "Redefining the FPGA The first fully programmable system solution designed specifically for intellectual property."

Similar presentations


Ads by Google