2 Objectives After completing this module, you will be able to: Describe the clocking resources available in the 7 series FPGAsExplain the contents of the Clock Management Tile (CMT)Add these resources to your design
4 High-Performance Clocking Modern applications have complex clocking requirementsExtremely high-performance clock signalsSupport for multiple frequency domains across a wide frequency rangeDe-skewing of clocks relative to one anotherLow jitter and precise duty cycle to maintain the widest possible data valid windowLowest possible system powerXilinx FPGAs have a rich mixture of clocking resources to accommodate these requirementsThe perfect balance of resources at the right cost
5 Clock ManagementSystems usually require multiple clock frequencies from the same sourceMinimizing the number of oscillators lowers system costExternal clock sources can often be noisyFiltering jitter cleans up clocks widening the data valid windowMany circuits need to be clocked at the same time to ensure correct operationDe-skewing and aligning clocks eliminates hold-time issues and race conditionsClkInOut0110
6 7 Series FPGAs Deliver Powerful Clock Management Global clock buffersHigh fanout clock distribution bufferLow-skew clock distributionRegional clock routingClock regionsEach clock region is 50 CLBs high and spans half the deviceClock management tile (CMT)One Mixed-Mode Clock Managers (MMCMs) and one Phase Locked Loop (PLL) in each Clock Management (CMT)Performs frequency synthesis, clock de-skew, and jitter-filteringHigh input frequency rangeSimple design creation through the Clocking WizardClock BuffersMMCMPLLClock WizardAutomatic HDL code
8 Clock-Capable Inputs (CCIO) All synchronous designs need at least one external clock referenceMany designs require several clock sourcesThese sources need to be brought into the FPGA and connected to the internal FPGA clock resourcesEvery 7 series FPGA has four clock-capable inputs in each bankThese inputs are regular I/O pins with dedicated connections to internal clock resourcesWhen not used as clock inputs, they can be used as a regular I/O pinEach clock input can be used as a single-ended clock input, or can be paired with an adjacent pin to form a differential clock inputEach bank can therefore have four single-ended or four differential clock inputsTwo of the four are Multi-Region Clock Capable (MRCC) and the other two are Single Region Clock Capable (SRCC)Clock-capable pins will have the identifier “CC” in their pin name.An IBUFG or IBUFGDS primitive can be used in your HDL code to indicate that an input must be mapped to a clock-capable pin.
9 Clock Networks An FPGA is a regular array of resources Many of these resources require clocks for synchronous operationsSlice flip-flops, input/output flip-flops, block RAMs, DSP slicesIn order to implement synchronous designs, clocks must be distributed to these clocked elementsFor synchronous operation, clocks must arrive at the clocked elements withExtremely low clock skewEnsures minimal internal hold-time issuesLow clock jitterAllows for highest performanceDuty cycle preservationImportant for Double Data Rate (DDR) applicationsLow insertion latencyImportant for synchronous input and output interfaces
10 Clock Buffer & Routing Column BUFG in Center of Device Die ViewClock Buffer & Routing ColumnBUFHBUFIOHROWBUFRLeaf Cell ClocksThe Clock Buffer and Routing Column contains only the global clock buffers and routing.The MMCMs and PLLs are located in the CMT columns that are immediately adjacent to the IO columns.The global clock buffers (BUFG) are in the middle of the chip. These drive the vertical spines of the global clock network.The horizontal spines of the global clock network run through the center of each clock region. These horizontal spines are driven by BUFH buffers.There are also regional clock routing resources driven by BUFRs.Clocks are driven up and down from the center horizontal row (HROW) of each clock region.The BUFIO are placed within the I/O column, and drive the I/O clock network in that bank.The BUFMR are dedicated buffers that allow clock inputs to drive the BUFIOs and BUFRs of adjacent regions.Transceiver ColumnBUFMRBUFG in Center of DeviceIO ColumnsClock RegionCMT Columns
11 7 Series FPGA Clock Regions Larger clock region50 CLBs high, 50 I/Os highSame size as I/O bankHalf width of device2–24 regions per deviceResources per clock region12 global clock networksDriven by BUFH4 regional clock networksDriven by BUFR4 I/O clock networksDriven by BUFIOIOB25CLB25CLB25P L LThis is a close-up view of the 7 series FPGA clock region. Each clock region is 50 CLBs high and half the width of the device.As opposed to earlier technologies, there is one IOB column per clock region. This simplifies the clock networks in the device.Each clock region contains 12 BUFH buffers that drive the global clock network.Each clock region has four regional clock networks, driven by the BUFRs within the region.Each clock region has four I/O clock networks per I/O column.M M C MIOB25CLB25CLB25
12 Global Clocking – Vertical Spines Global clock networks allow clocks to be distributed to potentially every clocked element on the dieGlobal clocks are driven by BUFGCTRL located in the middle of the dieEach BUFGCTRL drives a vertical spine in the center of the die that runs from the bottom to the top of the dieMost 7 series devices have 32 vertical spinesThe smaller Artix™-7 FPGAs only have 16Vertical Spines of the Global Clocks
13 Global Clock Buffer (BUFGCTRL) 32 BUFGCTRLs (or BUFG) reside in the center of the deviceSmaller Artix-7 FPGAs only have 16BUFGCTRLs can be driven byClock-capable I/O (CCIO) in the same halfCMT outputs in the same halfGigabit transceiver clocks in the same halfOther BUFGInterconnectBUFRBUFGCTRL outputs drive the vertical global clock spineBUFGCTRL component implementsSimple clock buffer (BUFG)Clock buffer with clock switching (BUFGMUX or BUFGMUX_CTRL)Clock buffer with clock enable (BUFGCE)BUFGCTRLOS1S0IGNORE0IGNORE1CE0CE1I1I0The BUFGCTRLs straddle the center line of the device; half the BUFGCTRLs in the top half of the die and the others are in the bottom half.Each BUFGCTRL can be driven by the CCIO, CMT (PLL or MMCM), or gigabit transceivers in the same half of the device. The BUFGCTRL in the top half of the die can only be driven by the resources in the top half of the die.As with all clocking resources, buffers driven by local interconnect have no known phase relationship with their source.
14 BUFGCTRL for Clock Multiplexing SOBUFGMUXA BUFGCTRL can be configured as a clock multiplexerAllows for glitchless switching between two clock sourcesA select signal is generated by the user’s design, which selects between two different input clocksThe selected clock is driven onto the clock networkThe BUFGMUX primitive can be used to configure a BUFGCTRL as a global clock buffer with clock multiplexingTwo flavorsBUFGMUX: S input must be synchronous to current clockBUFGMUX_CTRL: S input can change any timeThe BUFG multiplexer utilizes the S port to implement a selection. The S signal must meet setup time prior to the clock edge of the currently selected clock. Violating the setup time can result in a glitch (check your FPGA user guide for more information).Assuming setup times are met, the functionality of the BUFGMUX macro is as follows:At time T1, S (Select) transitions from high to low.Because I1 is currently high, the multiplexer waits for I1 to transition high to low.After I1 is low, the multiplexer waits for I0 to transition high to low.When I0 transitions from high to low, the outputs are switched to I0.The BUFGMUX_CTRL is similar, but there is no setup/hold requirement on S to guarantee glitchless switching. This primitive replaces the BUFGMUX_VIRTEX4 of previous technologies.I1I0SOT1T2
15 BUFGCTRL for Gated Clock (BUFGCE) A BUFGCTRL can be configured as a clock buffer with clock enable functionalityAllows the clock to turn on and off dynamicallyEnables dynamic clock gating for power savings and the generation of decimated clocksThe Clock Enable (CE) signal is generated by the user’s designThe gating is glitch and runt free if the CE is generated synchronouslyThe clock is still driven onto the clock networkThe BUFGCE is a global buffer with a built-in CEBUFGCEIOCEDisabling the clock while I is high will not disable the clock until I transitions from high to low; likewise, enabling the clock while I is high will not enable the clock until after I transitions from high to low.The BUFGCE must be instantiated.A BUFGCTRL can also be used for other functions, such as an asynchronous multiplexer or a clock multiplexer with clock enable capability. These can be accomplished by instantiating the full BUFGCTRL and managing the control inputs in your design.ICEOHeld LowEnable Clock afterHigh-to-Low Transition on I
17 Summary Clock region height = 50 CLBs, 50 IOBs Four clock-capable inputs per bankThirty-two global clock buffersTwelve global clocks per region, delivered via BUFHFour BUFIOs per I/O bankTwo BUFRs per I/O bankTwo BUFMR per I/O bankTwo columns of Clock Management Tiles (CMTs) per deviceAdjacent to I/O banksThe CMT has one Mixed-Mode Clock Manager (MMCM) and one PLL
18 Where Can I Learn More? User Guides Xilinx Education Services courses 7 Series FPGAs Clocking Resources User GuideDescribes the complete clocking structuresXilinx Education Services coursesDesigning with the 7 Series Families courseXilinx tools and architecture coursesHardware description language coursesBasic FPGA architecture, Basic HDL Coding Techniques, and other Free Videos!
23 Global Clocking – Horizontal Spines Each clock region has 12 horizontal spines for carrying global clocksClock regions on left and right have 12 independent horizontal spinesThese horizontal spines can drive all clocked resources within the regionSlice flip-flops, distributed RAMs, SRLs, block RAMs, DSP slices, IOB flip-flops, CLKDIV of ISERDES and OSERDES ...Each horizontal spine is driven by a BUFHHorizontal Spines of the Global Clocks
24 Horizontal Spine Clock Buffer (BUFH) Each of the 12 horizontal spines in a clock region is driven by a BUFHDriven byCMTs in the same regionBUFGCTRL via vertical clock spineClock-capable inputs in the same horizontal rowCan be driven from bank on the left or rightInterconnectCan be a simple clock buffer (BUFH), or a clock buffer with clock enable (BUFHCE)BUFHCE is glitchless when CE_TYPE set to “SYNC”When a global clock (from a BUFGCTRL) is required in a clock region, a BUFH is automatically inferred by the toolsBUFHIOBUFH drives a horizontal clock row into each clock region. Each clock region has 12 BUFHs.There is also a BUFHCE primitive with a clock enable input, which is helpful for turning off a local clock into a region to help minimize power consumption.If you instantiate a BUFH or BUFHCE in your design, an area constraint may be required to keep all of the clock loads confined to a single clock region.BUFHCEIOCE
25 Regional Clock Networks Each clock region has four regional clock networksRegional clocks in the left and right half are separateThese networks are in addition to the 12 horizontal spines of the global clockThese regional networks can drive all clocked resources within the regionEach horizontal spine is driven by a BUFRThe regional clock networks reach the clock inputs of all the synchronous elements within the region.Regional Clock Networks
26 Regional Clock Buffer (BUFR) Four BUFR in each clock region – one for each regional clock networkDriven byClock-capable inputs in the regionInterconnectGigabit transceivers in the same regionMMCM high-performance clocks in the same regionBUFMR in the same region or region above or belowThe output of the BUFR can be a divided version of the input clock. Dividers of 1 to 8 are supportedCLR and CE can be used to control which input clock rising edge generates the output clock rising edgeBUFRCLRCEThe relationship between BUFRs and regional clock networks in 7 series FPGAs is simpler than in previous technologies. Because there are four regional clock networks and four BUFRs per region, the relationship is one to one.BUFRs cannot directly drive regional clock networks in adjacent regions. This is accomplished using the new BUFMR. The BUFMR has features for synchronizing multiple BUFRs.÷IO
27 Clocking CLB Resources In every region there are 12 horizontal spines of the global clock network and 4 regional clocksAll 16 clocks can clock resources within the regionOnly 12 of the 16 can enter the double column of CLBs in the top and bottom half of the clock regionThus, any 12 of the 16 clocks are available for the 50 CLBs in a pair of columns above the HROW and 12 of the 16 clocks are available for the 50 CLBs in the pair of columns belowThe CLBs are arranged as symmetrical pairs of columns. Each column is 50 CLBs high, with 25 above the HROW and 25 below the HROW.The two adjacent half columns share 12 routing tracks from the HROW clocking resources. Thus, for the 50 CLBs that are in two adjacent columns above the HROW, only 12 of the 16 clocks available in the HROW can be used. Similarly, an independent set of 12 of the 16 clocks can be used for the 50 CLBs in the two half columns below the HROW.A similar restriction exists for the IOBs in the I/O column. Any 6 of the 12 clocks on the horizontal spines of the global clock can be used for the 25 IOBs above the HROW, and any 6 of the 12 can be used below. In addition to the 6 global clocks, all 4 regional clocks can be used by all IOBs (as can all 4 of the I/O clocks, shown later).12 Horizontal Spines of the Global Clock12121212124 Regional Clock Networks
28 I/O Clock Networks Each clock region has four I/O clock networks These networks can drive only the clock ports of the ILOGIC/OLOGIC resources and the high-speed clock ports (CLK) of the ISERDES/OSERDESEach I/O clock network is driven by a BUFIOHighest quality clock availableThe I/O clock networks can drive the clock pins of the IOB resources only:DDR and SDR flip-flops in the ILOGIC/OLOGIC block.High-speed clock port (CLK) of the ISERDES and OSERDES resources.Because of the low fanout, they have extremely low skew and a very short insertion delay (ideal for source-synchronous interface applications).These networks add the least jitter of all clock networks, and have run at the highest speeds (up to 1066 MHz).I/O Clock Networks
29 I/O Clocking The I/O clock networks are driven by the BUFIO The BUFIO is a simple clock buffer with no additional functionalityDriven byClock-capable inputs in the same I/O bankMMCM high-performance clocks in the same regionThese are immediately adjacent to the I/O bankThe BUFMR in the same region or region above or belowInterconnectBUFIOIO
30 Source-Synchronous Interfaces I/O and regional clock networks combined with ISERDES/OSERDES provide powerful tools for creating source synchronous interfacesBUFR is set to ÷N if interface is SDR, or ÷(N/2) if DDRN can be 2 to 8 in SDR, and 2 to 14 in DDRThe output of the BUFIO and BUFR are guaranteed to be in phase, allowing proper clock crossing between CLK and CLKDIV in ISERDES/OSERDES.For high-speed output interfaces, the BUFIO and BUFR can be driven by a high-speed clock generated by a high-performance clock from the MMCM in the same region.FPGA FabricNDataISERDESIOCLKCLKDIVCLKBUFIOCCIO÷BUFR
31 Multi-Region Clock Buffer (BUFMR) Each bank has two BUFMRsAllows a single clock source to be driven to the BUFIO and/or BUFR in the region above and belowCan be used to create an interface with more than 50 pins (maximum 150)Driven byMulti-region clock capable I/O (MRCC) pinGigabit transceiver clockCan drive only the BUFIO and BUFR in the same region, region above and region belowThe BUFMRCE has an attribute that selects whether the CE is synchronous, allowing glitch-free enabling and disabling of the clock, or asynchronous.BUFMRIOCEBUFMRCEIO
32 Synchronizing BUFRs Driven by a BUFMR In order to clock a single interface that spans multiple banks, a BUFMR must be used to drive the BUFIO and BUFR in the different regionsThe dividers on each BUFR are independent; they must be synchronized in order to ensure proper operation of the interfaceUse a BUFMRCE to disable the clock feeding the BUFRsAssert the CLR on all the BUFRsThis resets the dividers in the BUFRsDeassert the CLR on all the BUFRsThis allows the dividers to start on the next rising edge the input clock (currently gated)Assert the CE on the BUFMRStarts the clocks to all BUFRsBUFRs are now in syncCE÷CLRBUFMRCEBUFR
33 High-Performance Clocks Four performance paths driving the I/O column clocked resourcesDriven byMMCM outputs O0-O3Can driveBUFIOBUFROSERDES clocks directlyCleanest path from MMCM to I/O columnsLower jitter than any other routingSoftware automatically places critical signals onto performance path routing4MMCMPerformance path routing enables fast I/O interfaces.If you drive a BUFIO and/or BUFR in a region with the O0–O3 outputs of the MMCM in the same clock region, then the software will automatically use the high-performance paths for this route.Note: The other outputs of the MMCM and the output of the PLL cannot use these paths.
34 Connections to the Center Column The BUFGCTRL and BUFH reside in the center column of the FPGAThe CCIO, BUFIO, and BUFR reside in the I/O columnThe MMCM and PLL reside in the CMT column, immediately adjacent to the I/O columnGTs source clocks in the GT columnDedicated routing paths exist in the HCLK row to bring clocks between these resourcesThe tools will use these routes when necessaryThere are 14 dedicated nets to bring clock inputs to the central clocking column. These can drive the BUFHs for the region or the BUFGCTRLs for the device.The 14 nets can be driven by:The 4 CCIO in the I/O columnThe MMCM and PLL outputs in the CMT columnThe gigabit transceiver clock outputs in the GT columnI/OCMTCLKGT
35 Stacked Silicon Interconnect Implications Larger 7 series devices are made up of Super Logic Region (SLR) blocksSLR boundaries are on clock region boundariesThe vertical spines of the global clock run through all SLRsEach SLR has 32 BUFGCTRLsAny 32 of the total number can be usedExpect increased clock skew between SLRs—the software tools will automatically account for thisBUFMR tracks cannot pass from one SLR to the nextIn devices that are made up of multiple SLRs, there are still 32 vertical spines of the global clock. These spines traverse all the SLRs.Each SLR has the composition of a full FPGA. There are multiple clock regions, I/O banks, configuration blocks, etc. Therefore, each SLR has 32 BUFGCTRLs that can drive the vertical clock spines. However, only one BUFGCTRL can drive any one clock spine, regardless of which SLR the BUFGCRL resides in. Thus, though there are Nx32 BUFGCTRLs in the complete device and only 32 of them can be used.Because each SLR is a different die, its fabrication process can vary more from SLR to SLR than can be expected from a single die. Therefore, it is possible to see more clock skew on the global clock tree between different SLRs. The ISE™ tools will automatically take this into account during placing and timing analysis. However, to obtain the best performance, it may be a good idea to perform floorplanning to restrict blocks to one SLR.The tracks that connect the BUFMR outputs to the BUFIO/BUFR in adjacent regions do not cross from one SLR to the next. Therefore, the BUFMR in one SLR cannot drive the BUFIO/BUFR in an adjacent SLR.
36 7 Series FPGA Clock Network Summary Clock regions are 50 CLBs tallClock regions span one half the dieClock regionsmatch I/O banks50 I/Os per I/O bank4 differential or single-ended clock capable inputs4 BUFIO and4 I/O clock networks4 BUFRs and4 regional clocks12 horizontal clock spinesand 12 BUFH
38 Summary Don’t start trying to control your design’s routing This presentation has noted when the tools will automatically choose the appropriate routing resourcesInstantiate the appropriate primitiveMake certain you connect your Serdes clocks appropriately using the proper routing resourcesConsider using area constraints to isolate logic into a single SLR, if possibleStacked Silicon devices still have 32 global clocks
39 Where Can I Learn More? User Guides Xilinx Education Services courses 7 Series FPGAs Clocking Resources User GuideDescribes the complete clocking structuresXilinx Education Services coursesDesigning with the 7 Series Families courseXilinx tools and architecture coursesHardware description language coursesBasic FPGA architecture, Basic HDL Coding Techniques, and other Free Videos!
44 MMCM and PLL Features Up to 24CMTs per device Two software primitives One MMCMs and one PLL per CMTTwo software primitives*_BASE has only the basic ports*_ADV provides access to all portsPLL is primarily intended for use with the I/O phaser for high speed memory controllersThe MMCM is the primary clock resource for user clocksCLKIN1CLKFBINCLKOUT<6:0>CLKOUTB<3:0>CLKFBOUTMMCME2_BASERSTLOCKEDPWRDWNMMCME2_ADVCLKIN2CLKINSELDRPPhase ShiftCLKINSTOPPEDCLKFBSTOPPEDCLKOUT<5:0>PLLE2_BASEPLLE2_ADVEach CMT has one MMCM and one PLL. The two resources share some of the same basic functionality, but the MMCM has some additional features for manipulating user clocks.While the PLL can also be used for user clocks, it is primarily intended to be used in conjunction with the I/O phaser and I/O FIFOs for high speed memory controllers.While there is only one MMCM per CMT (as opposed to two in the Virtex-6 FPGA), there are generally twice as many CMTs in a device, because there are two columns of CMTs (as opposed to one in the Virtex-6 FPGA). This allows the PLLs to be reserved for high-speed memory controllers.
45 PLL Architecture FOUT FIN FPFD FVCO Lock Detect Lock Monitor LOCKED CLKINSEL8-phasetapsCLKIN1CLKIN2DPFDCPLFVCOO0CLKOUT0O1CLKOUT1D: Input clock Divider.PFD: Phase Frequency Detector – Compares input and feedback, generating phase proportional voltage.CP: Charge pump – Integrates change information from PFD.LF: Filters CP signal – Limits frequency change rate of VCO.Programmable Loop Filter characteristic trades off bandwidth vs. locking range.VCO: Voltage Controlled Oscillator – Generates a clock based on the voltage output of the LF.Generates 8 phases: 0°, 45°, 90°, 135°, 180°, 225°, 270°, and 315°.O0-O5: Output counters – Independent dividers of the VCO frequency.Can take in any one of the 8 phases.M: Feedback counter (Multiplier) – Divides the VCO clock to “re-generate” the input clock.Closes the loop.Any one of the O0-O5 outputs can be used via the CLKFBIN path.FINFPFDFVCOO2CLKOUT2MO3CLKOUT3CLKFBINCLKFBOUTO4CLKOUT4FVCO = FIN * M / DFOUT = FVCO / O = FIN * M / D / OO5CLKOUT5
46 PLL Features Frequency lock between FPFD and CLKFB Phase lock between CLKIN and CLKFBFrequency synthesis for up to 6 different outputsFout = Fin * M / (D*O)One M and one D value per PLLEach PLL output can have its own O valueM: 1-64; D: 1-56; O: 1-128Static phase shifting increments of 1/8 of the VCO periodEach output has independent selection of the phaseDuty cycle generation in increments of 1/8 of the VCO periodReference clock switchingLock detectionPWRDWN signal to disable regulated power supplyThere is one M and D divider per PLL. These two dividers determine the frequency of operation of the VCO.FVCO = FCLKIN * M / DEach output has its own divider, O. The output frequency is FVCO/O.
47 Extra MMCM Features Fractional counters Dynamic phase shifting Ability to configure O0 and M as a counter with 1/8th granularity (for example, 2.125, 2.250, 2.375, etc.)Enables many more VCO frequencies and output frequencies to be synthesizedDynamic phase shiftingFine phase shiftingDivider cascading for wider divider rangeO4 divider uses resources from O6O0O1O2O3O4O5O6CLKFBOUT4590135180225270315VCOOutputs
48 Additional MMCM Signals Complement outputsO0-O3 of every MMCM have both true and complement outputsCLKOUT6 output (with O6 divider)Dynamic phase shift portPSCLKPSENPSINCDECPSDONEVCOLFCPPFDO0CLKFBCLKIN1CLKIN2RoutingClockSwitchDCLKINSTOPPEDLockCLKFBSTOPPEDStopDetect9O1O2O3O4O5O6MCLKFBOUTHOLD
49 MMCM and PLL InputsPLL and MMCM input clock ports (CLKIN1 and CLKIN2) can be source fromCCIO in the same clock regions (recommended)CCIO in a vertically adjacent clock regionsBUFGBUFRBUFHInterconnect (not recommended)Gigabit transceiver clockCLKFB can be sourced by the same sources except the gigabit transceiver clockPLL/MMCM outputs can drive BUFG and BUFH directlyMMCM high-performance clock outputs (O0 – O3) can drive BUFIO, BUFR, and CLK of OSERDES in the same region directly
50 Phase LockingThe control loop of the PLL/MMCM works to keep the CLKIN and CLKFBIN clocks in phaseRegardless of the delay between the CLKFBOUT and the CLKFBINThe compensation is dynamicEven if the magnitude of the delay changes in time, the PLL will adjust to keep the two clocks in phasePLL/MMCMDelayCLKINCLKFBOUTCLKFBINRSTLOCKED
51 Frequency GenerationThe same mechanism that keeps clocks in phase generates new frequenciesBy placing a divider of M between the VCO and the CLKFBOUT output, the VCO must run at M times the CLKIN frequency in order to achieve phase lockSimilarly, by placing a D divider between the CLKIN port and the PFD block, the VCO must now run at M/D times the frequency of CLKIN in order to achieve phase lockFinally, the output dividers divide the VCO frequency for the output portsVCOLFCPPFDO0CLKFBCLKIN1CLKIN2RoutingClockSwitchDCLKINSTOPPEDLockCLKFBSTOPPEDStopDetect9O1O2O3O4O5O6MCLKFBOUTHOLD
52 Interpolated Fine Phase Shift (IFPS) In addition to the static phase shifting capability available in the PLL and MMCM, the MMCM can also perform interpolated fine phase shiftingThe VCO output can be shifted in increments of 1/56 of the VCO periodThe shift amount can be set statically or can be dynamically controlledThe shifted VCO output can be selected as the input to any of the clock outputs or the CLKFBOUT outputThe shift amount is 0 to 360°The shift amount wraps aroundThe VCO generates eight phase shifted outputs, each shifted by 45°.The Interpolated Fine Phase Shift (IFPS) module generates a ninth output from the VCO, which can be shifted by any amount, in increments of 1/56 of the VCO period (therefore 6.43°).Any of the eight output dividers (including the M divider for the CLKFBOUT output) can select any eight of the static phase-shifted VCO outputs or the fine phase-shifted VCO output.Due to the phase locking at the PFD, shifting the CLKFBOUT output clock will result in the VCO phase advancing to maintain lock. Thus, a positive phase shift on the CLKFBOUT will result in all outputs displaying a negative shift.The value of the fine phase shift can be statically set or can be changed dynamically using the PSCLK, PSEN and PSINCDEC port; on any PSCK where PSEN is asserted, the phase will advance by 1/56 (PSINCDEC=1), or retard by 1/56 (PSINCDEC=0). Once initiated, another shift cannot be attempted until PSDONE is signaled (for one PSCLK).VCOO0O1MCLKOUT0CLKOUT1CLKFBOUTIFPSPSINCDECPSENPSDONE8
54 SummaryPLL and MMCMs offer a BASE (basic ports) and ADV (all ports) primitivesVCO is the voltage controlled oscillatorIdeally, the PFD should be as high as possible (within a valid range)If the frequency of VCO is high, you can generate more output frequencies and you generally get better jitter performanceHigher frequencies of VCO uses more powerThe MMCM also provides a fixed phase shift, interpolated phase shift, and active phase shift functionality
55 Where Can I Learn More? User Guides Xilinx Education Services courses 7 Series FPGAs Clocking Resources User GuideDescribes the complete clocking structuresXilinx Education Services coursesDesigning with the 7 Series Families courseXilinx tools and architecture coursesHardware description language coursesBasic FPGA architecture, Basic HDL Coding Techniques, and other Free Videos!
58 Objectives After completing this module, you will be able to: Detail the clocking resources available in the 7 series FPGAsSpecify the resources available in the Clock Management Tile (CMT)Use the clocking resources in your design
59 Lessons Overview Clock Networks and Buffers Clock Management Tile Usage ModelsUsing Clock ResourcesSummaryThe next several slides illustrate different methods for connecting the MMCM or PLL to manipulate clock signals.
60 Clock Insertion DelayBringing the clock in from an IBUFG, through a BUFG, and distributing it through the global clock tree introduces delay on the clock pathDelay is Process/Voltage/Temperature (PVT) dependentIncreases setup/hold window on input flip-flopsIncreases output valid uncertainty on output flip-flopsRestricts high-speed interface design using global clock resourcesThe clock insertion delay depends on the depth of the clock tree, which increases with FPGA size.IBUFGBUFGIBUFDQClock Insertion DelayClockRequired Data Valid(without clock insertion)(with clock delay)
61 Clock Insertion Delay Removal Example IBUFGBUFGPLL/MMCMCLKINCLKFBOUTTo logicCLKFBINRSTLOCKEDThe PLL or MMCM can add some additional delay to the feedback path to cancel out the delay of the IBUFG.This is done automatically by the tools by setting the COMPENSATION attribute to ZHOLD.This will guarantee a zero or negative hold time requirement for all IOBs driven by the global clock.Both PLLs and MMCMs need time after configuration to align the feedback clock to the input clock.For details about lock times, see the datasheet for your device.The PLL/MMCM keeps CLKIN and CLKFBIN in phase, cancelling out all delay between CLKFBOUT and CLKFBINIn this case, the delay is that of the BUFG and global clock networkThe delay is cancelled regardless of PVT variation over timeThus, the clock arriving at the logic driven by the global clock is in phase with the input clock; the clock insertion delay is removedLOCKED
62 System-Synchronous Clocking Internal de-skew allows for synchronous interfaces with no hold-time violationsThe tools will automatically select COMPENSATION=ZHOLDFPGA 1FPGA 2OBUFIBUFBy using the MMCM or PLL to remove the clock insertion delay, system components using a common clock can easily communicate by using simple synchronous interfaces.Hold-time requirements of the receiving flip-flops are guaranteed to be met.Communication can be done at moderate speeds over significant distances.The length of the board routing from the common clock to the IBUFG inputs in both FPGAs should be matched.Note that the same type of input and output buffers should be used to balance this system.DQQDIBUFOBUFQDDQIBUFGBUFGIBUFGBUFGPLL/MMCMPLL/MMCMINOUTINOUTFBINFBINCLK
63 Using Internal Feedback IBUFGPLL/MMCMBUFGCLKINCLKINCLKOUT1CLKA (To logic)CLKFBINCLKOUTFBRSTLOCKEDWhen using a PLL/MMCM for frequency synthesis or jitter reduction onlyPhase of CLKA has no requirements with respect to CLKINTools will select the internal feedback path, which requires no routing or logic outside the PLL/MMCMTools will automatically set COMPENSATION=INTERNALLOCKED
64 Using Multiple PLL/MMCM Clocks Internally IBUFGPLL/MMCMBUFGCLKINCLKOUT0CLKA (To logic)BUFGCLKFBINCLKOUTFBCLKB (To logic)RSTLOCKEDBecause all BUFGs are identical (running at the same process/temperature/voltage), and the global clock networks are fully buffered (hence, load independent), the frequency and phase relationships of PLL/MMCM outputs are preserved through the clock networks.The implementation and timing analysis tools understand these relationships and correctly analyze the paths between flip-flops on these different domains.For example, if CLKOUT0 is programmed to be twice the frequency of CLKFBOUT:The tools know that static paths between the two domains are constrained by the period of the faster clock.If the frequency is the same, but the phase is shifted (CLKOUT1 advanced by 90° from CLKFBOUT):Paths from CLKA to CLKB are constrained to ¾ of a clock period.Paths from CLKB to CLKA are constrained to ¼ of a clock period.Multiple outputs of the MMCM/PLL have known phase relationshipsFor example, CLKA may be the same frequency as CLKB, phase shifted 180°The delay through all BUFGs and clock networks are identicalSo CLKA and CLKB arrive at loads with the same phase relationshipCrossing from one clock domain to the other can be accomplished synchronouslyLOCKED
65 Clock Mirror ExampleOBUFODDRIBUFGBUFGPLL/MMCMCLKINCLKOUTOBUFODDRCLKFBINIBUFGRSTLOCKEDThe delay through the two ODDRs and OBUFs are matched.They have the same process, temperature and voltage and, hence, will have the same characteristics.The delay through the two IBUFGs is also matched.The delay of the BUFG, global clock network, ODDR flip flop, output buffer, and routing track back to the pin driving the CLKFBIN port will be cancelled.The tools will automatically select COMPENSATION=EXTERNALThe PLL or MMCM keeps the input and feedback clock in phaseThe feedback clock is not restricted to being the output of the global clock networkIn this example, two (or more) identical output clocks are generatedOne is used as the feedback clockThe others are mirrors; hence, in phase with each other and the input clockLOCKED
66 Design Example Requirement Solution 33.3-MHz external oscillator controls533.3-MHz data being generated by I/O logic (BUFIO)Large amount of logic at 66.6 MHz (BUFG)Small design at 54 MHz (BUFH)Phase relationship between input clock and output clock is irrelevantSolutionMMCM valuesM=32, D=1. FVCO =O0= uses fractional divider (1066/19.75 = 54 MHz))O1=2 (1066/2 = 533 MHz)O2=16 (1066/16 = 66 MHz)MMCMPerformancePathBUFIOBUFHBUFRCLKIN1CLKOUT0CLKOUT1CLKOUT2CLKFBINCLKFBOUTBUFG÷IBUFGThis example shows how a single MMCM can be used to generate the multiple clocks required by your system.This reduces overall system cost.
68 Inference Clock networks are represented by nets in your RTL design The mapping of an RTL net to a clock network is managed by using the appropriate clock buffer to generate that netCertain resources can be inferredA primary input net (with or without an IBUF instantiated) will be mapped to a global clock if it drives the clock inputs of clocked resourcesThe BUFG will be inferredBUFH drivers will be inferred whenever a global clock (driven by a BUFG) is required in a clock regionBUFHs for each region required will be inferredBUFIO, BUFR, and BUFMR cannot be inferredInstantiating these buffers tells the tools that you want to use the corresponding clock networksPLLs and MMCMs cannot be inferred
69 InstantiationAll clocking resources can be directly instantiated in your RTL codeSimulation models exist for all resourcesRefer to the Library Guide for HDL DesignsUse the Language Templates ( )PLLs and MMCMs have many inputs and outputs, as well as many attributesOptimal dividers for obtaining the desired characteristics may be hard to deriveThe Clocking Wizard via the CORE Generator™ interface can help
70 Clocking WizardThe Clocking Wizard walks you through the generation of complete clocking subsystemsSpecify the input frequency and desired output frequencies, phases and duty cyclesSpecify the desired buffersChooses the optimal PLL/MMCM settingsInstantiates all clock and input buffers
72 SummaryThe Architecture Wizard makes it easy to build and instantiate your clocking resourcesProvides instantiation templateGui for customization of the componentsIncludes a jitter calculator and calculates parameter valuesApplication specific instantiations (such as Clock Mirror, System Synchronous Clocking, and Feedback options)
73 Where Can I Learn More? User Guides Xilinx Education Services courses 7 Series FPGAs Clocking Resources User GuideDescribes the complete clocking structuresXilinx Education Services coursesDesigning with the 7 Series Families courseXilinx tools and architecture coursesHardware description language coursesBasic FPGA architecture, Basic HDL Coding Techniques, and other Free Videos!