Presentation on theme: "A Practical Guide to DDR2 Design with Spartan-3A DSP"— Presentation transcript:
1A Practical Guide to DDR2 Design with Spartan-3A DSP Featuring ISE 9.2 and the Xilinx Spartan-3A DSP 1800A Starter PlatformThis is a 6.5 hour course. Typically start at 8:30 am and end by 4:30 pm, with a 1 hour lunch, and 30 minutes for welcome and introductions.FAEs – please direct your questions to Bryan FletcherAudienceHardware designers interested in memory interfaces.Emphasis on stand-alone, MIG-based DDR2 designs.Since EDK is moving to MIG-based memory controllers, it also applies to processor people, although we won’t specifically cover MPMCLab hardwareSpartan-3A DSP 1800A Starter Platform with XC3SD1800A and 32Mx32 Micron DDR2SoftwareISE 9.2 with SP3 and IP Update #2, which includes MIG 2.0 with support for Spartan-3A DSPDaily Schedule:8:30 Welcome, Eating snacks, getting settled (15 minutes)8:45 Intro to the course and instructor (15 minutes)9:00 Lecture 1 (60 minutes)10:00 Lab 1 (60 minutes)11:00 Lecture 2 (60 minutes)12:00 Lunch (60 minutes)1:00 Lab 2 (90 minutes)2:30 Lecture 3 (60 minutes)3:30 Lecture 4 (30 minutes)4:00 Lab 3 (30 minutes)4:30 Finish
2Course Objectives By the end of the day, you will Build a functioning DDR2 controller in hardwareKnow what’s required to design your own boardShow how to get a DDR2 controller running on existing hardware, then explain how to duplicate the hardware.Avnet designed and built the Spartan-3A DSP 1800A Starter Platform for Xilinx. We were successful in designing a 32Mx32 DDR2 interface that worked on our 1st prototype. On the hardware design side, we’d like to share with everyone what we did to have that success.As a note to FAEs, the following are NOT covered in much detail by intention:Coverage of SDR, DDR-1, DDR-3, QDR or any other xDRHowever, many of the principles applyWe had to pick a specific case to coverDetailed discussion of the FPGA architectureVirtex-4 or Virtex-5Again, the principles apply, but we won’t specifically discuss VirtexDetailed discussion on the inner workings of the controllerMemory controllers in embedded processor designDesigning for DIMMs
3Morning Agenda Memory, FPGAs, and Memory Controllers Memory trendsDDR2 signalingXilinx FPGA memory controllersMemory Interface Generator (MIG)Lab 1 – Generate a DDR2 controller coreReal-world Design with a MIG DDR2 ControllerInterface to the MIG controllerLogically simulateHardware debugLunch BreakJust hit the main section titles here. Cover the section sub-topics right before each section.
4Afternoon AgendaLab 2 – Build and verify a DDR2 controller in hardwarePCB ConsiderationsFPGA pinoutFactors impacting signal quality and crosstalkPCB simulation example for DDR2Trace requirementsPowerCustomizing and Verifying the MIG ResultsPinout rulesPin-swappingVerifying a new designLab 3 – Analyze and Fix Customized MIG ControllersJust hit the main section titles here. Cover the section sub-topics right before each section.
5A Practical Guide to DDR2 Design with Spartan-3A DSP Memory, FPGAs, and Memory Controllers
6Memory, FPGAs, and Memory Controllers Memory trendsDDR2 signalingXilinx FPGA memory controllersMemory Interface Generator (MIG)Lab 1 – Generate a DDR2 controller coreReal-world Design with a MIG DDR2 ControllerInterface to the MIG controllerLogically simulateHardware debugLunch BreakJust hit the main section titles here. Cover the section sub-topics right before each section.
7The FPGA/Memory Interface Memory interface success in an FPGA is dependent on many thingsFPGA fabricControllerMemoryClockPCB LayoutPowerWe’ll cover all these topics todayMemoryControllerFPGATerminationPowerClockThe diagram shows a simplified memory interface. We can see the major pieces – the memory, the FPGA, and the controller built inside the FPGA. Other critical pieces are also shown, including the termination, PCB traces, power circuitry, and the PCB itself.All of these things play a part in the total memory interface solution. All of them must be accounted for during the design phase for the interface to work.PCB
8DDR2 Interface Covered Today FPGASpartan-3A DSP XC3SD1800AMemoryMicron DDR2 MT47H32M16ControllerXilinx Memory Interface Generator (MIG)PCB/Power/ TerminationsAvnet-designed Spartan-3A DSP 1800A Starter PlatformThe complete FPGA/memory interface is covered today. We’re using a Xilinx DDR2 controller inside a Xilinx Spartan-3A DSP FPGA to interface with Micron DDR2 chips. All of this is encompassed on the Avnet-designed, Xilinx Spartan-3A DSP 1800A Starter Platform, which we’ll use to examine the various PCB factors of the interface.
9Why DDR2? Compared to DDR-1 Compared to DDR-3 Less expensive More readily availableLower powerLarger varietiesOn-die termination (ODT)We’ll show details on this laterDifferential strobesCompared to DDR-3More matureEasier to getBetter controller supportFirst of all, make sure everyone knows what DDR2 is. They better if they’re coming to this class. Specifically, we are talking about 2nd Generation Double-Data Rate Synchronous Dynamic Random Access Memory (DDR-2 SDRAM). Based on the data we’ve seen lately, DDR2 is clearly the memory of choice for most engineers designing new boards.
10DRAM Market and Technology Trend DRAM Shipments by Memory Technology TypeDDR2 is the prevalent architectureDDR is still widely used (low end applications)DDR3 is the upcoming technology80007000DDR36000DDR25000DDRTranscript:What about the trends in terms of the DRAM market? And here is a forecast from iSupply that basically shows that DDR2 will continue to be prevalent in the memory market for the next couple of years. DDR is still used, but mostly in low-end application and it is being replaced by DDR2 even in lower-end applications. DDR3, it has barely come into the market, there are some memory vendors sampling DDR3; however, I think this forecast is quite optimistic. Usually, unless Intel adopts a new technology in the PC architecture, this technology does not proliferate in the overall market, so we'll see when DDR3 is adopted and after that point you will see DDR3 become more prevalent in the other applications.Author’s Original Notes:This graph shows the breakdown in DRAM usage based on architecture.DDR, the first generation of double data rate SDRAMs is still widely used today, but DDR2 has taken over, becoming the prevalent DRAM architecture in terms of volume usage. The forecast suggests this trend will continue for the next few years.DDR is used mostly in low-cost, low-end applications, but we are seeing a growing trend toward making DDR2 the first choice in these applications. As the price of DDR2 memories continues to approach that of DDR devices, it has driven greater demand for DDR2 in all markets.Thus, our focus today will be on DDR2.There is an evolutionary DDR3 architecture on the horizon, but this forecast from iSupply still predicts DDR2 to continue to be the most prevalent architecture for the next few years. Actually, it seems that the DDR3 forecast may be a little optimistic by 9 months or more. However, We are evaluating the DDR3 architecture as DRAM vendors have started to sample these devices, so we’ll be talking about these DDR3 interfaces in more detail in a future webcast.Units (Millions)4000SDRAMRDRAM3000EDO2000FP / EDO100020022003200420052006200720082009Slide Courtesy XilinxForecast YearData Source: iSupplyNote: The DDR3 forecast seems very optimistic
11DDR SDRAM Component Comparison VoltageSpeed*DensityOn-Die Termination (ODT)CAS LatencyDDR2.5V / 1.25VMbps128 Mb – 1 GbNone2, 2.5, 3DDR21.8V / 0.9VMbps256 Mb – 2 GbData (Nominal)3, 4, 5DDR31.5V / 0.75V600 Mbps – 1.6 Gbps512 Mb – 4 GbData (Nominal & Dynamic), Address, Control on DIMMs5, 6, 7, 8, 9, 10All of these details were previously given, but it might help to see them all in one spot.One key point in relation to an FPGA interface is that these memory devices have a MINIMUM operating frequency. For DDR2, 125 MHz is the minimum frequency within specification for the device.*Raw speed of memory device, NOT necessarily the speed the FPGA controller can run
12Memory Organization Bank 0 Bank 1 Bank 2 Bank 3 DDR2 Organized as BanksRowsColumnsEach needs addressingColumnBank 0Bank 1Bank 2Bank 3DDR2RowDDR2 SDRAM memory is organized into banks, rows, and columns. In order to access a particular piece of information, you must know which bank, which row, and which column. The controller is responsible for understanding and addressing this architecture.
13Bank Management Latency to open a row Latency to close a row 4 or 8 banks per memory deviceAny 1 row per bank can be openOther devices can have different rows openYou must understand this bank/row/column architecture to know how to make use of it efficiently. Each time a row inside a bank is opened or closed, there is a time penalty. However, rows in multiple banks can be open at once. You must understand how your controller manages these banks.Each DDR2 contains 4 or 8 banks internally.Each bank can have one row open.Multiple parts can each have different rows open.Time is lost opening a row (pulling it down to working area)Time is lost closing a row (pushing it back to memory cell array)Latency to open a rowLatency to close a rowSlide courtesy Xilinx
14Bank InterleaveLeft side has bank/row conflicts – same row in bank -> conflict!Right side shows banks changing, but no conflictHigher throughput with bank interleaveConflicts (gaps for activate, precharge)No Conflicts (no gaps)Here’s an example of poor and then good bank management.Slide courtesy Xilinx
15Row/Column Addressing Interface on S3ADSPSK is 32Mx32 (128 MB or 1 Gbit)Two chips (each 32Mx16)Each chip consists of 4 banksEach bank has 8K rows and 1K columnsEach memory location stores 16 bits2 chips * 4 banks * 8K rows * 1K columns * 16 bits = 1GbitLinear addressing requires 25 address bitsOur interface has 15 total address bits13 ADDRESS (A) and 2 BANK ADDRESS (BA)BA[1:0] selects one of four banksA[12:0] with RAS selects one of 8K rows in the bankA[9:0] with CAS selects one of 1K columns in the rowNow that we understand that the DDR2 is accessed by banks, rows, and columns, let’s look at a specific example – the board we’ll be using in the lab today, the Spartan-3A DSP 1800A Starter Platform.This board has a 32-Meg by 32-bit DDR2 interface to the FPGA.Addressing 128MB linearly would required 25 address bits (2^25 = 128M). However, the DDR2 architecture allows us to reduce the total number of address pins required – in this case only 15. The DDR2 is able to accomplish this by designating between row and column addresses on the address pins.
16Control Signals Combination of RAS, CAS, and WE determine action RAS asserted = Open RowBank Address and Row Address latched inCAS and WE asserted = WriteColumn address latched inWrite enabledCAS asserted = ReadRAS and WE asserted = Close Row (PRECHARGE)Row deactivated‘1’ means asserted (which is active low)RASCASWEOpen Row1WriteReadClose RowThe main control signals that determine what the DDR2 does are RAS, CAS, and WE.RAS = Row Address StrobeCAS = Column Address StrobeWE = Write EnableBased on the combination of these signals asserted identifies one of several different memory actions, as shown in the table.RAS, CAS, WE are all asserted with low signal. ‘1’ means asserted.
17..... Read Example RAS asserted opens row Latches bank and row addressesBank 3Row 0x000CCAS asserted by itself identifies the operation as readLatches the column addressColumn 0x0000With row open, multiple reads can be performed by re-asserted CASColumn 0x0008We haven’t yet talked about the data interface, but let’s take a moment to summarize the address and control by looking at a couple examples. First, a read example. We focus in on what’s going on with RAS, CAS, and WE.
18Multiple Reads Five subsequent reads from the same row shown Burst length for this example is 8Each time CAS asserts, 8 words are read40 total words are read in this diagramClose row (Pre-charge) shown after readingRAS and WE simultaneously assertedBy zooming out further, we can see what happens with multiple transactions. In this case, 5 read transactions are issued, each transaction being a burst of 8.The transactions are concluded by closing the row.
19..... Write Example RAS asserted opens row Latches bank and row addressesBank 3Row 0x000CCAS and WE asserted together identifies the operation as writeLatches the column addressColumn 0x0000With row open, multiple writes can be performed by re-asserted CAS and WEColumn 0x0008Now for the Write example.
20Multiple Writes Five subsequent writes from the same row shown Burst length for this example is 8Each time CAS/WE assert, 8 words are written40 total words are written in this diagramClose row (Pre-charge) shown after readingRAS and WE simultaneously asserted
21Data Interface One strobe (DQS) per 8 bits of data (DQ) DQS is a local clock for each data byteCan be differentialOne mask (DM) per 8 bits of data (DQ)Selects which bytes are active during a write (byte enable)32-bit interface has 32 DQ, 4 DQS, and 4 DM bitsFPGA outputs DQS center aligned to the data for a writeFPGA receives DQS edge aligned from the memory on a readWe’ve previously discussed the address and control to the DDR2. Now we’ll cover the data interface.The data interface consists of data bits (DQ), data strobes (DQS), and data mask (DM).VERY CRITICAL -- Showing the alignment of DQS to DQ on reads/writes will make it easier to understand later when we discuss shifting DQS/DQ.DQSDQSDQDQDATA WRITEFPGA DDR2DATA WRITEDDR2 FPGA
22Clock Differential – CK and CK# Address and control signals are registered at every positive edge of CKDQ and DQS outputs from DDR2 aligned with clockDDR2 uses an internal Delay Locked Loop (DLL)DLL has both a minimum and maximum frequencyDDR2 specifications based on operating within this frequency range (125 MHz to 533 MHz)Another critical signal in the DDR2 interface is the clock to the memory device – CK and CK#CK and CK# are differential clock inputs to the DDR2. All address andcontrol input signals are sampled on the crossing of the positiveedge of CK and negative edge of CK#. Output data (DQs and DQS/DQS#) is referenced to the crossings of CK and CK#.Commands (address and control signals) are registered at every positive edge of CK.Input data is registered on both edges of DQS, and output data is referenced to bothedges of DQS as well as to both edges of CK.It is possible to disable the DLL and operate lower the 125 MHz for DDR2, but it is not recommended.
23On-Die Termination ODT = On-Die Termination Enables built in stub termination on DDR2’s data interfaceEliminates need for stub termination resistors on the DDR2 side for dataAdjustable: 50Ω, 75Ω, or 150ΩThe ODT is driven from the FPGA to the DDR2. It allows the FPGA to tell what, if any, on-die termination is required.
24FPGA Interface DDR2 SDRAM FPGA ADDRESS BANK ADDRESS RAS CAS WE DQ DQS DMCLKCLK_ENAll the connections between FPGA and DDR2 are shown here. We can see all the address, control, data, and other signals just discussed. We also see a loopback signal called RST_DQS_DIV which will be discussed later.Looking at this diagram, it may appear to be fairly simple. Some may ask, “Why do I need a controller?”CSODTRST_DQS_DIV
25Why Do I Need a Controller? Easier to interface to a controller than directly to the memoryManages multiple operationsInitializationSee the DDR2 datasheet excerptCalibrationShift outgoing DQS by 90 degreesShift incoming DQS by 90 degreesRefresh DRAMsSimplified interface4 potential commands instead of 15Initialize command to MIG controller spawns 13 commands to DDR2Some may wonder why we need a controller at all. Why not just connect the FPGA to the memory write our own interface?The reason is because the complexity of the memory would make this a major effort. A pre-designed controller makes it easier for us to use the memory, including the items shown in the 2nd bullet above.As an example, let’s look at the Initialization instructions from the DDR2 Datasheet (open the Micron DDR2 datasheet provided with the presentation or click the hyperlink). Read the Initialization.Some may want to know what the disadvantages of using a controller are. The controller may implement more features than you need, and thus take up more space than necessary in the FPGA. The controller may not implement the most efficient method of accessing the DDR2 data for your particular application.From the Micron DDR2 datasheet:“5. For a minimum of 200μs after stable power and clock (CK, CK#), apply NOP or DESELECTcommands, then take CKE HIGH.6. Wait a minimum of 400ns, then issue a PRECHARGE ALL command.7. Issue a LOAD MODE command to the EMR(2). (To issue an EMR(2) command, provide LOWto BA0, and provide HIGH to BA1.) Set register E7 to “0” or “1;” all others must be “0.”8. Issue a LOAD MODE command to the EMR(3). (To issue an EMR(3) command, provide HIGHto BA0 and BA1.) Set all registers to “0.”9. Issue a LOAD MODE command to the EMR to enable DLL. To issue a DLL ENABLE command,provide LOW to BA1 and A0; provide HIGH to BA0. Bits E7, E8, and E9 can be set to “0” or“1;” Micron recommends setting them to “0.”10. Issue a LOAD MODE command for DLL RESET. 200 cycles of clock input is required to lockthe DLL. (To issue a DLL RESET, provide HIGH to A8 and provide LOW to BA1 and BA0.) CKEmust be HIGH the entire time.11. Issue PRECHARGE ALL command.12. Issue two or more REFRESH commands.13. Issue a LOAD MODE command with LOW to A8 to initialize device operation (i.e., to programoperating parameters without resetting the DLL). To access the mode registers, BA1=1, BA0 = 0.14. Issue a LOAD MODE command to the EMR to enable OCD default by setting bits E7, E8, andE9 to “1,” and then setting all other desired parameters. To access the extended mode register,BA1 = 0, BA0 = 1.15. Issue a LOAD MODE command to the EMR to enable OCD exit by setting bits E7, E8, and E9to “0,” and then setting all other desired parameters. To access the extended mode registers,16. The DDR2 SDRAM is now initialized and ready for normal operation 200 clock cycles afterthe DLL RESET at Tf0. It is also suggested to include a single dummy WRITE command followedby tWR anytime after the REFRESH commands, but before the first true WRITE commandto the DRAM.Reduces the design effort
26Memory Interface Generator (MIG) Free utility to create a custom FPGA/memory interfaceBased on real, working, tested hardwareDocumented in Xilinx Application Notes (XAPP)Customized outputs includeRTL source for the memory controller in Verilog or VHDLSimulation testbench and supportUser Constraint File (UCF)Pinout specific for chosen FPGA device/packageLogic block locationsFPGA timing constraintsBatch files for processingRun ISE tools in command line modeConvert to ISE Project Navigator ProjectTiming analysisDocumentationXilinx offers a variety of memory controllers. A tool for generating custom memory controllers is called the Xilinx Memory Interface Generator, or MIG.The MIG controllers are based on Xilinx Application Note reference designs. The XAPP Reference Designs are specific instances of working controllers. MIG takes those designs and gives the user a front end to be able to parameterize several things and create a custom design.The MIG outputs are listed in the 3rd bullet.Also note that the default method for implementing this design is through the command-line. This is not required, though. MIG provides a script for converting the project to ProjNav. The demo we’ll show later is based on Project Navigator.It’s also worth noting here that as a free utility, MIG is a great fit for those looking for a basic, general-purpose controller. User’s looking for advanced features or maximum performance should plan on modifying the MIG controller themselves or consider buying a 3rd-party controller.
27MIG v2.0 Component Controllers DDRDDR2RLDRAM-IIQDR-II SRAMDDR-II SRAMVirtex-5200 MHz333 MHz300 MHzVirtex-4175 MHz250 MHzSpartan-3A/ 3AN/3ADSP166 MHzSpartan-3ESpartan-3This table shows which controllers MIG has available for each FPGA. Speeds shown are for the fastest clock rate in the fastest FPGA speed grade for the family. As an example, we can see the extensive Virtex-4 support – 175 MHz for DDR, 300 MHz for DDR2, 250 MHz for RLDRAM-II, QDR-II SRAM, and DDR-II SRAM.Virtex-5 is a more advanced FPGA fabric, and you can see the supported controllers are all faster.Spartan-3 is a less expensive and lower performance FPGA. We can see that all the Spartan FPGAs support DDR at 166 MHz. For Sp3A and Sp3, DDR2 support is also 166 MHz, showing that there is no performance advantage in Spartan by moving to DDR2.(Compared to MIG v1.7.3, MIG 2.0 does not improve the fastest controller speed in the fastest speed grade. However, speeds in slower speed grades have been improved significantly for Virtex-5.)DDR2 for Spartan-3ESSTL1.8 Class II not supported in siliconMay work, but not officially supportedDDR3 and RLDRAM-II for V5 covered in a reference design.Fastest clock rate in fastest FPGA speed gradeSee
28MIG v2.0 DIMM Controllers DDR DDR2 Virtex-5 Virtex-4 200 MHz333 MHzVirtex-4175 MHz267 MHzSpartan-3A/ 3AN/3ADSP166 MHzSpartan-3ESpartan-3And, here we see the MIG 2.0 DIMM controllers for DDR and DDR2, up to 333 MHz performance in Virtex-5.Fastest clock rate in fastest FPGA speed gradeSee
29Spartan-3/3A DDR2 Controller PerformanceUp to 166 MHz / 333 Mbps in -5 Speed grade device200 MHz specific implementation documented in XAPP458133 MHz/266 Mbps in -4 Speed grade deviceSpartan-3A only supports left and right sidesData WidthBased on total available pinsComponentUp to 72-bit in Spartan-3Up to 64-bit in Spartan-3A/3AN/3ADSPDIMM64- and 72-bit in Spartan-364-bit in Spartan-3A/3AN/3ADSPDQ to DQS Ratio is 8:1No built-in bank management for Spartan controllersVirtex-5 has 4-bank Least Recently Used optionLeft/right side support in Spartan-3A is due to those being the only sides that can handle SSTL 1.8V Class II. You could realistically use Class I and then use top/bottom sides, but that’s not supported officially.
30Embedded Processor Controllers Interface DDR2 to a MicroBlaze processorEmbedded Development Kit (EDK) 9.2Includes the Multi-Port Memory Controller v3 (MPMC3)MIG used for the physical layerAll MIG rules and constraints applySee Answer Record 29221Still set XIL_ROUTE_ENABLE_DATA_CAPTUREUse script to include MIG UCF in MicroBlaze system UCFVerify design built correctly (see Lab 3)Xilinx also offers memory controllers in the EDK embedded tool suite that allows an embedded processor like MicroBlaze to interface to DDR2 memory.
31Where do I get MIG? MIG is included with ISE Foundation/WebPACK Part of CORE GeneratorGraphical User Interface (GUI) provides access toCore libraryDatasheets3rd party contact informationAvailable Xilinx Solution RecordsMust Install ISE IP updateMIG v2.0 in ISE 9.2 IP Update 2Get IP Updates atFind more information atWebPACKWebPACK is free!WebPACK supports XC3SD1800A on S3ADSPSK
32MIG Documentation MIG User’s Guide (UG086) Xilinx Application Notes XAPP768c (Spartan DDR)XAPP454 (Spartan DDR2)XAPP458 (Spartan-3A Starter 200 MHz DDR2)XAPP858 (Virtex-5 DDR2)XAPP701 & XAPP702 (Virtex-4 DDR2 Direct Clocking)XAPP721 & XAPP723 (Virtex-4 DDR2 SERDES)Virtex-5 ML561 Memory Interfaces User’s Guide (UG199)Documents highlighted in Blue are the most pertinent ones for today’s Spartan-3A DSP labs.The philosophy I’m using for the labs and lecture today is to assume that the MIG controller works. We will not focus on how the controller works – we’re assuming it’s good. Our focus is understanding how that controller fits into the system and hot to get the basic controller up and running.The details on how the controller works is documented very well in the application notes. For those interested in going beyond what we cover today including custom modifications to the MIG controller, it is highly recommended that the user read this documentation thoroughly.
33MIG Design Flow With Project Navigator ...MIG Design Flow With Project NavigatorProject NavigatorIntegrate DesignCore GeneratorMIGDownload Design to Hardwarecreate_ise.batHow do I access and use MIG?Personally, I prefer Project Navigator, especially when working on new designs. Therefore, we’ll show you here a typical flow diagram when using Project Navigator to create and integrate a MIG design.First, launch Core Generator and create a new project. From CoreGen, launch MIG. Use MIG to generate the controller.<click>Use the create_ise.bat batch file to generate a Project Navigator compilation of the MIG project. Once created, launch Project Navigator and open the new project.Integrate the MIG controller with the rest of your design.Download the design to hardware and see it work.MIG Outputs
34A Practical Guide to DDR2 Design with Spartan-3A DSP Lab 1 – Generate a DDR2 Controller with MIG
35Download Design to Hardware Lab 1 OverviewRun COREGenRun MIGConfigure controllerGenerateConvert to Project NavigatorReview raw outputsHDLUCFBuild scriptsProject NavigatorIntegrate DesignCore GeneratorMIGDownload Design to HardwareMIG Outputs
36Lab 1 Review What are the benefits of using MIG? What is required to use Project Navigator with a MIG design?Other observations?Pinouts match our board?What else did you notice about the UCF?Properties match between ProjNav and command-line script?
37A Practical Guide to DDR2 Design with Spartan-3A DSP Real-world Design with a MIG DDR2 Controller
38Real-world Design with a MIG DDR2 Controller Memory, FPGAs, and Memory ControllersMemory trendsDDR2 signalingXilinx FPGA memory controllersMemory Interface Generator (MIG)Lab 1 – Generate a DDR2 controller coreReal-world Design with a MIG DDR2 ControllerInterface to the MIG controllerLogically simulateHardware debugLunch BreakJust hit the main section titles here. Cover the section sub-topics right before each section.
39MIG Output Block Diagram .MIG Output Block DiagramMemoryMIG OutputsUser LogicMemory ControllerClockClock ManagementCalibrationThis is a simplified representation of an FPGA/DDR2 interface, including what gets generated by MIG. Obviously, we have the FPGA, the Memory, and the Clock connected on our PCB. The MIG outputs are shown in purple. This includes the Controller itself, the Calibration block, a clock management block, and some custom logic.<click>MIG does in fact generate a user logic example that runs a simple test. MIG calls the user logic RTL “ Test_Bench” although MIG also generates a simulation testbench (sim_tb_top). This is the area we must initially understand so that we can connect our own logic to the controller. We will refer to this as the User or Back-end Interface, and it will be a major focus of this lecture and the next lab.Of course, the rest of the controller is also important. Being open source with exposed RTL, many users will want to customize and optimize the MIG-generated controller. However, for our purposes today, we are using the default controller.Note – although the clock is shown coming into the Clock Management block and going nowhere, it really goes to every single piece in the design, including the User Logic. In fact, multiple phases go to most modules in the design.FPGA
40User Logic Operating Modes InitializeUser instructs controller to set up the DDR2 for operationController programs DDR2 with operating parametersParameters established by user during MIG generationWriteUser instructs controller to write data to memoryController writes the data to the DDR2ReadUser instructs controller to read data from memoryController reads the data from the DDR2RefreshController tells user a refresh is neededUser pauses while controller handles refreshThe User Logic can interact with the controller in four different modes. Three of these modes are initiated by the User – Initialize, Write, and Read. The fourth mode is initiated by the Controller – Refresh.We’ll look in detail at each of these modes, includingWhich user interface signals are involvedWhich command is usedWhat sequence of events needs to open in the user logic to succeed
41Clock Domains in the User Logic 90-degree phase of DDR2 clockUsed for all data-related signalsGenerated by a DCMReferred to as CLK90180-degree phase of DDR2 clockUsed for all control-related signalsGenerated by negative edge of 0-phase clockReferred to as CLK180 or Falling Edge CLK0Why is this important?User logic controls interaction between domainsUser must manage multiple clocks and resetsBefore looking at these different modes, we need to understand a little bit about the clock domains involved in the user interface. All signals in the User Interface are governed by either the 90-degree phase clock or the 180-degree phase clock.It’s important to understand this so the example User Logic will make sense when you see the different clocks and resets at work.
42User Interface Signals Write DataWrite MaskAddressBurst DoneCommandCommand AcknowledgeControllerUser LogicRead DataData ValidThis diagram shows all of the connections between the controller and the User Logic. With this complete picture in mind, we’ll now take a look at what’s involved in each of the four modes of operation.Initialization CompleteAuto Refresh RequestAuto Refresh DoneClocks & ResetsClocks & Resets
43Initialization Complete InitializeCommandControllerUser LogicConnections involved in the Initialize Mode are shown.Initialization CompleteClocks & ResetsClocks & Resets
44How to Initialize Wait for RST_90 and RST_180 to deassert Set USER_CMD to b’010 on CLK180 for one clockWait for INIT_DONE to assertMinimum of 200 sThe steps to perform an Initialization from the User logic
45Write Controller User Logic Write Data Write Mask Address Burst Done CommandCommand AcknowledgeControllerUser LogicWrite connectionsClocks & ResetsClocks & Resets
46How to Write Set USER_CMD to b’100 and Address on CLK180 Wait for USER_CMD_ACKSet the DATA and MASK on CLK90A dataword is double the memory interfaceProvide BURST_LENGTH/2 datawords (BL=8 4 words)Set the next address and dataAssert BURST_DONE on CLK180 after the last addressDeassert USER_CMD after BURST_DONEImage shows 5 write transactions to a 32-bit DDR2 interface with Burst Length of 8. A total of 20 double-words are seen between the X and O markers. This is a total of bit words written to the DDR, starting at address 0x How this address translates to Bank, Row, and Column will be explained later.
47Read Controller User Logic Address Burst Done Command Command AcknowledgeControllerUser LogicRead DataData ValidRead connections. Address, Burst Done, Command, and Command Acknowledge are the same as in the Write case.Clocks & ResetsClocks & Resets
48How to Read Set USER_CMD to b’110 and Address on CLK180 Wait for USER_CMD_ACKSet the next addressAssert BURST_DONE on CLK180 after the last addressDeassert USER_CMD after BURST_DONEWatch for Data Valid to indicate when data is good (CLK90)The steps to perform a Read from the User logic
49Refresh Controller User Logic Auto Refresh Request Auto Refresh Done Refresh connectionsAuto Refresh RequestAuto Refresh DoneClocks & ResetsClocks & Resets
50How to RefreshAt all times, check for auto refresh request (AR_REQ) in the CLK180 domainIf AR_REQ, then do not start a new transactionWait for AR_DONE (CLK180)Go back to what you were doingThe Refresh happens automatically – the User Logic simply needs to be aware.
51User Logic Address Starting address for burst DDR2 auto-increments address for burstCombines addresses for bank, row, and column[ (row) : (column) : (bank address) ]32M x 32 exampleAddress bus is 26 bitsUser_A[25:13] is the Row AddressUser_A[12:2] is the Column AddressUser_A[1:0] is the Bank AddressWhy is Column Address 11 bits?1K columns per row only requires 10 bitsThis slide explains how the User Logic address maps to the controller.Why is bank address in the least-significant position?Having Bank Address as the ls-bits makes it convenient for interleaving. However, in Spartan, interleaving is not supported in MIG and must be added by the User.
52Column Address A10 Column address bit A10 is special PRECHARGE is the DDR2 “Close Row” commandDeactivates current rowReturns bank to the idle stateAuto-PRECHARGEDDR2 automatically closes the row after the current operationMIG does not support auto-precharge, but still reserves A10To set Auto-PRECHARGE, assert column address A10What if a column needs 11 or more address bits?Rare, but if so, A10 gets skippedWhat if a column needs 9 or fewer address bits?Extra address bits added up to A1032Mx32 has 11 column address bitsA[9:0] for the addressA reserved for user to create custom, auto-precharge logic
53User Interface Commands Description000NOP010Initialize Memory100Write Request110Read RequestOthersReservedThis is the complete list of commands that the User Logic can exercise.
54User Logic Interface -- 32Mx32 Example FunctionUser Guide NameDirectionWidthData (Write)cntrl0_user_input_dataTo controller64Maskcntrl0_user_data_mask8Addresscntrl0_user_input_address26Commandcntrl0_user_command_register3Burst Donecntrl0_burst_done1Command Acknowledgecntrl0_user_cmd_ackTo userData (Read)cntrl0_user_output_dataData Validcntrl0_user_data_validInitialization Completecntrl0_init_valAuto Refresh Requestcntrl0_auto_ref_reqAuto Refresh Donecntrl0_ar_doneThis reference slide shows the user logic interface to the MIG controller for a 32Mx32 controller. See the official signal names, the direction, and their width. Notice the two data busses and mask bus are all twice as big as the interface between the FPGA and DDR2.
55Spartan-3x Memory Interface Architecture .Spartan-3x Memory Interface ArchitectureLUT delay selectLUT delay Calibration MonitorInput_clockFPGA ClockDCMClocks all modules in fabricLutdelayUser_data_validFIFOUser_output_dataRead CaptureFIFOLutdelayUser_data_maskDQS, DQUser_input_dataDDR2SDRAMWrite DatapathUser_addressDMUser_commandHopefully this diagram now makes sense. This shows the MIG controller. This is what the User Interface must communicate with. All of the important pieces inside the controller are shown here.DCM– Generates the clk0 and clk90 phases of the clock required for the interface designThe other yellow blocks, in particular the Read Capture and LUT Delay Calibration Monitor circuits are explained in detail in XAPP768c<click>One signal we haven’t yet discussed is the reference clock for the system. On this diagram, it is shown as the Input_clock.User_burst_doneUser InterfaceAddress, Command, & ControlUser_cmd_ackControllerSpartan-3x FPGASlide courtesy Xilinx
56System Reference Clock MIG assumes this to be differentialSYS_CLK and SYS_CLKbMIG assumes it to be the controller frequencyUser must connect the real system clockSingle-ended clock is acceptableDifferential has less jitterCan a DCM synthesize the proper frequency?Yes, if you account for jitter in timing calculationsUser must edit several MIG RTL filesShown in Lab 2The Input_clock from the previous page is an input to the FPGA from some source on the PCB. This is not one of the customizations that MIG allows, so it makes some assumptions.
57MIG’s Two Output Designs user_designFor the user who wants to instantiate the MIG controllerTop-level exposes all DDR2 external signals and User Logic interfaceNo instantiation template providedexample_designAdds a User Logic exampleTop-level only exposes DDR2 external signalsAdds wrapper layers to connect controller, calibration, clock management, and User LogicMore practical starting pointAs we saw in Lab 1, MIG generates two different designs – user_design and example_design. What’s the difference between these two designs?The user_design has no user logic. All of the DDR2 and User Logic signals are brought out to the top-level. The user would then instantiate that top-level in their own design.The example_design includes example user logic. The top-level is a wrapper to connect things together. It’s a good starting point because it can be synthesized and built. A user can still implement their own user logic. Simply replace the example user logic with your own, preserving the top-level wrappers that pull everything together.
58example_design File Hierarchy Memory controllerUser LogicClock managementCalibrationTop-level, main, and infrastructure are essentially wrappers. Note that MIG calls the User Logic “test_bench.” We can build this design as is, or we can replace “test_bench” with our own code, preserving the wrappers at the top-level.
59Logical Simulation MIG generates logical simulation files VHDL or Verilog testbenchModelSIM “do” fileMicron memory modelAssuming the Micron license agreement is checkedVerilog onlyNewer versions available directly from Micron
60Simulator Support Verilog VHDL ISE Simulator Yes – Requires modifications to the HDL. No modifications required in 10.1No – Scheduled to work in ISE 10.1ModelSIM-XE VerilogYesNAModelSIM-XE VHDLNo – Need mixed language simulator due to Micron’s Verilog modelModelSIM-SE
61VHDL Options Use a mixed-language simulator ModelSIM SEtested and supported by XilinxGet 3rd party VHDL models for the memoryNot tested or supported by XilinxWait for ISE 10.1 to consider ISE Simulator
62Hardware Debug External logic analyzer External scope Consider adding Agilent Soft Touch connectorless probesInvaluable for performing full-speed measurementsExternal scopeProbe directly at the memory or FPGALeave break-out vias exposed for BGAs on prototypesCritical for measuring signal integrityEmbedded logic analyzerExtremely versatile and inexpensive optionChipScope Pro
63Debug Logic Anywhere Within the FPGA MemoryControllerClockDataXilinxILAClockTrigger OutTrigger 0Trigger 1Trigger 2Trigger 3AddressIdentify logic that you need to debug and verifyChipScope Pro cores are placed directly within the logic and …Function as “virtual test headers”Provide access any signal or node with the FPGADebug at the system clock rateSlide courtesy Xilinx
64ChipScope in Spartan ChipScope cores take up FPGA resources Consider using a larger FPGA in prototypesChipScope logic must meet timingLimited to around 200 MHz for SpartanChipScope must run faster than DDR2 to see double data rateDDR2 lower limit is 125 MHzNot practical to run ChipScope at 250 MHzConsider violating 125 MHz limitRun DDR2 at 50 MHzRun ChipScope at 100 MHz or 200 MHzSee this in Lab 2High-speed measurements need to be taken with a high-speed logic analyzer
66A Practical Guide to DDR2 Design with Spartan-3A DSP Lab 2 – Build and verify a DDR2 controller in hardware
67Download Design to Hardware Lab 2 OverviewModify the example designUCF to match the hardwareConnect correct system clockIntegrate new user logicAdd ChipScope logic analyzerBuild, download, and verify hardwareProject NavigatorIntegrate DesignCore GeneratorMIGDownload Design to HardwareMIG Outputs
68User Test Logic Initialize the memory Write to fill the memory with incrementing patternRead back the memoryHandle auto-refresh when necessary
69User Test Logic State Machine WritePower OnInitialize MemoryCompareReadAfter power on, the user logic issues a single Initialize Memory command to the controller, after which the controller takes care of all the gory details we previously read from the datasheet.After initialization, the testbench gets into a loop where it (1) writes data to the memory, (2) reads the data back from the memory, (3) compares the read data with the original data pattern.If at any time an error is detected in the comparison, an error signal is asserted (and an LED lights on our demo design).
70Xilinx Spartan-3A DSP 1800A Starter Platform ..EXPRS232Texas Instruments RegulatorsSpartan-3A DSP FPGAMight be good to point a few of the components-- RS232 and DB15 Video on the edgeMx16 DDR chips above FPGA (total of 32Mx32 or 128MB)-- EXP slot-- 8Mbit Parallel Platform Flash-- Power sourceNational 10/100/1000 PHYIntel FlashMicron 32Mx32 DDR
71www.em.avnet.com/exp Expansion slot for custom development Consists of two high-speed Samtec mezzanine connectorsTwo connectors = full expansion moduleOne connector = half expansion moduleEach connector has84 I/Os (mix of single-ended and differential)Access to one or more FPGA clock input pins2.5V and 3.3V source powerUse one from AvnetEXP PrototypeEXP High-speed AnalogEXP VideoEXP InterfaceDevelop your ownSpecification available from Avnet
72Lab 2 Review Why did we do simulation in Verilog? What’s the penalty for using a DCM to synthesize the system clock?Other observations?Did your design pass timing?How was the new UCF different from the original?How does adding ChipScope affect a design?
73A Practical Guide to DDR2 Design with Spartan-3A DSP PCB Considerations
74PCB ConsiderationsLab 2 – Build and verify a DDR2 controller in hardwarePCB ConsiderationsFPGA pinoutFactors impacting signal quality and crosstalkPCB simulation example for DDR2Trace requirementsPowerCustomizing and Verifying the MIG ResultsPinout rulesPin-swappingVerifying a new designLab 3 – Analyze and Fix Customized MIG ControllersJust hit the main section titles here. Cover the section sub-topics right before each section.
75FPGA Pinout A random pinout will not work Pinout relationship of DQS to DQ is criticalLUT delay line placement is criticalMIG follows all the pinout guidelinesTo modify the MIG-generated pinout, you must understand the pinout rulesThe pinout rules are covered later
76Simultaneous Switching Outputs (SSO) Limits the number of outputs on a bankIf violated, ground bounce can occurMIG doesn’t check this for youRead XAPP689For DDR2, we use SSTL 1.8V Class I & II IOSTANDARDsFor XC3SD1800A-FG676 Bank 3 (Left)Datasheet Table 26 shows 9 equivalent Vcco/GND pairsSSTL18_I allows 15 SSO per Vcco/GND pair (135 total)SSTL18_II allows 3 SSO per Vcco/GND pair (27 total)Spartan-3A DSP 1800A Starter PlatformUsing Class II for all 32 data bits is a problemUse Class I instead
77Calibration Loopback MIG calls this rst_dqs_div Not a DDR2 signal! One output: rst_dqs_div_outOne input: rst_dqs_div_inNot a DDR2 signal!Used for Spartan-3x controllers to calibrate timingWrite enable for readback from DDR2Must be placed on two I/Os in the center of the DQ busTrace length equal to sum of average DQS and clock trace lengthsBasically one roundtrip from FPGA to memory and backNOT the same as clock feedback required by pre-EDK 9.2 DDR2 controllerIf using EDK 9.1 or earlier, design a separate clock feedbackBoth MIG loopback and EDK 9.1 clock feedback are on Spartan-3A DSP 1800A Starter PlatformThe rst_dqs_div signal is driven to an IOB as an output and is then taken as an input throughthe input buffer. This technique normalizes the IOB and trace delays between the rst_dqs_divand DQS Clock signals. The rst_dqs_div signal from the input PAD of the FPGA uses identicalrouting resources as the DQS before it enters the LUT delay circuit. The trace delay of the loopshould be the sum of the trace delays of the clock forwarded to the memory and the DQS.
78Factors Impacting Signal Quality The 3 T’sTechnologyUse slowest possible driver switching speedsTopologySelect optimal topology for signal integrity, timing, and EMCShorten traces or stubs to their critical length or shorterTerminationSelect optimal termination for signal integrity, timing and EMCMatch end of line to Z0 using passive components
79Technology’s Influence on SI Smaller dies mean faster edge ratesFaster edge rates mean reflections, and signal quality problemsEven when the package hasn’t changed and your clock speed hasn’t changedA problem for legacy designs and redesignsOvershoot using SSTL18 Class II driver (red) = 675 mV peak-peakOvershoot using LVCMOS18 Fast driver (green) = 47 mV peak-peakPlot shows signal integrity using SSTL18 Class II driver compared to that using LVCMOS18 Fast for an unterminated microstrip trace.Trace TopologySetup: length = 7 inches
80Topology’s Influence on SI Topology is criticalAnalyze prior to layoutLonger traces more susceptible to reflectionsOvershoot for 2.5 inch trace (red) = 670 mV peak-peakOvershoot for 0.25 inch trace (green) = 114 mV peak-peakPlot shows signal integrity using SSTL18 Class II driver with an unterminated microstrip trace of lengths 0.25 inches and 2.5 inches.Trace Topology
81Termination’s Influence on SI Impedance discontinuities cause reflectionsChanges in trace widthBGA breakoutsStubsViasLoadsConnector transitionsBasic Termination GuidelinesSource termination is useful in point-to-pointFar-end termination is useful in multi-point connectionsDistributed termination is useful with variable configurationsImproper terminationsNo terminationLarge power plane discontinuitiesChanges in trace height above power planesChanging layersReflections occur at impedance discontinuities (I.e., Z2 Z1)Even with “unlike objects” (e.g., a trace and a connector), reflections will not occur as long as Z2 = Z1We call this “Impedance Matching”Another factor is the length over which the discontinuity occursA BGA breakout, for example, may have a longer length than a via, and therefore cause a bigger reflectionDecide the best method for your trace topology and design requirementsYou can terminate the source, the far end, both, or you can employ “distributed” terminations at several locations (in the case of multipoint connections)Some basic guidelines:Source termination is useful in point-to-point/one-directional connectionsFar-end termination is useful in multi-point connectionsDistributed termination can be helpful if you have a plug-in system with variable configurationSelect correct values for terminatorsPassive component values depend on physical board properties
82Termination Example Setup: length = 7 inches Overshoot for unterminated net (red) = 674 mV peak-peakOvershoot for series terminated net (green) = 90 mV peak-peakBelow we look at signal integrity using SSTL18 Class II driver with of a microstrip trace with and without termination.Trace Topology without TerminationSetup: length = 7 inchesTrace Topology with Termination
83Factors Impacting Crosstalk Crosstalk occurs when 2 or more neighboring traces couple togetherThe following affect crosstalk performance on a PCBStackup, Signal Integrity, Fast Edge Rates, and Trace SeparationClockAClockBClockA(Aggressor)CoupledRegionSending a signal down one trace causes a signal to appear on the 2nd traceClockB(Victim)Crosstalk is caused by capacitive and inductive coupling. Inductive coupling causes signal currents to couple voltages into neighboring nets. Capacitive coupling causes signal voltages to couple currents into neighboring nets.Crosstalk is only induced when voltage on the aggressor net is changing due to transition or ringing.Crosstalk is not important during the part of the cycle that is not within the setup/hold time of clocked nets.In this example, Clock A is steady (not active). ClockB is transitioning to logic high. Because of the close proximity of the two nets, and the electro-magnetic field generated from ClockB, ClockA becomes effected. The edge rate of ClockB will determine (along with their proximity to each other) how much noise is actually generated on ClockA.Net ClockA inducing crosstalk on ClockBNet Topologies
84Stackup’s Impact on Crosstalk Microstrips (surface layer) traces are more susceptible to crosstalkStriplines (internal layer) traces are less susceptible to crosstalkMultiple reference planesReduces trace to trace couplingConsider extra layers for reference planesA divided voltage plane is a poor referenceIn the next slide we’ll look at crosstalk for microstrip versus stripline for terminated nets
85Microstrip vs. Stripline Setup: length = 7 inches, spacing = 8 mils, TL1 and TL3 Aggressor, TL2 = VictimCrosstalk on TL2 for microstrip (green) = mV peak-peakCrosstalk on TL2 for stripline (blue) = 79 mV peak-peak
86Signal Integrity’s Impact on Crosstalk Reflections lead to higher signal swingsTermination improves SI and crosstalk.Unterminated netsThe following describes the setup aboveused are SSTL18 Class II buffers3 coupled microstrip traceslength = 7 inches, spacing = 8 milsMiddle trace (TL2) is victim netTerminated netsCrosstalk on TL2 when nets were unterminated (red) = 637 mV peak-peakCrosstalk on TL2 when nets were terminated (green) = mV peak-peak
87Edge Rates’ Impact on Crosstalk Fast edge rates lead to increased coupling between tracesCrosstalk is higherSlowest driver that meets timing requirements is recommendedDrive strengthLarge drive strength values and singled ended swing also increase the coupling between tracesLower drive strength to minimum that meets requirementsIn the next slide we’ll look at crosstalk for stripline for unterminated nets using SSTL18 Class I drivers and then using SSTL18 Class II drivers
88SSTL18_I vs SSTL18_IISetup: length = 7 inches, spacing = 8 mils, TL1 and TL3 Aggressor, TL2 = VictimCrosstalk on TL2 using SSTL Class II drivers (red) = 118 mV peak-peakCrosstalk on TL2 using SSTL Class I drivers (green) = 89 mV peak-peak
89DDR2 Termination DDR2 uses the SSTL standard Rules are simple Stub Series Termination LogicJEDECRules are simpleStub termination to 0.9Vtt on all receiving nodesResistor value equal to board impedanceSeries termination on all driving nodesSum of termination and output driver impedance should equal board impedanceWhat about bi-directional signals?Sometimes driving, sometimes receivingSeries and stub terminations required at both endsDifferential signals have 100-ohm termination at loadEXCEPTION: Any or all terminations can be eliminated if proven during board-level simulation
90DDR2 On Die Termination (ODT) New feature in DDR2Termination added inside DDR2DQ, DQS, DMMultiple termination values for different configurationsNone, 50 Ohm, 75 Ohm, 150 OhmNot included for address or controlThat comes with DDR3Eliminates the need for stub terminations at the DDR2 for the data linesThe ODT feature is designed to improve signal integrity of thememory channel by allowing the DDR2 SDRAM controller to independently turn on/offODT for any or all devices. RTT effective resistance values of 50Ω, 75Ω, and 150Ω areselectable and apply to each DQ, DQS/DQS#, RDQS/RDQS#, UDQS/UDQS#, LDQS/LDQS#, DM, and UDM/LDM signals.
92When Can Terminations Be Relaxed? When simulating proves that it can!See Micron TN4614Some examplesTrace length < 2”Only 1 or 2 memory devicesRelaxed timing allows weaker driversReduced DDR2 drive strengthUse SSTL 1.8V Class I FPGA I/O Standard
93Board-level Simulation A must for any serious, high-speed designInvestigate the followingDiffering layout topologiesTrace lengthsResistor valuesResistor placementExamplesDetermined we could eliminate all terminations on Spartan-3MB boardTrace lengths < 1”Determined some stub terminations on Spartan-3A DSP 1800A Starter Platform were not necessary
94Example Simulation Flow Simulate after each step until acceptable results are achievedCreate driver/receiver topology with no terminationCan the trace be shortened?Is a reduced drive strength possible?Turn on ODT at the DDR2Experiment with all three options (50, 75, and 150ohm)Add series termination at driverFor bi-directional signals, experiment moving series termination to other sideAdd series termination on both sidesAdd stub termination at receiver(s)
95DDR2 Signal Integrity Simulation Demo Pre-Layout Analysis for Data Topology
96DDR2 Crosstalk Simulation Demo Pre-Layout Analysis for Data Topology
97Recommended PCB Design Flow PRE-LAYOUTLAYOUTPROTOTYPINGSystem Design, Part Selection, and Schematic EntryFull BoardPlace-and-routePrototypeFunctionalTesting&DebuggingEMITesting&DebuggingThe two simulations just demonstrated were performed using Linesim with the optional crosstalk package (approximately $8K).LinesimBoardsimMentor HyperLynx
98For More InformationTalk to a Mentor Graphics Representative in your areaGet a HyperLynx evaluationGood tutorialsEasy to useNo license required for evalFAEs – Feel free to look up and add name of local reseller and even invite them to the session.
99Other SI Resources Terry Fox & Associates (Issaquah, WA, USA) Signal integrity trainingProject consultingDisaster recoveryCircuitCraft (Calgary, Canada)Schematic and layout design reviewPost-layout board simulation (using HyperLynx Boardsim)Full electrical and physical designOr check with your Mentor Graphics Rep for a local resource$3-5K for basic simulation helpFAEs – these resources are both located in North America, but I believe they will work on international projects as well. If you have a local consultant that you would prefer to highlight, feel free to modify this slide for your area.
100Simulation ModelsBoard-level simulation tools typically require IBIS modelsXilinx provides IBIS models for freeAvailable from the Download Center (www.xilinx.com/download)Make sure you are using the correct I/O standardOr, generate an IBIS model directly from ISEMicron provides IBIS models for freeSee product web page
101Trace Length Matching Requirements Members of a differential pair matched to +/-10milDQ, DQS, DM and CK matched to +/- 45milAddress/Control matched to +/- 100mil of CKRST_DQS_DIV and MB_FB_CLK matched to +/- 45mil of sum of average DQS and average CKThese are tight requirements, but if you make them, your board will likely work.Trace LengthsThese rules indicate the maximum electrical delays between DDR/DDR2 SDRAM signalsat 333 MHz:1. ± 25 ps maximum electrical delay between any DQ and its associated DQS/DQS#2. ± 50 ps maximum electrical delay between any address and control signals and thecorresponding CK/CK#3. ± 100 ps maximum electrical delay between any DQS/DQS# and CK/CK#.
102Power Three independent supplies required DDR2 and FPGA I/O supply is 1.8VSource/sink 0.9V termination supplyResistor divider is possibleRegulator is recommended0.9V reference supply
103Texas Instruments TPS51116Used on Spartan-3A DSP 1800A Starter PlatformProvides 1.8V up to 10AProvides 0.9V Vtt up to 3AProvides 0.9V Vref1.8V0.9VttDDR2FPGATPS51116DDR20.9Vref
105Decoupling Capacitors For the FPGA, follow Xilinx XAPP623Example for the Spartan-3A DSP 1800A Starter Platform (XC3SD1800A-FG676) shown in tableFor the DDR2, see Micron TN4602FPGA Decoupling1.8V1.2V0.9VTotal # Pwr/Gnd Pairs923Tantalum Capacitor 470uF14.7uF (0603)241.0uF (0402)375.01uF (0201)13
106Stackup Power planes on S3ADSPSK are multi-rail Not good for a return pathExtra ground planes used
107A Practical Guide to DDR2 Design with Spartan-3A DSP Customizing and Verifying the MIG Results
108Customizing and Verifying the MIG Results Lab 2 – Build and verify a DDR2 controller in hardwarePCB ConsiderationsFPGA pinoutFactors impacting signal quality and crosstalkPCB simulation example for DDR2Trace requirementsPowerCustomizing and Verifying the MIG ResultsPinout rulesPin-swappingVerifying a new designLab 3 – Analyze and Fix Customized MIG ControllersJust hit the main section titles here. Cover the section sub-topics right before each section.
109Customizing the MIG Pinout What if the MIG output doesn’t match an existing board?What if I’m designing a board and don’t like the pinout MIG gives me?What about pin-swapping during layout?Why does it matter?Calibrating strobes with data bits
110If You Want to Change the I/Os Know the pinout rulesUse FPGA Editor to find suitable alternativesModify the UCF accordinglyVerify the implemented result
111Spartan-3x Pinout Rules The IOBs for DQ bits must be placed five tiles above or six tiles below the IOB tile for the associated DQS bitSee XAPP768cSee AR24935Unbonded IOBs count (can’t simply use datasheet pinout)Loopback must be in the middle of the DQ busKeep even/odd DQs oriented in the same top/bottom tile patternOne CLB column is dedicated for the odd numbered bits and one is dedicated for the even numbered bitsCK/CK_N, address, RAS_N, CAS_N, WE_N, CS_N, and ODT must be placed together in bank that are on the same side of the device
112What’s a Tile? IOBs are grouped together Each grouping is called a tileExample shows a 2 IOB tile“Five tiles above” means 10 total IOBs above in this caseIOB Tile
113Common Pin-swapping A DQ byte can be swapped with another byte Strobe and data swapped togetherDQ bits can swap within a byteSwap even-numbered bits with other even-numbered bitsSwap odd-numbered bits with other odd-numbered bitsControl, address, data mask, and clock can be swapped at willSpreading too far apart may cause timing issues thoughAnything besides these requires checking against the pinout rules
114Using FPGA Editor Tool to view internal device layout Shows things that are hidden in other toolsUnbonded I/OsDetailed routingWhat will we use it for?Creating or customizing a pinout that follows the MIG rulesVerifying a design was implemented properlyWhere is it?Start Programs Xilinx ISE Accessories FPGA EditorWithin Project Navigator, “View/Edit Placed Routed Design”
115Adjusting the UCF Use FPGA Editor to find acceptable new pin locations Pin location changes must be reflected in the UCFThese pin location changes will affect SLICE location constraints as well
116Verify the Result Pass UCF timing constraints Examine data routing MAXDELAYFROM/TOPERIODExamine data routingCompare routes in FPGA Editor with AR25245Analyze delays in FPGA EditorExamine clock routingInspect Clock Section in PAR report
117Verify Data (DQ) Routing Keep even/odd DQ bits in the same CLB ColumnsKeep even/odd DQs oriented insame top/bottom I/O tile patternEVEN DQCLBCOLUMNODD DQCLBCOLUMNEven on top of I/O Tile Pair. Odd on bottomDQ0DQ1Once this pattern is established, it must be repeated for all DQ lines(as seen in FPGA Editor)
118Data Skew and Delay Use FPGA Editor Data net skew < 75 ps Instructions outlined in AR25245Data net skew < 75 psTotal delay range = psIn this exampleDelays range from 411 to 464 psSkew = 53 psData net skew is less than 75 psTotal delay range is within ps range
119Clock Routing Clock routing must follow a specific pattern Details are shown in AR25245Example of proper routing shown
120Clock Report Inspect Clock Report section in PAR report file Net Skew < 65 ps (this example = 64 ps)Max Delay ~ 400 ps (this example’s range is 465 to 491 ps)**************************Generating Clock Report.|main_00/top0/data_pa | | | | | ||th0/dqs0_delayed_col | | | | | || | Local| | 11 | | ||th0/dqs1_delayed_col | | | | | || | Local| | 11 | | |
121A Practical Guide to DDR2 Design with Spartan-3A DSP Lab 3 – Analyze and Fix Customized MIG Controllers
122Verify a “known-good” design (Lab 2) Lab 3 OverviewVerify a “known-good” design (Lab 2)Verify with FPGA Editor and examining PAR reportPractice looking at something correctFix a “broken” designOpen “broken” design in FEDFigure out what’s wrongFix it in your UCFRe-implementRe-verify
123Lab 3 Review What happens when you violate the +5/-6 tile rule? How did you determine the correct SLICE location after changing DQ/DQS?What are the pins to avoid when selecting new sites in FPGA Editor?How were those unsuitable pins highlighted?How were the new pin locations verified?
124A Practical Guide to DDR2 Design with Spartan-3A DSP Conclusion
125To Proceed with a MIG Design Get the Xilinx Spartan-3A DSP 1800A Starter PlatformGet ISE 9.2, IP Update #2, and ChipScope ProEvaluate options for logical simulationPrepare for board-level simulationGet IBIS and HDL modelsRead the documentationXilinxPreviously listedMicronDDR2 datasheetTN4602, TN4605, TN4606, TN4614, TN4720
126SpeedWay Kit Specials Spartan-3A DSP Starter Kit $285 (save $109) Xilinx Spartan-3A DSP Starter Kit and SpeedWay attendanceAES-SPEEDWAY-S3ADSP-SKSpartan-3A Starter Kit $200 (save $124)Xilinx Spartan-3A Starter Kit and SpeedWay attendanceAES-SPEEDWAY-S3A-SKVirtex-5 LX50T PCIe Starter Kit $995 (save $499)Avnet Virtex-5 LX50T PCIe board and SpeedWay attendanceAES-SPEEDWAY-LX50T-SKEDK Software Bundle* $200 (save $295)12-month EDK software licenseAES-SPEEDWAY-EDKISE Foundation Bundle* $995 (save $1500)12-month ISE software licenseAES-SPEEDWAY-ISE* Must purchase a “-SK” kit Other specials available – see the kit specials handout
127Course Objectives Review You now have…Built a functioning DDR2 controller in hardwareGenerate the DDR2 controller IPIncorporate into ISE Project NavigatorConnect the design to custom logicDownload and operate on the Spartan-3A DSP 1800A Starter PlatformLearned what’s required to design your own boardConnect the FPGA to DDR2 componentsPower and decouplingSignal integrity and crosstalkCreate a custom DDR2 pinout for the FPGA
128A Practical Guide to DDR2 Design with Spartan-3A DSP Thank you!