Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Practical Guide to DDR2 Design with Spartan-3A DSP

Similar presentations

Presentation on theme: "A Practical Guide to DDR2 Design with Spartan-3A DSP"— Presentation transcript:

1 A Practical Guide to DDR2 Design with Spartan-3A DSP
Featuring ISE 9.2 and the Xilinx Spartan-3A DSP 1800A Starter Platform This is a 6.5 hour course. Typically start at 8:30 am and end by 4:30 pm, with a 1 hour lunch, and 30 minutes for welcome and introductions. FAEs – please direct your questions to Bryan Fletcher Audience Hardware designers interested in memory interfaces. Emphasis on stand-alone, MIG-based DDR2 designs. Since EDK is moving to MIG-based memory controllers, it also applies to processor people, although we won’t specifically cover MPMC Lab hardware Spartan-3A DSP 1800A Starter Platform with XC3SD1800A and 32Mx32 Micron DDR2 Software ISE 9.2 with SP3 and IP Update #2, which includes MIG 2.0 with support for Spartan-3A DSP Daily Schedule: 8:30 Welcome, Eating snacks, getting settled (15 minutes) 8:45 Intro to the course and instructor (15 minutes) 9:00 Lecture 1 (60 minutes) 10:00 Lab 1 (60 minutes) 11:00 Lecture 2 (60 minutes) 12:00 Lunch (60 minutes) 1:00 Lab 2 (90 minutes) 2:30 Lecture 3 (60 minutes) 3:30 Lecture 4 (30 minutes) 4:00 Lab 3 (30 minutes) 4:30 Finish

2 Course Objectives By the end of the day, you will
Build a functioning DDR2 controller in hardware Know what’s required to design your own board Show how to get a DDR2 controller running on existing hardware, then explain how to duplicate the hardware. Avnet designed and built the Spartan-3A DSP 1800A Starter Platform for Xilinx. We were successful in designing a 32Mx32 DDR2 interface that worked on our 1st prototype. On the hardware design side, we’d like to share with everyone what we did to have that success. As a note to FAEs, the following are NOT covered in much detail by intention: Coverage of SDR, DDR-1, DDR-3, QDR or any other xDR However, many of the principles apply We had to pick a specific case to cover Detailed discussion of the FPGA architecture Virtex-4 or Virtex-5 Again, the principles apply, but we won’t specifically discuss Virtex Detailed discussion on the inner workings of the controller Memory controllers in embedded processor design Designing for DIMMs

3 Morning Agenda Memory, FPGAs, and Memory Controllers
Memory trends DDR2 signaling Xilinx FPGA memory controllers Memory Interface Generator (MIG) Lab 1 – Generate a DDR2 controller core Real-world Design with a MIG DDR2 Controller Interface to the MIG controller Logically simulate Hardware debug Lunch Break Just hit the main section titles here. Cover the section sub-topics right before each section.

4 Afternoon Agenda Lab 2 – Build and verify a DDR2 controller in hardware PCB Considerations FPGA pinout Factors impacting signal quality and crosstalk PCB simulation example for DDR2 Trace requirements Power Customizing and Verifying the MIG Results Pinout rules Pin-swapping Verifying a new design Lab 3 – Analyze and Fix Customized MIG Controllers Just hit the main section titles here. Cover the section sub-topics right before each section.

5 A Practical Guide to DDR2 Design with Spartan-3A DSP
Memory, FPGAs, and Memory Controllers

6 Memory, FPGAs, and Memory Controllers
Memory trends DDR2 signaling Xilinx FPGA memory controllers Memory Interface Generator (MIG) Lab 1 – Generate a DDR2 controller core Real-world Design with a MIG DDR2 Controller Interface to the MIG controller Logically simulate Hardware debug Lunch Break Just hit the main section titles here. Cover the section sub-topics right before each section.

7 The FPGA/Memory Interface
Memory interface success in an FPGA is dependent on many things FPGA fabric Controller Memory Clock PCB Layout Power We’ll cover all these topics today Memory Controller FPGA Termination Power Clock The diagram shows a simplified memory interface. We can see the major pieces – the memory, the FPGA, and the controller built inside the FPGA. Other critical pieces are also shown, including the termination, PCB traces, power circuitry, and the PCB itself. All of these things play a part in the total memory interface solution. All of them must be accounted for during the design phase for the interface to work. PCB

8 DDR2 Interface Covered Today
FPGA Spartan-3A DSP XC3SD1800A Memory Micron DDR2 MT47H32M16 Controller Xilinx Memory Interface Generator (MIG) PCB/Power/ Terminations Avnet-designed Spartan-3A DSP 1800A Starter Platform The complete FPGA/memory interface is covered today. We’re using a Xilinx DDR2 controller inside a Xilinx Spartan-3A DSP FPGA to interface with Micron DDR2 chips. All of this is encompassed on the Avnet-designed, Xilinx Spartan-3A DSP 1800A Starter Platform, which we’ll use to examine the various PCB factors of the interface.

9 Why DDR2? Compared to DDR-1 Compared to DDR-3 Less expensive
More readily available Lower power Larger varieties On-die termination (ODT) We’ll show details on this later Differential strobes Compared to DDR-3 More mature Easier to get Better controller support First of all, make sure everyone knows what DDR2 is. They better if they’re coming to this class. Specifically, we are talking about 2nd Generation Double-Data Rate Synchronous Dynamic Random Access Memory (DDR-2 SDRAM). Based on the data we’ve seen lately, DDR2 is clearly the memory of choice for most engineers designing new boards.

10 DRAM Market and Technology Trend
DRAM Shipments by Memory Technology Type DDR2 is the prevalent architecture DDR is still widely used (low end applications) DDR3 is the upcoming technology 8000 7000 DDR3 6000 DDR2 5000 DDR Transcript: What about the trends in terms of the DRAM market? And here is a forecast from iSupply that basically shows that DDR2 will continue to be prevalent in the memory market for the next couple of years. DDR is still used, but mostly in low-end application and it is being replaced by DDR2 even in lower-end applications. DDR3, it has barely come into the market, there are some memory vendors sampling DDR3; however, I think this forecast is quite optimistic. Usually, unless Intel adopts a new technology in the PC architecture, this technology does not proliferate in the overall market, so we'll see when DDR3 is adopted and after that point you will see DDR3 become more prevalent in the other applications. Author’s Original Notes: This graph shows the breakdown in DRAM usage based on architecture. DDR, the first generation of double data rate SDRAMs is still widely used today, but DDR2 has taken over, becoming the prevalent DRAM architecture in terms of volume usage. The forecast suggests this trend will continue for the next few years. DDR is used mostly in low-cost, low-end applications, but we are seeing a growing trend toward making DDR2 the first choice in these applications. As the price of DDR2 memories continues to approach that of DDR devices, it has driven greater demand for DDR2 in all markets. Thus, our focus today will be on DDR2. There is an evolutionary DDR3 architecture on the horizon, but this forecast from iSupply still predicts DDR2 to continue to be the most prevalent architecture for the next few years. Actually, it seems that the DDR3 forecast may be a little optimistic by 9 months or more. However, We are evaluating the DDR3 architecture as DRAM vendors have started to sample these devices, so we’ll be talking about these DDR3 interfaces in more detail in a future webcast. Units (Millions) 4000 SDRAM RDRAM 3000 EDO 2000 FP / EDO 1000 2002 2003 2004 2005 2006 2007 2008 2009 Slide Courtesy Xilinx Forecast Year Data Source: iSupply Note: The DDR3 forecast seems very optimistic

11 DDR SDRAM Component Comparison
Voltage Speed* Density On-Die Termination (ODT) CAS Latency DDR 2.5V / 1.25V Mbps 128 Mb – 1 Gb None 2, 2.5, 3 DDR2 1.8V / 0.9V Mbps 256 Mb – 2 Gb Data (Nominal) 3, 4, 5 DDR3 1.5V / 0.75V 600 Mbps – 1.6 Gbps 512 Mb – 4 Gb Data (Nominal & Dynamic), Address, Control on DIMMs 5, 6, 7, 8, 9, 10 All of these details were previously given, but it might help to see them all in one spot. One key point in relation to an FPGA interface is that these memory devices have a MINIMUM operating frequency. For DDR2, 125 MHz is the minimum frequency within specification for the device. *Raw speed of memory device, NOT necessarily the speed the FPGA controller can run

12 Memory Organization Bank 0 Bank 1 Bank 2 Bank 3 DDR2 Organized as
Banks Rows Columns Each needs addressing Column Bank 0 Bank 1 Bank 2 Bank 3 DDR2 Row DDR2 SDRAM memory is organized into banks, rows, and columns. In order to access a particular piece of information, you must know which bank, which row, and which column. The controller is responsible for understanding and addressing this architecture.

13 Bank Management Latency to open a row Latency to close a row
4 or 8 banks per memory device Any 1 row per bank can be open Other devices can have different rows open You must understand this bank/row/column architecture to know how to make use of it efficiently. Each time a row inside a bank is opened or closed, there is a time penalty. However, rows in multiple banks can be open at once. You must understand how your controller manages these banks. Each DDR2 contains 4 or 8 banks internally. Each bank can have one row open. Multiple parts can each have different rows open. Time is lost opening a row (pulling it down to working area) Time is lost closing a row (pushing it back to memory cell array) Latency to open a row Latency to close a row Slide courtesy Xilinx

14 Bank Interleave Left side has bank/row conflicts – same row in bank -> conflict! Right side shows banks changing, but no conflict Higher throughput with bank interleave Conflicts (gaps for activate, precharge) No Conflicts (no gaps) Here’s an example of poor and then good bank management. Slide courtesy Xilinx

15 Row/Column Addressing
Interface on S3ADSPSK is 32Mx32 (128 MB or 1 Gbit) Two chips (each 32Mx16) Each chip consists of 4 banks Each bank has 8K rows and 1K columns Each memory location stores 16 bits 2 chips * 4 banks * 8K rows * 1K columns * 16 bits = 1Gbit Linear addressing requires 25 address bits Our interface has 15 total address bits 13 ADDRESS (A) and 2 BANK ADDRESS (BA) BA[1:0] selects one of four banks A[12:0] with RAS selects one of 8K rows in the bank A[9:0] with CAS selects one of 1K columns in the row Now that we understand that the DDR2 is accessed by banks, rows, and columns, let’s look at a specific example – the board we’ll be using in the lab today, the Spartan-3A DSP 1800A Starter Platform. This board has a 32-Meg by 32-bit DDR2 interface to the FPGA. Addressing 128MB linearly would required 25 address bits (2^25 = 128M). However, the DDR2 architecture allows us to reduce the total number of address pins required – in this case only 15. The DDR2 is able to accomplish this by designating between row and column addresses on the address pins.

16 Control Signals Combination of RAS, CAS, and WE determine action
RAS asserted = Open Row Bank Address and Row Address latched in CAS and WE asserted = Write Column address latched in Write enabled CAS asserted = Read RAS and WE asserted = Close Row (PRECHARGE) Row deactivated ‘1’ means asserted (which is active low) RAS CAS WE Open Row 1 Write Read Close Row The main control signals that determine what the DDR2 does are RAS, CAS, and WE. RAS = Row Address Strobe CAS = Column Address Strobe WE = Write Enable Based on the combination of these signals asserted identifies one of several different memory actions, as shown in the table. RAS, CAS, WE are all asserted with low signal. ‘1’ means asserted.

17 ..... Read Example RAS asserted opens row
Latches bank and row addresses Bank 3 Row 0x000C CAS asserted by itself identifies the operation as read Latches the column address Column 0x0000 With row open, multiple reads can be performed by re-asserted CAS Column 0x0008 We haven’t yet talked about the data interface, but let’s take a moment to summarize the address and control by looking at a couple examples. First, a read example. We focus in on what’s going on with RAS, CAS, and WE.

18 Multiple Reads Five subsequent reads from the same row shown
Burst length for this example is 8 Each time CAS asserts, 8 words are read 40 total words are read in this diagram Close row (Pre-charge) shown after reading RAS and WE simultaneously asserted By zooming out further, we can see what happens with multiple transactions. In this case, 5 read transactions are issued, each transaction being a burst of 8. The transactions are concluded by closing the row.

19 ..... Write Example RAS asserted opens row
Latches bank and row addresses Bank 3 Row 0x000C CAS and WE asserted together identifies the operation as write Latches the column address Column 0x0000 With row open, multiple writes can be performed by re-asserted CAS and WE Column 0x0008 Now for the Write example.

20 Multiple Writes Five subsequent writes from the same row shown
Burst length for this example is 8 Each time CAS/WE assert, 8 words are written 40 total words are written in this diagram Close row (Pre-charge) shown after reading RAS and WE simultaneously asserted

21 Data Interface One strobe (DQS) per 8 bits of data (DQ)
DQS is a local clock for each data byte Can be differential One mask (DM) per 8 bits of data (DQ) Selects which bytes are active during a write (byte enable) 32-bit interface has 32 DQ, 4 DQS, and 4 DM bits FPGA outputs DQS center aligned to the data for a write FPGA receives DQS edge aligned from the memory on a read We’ve previously discussed the address and control to the DDR2. Now we’ll cover the data interface. The data interface consists of data bits (DQ), data strobes (DQS), and data mask (DM). VERY CRITICAL -- Showing the alignment of DQS to DQ on reads/writes will make it easier to understand later when we discuss shifting DQS/DQ. DQS DQS DQ DQ DATA WRITE FPGA  DDR2 DATA WRITE DDR2  FPGA

22 Clock Differential – CK and CK#
Address and control signals are registered at every positive edge of CK DQ and DQS outputs from DDR2 aligned with clock DDR2 uses an internal Delay Locked Loop (DLL) DLL has both a minimum and maximum frequency DDR2 specifications based on operating within this frequency range (125 MHz to 533 MHz) Another critical signal in the DDR2 interface is the clock to the memory device – CK and CK# CK and CK# are differential clock inputs to the DDR2. All address and control input signals are sampled on the crossing of the positive edge of CK and negative edge of CK#. Output data (DQs and DQS/ DQS#) is referenced to the crossings of CK and CK#. Commands (address and control signals) are registered at every positive edge of CK. Input data is registered on both edges of DQS, and output data is referenced to both edges of DQS as well as to both edges of CK. It is possible to disable the DLL and operate lower the 125 MHz for DDR2, but it is not recommended.

23 On-Die Termination ODT = On-Die Termination
Enables built in stub termination on DDR2’s data interface Eliminates need for stub termination resistors on the DDR2 side for data Adjustable: 50Ω, 75Ω, or 150Ω The ODT is driven from the FPGA to the DDR2. It allows the FPGA to tell what, if any, on-die termination is required.

DM CLK CLK_EN All the connections between FPGA and DDR2 are shown here. We can see all the address, control, data, and other signals just discussed. We also see a loopback signal called RST_DQS_DIV which will be discussed later. Looking at this diagram, it may appear to be fairly simple. Some may ask, “Why do I need a controller?” CS ODT RST_DQS_DIV

25 Why Do I Need a Controller?
Easier to interface to a controller than directly to the memory Manages multiple operations Initialization See the DDR2 datasheet excerpt Calibration Shift outgoing DQS by 90 degrees Shift incoming DQS by 90 degrees Refresh DRAMs Simplified interface 4 potential commands instead of 15 Initialize command to MIG controller spawns 13 commands to DDR2 Some may wonder why we need a controller at all. Why not just connect the FPGA to the memory write our own interface? The reason is because the complexity of the memory would make this a major effort. A pre-designed controller makes it easier for us to use the memory, including the items shown in the 2nd bullet above. As an example, let’s look at the Initialization instructions from the DDR2 Datasheet (open the Micron DDR2 datasheet provided with the presentation or click the hyperlink). Read the Initialization. Some may want to know what the disadvantages of using a controller are. The controller may implement more features than you need, and thus take up more space than necessary in the FPGA. The controller may not implement the most efficient method of accessing the DDR2 data for your particular application. From the Micron DDR2 datasheet: “5. For a minimum of 200μs after stable power and clock (CK, CK#), apply NOP or DESELECT commands, then take CKE HIGH. 6. Wait a minimum of 400ns, then issue a PRECHARGE ALL command. 7. Issue a LOAD MODE command to the EMR(2). (To issue an EMR(2) command, provide LOW to BA0, and provide HIGH to BA1.) Set register E7 to “0” or “1;” all others must be “0.” 8. Issue a LOAD MODE command to the EMR(3). (To issue an EMR(3) command, provide HIGH to BA0 and BA1.) Set all registers to “0.” 9. Issue a LOAD MODE command to the EMR to enable DLL. To issue a DLL ENABLE command, provide LOW to BA1 and A0; provide HIGH to BA0. Bits E7, E8, and E9 can be set to “0” or “1;” Micron recommends setting them to “0.” 10. Issue a LOAD MODE command for DLL RESET. 200 cycles of clock input is required to lock the DLL. (To issue a DLL RESET, provide HIGH to A8 and provide LOW to BA1 and BA0.) CKE must be HIGH the entire time. 11. Issue PRECHARGE ALL command. 12. Issue two or more REFRESH commands. 13. Issue a LOAD MODE command with LOW to A8 to initialize device operation (i.e., to program operating parameters without resetting the DLL). To access the mode registers, BA1 =1, BA0 = 0. 14. Issue a LOAD MODE command to the EMR to enable OCD default by setting bits E7, E8, and E9 to “1,” and then setting all other desired parameters. To access the extended mode register, BA1 = 0, BA0 = 1. 15. Issue a LOAD MODE command to the EMR to enable OCD exit by setting bits E7, E8, and E9 to “0,” and then setting all other desired parameters. To access the extended mode registers, 16. The DDR2 SDRAM is now initialized and ready for normal operation 200 clock cycles after the DLL RESET at Tf0. It is also suggested to include a single dummy WRITE command followed by tWR anytime after the REFRESH commands, but before the first true WRITE command to the DRAM. Reduces the design effort

26 Memory Interface Generator (MIG)
Free utility to create a custom FPGA/memory interface Based on real, working, tested hardware Documented in Xilinx Application Notes (XAPP) Customized outputs include RTL source for the memory controller in Verilog or VHDL Simulation testbench and support User Constraint File (UCF) Pinout specific for chosen FPGA device/package Logic block locations FPGA timing constraints Batch files for processing Run ISE tools in command line mode Convert to ISE Project Navigator Project Timing analysis Documentation Xilinx offers a variety of memory controllers. A tool for generating custom memory controllers is called the Xilinx Memory Interface Generator, or MIG. The MIG controllers are based on Xilinx Application Note reference designs. The XAPP Reference Designs are specific instances of working controllers. MIG takes those designs and gives the user a front end to be able to parameterize several things and create a custom design. The MIG outputs are listed in the 3rd bullet. Also note that the default method for implementing this design is through the command-line. This is not required, though. MIG provides a script for converting the project to ProjNav. The demo we’ll show later is based on Project Navigator. It’s also worth noting here that as a free utility, MIG is a great fit for those looking for a basic, general-purpose controller. User’s looking for advanced features or maximum performance should plan on modifying the MIG controller themselves or consider buying a 3rd-party controller.

27 MIG v2.0 Component Controllers
DDR DDR2 RLDRAM-II QDR-II SRAM DDR-II SRAM Virtex-5 200 MHz 333 MHz 300 MHz Virtex-4 175 MHz 250 MHz Spartan-3A/ 3AN/3ADSP 166 MHz Spartan-3E Spartan-3 This table shows which controllers MIG has available for each FPGA. Speeds shown are for the fastest clock rate in the fastest FPGA speed grade for the family. As an example, we can see the extensive Virtex-4 support – 175 MHz for DDR, 300 MHz for DDR2, 250 MHz for RLDRAM-II, QDR-II SRAM, and DDR-II SRAM. Virtex-5 is a more advanced FPGA fabric, and you can see the supported controllers are all faster. Spartan-3 is a less expensive and lower performance FPGA. We can see that all the Spartan FPGAs support DDR at 166 MHz. For Sp3A and Sp3, DDR2 support is also 166 MHz, showing that there is no performance advantage in Spartan by moving to DDR2. (Compared to MIG v1.7.3, MIG 2.0 does not improve the fastest controller speed in the fastest speed grade. However, speeds in slower speed grades have been improved significantly for Virtex-5.) DDR2 for Spartan-3E SSTL1.8 Class II not supported in silicon May work, but not officially supported DDR3 and RLDRAM-II for V5 covered in a reference design. Fastest clock rate in fastest FPGA speed grade See

28 MIG v2.0 DIMM Controllers DDR DDR2 Virtex-5 Virtex-4
200 MHz 333 MHz Virtex-4 175 MHz 267 MHz Spartan-3A/ 3AN/3ADSP 166 MHz Spartan-3E Spartan-3 And, here we see the MIG 2.0 DIMM controllers for DDR and DDR2, up to 333 MHz performance in Virtex-5. Fastest clock rate in fastest FPGA speed grade See

29 Spartan-3/3A DDR2 Controller
Performance Up to 166 MHz / 333 Mbps in -5 Speed grade device 200 MHz specific implementation documented in XAPP458 133 MHz/266 Mbps in -4 Speed grade device Spartan-3A only supports left and right sides Data Width Based on total available pins Component Up to 72-bit in Spartan-3 Up to 64-bit in Spartan-3A/3AN/3ADSP DIMM 64- and 72-bit in Spartan-3 64-bit in Spartan-3A/3AN/3ADSP DQ to DQS Ratio is 8:1 No built-in bank management for Spartan controllers Virtex-5 has 4-bank Least Recently Used option Left/right side support in Spartan-3A is due to those being the only sides that can handle SSTL 1.8V Class II. You could realistically use Class I and then use top/bottom sides, but that’s not supported officially.

30 Embedded Processor Controllers
Interface DDR2 to a MicroBlaze processor Embedded Development Kit (EDK) 9.2 Includes the Multi-Port Memory Controller v3 (MPMC3) MIG used for the physical layer All MIG rules and constraints apply See Answer Record 29221 Still set XIL_ROUTE_ENABLE_DATA_CAPTURE Use script to include MIG UCF in MicroBlaze system UCF Verify design built correctly (see Lab 3) Xilinx also offers memory controllers in the EDK embedded tool suite that allows an embedded processor like MicroBlaze to interface to DDR2 memory.

31 Where do I get MIG? MIG is included with ISE Foundation/WebPACK
Part of CORE Generator Graphical User Interface (GUI) provides access to Core library Datasheets 3rd party contact information Available Xilinx Solution Records Must Install ISE IP update MIG v2.0 in ISE 9.2 IP Update 2 Get IP Updates at Find more information at WebPACK WebPACK is free! WebPACK supports XC3SD1800A on S3ADSPSK

32 MIG Documentation MIG User’s Guide (UG086) Xilinx Application Notes
XAPP768c (Spartan DDR) XAPP454 (Spartan DDR2) XAPP458 (Spartan-3A Starter 200 MHz DDR2) XAPP858 (Virtex-5 DDR2) XAPP701 & XAPP702 (Virtex-4 DDR2 Direct Clocking) XAPP721 & XAPP723 (Virtex-4 DDR2 SERDES) Virtex-5 ML561 Memory Interfaces User’s Guide (UG199) Documents highlighted in Blue are the most pertinent ones for today’s Spartan-3A DSP labs. The philosophy I’m using for the labs and lecture today is to assume that the MIG controller works. We will not focus on how the controller works – we’re assuming it’s good. Our focus is understanding how that controller fits into the system and hot to get the basic controller up and running. The details on how the controller works is documented very well in the application notes. For those interested in going beyond what we cover today including custom modifications to the MIG controller, it is highly recommended that the user read this documentation thoroughly.

33 MIG Design Flow With Project Navigator
... MIG Design Flow With Project Navigator Project Navigator Integrate Design Core Generator MIG Download Design to Hardware create_ise.bat How do I access and use MIG? Personally, I prefer Project Navigator, especially when working on new designs. Therefore, we’ll show you here a typical flow diagram when using Project Navigator to create and integrate a MIG design. First, launch Core Generator and create a new project. From CoreGen, launch MIG. Use MIG to generate the controller. <click> Use the create_ise.bat batch file to generate a Project Navigator compilation of the MIG project. Once created, launch Project Navigator and open the new project. Integrate the MIG controller with the rest of your design. Download the design to hardware and see it work. MIG Outputs

34 A Practical Guide to DDR2 Design with Spartan-3A DSP
Lab 1 – Generate a DDR2 Controller with MIG

35 Download Design to Hardware
Lab 1 Overview Run COREGen Run MIG Configure controller Generate Convert to Project Navigator Review raw outputs HDL UCF Build scripts Project Navigator Integrate Design Core Generator MIG Download Design to Hardware MIG Outputs

36 Lab 1 Review What are the benefits of using MIG?
What is required to use Project Navigator with a MIG design? Other observations? Pinouts match our board? What else did you notice about the UCF? Properties match between ProjNav and command-line script?

37 A Practical Guide to DDR2 Design with Spartan-3A DSP
Real-world Design with a MIG DDR2 Controller

38 Real-world Design with a MIG DDR2 Controller
Memory, FPGAs, and Memory Controllers Memory trends DDR2 signaling Xilinx FPGA memory controllers Memory Interface Generator (MIG) Lab 1 – Generate a DDR2 controller core Real-world Design with a MIG DDR2 Controller Interface to the MIG controller Logically simulate Hardware debug Lunch Break Just hit the main section titles here. Cover the section sub-topics right before each section.

39 MIG Output Block Diagram
. MIG Output Block Diagram Memory MIG Outputs User Logic Memory Controller Clock Clock Management Calibration This is a simplified representation of an FPGA/DDR2 interface, including what gets generated by MIG. Obviously, we have the FPGA, the Memory, and the Clock connected on our PCB. The MIG outputs are shown in purple. This includes the Controller itself, the Calibration block, a clock management block, and some custom logic. <click> MIG does in fact generate a user logic example that runs a simple test. MIG calls the user logic RTL “ Test_Bench” although MIG also generates a simulation testbench (sim_tb_top). This is the area we must initially understand so that we can connect our own logic to the controller. We will refer to this as the User or Back-end Interface, and it will be a major focus of this lecture and the next lab. Of course, the rest of the controller is also important. Being open source with exposed RTL, many users will want to customize and optimize the MIG-generated controller. However, for our purposes today, we are using the default controller. Note – although the clock is shown coming into the Clock Management block and going nowhere, it really goes to every single piece in the design, including the User Logic. In fact, multiple phases go to most modules in the design. FPGA

40 User Logic Operating Modes
Initialize User instructs controller to set up the DDR2 for operation Controller programs DDR2 with operating parameters Parameters established by user during MIG generation Write User instructs controller to write data to memory Controller writes the data to the DDR2 Read User instructs controller to read data from memory Controller reads the data from the DDR2 Refresh Controller tells user a refresh is needed User pauses while controller handles refresh The User Logic can interact with the controller in four different modes. Three of these modes are initiated by the User – Initialize, Write, and Read. The fourth mode is initiated by the Controller – Refresh. We’ll look in detail at each of these modes, including Which user interface signals are involved Which command is used What sequence of events needs to open in the user logic to succeed

41 Clock Domains in the User Logic
90-degree phase of DDR2 clock Used for all data-related signals Generated by a DCM Referred to as CLK90 180-degree phase of DDR2 clock Used for all control-related signals Generated by negative edge of 0-phase clock Referred to as CLK180 or Falling Edge CLK0 Why is this important? User logic controls interaction between domains User must manage multiple clocks and resets Before looking at these different modes, we need to understand a little bit about the clock domains involved in the user interface. All signals in the User Interface are governed by either the 90-degree phase clock or the 180-degree phase clock. It’s important to understand this so the example User Logic will make sense when you see the different clocks and resets at work.

42 User Interface Signals
Write Data Write Mask Address Burst Done Command Command Acknowledge Controller User Logic Read Data Data Valid This diagram shows all of the connections between the controller and the User Logic. With this complete picture in mind, we’ll now take a look at what’s involved in each of the four modes of operation. Initialization Complete Auto Refresh Request Auto Refresh Done Clocks & Resets Clocks & Resets

43 Initialization Complete
Initialize Command Controller User Logic Connections involved in the Initialize Mode are shown. Initialization Complete Clocks & Resets Clocks & Resets

44 How to Initialize Wait for RST_90 and RST_180 to deassert
Set USER_CMD to b’010 on CLK180 for one clock Wait for INIT_DONE to assert Minimum of 200 s The steps to perform an Initialization from the User logic

45 Write Controller User Logic Write Data Write Mask Address Burst Done
Command Command Acknowledge Controller User Logic Write connections Clocks & Resets Clocks & Resets

46 How to Write Set USER_CMD to b’100 and Address on CLK180
Wait for USER_CMD_ACK Set the DATA and MASK on CLK90 A dataword is double the memory interface Provide BURST_LENGTH/2 datawords (BL=8  4 words) Set the next address and data Assert BURST_DONE on CLK180 after the last address Deassert USER_CMD after BURST_DONE Image shows 5 write transactions to a 32-bit DDR2 interface with Burst Length of 8. A total of 20 double-words are seen between the X and O markers. This is a total of bit words written to the DDR, starting at address 0x How this address translates to Bank, Row, and Column will be explained later.

47 Read Controller User Logic Address Burst Done Command
Command Acknowledge Controller User Logic Read Data Data Valid Read connections. Address, Burst Done, Command, and Command Acknowledge are the same as in the Write case. Clocks & Resets Clocks & Resets

48 How to Read Set USER_CMD to b’110 and Address on CLK180
Wait for USER_CMD_ACK Set the next address Assert BURST_DONE on CLK180 after the last address Deassert USER_CMD after BURST_DONE Watch for Data Valid to indicate when data is good (CLK90) The steps to perform a Read from the User logic

49 Refresh Controller User Logic Auto Refresh Request Auto Refresh Done
Refresh connections Auto Refresh Request Auto Refresh Done Clocks & Resets Clocks & Resets

50 How to Refresh At all times, check for auto refresh request (AR_REQ) in the CLK180 domain If AR_REQ, then do not start a new transaction Wait for AR_DONE (CLK180) Go back to what you were doing The Refresh happens automatically – the User Logic simply needs to be aware.

51 User Logic Address Starting address for burst
DDR2 auto-increments address for burst Combines addresses for bank, row, and column [ (row) : (column) : (bank address) ] 32M x 32 example Address bus is 26 bits User_A[25:13] is the Row Address User_A[12:2] is the Column Address User_A[1:0] is the Bank Address Why is Column Address 11 bits? 1K columns per row only requires 10 bits This slide explains how the User Logic address maps to the controller. Why is bank address in the least-significant position? Having Bank Address as the ls-bits makes it convenient for interleaving. However, in Spartan, interleaving is not supported in MIG and must be added by the User.

52 Column Address A10 Column address bit A10 is special
PRECHARGE is the DDR2 “Close Row” command Deactivates current row Returns bank to the idle state Auto-PRECHARGE DDR2 automatically closes the row after the current operation MIG does not support auto-precharge, but still reserves A10 To set Auto-PRECHARGE, assert column address A10 What if a column needs 11 or more address bits? Rare, but if so, A10 gets skipped What if a column needs 9 or fewer address bits? Extra address bits added up to A10 32Mx32 has 11 column address bits A[9:0] for the address A[10] reserved for user to create custom, auto-precharge logic

53 User Interface Commands
Description 000 NOP 010 Initialize Memory 100 Write Request 110 Read Request Others Reserved This is the complete list of commands that the User Logic can exercise.

54 User Logic Interface -- 32Mx32 Example
Function User Guide Name Direction Width Data (Write) cntrl0_user_input_data To controller 64 Mask cntrl0_user_data_mask 8 Address cntrl0_user_input_address 26 Command cntrl0_user_command_register 3 Burst Done cntrl0_burst_done 1 Command Acknowledge cntrl0_user_cmd_ack To user Data (Read) cntrl0_user_output_data Data Valid cntrl0_user_data_valid Initialization Complete cntrl0_init_val Auto Refresh Request cntrl0_auto_ref_req Auto Refresh Done cntrl0_ar_done This reference slide shows the user logic interface to the MIG controller for a 32Mx32 controller. See the official signal names, the direction, and their width. Notice the two data busses and mask bus are all twice as big as the interface between the FPGA and DDR2.

55 Spartan-3x Memory Interface Architecture
. Spartan-3x Memory Interface Architecture LUT delay select LUT delay Calibration Monitor Input_clock FPGA Clock DCM Clocks all modules in fabric Lut delay User_data_valid FIFO User_output_data Read Capture FIFO Lut delay User_data_mask DQS, DQ User_input_data DDR2 SDRAM Write Datapath User_address DM User_command Hopefully this diagram now makes sense. This shows the MIG controller. This is what the User Interface must communicate with. All of the important pieces inside the controller are shown here. DCM – Generates the clk0 and clk90 phases of the clock required for the interface design The other yellow blocks, in particular the Read Capture and LUT Delay Calibration Monitor circuits are explained in detail in XAPP768c <click> One signal we haven’t yet discussed is the reference clock for the system. On this diagram, it is shown as the Input_clock. User_burst_done User Interface Address, Command, & Control User_cmd_ack Controller Spartan-3x FPGA Slide courtesy Xilinx

56 System Reference Clock
MIG assumes this to be differential SYS_CLK and SYS_CLKb MIG assumes it to be the controller frequency User must connect the real system clock Single-ended clock is acceptable Differential has less jitter Can a DCM synthesize the proper frequency? Yes, if you account for jitter in timing calculations User must edit several MIG RTL files Shown in Lab 2 The Input_clock from the previous page is an input to the FPGA from some source on the PCB. This is not one of the customizations that MIG allows, so it makes some assumptions.

57 MIG’s Two Output Designs
user_design For the user who wants to instantiate the MIG controller Top-level exposes all DDR2 external signals and User Logic interface No instantiation template provided example_design Adds a User Logic example Top-level only exposes DDR2 external signals Adds wrapper layers to connect controller, calibration, clock management, and User Logic More practical starting point As we saw in Lab 1, MIG generates two different designs – user_design and example_design. What’s the difference between these two designs? The user_design has no user logic. All of the DDR2 and User Logic signals are brought out to the top-level. The user would then instantiate that top-level in their own design. The example_design includes example user logic. The top-level is a wrapper to connect things together. It’s a good starting point because it can be synthesized and built. A user can still implement their own user logic. Simply replace the example user logic with your own, preserving the top-level wrappers that pull everything together.

58 example_design File Hierarchy
Memory controller User Logic Clock management Calibration Top-level, main, and infrastructure are essentially wrappers. Note that MIG calls the User Logic “test_bench.” We can build this design as is, or we can replace “test_bench” with our own code, preserving the wrappers at the top-level.

59 Logical Simulation MIG generates logical simulation files
VHDL or Verilog testbench ModelSIM “do” file Micron memory model Assuming the Micron license agreement is checked Verilog only Newer versions available directly from Micron

60 Simulator Support Verilog VHDL ISE Simulator
Yes – Requires modifications to the HDL. No modifications required in 10.1 No – Scheduled to work in ISE 10.1 ModelSIM-XE Verilog Yes NA ModelSIM-XE VHDL No – Need mixed language simulator due to Micron’s Verilog model ModelSIM-SE

61 VHDL Options Use a mixed-language simulator
ModelSIM SE tested and supported by Xilinx Get 3rd party VHDL models for the memory Not tested or supported by Xilinx Wait for ISE 10.1 to consider ISE Simulator

62 Hardware Debug External logic analyzer External scope
Consider adding Agilent Soft Touch connectorless probes Invaluable for performing full-speed measurements External scope Probe directly at the memory or FPGA Leave break-out vias exposed for BGAs on prototypes Critical for measuring signal integrity Embedded logic analyzer Extremely versatile and inexpensive option ChipScope Pro

63 Debug Logic Anywhere Within the FPGA
Memory Controller Clock Data Xilinx ILA Clock Trigger Out Trigger 0 Trigger 1 Trigger 2 Trigger 3 Address Identify logic that you need to debug and verify ChipScope Pro cores are placed directly within the logic and … Function as “virtual test headers” Provide access any signal or node with the FPGA Debug at the system clock rate Slide courtesy Xilinx

64 ChipScope in Spartan ChipScope cores take up FPGA resources
Consider using a larger FPGA in prototypes ChipScope logic must meet timing Limited to around 200 MHz for Spartan ChipScope must run faster than DDR2 to see double data rate DDR2 lower limit is 125 MHz Not practical to run ChipScope at 250 MHz Consider violating 125 MHz limit Run DDR2 at 50 MHz Run ChipScope at 100 MHz or 200 MHz See this in Lab 2 High-speed measurements need to be taken with a high-speed logic analyzer


66 A Practical Guide to DDR2 Design with Spartan-3A DSP
Lab 2 – Build and verify a DDR2 controller in hardware

67 Download Design to Hardware
Lab 2 Overview Modify the example design UCF to match the hardware Connect correct system clock Integrate new user logic Add ChipScope logic analyzer Build, download, and verify hardware Project Navigator Integrate Design Core Generator MIG Download Design to Hardware MIG Outputs

68 User Test Logic Initialize the memory
Write to fill the memory with incrementing pattern Read back the memory Handle auto-refresh when necessary

69 User Test Logic State Machine
Write Power On Initialize Memory Compare Read After power on, the user logic issues a single Initialize Memory command to the controller, after which the controller takes care of all the gory details we previously read from the datasheet. After initialization, the testbench gets into a loop where it (1) writes data to the memory, (2) reads the data back from the memory, (3) compares the read data with the original data pattern. If at any time an error is detected in the comparison, an error signal is asserted (and an LED lights on our demo design).

70 Xilinx Spartan-3A DSP 1800A Starter Platform
.. EXP RS232 Texas Instruments Regulators Spartan-3A DSP FPGA Might be good to point a few of the components -- RS232 and DB15 Video on the edge Mx16 DDR chips above FPGA (total of 32Mx32 or 128MB) -- EXP slot -- 8Mbit Parallel Platform Flash -- Power source National 10/100/1000 PHY Intel Flash Micron 32Mx32 DDR

71 Expansion slot for custom development
Consists of two high-speed Samtec mezzanine connectors Two connectors = full expansion module One connector = half expansion module Each connector has 84 I/Os (mix of single-ended and differential) Access to one or more FPGA clock input pins 2.5V and 3.3V source power Use one from Avnet EXP Prototype EXP High-speed Analog EXP Video EXP Interface Develop your own Specification available from Avnet

72 Lab 2 Review Why did we do simulation in Verilog?
What’s the penalty for using a DCM to synthesize the system clock? Other observations? Did your design pass timing? How was the new UCF different from the original? How does adding ChipScope affect a design?

73 A Practical Guide to DDR2 Design with Spartan-3A DSP
PCB Considerations

74 PCB Considerations Lab 2 – Build and verify a DDR2 controller in hardware PCB Considerations FPGA pinout Factors impacting signal quality and crosstalk PCB simulation example for DDR2 Trace requirements Power Customizing and Verifying the MIG Results Pinout rules Pin-swapping Verifying a new design Lab 3 – Analyze and Fix Customized MIG Controllers Just hit the main section titles here. Cover the section sub-topics right before each section.

75 FPGA Pinout A random pinout will not work
Pinout relationship of DQS to DQ is critical LUT delay line placement is critical MIG follows all the pinout guidelines To modify the MIG-generated pinout, you must understand the pinout rules The pinout rules are covered later

76 Simultaneous Switching Outputs (SSO)
Limits the number of outputs on a bank If violated, ground bounce can occur MIG doesn’t check this for you Read XAPP689 For DDR2, we use SSTL 1.8V Class I & II IOSTANDARDs For XC3SD1800A-FG676 Bank 3 (Left) Datasheet Table 26 shows 9 equivalent Vcco/GND pairs SSTL18_I allows 15 SSO per Vcco/GND pair (135 total) SSTL18_II allows 3 SSO per Vcco/GND pair (27 total) Spartan-3A DSP 1800A Starter Platform Using Class II for all 32 data bits is a problem Use Class I instead

77 Calibration Loopback MIG calls this rst_dqs_div Not a DDR2 signal!
One output: rst_dqs_div_out One input: rst_dqs_div_in Not a DDR2 signal! Used for Spartan-3x controllers to calibrate timing Write enable for readback from DDR2 Must be placed on two I/Os in the center of the DQ bus Trace length equal to sum of average DQS and clock trace lengths Basically one roundtrip from FPGA to memory and back NOT the same as clock feedback required by pre-EDK 9.2 DDR2 controller If using EDK 9.1 or earlier, design a separate clock feedback Both MIG loopback and EDK 9.1 clock feedback are on Spartan-3A DSP 1800A Starter Platform The rst_dqs_div signal is driven to an IOB as an output and is then taken as an input through the input buffer. This technique normalizes the IOB and trace delays between the rst_dqs_div and DQS Clock signals. The rst_dqs_div signal from the input PAD of the FPGA uses identical routing resources as the DQS before it enters the LUT delay circuit. The trace delay of the loop should be the sum of the trace delays of the clock forwarded to the memory and the DQS.

78 Factors Impacting Signal Quality
The 3 T’s Technology Use slowest possible driver switching speeds Topology Select optimal topology for signal integrity, timing, and EMC Shorten traces or stubs to their critical length or shorter Termination Select optimal termination for signal integrity, timing and EMC Match end of line to Z0 using passive components

79 Technology’s Influence on SI
Smaller dies mean faster edge rates Faster edge rates mean reflections, and signal quality problems Even when the package hasn’t changed and your clock speed hasn’t changed A problem for legacy designs and redesigns Overshoot using SSTL18 Class II driver (red) = 675 mV peak-peak Overshoot using LVCMOS18 Fast driver (green) = 47 mV peak-peak Plot shows signal integrity using SSTL18 Class II driver compared to that using LVCMOS18 Fast for an unterminated microstrip trace. Trace Topology Setup: length = 7 inches

80 Topology’s Influence on SI
Topology is critical Analyze prior to layout Longer traces more susceptible to reflections Overshoot for 2.5 inch trace (red) = 670 mV peak-peak Overshoot for 0.25 inch trace (green) = 114 mV peak-peak Plot shows signal integrity using SSTL18 Class II driver with an unterminated microstrip trace of lengths 0.25 inches and 2.5 inches. Trace Topology

81 Termination’s Influence on SI
Impedance discontinuities cause reflections Changes in trace width BGA breakouts Stubs Vias Loads Connector transitions Basic Termination Guidelines Source termination is useful in point-to-point Far-end termination is useful in multi-point connections Distributed termination is useful with variable configurations Improper terminations No termination Large power plane discontinuities Changes in trace height above power planes Changing layers Reflections occur at impedance discontinuities (I.e., Z2  Z1) Even with “unlike objects” (e.g., a trace and a connector), reflections will not occur as long as Z2 = Z1 We call this “Impedance Matching” Another factor is the length over which the discontinuity occurs A BGA breakout, for example, may have a longer length than a via, and therefore cause a bigger reflection Decide the best method for your trace topology and design requirements You can terminate the source, the far end, both, or you can employ “distributed” terminations at several locations (in the case of multipoint connections) Some basic guidelines: Source termination is useful in point-to-point/one-directional connections Far-end termination is useful in multi-point connections Distributed termination can be helpful if you have a plug-in system with variable configuration Select correct values for terminators Passive component values depend on physical board properties

82 Termination Example Setup: length = 7 inches
Overshoot for unterminated net (red) = 674 mV peak-peak Overshoot for series terminated net (green) = 90 mV peak-peak Below we look at signal integrity using SSTL18 Class II driver with of a microstrip trace with and without termination. Trace Topology without Termination Setup: length = 7 inches Trace Topology with Termination

83 Factors Impacting Crosstalk
Crosstalk occurs when 2 or more neighboring traces couple together The following affect crosstalk performance on a PCB Stackup, Signal Integrity, Fast Edge Rates, and Trace Separation ClockA ClockB ClockA (Aggressor) Coupled Region Sending a signal down one trace causes a signal to appear on the 2nd trace ClockB (Victim) Crosstalk is caused by capacitive and inductive coupling. Inductive coupling causes signal currents to couple voltages into neighboring nets. Capacitive coupling causes signal voltages to couple currents into neighboring nets. Crosstalk is only induced when voltage on the aggressor net is changing due to transition or ringing. Crosstalk is not important during the part of the cycle that is not within the setup/hold time of clocked nets. In this example, Clock A is steady (not active). ClockB is transitioning to logic high. Because of the close proximity of the two nets, and the electro-magnetic field generated from ClockB, ClockA becomes effected. The edge rate of ClockB will determine (along with their proximity to each other) how much noise is actually generated on ClockA. Net ClockA inducing crosstalk on ClockB Net Topologies

84 Stackup’s Impact on Crosstalk
Microstrips (surface layer) traces are more susceptible to crosstalk Striplines (internal layer) traces are less susceptible to crosstalk Multiple reference planes Reduces trace to trace coupling Consider extra layers for reference planes A divided voltage plane is a poor reference In the next slide we’ll look at crosstalk for microstrip versus stripline for terminated nets

85 Microstrip vs. Stripline
Setup: length = 7 inches, spacing = 8 mils, TL1 and TL3 Aggressor, TL2 = Victim Crosstalk on TL2 for microstrip (green) = mV peak-peak Crosstalk on TL2 for stripline (blue) = 79 mV peak-peak

86 Signal Integrity’s Impact on Crosstalk
Reflections lead to higher signal swings Termination improves SI and crosstalk. Unterminated nets The following describes the setup above used are SSTL18 Class II buffers 3 coupled microstrip traces length = 7 inches, spacing = 8 mils Middle trace (TL2) is victim net Terminated nets Crosstalk on TL2 when nets were unterminated (red) = 637 mV peak-peak Crosstalk on TL2 when nets were terminated (green) = mV peak-peak

87 Edge Rates’ Impact on Crosstalk
Fast edge rates lead to increased coupling between traces Crosstalk is higher Slowest driver that meets timing requirements is recommended Drive strength Large drive strength values and singled ended swing also increase the coupling between traces Lower drive strength to minimum that meets requirements In the next slide we’ll look at crosstalk for stripline for unterminated nets using SSTL18 Class I drivers and then using SSTL18 Class II drivers

88 SSTL18_I vs SSTL18_II Setup: length = 7 inches, spacing = 8 mils, TL1 and TL3 Aggressor, TL2 = Victim Crosstalk on TL2 using SSTL Class II drivers (red) = 118 mV peak-peak Crosstalk on TL2 using SSTL Class I drivers (green) = 89 mV peak-peak

89 DDR2 Termination DDR2 uses the SSTL standard Rules are simple
Stub Series Termination Logic JEDEC Rules are simple Stub termination to 0.9Vtt on all receiving nodes Resistor value equal to board impedance Series termination on all driving nodes Sum of termination and output driver impedance should equal board impedance What about bi-directional signals? Sometimes driving, sometimes receiving Series and stub terminations required at both ends Differential signals have 100-ohm termination at load EXCEPTION: Any or all terminations can be eliminated if proven during board-level simulation

90 DDR2 On Die Termination (ODT)
New feature in DDR2 Termination added inside DDR2 DQ, DQS, DM Multiple termination values for different configurations None, 50 Ohm, 75 Ohm, 150 Ohm Not included for address or control That comes with DDR3 Eliminates the need for stub terminations at the DDR2 for the data lines The ODT feature is designed to improve signal integrity of the memory channel by allowing the DDR2 SDRAM controller to independently turn on/off ODT for any or all devices. RTT effective resistance values of 50Ω, 75Ω, and 150Ω are selectable and apply to each DQ, DQS/DQS#, RDQS/RDQS#, UDQS/UDQS#, LDQS/ LDQS#, DM, and UDM/LDM signals.

91 Spartan-3A DSP Starter Termination

92 When Can Terminations Be Relaxed?
When simulating proves that it can! See Micron TN4614 Some examples Trace length < 2” Only 1 or 2 memory devices Relaxed timing allows weaker drivers Reduced DDR2 drive strength Use SSTL 1.8V Class I FPGA I/O Standard

93 Board-level Simulation
A must for any serious, high-speed design Investigate the following Differing layout topologies Trace lengths Resistor values Resistor placement Examples Determined we could eliminate all terminations on Spartan-3MB board Trace lengths < 1” Determined some stub terminations on Spartan-3A DSP 1800A Starter Platform were not necessary

94 Example Simulation Flow
Simulate after each step until acceptable results are achieved Create driver/receiver topology with no termination Can the trace be shortened? Is a reduced drive strength possible? Turn on ODT at the DDR2 Experiment with all three options (50, 75, and 150ohm) Add series termination at driver For bi-directional signals, experiment moving series termination to other side Add series termination on both sides Add stub termination at receiver(s)

95 DDR2 Signal Integrity Simulation Demo
Pre-Layout Analysis for Data Topology

96 DDR2 Crosstalk Simulation Demo
Pre-Layout Analysis for Data Topology

97 Recommended PCB Design Flow
PRE-LAYOUT LAYOUT PROTOTYPING System Design, Part Selection, and Schematic Entry Full Board Place-and-route Prototype Functional Testing & Debugging EMI Testing & Debugging The two simulations just demonstrated were performed using Linesim with the optional crosstalk package (approximately $8K). Linesim Boardsim Mentor HyperLynx

98 For More Information Talk to a Mentor Graphics Representative in your area Get a HyperLynx evaluation Good tutorials Easy to use No license required for eval FAEs – Feel free to look up and add name of local reseller and even invite them to the session.

99 Other SI Resources Terry Fox & Associates (Issaquah, WA, USA)
Signal integrity training Project consulting Disaster recovery CircuitCraft (Calgary, Canada) Schematic and layout design review Post-layout board simulation (using HyperLynx Boardsim) Full electrical and physical design Or check with your Mentor Graphics Rep for a local resource $3-5K for basic simulation help FAEs – these resources are both located in North America, but I believe they will work on international projects as well. If you have a local consultant that you would prefer to highlight, feel free to modify this slide for your area.

100 Simulation Models Board-level simulation tools typically require IBIS models Xilinx provides IBIS models for free Available from the Download Center ( Make sure you are using the correct I/O standard Or, generate an IBIS model directly from ISE Micron provides IBIS models for free See product web page

101 Trace Length Matching Requirements
Members of a differential pair matched to +/-10mil DQ, DQS, DM and CK matched to +/- 45mil Address/Control matched to +/- 100mil of CK RST_DQS_DIV and MB_FB_CLK matched to +/- 45mil of sum of average DQS and average CK These are tight requirements, but if you make them, your board will likely work. Trace Lengths These rules indicate the maximum electrical delays between DDR/DDR2 SDRAM signals at 333 MHz: 1. ± 25 ps maximum electrical delay between any DQ and its associated DQS/DQS# 2. ± 50 ps maximum electrical delay between any address and control signals and the corresponding CK/CK# 3. ± 100 ps maximum electrical delay between any DQS/DQS# and CK/CK#.

102 Power Three independent supplies required
DDR2 and FPGA I/O supply is 1.8V Source/sink 0.9V termination supply Resistor divider is possible Regulator is recommended 0.9V reference supply

103 Texas Instruments TPS51116 Used on Spartan-3A DSP 1800A Starter Platform Provides 1.8V up to 10A Provides 0.9V Vtt up to 3A Provides 0.9V Vref 1.8V 0.9Vtt DDR2 FPGA TPS51116 DDR2 0.9Vref

104 National DDR2 Termination
Buck Converter LM283X Sot-23 1, 1.5, 2A 3.0 – 5.5V 1.8V DDR2 Memory System VTT = 0.9V DDR2 Termination Regulator LP2997 PSOP-8 This simple solution is low in external component count. Vref = 0.9V

105 Decoupling Capacitors
For the FPGA, follow Xilinx XAPP623 Example for the Spartan-3A DSP 1800A Starter Platform (XC3SD1800A-FG676) shown in table For the DDR2, see Micron TN4602 FPGA Decoupling 1.8V 1.2V 0.9V Total # Pwr/Gnd Pairs 9 23 Tantalum Capacitor 470uF 1 4.7uF (0603) 2 4 1.0uF (0402) 3 7 5 .01uF (0201) 13

106 Stackup Power planes on S3ADSPSK are multi-rail
Not good for a return path Extra ground planes used

107 A Practical Guide to DDR2 Design with Spartan-3A DSP
Customizing and Verifying the MIG Results

108 Customizing and Verifying the MIG Results
Lab 2 – Build and verify a DDR2 controller in hardware PCB Considerations FPGA pinout Factors impacting signal quality and crosstalk PCB simulation example for DDR2 Trace requirements Power Customizing and Verifying the MIG Results Pinout rules Pin-swapping Verifying a new design Lab 3 – Analyze and Fix Customized MIG Controllers Just hit the main section titles here. Cover the section sub-topics right before each section.

109 Customizing the MIG Pinout
What if the MIG output doesn’t match an existing board? What if I’m designing a board and don’t like the pinout MIG gives me? What about pin-swapping during layout? Why does it matter? Calibrating strobes with data bits

110 If You Want to Change the I/Os
Know the pinout rules Use FPGA Editor to find suitable alternatives Modify the UCF accordingly Verify the implemented result

111 Spartan-3x Pinout Rules
The IOBs for DQ bits must be placed five tiles above or six tiles below the IOB tile for the associated DQS bit See XAPP768c See AR24935 Unbonded IOBs count (can’t simply use datasheet pinout) Loopback must be in the middle of the DQ bus Keep even/odd DQs oriented in the same top/bottom tile pattern One CLB column is dedicated for the odd numbered bits and one is dedicated for the even numbered bits CK/CK_N, address, RAS_N, CAS_N, WE_N, CS_N, and ODT must be placed together in bank that are on the same side of the device

112 What’s a Tile? IOBs are grouped together
Each grouping is called a tile Example shows a 2 IOB tile “Five tiles above” means 10 total IOBs above in this case IOB Tile

113 Common Pin-swapping A DQ byte can be swapped with another byte
Strobe and data swapped together DQ bits can swap within a byte Swap even-numbered bits with other even-numbered bits Swap odd-numbered bits with other odd-numbered bits Control, address, data mask, and clock can be swapped at will Spreading too far apart may cause timing issues though Anything besides these requires checking against the pinout rules

114 Using FPGA Editor Tool to view internal device layout
Shows things that are hidden in other tools Unbonded I/Os Detailed routing What will we use it for? Creating or customizing a pinout that follows the MIG rules Verifying a design was implemented properly Where is it? Start  Programs  Xilinx ISE  Accessories  FPGA Editor Within Project Navigator, “View/Edit Placed Routed Design”

115 Adjusting the UCF Use FPGA Editor to find acceptable new pin locations
Pin location changes must be reflected in the UCF These pin location changes will affect SLICE location constraints as well

116 Verify the Result Pass UCF timing constraints Examine data routing
MAXDELAY FROM/TO PERIOD Examine data routing Compare routes in FPGA Editor with AR25245 Analyze delays in FPGA Editor Examine clock routing Inspect Clock Section in PAR report

117 Verify Data (DQ) Routing
Keep even/odd DQ bits in the same CLB Columns Keep even/odd DQs oriented in same top/bottom I/O tile pattern EVEN DQ CLB COLUMN ODD DQ CLB COLUMN Even on top of I/O Tile Pair. Odd on bottom DQ0 DQ1 Once this pattern is established, it must be repeated for all DQ lines (as seen in FPGA Editor)

118 Data Skew and Delay Use FPGA Editor Data net skew < 75 ps
Instructions outlined in AR25245 Data net skew < 75 ps Total delay range = ps In this example Delays range from 411 to 464 ps Skew = 53 ps Data net skew is less than 75 ps Total delay range is within ps range

119 Clock Routing Clock routing must follow a specific pattern
Details are shown in AR25245 Example of proper routing shown

120 Clock Report Inspect Clock Report section in PAR report file
Net Skew < 65 ps (this example = 64 ps) Max Delay ~ 400 ps (this example’s range is 465 to 491 ps) ************************** Generating Clock Report . |main_00/top0/data_pa | | | | | | |th0/dqs0_delayed_col | | | | | | | | Local| | 11 | | | |th0/dqs1_delayed_col | | | | | | | | Local| | 11 | | |

121 A Practical Guide to DDR2 Design with Spartan-3A DSP
Lab 3 – Analyze and Fix Customized MIG Controllers

122 Verify a “known-good” design (Lab 2)
Lab 3 Overview Verify a “known-good” design (Lab 2) Verify with FPGA Editor and examining PAR report Practice looking at something correct Fix a “broken” design Open “broken” design in FED Figure out what’s wrong Fix it in your UCF Re-implement Re-verify

123 Lab 3 Review What happens when you violate the +5/-6 tile rule?
How did you determine the correct SLICE location after changing DQ/DQS? What are the pins to avoid when selecting new sites in FPGA Editor? How were those unsuitable pins highlighted? How were the new pin locations verified?

124 A Practical Guide to DDR2 Design with Spartan-3A DSP

125 To Proceed with a MIG Design
Get the Xilinx Spartan-3A DSP 1800A Starter Platform Get ISE 9.2, IP Update #2, and ChipScope Pro Evaluate options for logical simulation Prepare for board-level simulation Get IBIS and HDL models Read the documentation Xilinx Previously listed Micron DDR2 datasheet TN4602, TN4605, TN4606, TN4614, TN4720

126 SpeedWay Kit Specials Spartan-3A DSP Starter Kit $285 (save $109)
Xilinx Spartan-3A DSP Starter Kit and SpeedWay attendance AES-SPEEDWAY-S3ADSP-SK Spartan-3A Starter Kit $200 (save $124) Xilinx Spartan-3A Starter Kit and SpeedWay attendance AES-SPEEDWAY-S3A-SK Virtex-5 LX50T PCIe Starter Kit $995 (save $499) Avnet Virtex-5 LX50T PCIe board and SpeedWay attendance AES-SPEEDWAY-LX50T-SK EDK Software Bundle* $200 (save $295) 12-month EDK software license AES-SPEEDWAY-EDK ISE Foundation Bundle* $995 (save $1500) 12-month ISE software license AES-SPEEDWAY-ISE * Must purchase a “-SK” kit Other specials available – see the kit specials handout

127 Course Objectives Review
You now have… Built a functioning DDR2 controller in hardware Generate the DDR2 controller IP Incorporate into ISE Project Navigator Connect the design to custom logic Download and operate on the Spartan-3A DSP 1800A Starter Platform Learned what’s required to design your own board Connect the FPGA to DDR2 components Power and decoupling Signal integrity and crosstalk Create a custom DDR2 pinout for the FPGA

128 A Practical Guide to DDR2 Design with Spartan-3A DSP
Thank you!

Download ppt "A Practical Guide to DDR2 Design with Spartan-3A DSP"

Similar presentations

Ads by Google