Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling High System Performance with Advanced Silicon and Memory IP

Similar presentations


Presentation on theme: "Enabling High System Performance with Advanced Silicon and Memory IP"— Presentation transcript:

1 Enabling High System Performance with Advanced Silicon and Memory IP
2010 Technology Roadshow

2 Agenda Dynamic RAM and Static RAM
Altera external memory solutions: UniPHY Altera external memory solutions: High Performance Controller II Design flow for DDR and QDR external Memory controller Demo Summary

3 Static Ram V.S. Dynamic Ram
Static RAM (SRAM) is a type of semi-conductor memory Pros: Retains all information as long as power is maintained Reads stored data at a faster rate since they accept all address bits at the same time (DRAM accepts high to low) Cons: Construction includes four transistors and two cross-coupled inverters and two additional access transistors that serve to control the access to a storage cell during read and write operations A six-transistor CMOS SRAM cell

4 Static Ram V.S. Dynamic Ram
Dynamic RAM (DRAM) stores each bit of memory in a separate capacitor Pros: Only one transistor and capacitor are required per bit Allows for RAM to reach very high density Lowest cost per bit Cons: Capacitors leak electrons Information is lost unless charge is refreshed periodically The latency is uncertain 1-70RTZ3

5 Type of external RAM DDR/DDR2/DDR3 SDRAM RLDRAM I/II
Very low cost external RAM Higher latency : multiplexed address bus Lower efficiency: need refresh periodically Need Complex Controller RLDRAM I/II Reduced latency DRAM Partitioned into 8 banks to reduce parasitic capacitance of address and data lines Non-multiplexed address to save bus cycles QRD/QRD II/QRDII+ SRAM SRAM based: density is not high Low latency: 1 clock cycle; True dual port: independent read/write bus running with DDR clock Simple controller

6 Market-Specific Requirements
Over 70% of FPGA applications have some form of external memory Computer/ Storage Wireline Wireless Broadcast Military Application 40G/100G Basestations Video processing Disk array, servers. accelerators Low power / portable devices Need High performance Low latency Increased efficiency More functions Low power Memory Standard DDR3, RLDRAM II, QDR II/+ DDR2/3 Multi-port efficiency, 533 MHz RDIMM, ONFI, Flash, QDR II/+ LPDDR, mobile DDR Different Solutions Fit Market Applications

7 Dynamic Ram dominate the market
Highest density/Cheapest memory solution Widely used PC and Server applications Also widely used in many market segments Wireless: baseband processing; Remote Radio Head; Wireline: packet processing; Traffic management; Video processing; Security applications

8 External Memory Roadmap
800-MHz DDR3 800-MHz DDR3 DDR2/3, QDR II/+, RLDRAM II 533-MHz DDR3 DDR1/2/3, QDR II/+, RLDRAM II Performance 400-MHz DDR3 DDR1/2/3, QDR II/+ 200-MHz DDR2 DDR1/2 Logic Density

9 External RAM Design Challenge
Board design: Clock Speeds reaching 800MHz Parallel buses reaching the speeds of serial technology: 1.6Gbps Crosstalk, impedance, EMI, and jitter issues Noise susceptibility Controller Design and verification Tighter timing margins require calibration and bus training for DRAM, Controller and Analyzer capture Debug is difficult High data rate Memory interface is Double data rate; and not easy understand with additional control

10 Altera help you with Complete Memory Solutions
External Memory IP MegaCores Support for common memory standards (DDR 1/2/3, QDR, RLDRAM) Low latency high performance Included in the Free IP Base Suite Advanced FPGA Architecture Software Support Dedicated circuits to enable higher performance Best in class signal integrity Automatic Generated Constraints System Level Timing Analysis Spice and IBIS Simulation Models Support Collateral Reference designs Board Design Guidelines Development Kits & Hardware Reference Platforms Device Handbook, Application Note

11 External Memory IP MegaCore
Multi-Port Controller Memory IP Avalon MM AFI MPFE (Ref Design) Memory Controller (HPMCII) Memory PHY (UniPHY) Avalon MM External Memory QDR II, QDR II+, RLDRAM II/ III, DDR2/3 QDRII/QDRII+/RLDRAMIII/DDR2/3 High Performance Controller II (HPMCII) HPMCII : Higher bus efficiency with advanced bank management UniPHY: Lower latency Multi-Port Front End is available as a reference design Intelligent multi-class arbitration Effectively share bandwidth between several masters

12 UNIPHY Architecture

13 External Memory – What’s New
Result of Re-architected Memory PHY and Controller HPMC II = 2x the controller efficiency UniPHY = ½ the ALTMEMPHY latency Memory Type Controller UniPHY DDR 2/3 QII 9.1 QII 10.0 QDR II/II+ QII 9.1(1) RLDRAM II (1) Controller designed to support UniPHY. This is not HPMCII 13

14 Altera Memory PHY Solutions
Feature UniPHY ALTMEMPHY Available as a MegaCore Support for DDR2/3 Support for QDR II/II+ and RLDRAM II X PLL/DLL sharing Smart calibration algorithms Latency 0.5 1.0 Let’s now look at the two PHY offerings from Altera. Both are excellent but we have added some new key features to UniPHY to support the needs of high-performance applications. Some of those features include PLL and DLL sharing, support for QDR II/II+ and RLDRAM and a new smart calibration algorithm. We’ll have more details about this in the next few minutes. I am then going to describe all the line items, read my tag line at the bottom then transition into the next slide which is a side by side comparision fo the two PHYs UniPHY Provides Higher Flexibility With Half the Latency

15 PHY Architectures (UniPHY and ALTMEMPHY)
Stratix III UniPHY I/O Structure DLL PLL I/O Structure PLL Re - config I/O Structure config Re - config Re - Auto Cal PLL Calibration Sequencer I/O Structure Auto Cal PLL Re - config Calibration Sequencer Clock Gen Clock gen Mimic path Mimic Path Mimic path Clock gen Clock Gen DLL Memory Memory Altmemphy DLL DLL Memory Memory UniPHY Altmemphy DSQ I/O block DQS Path DSQ I/O block DQS Path Write path Write Path Memory IP DQ I/O block DQ I/O DQ I/O Controller Write Path Write path Memory IP FIFO DQ I/O block Read Path Read Path Memory Controller DQ I/O DQ I/O Controller Block Read Path Read Path Memory Controller I/O block I/O I/O Block Address/cmd Path Address/cmd path I/O block I/O I/O Block Address/cmd path Address/cmd Path In the new UniPHY architecture we’ve made enhancements to the ability of the PHY by adding several new features to improve the function and reduce the latency. One of the functional enhancements is the ability to do PLL sharing which allows many PHY’s to share the PLL resouces UniPHY uses the PLL as a shared resource across multiple interfaces. Wi For the new Stratix V FPGA significant circuit enhancements, along with the PHY enhancements, have been put in place to achieve higher performance on memory interfaces. All the critical circuits in the read/write paths have been hardened to guarantee timing closure at higher frequencies. The Hard FIFO in the I/O blocks enables the new UniPHY to half the PHY latency as compared to Stratix IV FPGAs. For example, at 400 MHz, the ALTMEMPHY read latency is 23 cycles while the UniPHY Stratix V FPGA PHY latency is expected to be 11 cycles. Other features like duty cycle correction, advanced calibration algorithms, and VT compensated deskew delays increase the operating margin for high data rates and high system reliability Our goal with Stratix V FPGAs was not only to increase the memory interface performance but also to make it easier to implement. A common pain point for many customers was the inability to easily share PLLs and DLLs. With the new UniPHY, Stratix V easily allows the sharing of PLLs and DLLs across multiple interfaces. Additionally, moving forward the UniPHY will be made available to customers as cleartext and provide easier debug and customization capabilities. Hard Soft Hard read/write path  Guaranteed timing closure at 800 MHz Soft I/O grouping and sequencer  Flexibility for supporting multiple configurations Re-architected UniPHY  Lower latency, shared resources (PLL, DLL) for multiple interfaces 15

16 UniPHY Benefits Universal PHY applicable to all families
Seamless replacement for ALTMEMPHY at PHY interface (AFI) UniPHYavailable starting QII 9.1 for QDR, RLDRAM, QII 10.0 for DDRx Enhanced Features Lower read latency ~ half that of ALTMEMPHY PLL/DLL instantiated at top level to support sharing across multiple interfaces More DIMM and Rank support Auto calibration Improve ease-of-use UniPHYavailable as cleartext(unencrypted) Niosbased calibration sequencer for easier debug Convenient application of timing and pin constraints Verilogtestbenchesfor understanding core Flexible timing models –provide transparency and higher accuracy

17 De-skew Calibration Utilizes Stratix III/IV FPGA dynamic trace compensation (programmable delay chains) to de-skew DQ data bus Provides extra margin at capture stage Track VT to maintain maximum data valid window (DVW) At this point – given all the Stratix III devices taped out – is the algorithm still under evaluation???? A

18 PHY Calibration At this point – given all the Stratix III devices taped out – is the algorithm still under evaluation???? A

19 PLL/DLL Sharing Building multiple memory interfaces can put a strain on scarce resources (e.g. PLL, DLL, global clocks) UniPHY supports convenient sharing of PLL & DLL PLL & DLL instantiated at top-level of PHY & Controller Can choose “PLL Master” or “PLL Slave” mode Interfaces must use same configuration PLL Master: PLL Slave:

20 UniPHY: Industry-leading Latency
Latency * (measured in full rate clock cycles) Protocol Half/Full Rate Controller (Addr/Cmd) PHY Memory (Max Read) (Read Return) Round Trip (less memory) RLDRAM II (x36) Full 2 † 2 8 5 17 9 Half 4 † 3 7 22 14 (7 HR) QDR II+ (x18) 1 2.5 5.5 11 8.5 4 7.5 16 13.5 (7 HR) DDR 2/3 (QII 10.0 estimate) 5 † DDR2: 5 DDR3: 11 DDR2: 17 DDR3: 23 12 10 † DDR2: 25 DDR3: 31 20 (10 HR) ALTMEMPHY (~DDR2) 3.5 DDR2: 5 10 DDR2: 23.5 18.5 18 DDR2: 41 36 (18 HR) † Best case shown; latency may be higher due to protocol requirements (tRC, bus turnaround, open/precharge) Memory Read Latency Options (by device/config) =============================== DDR3: 5-11 DDR2: 3,4,5 DDR: 2,2.5,3 RLDRAM: 3,4,5,6,8 QDR II: 1.5 QDR II+: 2, 2.5

21 Sequencer (Calibration) Architecture
Built as an SOPC system with AVALON components Processor + H/W accelerators NIOS performs the algorithmic side of calibration Faster development & better debug

22 IP That Calibrates, De-Skews, and Tracks to Eliminate PVT Variation
Calibration—removes process variation from FPGA and memory Sweep all resync phases for all DQ pins Build map: pin-by-pin basis Select best resync phase (Phase set dependant upon frequency) Reconfigurable PLL Swept resynchronization phase DQ Capture Resynch DQS Comparator Known training pattern Ideal resync phase: maximum setup and hold margin

23 VT Tracking and Compensation
Create one additional reference map (mimic path) Periodically keep sweeping this mimic path If mimic path map has moved (compared to reference), adjust resynch for DQ read path Adjust resynch if mimic path reference is moving I/O Structure I/O Structure PLL PLL Static timing analysis – small safe window Memory Re Re - - config config Memory Clock gen Clock gen Auto Cal Auto cal Mimic path Mimic path Dynamic tracking – large window

24 Notes on Reconfigurable PLL
Used during calibration stage and adjusted for voltage and temperature tracking No interruption of external memory interface operation when PLL reconfigured One PLL drives all clock signals required for interface Stratix III/IV PLLs have 7 to 10 outputs DDR uses 3 to 7 clocks; QDR uses 4 to 5 clocks Only one PLL required per interface Two required for >200 MHz in Stratix II FPGA

25 Calibrated Dynamic OCT available in Stratix and Arria FPGA
Cyclone has on chip serial termination Provides proper line termination and power savings Mixed termination values in same bank Dynamically turn ON and OFF parallel termination Saves significant power 1.6 watts over 72-bit DDR2 bus Properly terminates line for bidirectional busses Reduces costs Eases routing congestion Puts the memories closer Saves external component cost Dynamic OCT Read FPGA Memory Write Function Serial - Rs Parallel - Rt Dynamic Calibratation Value 2 25 / 50 default (20 to 60 w/ Ext R) 50 Turn Rt off during writes Rs and Rt Comment All banks +/- 5% Saves power (Also off during bus idle) PVT compensation ( requires external resistor) Single-ended termination 1 Stratix IV FPGAs also support on-chip differential termination (covered earlier) Final values and tolerances pending characterization

26 Programmable Control Per I/O
Controllable slew rate Four settings to match desired I/O standard, and control noise and overshoot Programmable output drive strength Match desired I/O standard Adjustable output buffer delay Separate from main I/O delay Deliberately add for skew to shift adjacent edges and reduce total number of simultaneous switch outputs (SSO) Independently control rise / fall times (i.e. adjust duty cycle) Settings depend on standard; SSTL18 example shown I/O standard mA SSTL18 Class I 4 6 8 10 12 SSTL18 Class II 16 Delay parameter Setting Units No delay ps 50 100 150 Rising edge delay Falling edge delay Both rising and falling edge delay

27 Variable Input and Output Delay for De-Skew
Hold - time requirements 400ps stepping , 7 settings 0.4ps ~ 2.8ns D Q T9 T10 T2 T3 T1 Q D Read calibration 50ps stepping , 16 settings Intrinsic delay ~ 750ps Fine tuning of DQ path delay 50ps stepping , 8 settings Intrinsic delay ~ 350ps Write calibration Reduce SSN and fine tune for write leveling 50ps stepping , 7 settings Intrinsic delay ~ 300ps Example automatic de-skew algorithm in DDR3 IP for centering data around DQS Set at compile time Path Run-time configurable Step size Set at compile Output buffer Total Input 1,100 ps 50 ps 2,800 ps 400 ps 3,900 ps Output 1,050 ps 150 ps 1,200 ps Resolution and absolute value pending characterization

28 Read Leveling Built Into I/O Bank for DDR3
Fastest data back Most delay required Resync Clk Represents resync-clock phase shifts—not I/O delays Not in the datapath PVT-compensated phase shifts Each DQS group has its own phase shift All output data across the bus can be aligned Individual DQ signals within a DQS group can be aligned with I/O delay elements Max phase delay Fly-by topology used for clk Mid phase delay Min phase delay Slowest data back Least delay required Original: These are not the I/O delays – i.e. they do not appear directly in the data path. These are phase shifts of the resync clock position (PVT compensated) which effectively block delay the DQ data in a given DQS group. Each DQS group has it’s own phase shift. Consequently all output data across the bus can be aligned. Individual dq signals within a dqs group can be aligned with I/O delay elements

29 Write Leveling Built Into I/O Bank for DDR3
Write clk DQS group 1 DQS group 0 Phase delay 0 Phase delay 1 8 8 DQS groups launched at separate times to coincide with clock arriving at devices on the DIMM

30 Stratix IV FPGA DDR3 at 1,067 Mbps
DDR3 across PVT available for 1,067 Mbps DDR3 speeds in lab now at 620 MHz (1,240 Gbps) Demonstration of capability and margin, not a commitment to productize Stratix V will support DDR3 at 1,600 Mbps Stratix IV FPGA DDR3 memory interface eye at 1,067 Mbps (533 MHz)

31 High Performance Memory Controller II (HPMCII)
HPMCII Architecture High Performance Memory Controller II (HPMCII)

32 External Memory IP MegaCore
Multi-Port Controller Memory IP Avalon MM AFI MPFE (Ref Design) Memory Controller (HPMCII) Memory PHY (UniPHY) Avalon MM External Memory QDR II, QDR II+, RLDRAM II/ III, DDR2/3 QDRII/QDRII+/RLDRAMIII/DDR2/3 High Performance Controller II (HPMCII) HPMCII : Higher bus efficiency with advanced bank management UniPHY: Lower latency Multi-Port Front End is available as a reference design Intelligent multi-class arbitration Effectively share bandwidth between several masters

33 Altera Memory Controller Solutions
Features HP Memory Controller II HP Memory Controller ECC with sub-word write Power management 5-cycle controller latency (6 w/ ECC) Support 800-MHz DDR3 memory X Advanced bank management w/ command look-ahead Flexible system interface Run-time programmable Multi-cast writes Now let’s look at the second part of the memory solution, the controller. We have launched a new controller with new features that enable our devices to achieve greater efficiency when processing data to and from memory, thus increasing the overall performance of the interface. Then I will do the following: Discuss each item in the table State the blue text as it is the value prop New Features Enable Better Controller Efficiency and Performance

34 RLDRAM-II & QDR-II/II+ Controllers w/ UniPHY
RLDRAM-II Controller Megacore Feature Description Performance Half Rate - Up to 400MHz Full Rate – Up to 300MHz Device Interfaces x9, x18, x36 devices Burst Lengths Full Rate: 2, 4, & 8 / Half Rate 4 & 8 Other Features Common I/O (CIO) Non-multiplexed addressing Avalon® Memory-Mapped (Avalon-MM) local interface QDR-II/II+ Controller Megacore Feature Description Performance Half Rate - Up to 400MHz fokr QDR II+ and 350 MHz for QDR II Full Rate – Up to 300MHz Device Interfaces x9, x18, x36 devices Burst Lengths Full Rate: 2 & 4 / Half Rate: 4 Other Features Avalon® Memory-Mapped (Avalon-MM) local interface

35 Memory Controller Technology Roadmap
DDR1/2/3 DDR12/3 DDR2/3 LPDDR/2 Higher Bandwidth MPFE Ref Design Multi-Port Controller Hard/Soft IP Data Reordering Performance 1T/2T Addr/Cmd Soft IP Priority Bypass Multi-cast Soft IP Adv Bank Mgt In-Order Cmd, R/W Run-time Reconfig HPMC I HPMC II Next-Generation Memory Controller Architecture

36 Memory Interface Enhancements
Hard Read/Write paths High resolution VT compensated delays Duty cycle correction Complete path tracking (memory + board + FPGA) Advanced calibration algorithms On-die, on-package decoupling 800 MHz 533 MHz Deskew with 50ps resolution 400 MHz Auto calibration To achieve the high 800-MHz DDR3 performance, the IP and hardware were designed to work together. First we start with the the auto calibration feature in the IP…that’s a good start but alone that does not get us to 800 MHz. Next we added the desckew with up to 50ps of resolution. Again, that keeps us moving toward the goal but it is not quite enough. So, finally, we add things like hard read / write paths in the HW, high resolution VT compensated delays, duty cycle correction, complete path tracking (Memory + Board + FPGA), advanced calibration algorithms, and on-die, on-package decoupling. All of these items, working together, get Stratix V to the outstanding 800-MHz DDR3 performance level. The 800-MHz DDR3 performance is only possible because advanced silicon works seamlessly with advanced memory IP. 0 MHz Advanced Silicon and IP Features Enabling Higher Performance! Memory Fmax

37 Stratix V Memory Performance
Memory Standard fMAX (MHz) DDR3 SDRAM 800 DDR2 SDRAM 533 RLDRAM II QDR II+ SRAM 550 QDR II SRAM 350 DDR2+ SRAM LPDDR2 SDRAM In the previous slide, we discussed how Stratix V working with the memory IP achieves 800-MHz DDR3. This approach is not limited to DDR3; it’s also used for all the other memory standards that Stratix V supports. I will then read the performance numbers from the slide These performances are available on all sides of the Stratix V die and over all PVT conditions Available on All Sides - Over PVT Conditions 37 37 37

38 Altera High Performance Memory Controller II
DDR1/2/3 SDRAM High Performance Memory Controller II (HPMC II) 2x the controller efficiency More features, increased throughput Features HP Memory Controller II HP Memory Controller MegaCore in QII 9.1 ECC with sub-word write Power Management Advanced bank management w/ command look-ahead Flexible system interface Run time programmable Multicast writes

39 Architecture Block Diagram High Performance Memory Controller II
Advanced Bank Management Flexible System Interface Run Time Programmability & Power Management

40 Advanced Bank Management
Look-ahead bank management Efficient bank interleaving support Issue activate and precharge commands early Use auto-precharge where possible In-order read/writes (no re-ordering) Per access open or close page policy Read/write accesses with auto-precharge Automatic cancellation of auto-precharge on page hits

41 Flexible System Interface
Avalon Memory Mapped Interface Adaptor for Native interface Avalon slave interface for access to CSR Burst size adaptation for efficient DRAM accesses Built-in burst adapter Combines short local transactions into memory bursts Split long local transactions into memory bursts Integrated low latency half rate system interface Support an optional half system interface speed Maintain the controller in the faster clock domain to reduce latency

42 Other Advanced Features
Run time programmable Timing parameters, configurations (row, col, bank, cs) and mode regiter settings ECC with sub-word writes 32+8 and 64+8 bits Multicast write to mitigate effects of tRC Write to multiple-ranks, read from any open rank Refresh timing control Programmable periodic refresh User requested auto-refresh Power Management User requested self-refresh Automatic entry / exit power down mode

43 Multi-Port Front-End (MPFE) Reference Design
Multi-class arbitration, sharing bandwidth between masters Time-critical accesses, Sharing bandwidth

44 Efficiency Improvement Techniques
Look-ahead Bank Management Look-ahead Auto-Precharge Transaction Combining

45 Existing DDR HP: No Look-ahead
Not Efficient!! Idle cmd bus Command Address Condition Read Bank 0 Activate required Bank 1 Precharge required Bank 2 Behavior of the existing DDR HP controller Commands are fetched and processed sequentially Both the command and the DQ bus are not fully utilized

46 v9.1 Controller: Look-ahead Bank Management
Use of idle cycles for bank-management Command Address Condition Read Bank 0 Activate required Bank 1 Precharge required Bank 2 Behavior of the v9.1 controller While waiting for tRCD (activate to read) to expire, bank management commands are issued to banks for read/write commands in the queue When each command reaches the front of the queue, the bank should be ready Look-ahead bank management happens during idle command cycles

47 Existing DDR HP: User Controls Auto-Precharge
PCH before next WR Command Bank Row Condition Write Bank 0 Row 0 Bank 1 Activate required Bank 2 Row 1 Precharge required Behavior of the existing DDR HP controller No look-ahead auto-precharge support On every row change, the controller will close and then open the row before the write or read burst Users can control auto-precharge on a per burst basis, but this is harder

48 v9.1 Controller: Look-ahead Auto-Precharge
WR with AP knowing that next WR to bank 0 is to a different row Command Bank Row Condition Write Bank 0 Row 0 Bank 1 Activate required Bank 2 Row 1 Precharge required Writes to the same bank, different row Look-ahead auto-precharge Controller decides whether to do auto-precharge read/write by looking ahead While doing write to bank 0, row 0, the controller issues an auto-precharge write Subsequent reads or writes to same bank/different row only require an activate This frees up valuable command bandwidth User still can control auto-precharge as in existing controller

49 Existing DDR HP: Multiple Single Transactions
Command Burst length Bank Row Column Write 2 Bank 0 Row 0 Col 0 Col 2 Col 4 Col 6 Wasted bandwidth Behavior of the existing DDR HP controller Four local requests to incrementing addresses (full-rate) Gap between writes limited by tCCD, but only one quarter of the DQ bandwidth is actually used

50 Efficiency Results: Read & Write
DDRx is ~60% more efficient than HP controller

51 High Performance Memory Controller Design with Altera MegaCore: DDR2

52 DDR2 High Performance Controller Design Flow
You can implement a DDR or DDR2 SDRAM High-Performance Controller MegaCore functions using either one of the following flows: ■ SOPC Builder flow ■ MegaWizard Plug-In Manager flow You can only instantiate the ALTMEMPHY megafunction using the MegaWizard Plug-In Manager flow.

53 DDR2 SDRAM High Performance Controller—Memory Settings
system and choose the frequency of operation for the device. Under General Settings, In the Memory Settings tab, you can select a particular memory device for your you can choose the device family, speed grade, and clock information. In the middle of the page (left-side), you can filter the available memory device listed on the right side of the Memory Presets dialog box, refer to Figure 3–1. If you cannot find the exact device that you are using, choose a device that has the closest specifications, then manually modify the parameters to match your actual device by clicking Modify parameters, next to the Selected memory preset field. Device family Targets device family (for example, Stratix III). The device family selected here must match the device family selected on MegaWizard page 2a. Speed grade Selects a particular speed grade of the device (for example, 2, 3, or 4 for the Stratix III device family). PLL reference clock frequency Determines the clock frequency of the external input clock to the PLL. Ensure that you use three decimal points if the frequency is not a round number (for example, MHz or 100 MHz) to avoid a functional simulation or a PLL locking problem. Memory clock frequency Determines the memory interface clock frequency. If you are operating a memory device below its maximum achievable frequency, ensure that you enter the actual frequency of operation rather than the maximum frequency achievable by the memory device. Also, ensure that you use three decimal points if the frequency is not a round number (for example, MHz or 400 MHz) to avoid a functional simulation or a PLL locking issue. Controller data rate Selects the data rate for the memory controller. Sets the frequency of the controller to equal to either the memory interface frequency (full-rate) or half of the memory interface frequency (half-rate). Enable half rate bridge This option is only available for HPC II. Turn on to keep the controller in the memory full clock domain while allowing the local side to run at half the memory clock speed, so that latency can be reduced. Local interface clock frequency Value that depends on the memory clock frequency and controller data rate, and whether or not you turn on the Enable Half Rate Bridge option. Local interface width Value that depends on the memory clock frequency and controller data rate, and whether or not

54 DDR2 SDRAM High Performance Controller—PHY Settings
outputs to drive Use dedicated PLL memory clocks HardCopy II and Stratix II (prototyping for HardCopy II) Turn on to use dedicated PLL outputs to generate the external memory clocks, which is required for HardCopy II ASICs and their Stratix II FPGA prototypes. When turned off, the DDIO output registers generate the clock outputs. When you use the DDIO output registers for the memory clock, both the memory clock and the DQS signals are well aligned and easily meets the tDQSS specification. However, when the dedicated clock outputs are for the memory clock, the memory clock and the DQS signals are not aligned properly and requires a positive phase offset from the PLL to align the signals together. Dedicated memory clock phase The required phase shift to align the CK/CK# signals with DQS/DQS# signals when using dedicated PLL outputs to drive memory clocks. Use differential DQS Arria II GX, Stratix III, and Stratix IV Enable this feature for better signal integrity. Recommended for operation at 333 MHz or higher. An option for DDR2 SDRAM only, as DDR SDRAM does not support differential DQSS. Enable external access to reconfigure PLL prior to calibration When enabling this option for Stratix II and HardCopy II devices, the inputs to the ALTPLL_RECONFIG megafunction are brought to the top level for debugging purposes. This option allows you to reconfigure the PLL before calibration to adjust, if necessary, the phase of the memory clock (mem_clk_2x) before the start of the calibration of the resynchronization clock on the read side. The calibration of the resynchronization clock on the read side depends on the phase of the memory clock on the write side. Instantiate DLL externally All supported device families, except for Cyclone III devices Use this option with Stratix III, Stratix IV, HardCopy III, or HardCopy IV devices, if you want to apply a non-standard phase shift to the DQS capture clock. The ALTMEMPHY DLL offsetting I/O can then be connected to the external DLL and the Offset Control Block. As Cyclone III devices do not have DLLs, this feature is not supported. Enable dynamic parallel on-chip termination Stratix III and Stratix IV This option provides I/O impedance matching and termination capabilities. The ALTMEMPHY megafunction enables parallel termination during reads and series termination during writes with this option checked. Only applicable for DDR and DDR2 SDRAM interfaces where DQ and DQS are bidirectional. Using the dynamic termination requires that you use the OCT calibration block, which may impose a restriction on your DQS/DQ pin placements depending on your RUP/RDN pin locations. Although DDR SDRAM does not support ODT, dynamic OCT is still supported in Altera FPGAs. For more information, refer to either the External Memory Interfaces in Stratix III Devices chapter in volume 1 of the Stratix III Device Handbook or the External Memory Interfaces in Stratix IV Devices chapter in volume 1 of the Stratix IV Device Handbook. Clock phase Arria II GX, Arria GX, Cyclone III, HardCopy II, Stratix II, and Stratix II GX Adjusting the address and command phase can improve the address and command setup and hold margins at the memory device to compensate for the propagation delays that vary with different loadings. You have a choice of 0°, 90°, 180°, and 270°, based on the rising and falling edge of the phy_clk and write_clk signals. In Stratix IV and Stratix III devices, the clock phase is set to dedicated. Dedicated clock phase Stratix III and Stratix IV When you use a dedicated PLL output for address and command, you can choose any legal PLL phase shift to improve setup and hold for the address and command signals. You can set this value to between 180° and 359°, the default is 240°. However, generally PHY timing requires a value of greater than 240° for half-rate designs and 270° for full-rate designs. Board skew All supported device families except Arria II GX and Stratix IV devices Maximum skew across any two memory interface signals for the whole interface from the FPGA to the memory (either a discrete memory device or a DIMM). This parameter includes all types of signals (data, strobe, clock, address, and command signals). You need to input the worst-case skew, whether it is within a DQS/DQ group, or across all groups, or across the address and command and clocks signals. This parameter generates the timing constraints in the .sdc file. Autocalibration simulation options families Choose between Full Calibration (long simulation time), Quick Calibration, or Skip Calibration. For more information, refer to the Simulation section in volume 4 of the External Memory Interface Handbook

55 DDR2 SDRAM High Performance Controller—Board Settings

56 DDR2 SDRAM High Performance Controller—Controller Settings

57 High Performance Memory Controller Design with Altera Megacore: QDRII

58 QDRII SRAM High Performance Controller—General Settings
Define the external RAM clock rate PHY interface type: full/half rate Address/command line clock phase IO standard Burst Length system and choose the frequency of operation for the device. Under General Settings, In the Memory Settings tab, you can select a particular memory device for your you can choose the device family, speed grade, and clock information. In the middle of the page (left-side), you can filter the available memory device listed on the right side of the Memory Presets dialog box, refer to Figure 3–1. If you cannot find the exact device that you are using, choose a device that has the closest specifications, then manually modify the parameters to match your actual device by clicking Modify parameters, next to the Selected memory preset field. Device family Targets device family (for example, Stratix III). The device family selected here must match the device family selected on MegaWizard page 2a. Speed grade Selects a particular speed grade of the device (for example, 2, 3, or 4 for the Stratix III device family). PLL reference clock frequency Determines the clock frequency of the external input clock to the PLL. Ensure that you use three decimal points if the frequency is not a round number (for example, MHz or 100 MHz) to avoid a functional simulation or a PLL locking problem. Memory clock frequency Determines the memory interface clock frequency. If you are operating a memory device below its maximum achievable frequency, ensure that you enter the actual frequency of operation rather than the maximum frequency achievable by the memory device. Also, ensure that you use three decimal points if the frequency is not a round number (for example, MHz or 400 MHz) to avoid a functional simulation or a PLL locking issue. Controller data rate Selects the data rate for the memory controller. Sets the frequency of the controller to equal to either the memory interface frequency (full-rate) or half of the memory interface frequency (half-rate). Enable half rate bridge This option is only available for HPC II. Turn on to keep the controller in the memory full clock domain while allowing the local side to run at half the memory clock speed, so that latency can be reduced. Local interface clock frequency Value that depends on the memory clock frequency and controller data rate, and whether or not you turn on the Enable Half Rate Bridge option. Local interface width Value that depends on the memory clock frequency and controller data rate, and whether or not

59 QDRII SRAM High Performance Controller- Memory Parameters
Define memory interface bus width system and choose the frequency of operation for the device. Under General Settings, In the Memory Settings tab, you can select a particular memory device for your you can choose the device family, speed grade, and clock information. In the middle of the page (left-side), you can filter the available memory device listed on the right side of the Memory Presets dialog box, refer to Figure 3–1. If you cannot find the exact device that you are using, choose a device that has the closest specifications, then manually modify the parameters to match your actual device by clicking Modify parameters, next to the Selected memory preset field. Device family Targets device family (for example, Stratix III). The device family selected here must match the device family selected on MegaWizard page 2a. Speed grade Selects a particular speed grade of the device (for example, 2, 3, or 4 for the Stratix III device family). PLL reference clock frequency Determines the clock frequency of the external input clock to the PLL. Ensure that you use three decimal points if the frequency is not a round number (for example, MHz or 100 MHz) to avoid a functional simulation or a PLL locking problem. Memory clock frequency Determines the memory interface clock frequency. If you are operating a memory device below its maximum achievable frequency, ensure that you enter the actual frequency of operation rather than the maximum frequency achievable by the memory device. Also, ensure that you use three decimal points if the frequency is not a round number (for example, MHz or 400 MHz) to avoid a functional simulation or a PLL locking issue. Controller data rate Selects the data rate for the memory controller. Sets the frequency of the controller to equal to either the memory interface frequency (full-rate) or half of the memory interface frequency (half-rate). Enable half rate bridge This option is only available for HPC II. Turn on to keep the controller in the memory full clock domain while allowing the local side to run at half the memory clock speed, so that latency can be reduced. Local interface clock frequency Value that depends on the memory clock frequency and controller data rate, and whether or not you turn on the Enable Half Rate Bridge option. Local interface width Value that depends on the memory clock frequency and controller data rate, and whether or not

60 QDRII SRAM High Performance Controller- Memory Timing
Define external memory timing parameter according to the external memory spec. system and choose the frequency of operation for the device. Under General Settings, In the Memory Settings tab, you can select a particular memory device for your you can choose the device family, speed grade, and clock information. In the middle of the page (left-side), you can filter the available memory device listed on the right side of the Memory Presets dialog box, refer to Figure 3–1. If you cannot find the exact device that you are using, choose a device that has the closest specifications, then manually modify the parameters to match your actual device by clicking Modify parameters, next to the Selected memory preset field. Device family Targets device family (for example, Stratix III). The device family selected here must match the device family selected on MegaWizard page 2a. Speed grade Selects a particular speed grade of the device (for example, 2, 3, or 4 for the Stratix III device family). PLL reference clock frequency Determines the clock frequency of the external input clock to the PLL. Ensure that you use three decimal points if the frequency is not a round number (for example, MHz or 100 MHz) to avoid a functional simulation or a PLL locking problem. Memory clock frequency Determines the memory interface clock frequency. If you are operating a memory device below its maximum achievable frequency, ensure that you enter the actual frequency of operation rather than the maximum frequency achievable by the memory device. Also, ensure that you use three decimal points if the frequency is not a round number (for example, MHz or 400 MHz) to avoid a functional simulation or a PLL locking issue. Controller data rate Selects the data rate for the memory controller. Sets the frequency of the controller to equal to either the memory interface frequency (full-rate) or half of the memory interface frequency (half-rate). Enable half rate bridge This option is only available for HPC II. Turn on to keep the controller in the memory full clock domain while allowing the local side to run at half the memory clock speed, so that latency can be reduced. Local interface clock frequency Value that depends on the memory clock frequency and controller data rate, and whether or not you turn on the Enable Half Rate Bridge option. Local interface width Value that depends on the memory clock frequency and controller data rate, and whether or not

61 QDRII SRAM High Performance Controller- Board Timing
Input the board timing inform The software will generate proper timing constraint based on the memory timing and board timing system and choose the frequency of operation for the device. Under General Settings, In the Memory Settings tab, you can select a particular memory device for your you can choose the device family, speed grade, and clock information. In the middle of the page (left-side), you can filter the available memory device listed on the right side of the Memory Presets dialog box, refer to Figure 3–1. If you cannot find the exact device that you are using, choose a device that has the closest specifications, then manually modify the parameters to match your actual device by clicking Modify parameters, next to the Selected memory preset field. Device family Targets device family (for example, Stratix III). The device family selected here must match the device family selected on MegaWizard page 2a. Speed grade Selects a particular speed grade of the device (for example, 2, 3, or 4 for the Stratix III device family). PLL reference clock frequency Determines the clock frequency of the external input clock to the PLL. Ensure that you use three decimal points if the frequency is not a round number (for example, MHz or 100 MHz) to avoid a functional simulation or a PLL locking problem. Memory clock frequency Determines the memory interface clock frequency. If you are operating a memory device below its maximum achievable frequency, ensure that you enter the actual frequency of operation rather than the maximum frequency achievable by the memory device. Also, ensure that you use three decimal points if the frequency is not a round number (for example, MHz or 400 MHz) to avoid a functional simulation or a PLL locking issue. Controller data rate Selects the data rate for the memory controller. Sets the frequency of the controller to equal to either the memory interface frequency (full-rate) or half of the memory interface frequency (half-rate). Enable half rate bridge This option is only available for HPC II. Turn on to keep the controller in the memory full clock domain while allowing the local side to run at half the memory clock speed, so that latency can be reduced. Local interface clock frequency Value that depends on the memory clock frequency and controller data rate, and whether or not you turn on the Enable Half Rate Bridge option. Local interface width Value that depends on the memory clock frequency and controller data rate, and whether or not

62 Functional Verification : Example Design TB
Memory IP (PHY & Controller): Cleartext RTL Standardized Interfaces (AFI, Avalon) Convenient timing & pin constraints Example Design: Generated with IP Matches user parameterization Includes PHY, Controller, Driver Example Driver (Traffic Generator): Issues parameterizable read & write traffic Synthesizable for boards and simulation Example Testbench: Integrates memory model with example design Provides base level of functional verification Memory Model Controller PHY Driver (Traffic Generator) Avalon PHY+ Controller Memory AFI Example Design Example Testbench Pass/Fail Pass Fail User defined stages (optional) Initialize Individual reads/writes Block reads/writes Sequential addresses Random Sequential/ Random (optional)

63 DDR Controller Demo with Stratix III Development Board

64 Summary Altera provides complete external Memory Controller solution to help you design fastest, robust DDR,QDR and RLDRAM memory interface UniPHY and High Performance Controller II deliver low latency and high efficiency which will largely improve DDR 2/3 DRAM interface performance MegaCore generator generates complete design file: RTL netlist, simulation bench, Timing constraint for easier use For more information, Please refer to

65 Thank You! For more information visit: www.altera.com
Thank you for viewing today’s program, Introducing 28-nm Stratix V FPGAs and HardCopy V ASICs: Built for Bandwidth brought to you by Altera Corporation. If you'd like to learn more about this topic, you can click on the links located under the “Additional Resources” section of your player. If you still have questions, please enter them in the question field located on your player and you will receive an response within 3 business days. I'm Bernhard Friebe and, on behalf of Altera Corporation, thank you for joining us today.


Download ppt "Enabling High System Performance with Advanced Silicon and Memory IP"

Similar presentations


Ads by Google