Download presentation
Presentation is loading. Please wait.
Published byPinar Gökçe Modified over 6 years ago
1
University Workshop: Introduction to Internal & External FPGA Memory
January 2019
2
Objectives Understand basics of RAM memory
SRAM vs. DRAM vs. SDRAM SDRAM evolution Understand basics of FPGA on-chip RAM memory MLAB, M9K, M20K, and eSRAM Understand basics of SDRAM memory How is the memory organized, how does it operate? Understand memory interface components What is the PHY, controller, front-end? What IPs are offered?
3
Memories and Storage Major Components in Computer
4
Computer System Memory Hierarchy
Cost $10 / MByte $10 / GByte $100 / TByte
5
Static Random-Access Memory (SRAM) vs
Static Random-Access Memory (SRAM) vs. Dynamic Random-Access Memory (DRAM) Built from 1T Less expensive and higher in density Bits stored as charge on node capacitance Bit cells loses charge over time and when read Must be periodically refreshed to retain charge Typically used as mass main/system memory Built from 6T (transistors) or 8T More expensive and lower in density Bits stored by inverter pair Bit lines driven by transistors Faster response Typically used as local memory (cache) Stores lookup tables for applications due to faster access times
6
FPGA On-Chip Memory Basics
7
Stratix 10 FPGA Memory Hierarchy Building Blocks
CRAM FF MLUT M20K Fast local storage Local CC/MC FIFOs (variable width, depth) Variable sized buffers Specialized Storage Distributed Storage eSRAM Fast-path/low-latency control Memory Management Wide/Deep FIFOs, video line buffers Fixed Program/Data Storage DDRx High-Capacity storage 1G – 200G Wireline Packet Buffering Processor code and data storage Video frame storage QDR/RLDRAM Fast-path/Low-latency storage Memory Management Statistics HBM Medium-Capacity High-BW storage 200G - 2Tbit Wireline Packet Buffering Processor code and data storage Video frame storage Focus on O On-Chip In-Package On-Board
8
Multi-ported SRAM Memory
Single-port, dual-port, n-port Number of ports specifies the number of address ports Associated with the number of ports is the number of read and write data ports <address ports><read ports><write ports> General Term ASIC Terminology Intel FPGA Terminology Comments Single Port 1RW Dual Port 1R1W Simple Dual Port Used for FIFOs 2RW True Dual Port Shared Memory Triple Port 2R1W Not Available Network type applications ROM 1R Read Only
9
Intel FPGA RAM Structures: Native Block Sizes
MAX 10 (smallest low cost parts): M9K (9x1024 total bits) Stratix V, Arria 10, Stratix 10 (Highest level of integration FPGAs): M20K (20x1024 bits) MLAB: Memories built from Lookup tables Quartus fitter groups multiple blocks to create larger memories FPGA fabric wrapper can make deeper and wider memories by grouping memories
10
Example: Dual Port RAM Megawizard
11
Example: Byte Enable Functional Waveform
Write data with byte enable (active high) and then data read from memory
12
Quartus Chip Planner – RAM blocks
13
SDRAM Memory Basics (DDR3 as an example)
14
SDRAM vs. DDR SDRAM SDRAM = Synchronous Dynamic Random Access Memory
Synchronized with the system bus that can run at much higher clock speeds Pipelining instructions for better efficiency DDR SDRAM = Double Data Rate SDRAM Data is captured at both rising and falling clock edges Single Data Rate SDRAM Double Data Rate SDRAM
15
SDRAM Evolution Type Name Min. Clock Rate Bandwidth
I/O Standard (Volts) Benefits SDRAM Synchronous Dynamic RAM 133 MHz 1.1 GBps LVTTL (3.3V) Synchronized to system clock DDR1 SDRAM Double data rate 1 SDRAM 266 MHz (x2) 4.2 GBps SSTL_2 (2.5V) Greater bandwidth (transferring data on both rising and falling clock edges) DDR2 SDRAM Double data rate 2 SDRAM 533 MHz (x2) 8.5 GBps SSTL_18 (1.8V) 2x faster vs. DDR. Improved I/O bus signal. DDR3 SDRAM Double data rate 3 SDRAM 800 MHz (x2) 12.8 GBps SSTL_15 (1.5V) 40% less power vs. DDR2. DDR4 SDRAM Double data rate 4 SDRAM 1600 MHz (x2) 25.6 GBps SSTL_12 / POD (1.2V) Better efficiency by 4 new bank groups. Each bank group can operate singlehanded => process 4 data within a clock cycle.
16
External Memory Terminology
Description Use Vendors DDR3, DDR4 Double Data Rate DRAM Main system memory Samsung, Micron, SK hynix Hybrid Memory Cube (HMC) Serial DRAM Micron High Bandwidth Memory (HBM) In-package (2.5D) DRAM Samsung, SK hynix QDR II, QDR IV Quad Data Rate SRAM Networking control plane memory Cypress, GSI, ISSI RLDRAM3 Reduced Latency DRAM Networking control plane table lookups Micron, Renesas Non-volatile Flash NAND: higher capacity, sequential access storage Samsung, Micron, SK hynix, Toshiba, etc NOR: faster, random access FPGA configuration Cypress, Samsung, Micron, etc Non-volatile 3D XPoint Emerging Storage class memory Intel, Micron Note: This section is focused on DDR3 as an example. The other protocols are not discussed.
17
External Memory InterFace (EMIF) Subsystem
FPGA, CPU, or SOC
18
DRAM Modules – Overview
DRAM chips have narrow data widths Typical DRAM chip data widths are x4, x8 and x16. DRAM modules are a collection of DRAM chips cascaded to form wider data widths Typically referred to as Dual In-line Memory Module (DIMM). Shares command, control, address lines but not the data strobe and data. Modules have notches in different spaces along the fingers to differentiate different DRAM types. Contains Serial Presence Detect (SPD) EEPROM – stores information about the module type for the memory controller to configure the memory correctly. Example: 8 DRAM chips of x8 forms a 64-bit DIMM. Pros: Provides high capacity DRAM chip with a wide data width. Cons: All accesses must be to the data width provided (i.e. loss of lower granularity accesses).
19
DDR3 Memory Organization
Each column is used to store one data word Each read/write transfer consists of 8 adjacent words Each row consist of multiple columns Active row is called page Each bank consist of multiple rows Each component consist of multiple banks COL n COL n+1 COL n+2 COL n+3 COL n+4 COL n+5 COL n+6 COL n+7 BANK z COL BANK 1 ROW COL Column x Column 0 ROW 0 BANK 0 ROW y While the IO timing parameters for SDRAM memories are similar to other memories like SRAM, the memory organization is very different. Unlike SRAMs that use 6 transistors to store a single bit of information, SDRAM devices like DDR3 using a single transistor plus capacitor to store the same bit of information. The benefit of this implementation is density and cost… and these factors lead to increased interest in DRAMs in our customer base. These density/cost benefits are achieved at the expense of memory controller complexity. Let us examine the organization of SDRAM to understand the controller requirements. In DDR3, each burst contains 8 beats of data… in other words, typical read write transactions to DDR3 have a burst length of 8. COL Column 0 Column x
20
To write/read to a specific row and column address in a bank:
DDR3 Memory Operation To write/read to a specific row and column address in a bank: Issue activate to “open” desired row address Issue write/read to desired column address Issue precharge to “close” an opened row (RC: to precharge sense amp to be ready for next row) Activate and precharge also referred as row command Write and read also referred as column command Each bank can be accessed independently To preserve memory contents Issue refresh commands every 7.8µs on average A DDR SDRAM device can have a number of banks open at once. Each bank has a currently selected row. Changing the column within the selected row of an open bank requires no additional bank management commands to be issued. Changing the row in an active bank, or changing the bank both incur a protocol penalty that requires the precharge (PCH) command closes the active row or bank, and the active (ACT) command thenopens or activates the new row or bank combination. The duration of this penalty is a function of the controller clock frequency, the memory clock frequency, and the memory device characteristics. Calculating the impact of a change of memory and controller configuration on a given system is not a trivial task, as it depends on the nature of the accesses that are performed.
21
Example: Reading from DDR3
Read operation sequence Activate row (page) containing data Issue read command after tRCD Data available tCL clock cycles later Single read requires 18 clock cycles Consider a 533-MHz device CAS Latency = 7 cycles, tRCD = 7 cycles 4 memory clock cycles to complete burst length 8 transfer No additional delay on back to back read commands to same row
22
Efficiency Efficiency measures data bus utilization
From previous example: Efficiency of single read = (4 cycles of data / 18 cycles) = 22% Efficiency of two reads to same page = (8 / 22) = 36% Efficiency of reading full page = (128x4) / ((128x4) + (18-4)) = (512 / 526) = 97% Note: 128 columns in page (row) Summary Efficiency dependent on user traffic patterns
23
Addressing from Programmer View to SDRAM
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Programmer’s view: 32 bit integer Decodes SDRAM space Mapped through bus protocol 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 B1 Row address B0 Column address SDRAM Controller: Issues Commands (Activate, etc), Row addr, column addr 1, column 2, etc
24
Comparing Efficiency of On-Chip FPGA RAM to SDRAM
Differences in cost and efficiency drive use of memory hierarchy
25
External Memory Interface IP Solution
26
Memory Interface Layers
FPGA Front End Avalon Arbiter cmd6 cmd5 cmd4 Avalon-ST/MM AXI Input Adaptor cmd7 Command Queues Controller Avalon cmd6 cmd1 cmd5 cmd4 cmd3 cmd2 cmd0 Scheduler/Arbitrator DDR Command Generator (Burst Adaptor) TBP Command Pool Avalon-ST/MM AXI Input Adaptor cmd7 Command Queue Command Queue Command Ordering Logic PHY Clocking Address / Command Calibration Sequencer DDIO Data Path FIFO I/O Buffers AFI PCB
27
Memory Interface Layers (cont.)
PHY Physical interface between the FPGA and DDR3 devices Handles I/O timing requirements imposed by memory device Implemented in FPGA periphery using dedicated circuits IOE registers, DQS clock trees, DLL, PLL, OCT, delay chains, etc. Controller Interfaces between the PHY and user logic using Avalon-MM Handles DRAM bank management and command sequence Implemented in FPGA core fabric Front-end Shares memory interface bandwidth between multiple masters Arbitration handled by SOPC/Qsys or custom MPFE component
28
DDR3 (HPC II) Controller with UniPHY
Fully parameterizable IP Specify memory & board parameters Parameterize PHY & controller settings Generate Comprehensive IP solution Clear-text RTL SDC timing constraints I/O logic assignments Example design with traffic generator Simulation testbench, BFM
29
Driver (Traffic Generator) or User Logic
IP Generation Output Memory Model HPC II Controller UniPHY Driver (Traffic Generator) or User Logic Avalon Memory IP Memory AFI Example Design Example Testbench Pass/Fail
30
Avalon-MM Slave Interface
Avalon write request AFI Controller activates row Controller issues write MEMORY Replace with DDR3 modelsim waveform… Max burst count = 64 Avalon data bus is 2* or 4* the memory data bus, full-rate and half-rate respectively. Local address mapping For multiple chip selects: width = chip bits + row bits + bank bits + column – N ■ For single chip select: width = row bits + bank bits + column – N Where N = 1 for full-rate controller and 2 for half-rate controller. PHY sends write command PHY sends write data
31
Advanced Features: Refresh, Power
Refresh timing control Automatic periodic refresh User programmable refresh interval User controlled refresh Refresh fully controlled by user Power management Self refresh mode Memory does not need clock input Lowest power mode SDRAM DLL disabled, requires longer wake up time Power down mode Memory clock must be maintained Lower power mode, but faster wake up time Automatic entry on idle and exit on activity
32
Advanced Features: ECC
Error detection and correction 32+8 and 64+8 data bus widths Data encoded during writes Read data decoded & checked Single-bit errors detected & corrected Double-bit errors detected Error & interrupt signals generated Configuration / status register Stores error counts and most recent error address Can mask errors, inject errors Partial writes using read-modify-write Memory contents read from partial write address, decoded & checked Data merged with partial write, encoded, & written to memory
33
Thank you
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.