Presentation is loading. Please wait.

Presentation is loading. Please wait.

Register Files and Memories

Similar presentations


Presentation on theme: "Register Files and Memories"— Presentation transcript:

1 Register Files and Memories
ECE 554 Digital Engineering Laboratory C. R. Kime and M.J. Schulte and M. Lipasti Modified (2/12): Kewal K. Saluja 10/2/2007

2 Register Files and Memories
Issues and Objectives Register File Concepts Implementation of Register Files Workarounds For Xilinx FPGAs Bottom Line Memories Timing Issues Width Expansion ECE Digital Engineering Laboratory

3 ECE 554 - Digital Engineering Laboratory
Issues and Objectives Issues ECE 554 projects require a broad range of register file and memory configurations ECE 554 lab boards provide very limited structures for implementing register files and memories. Objectives: To develop techniques for implementing a broad range of register file and memory configurations by using available lab board structures ECE Digital Engineering Laboratory

4 Register File Concepts
Register File Ports Address Ports Data Ports Control Ports Register file environments Non-Pipelined Pipelined Register File Implementations ECE Digital Engineering Laboratory

5 ECE 554 - Digital Engineering Laboratory
Register File Ports Address Read Write Shared Data Input Output Bidirectional Control Write Enable, Read/Write, Enable, Read, Write, CLK ECE Digital Engineering Laboratory

6 Environment - Non-Pipelined
RAddr A RAddr B Rdata A Rdata B Wdata C WAddr C WEn ALU CLK Input Wdata C not registered outside of Register File Clock controls when Wdata C is written Inputs WEN and Waddr C may or may not be registered ECE Digital Engineering Laboratory

7 Environment - Pipelined 1
Register File is part of the pipeline platform Inputs may or may not be registered Clock controls when Wdata C is written RAddr A RAddr B Rdata A Rdata B Wdata C WAddr C WEn ALU CLK ... ECE Digital Engineering Laboratory

8 Environment - Pipelined 2
Raddr A Raddr B Rdata A Rdata B Wdata C Waddr C WEn ... CLK Register File is between pipe stages is not clocked - WEN controls latches => SRAM Inputs may or may not be registered, but register must be between Rdata A, Rdata B, and Wdata C ECE Digital Engineering Laboratory

9 ECE 554 - Digital Engineering Laboratory
Latch-Based Latch/bit of file Latch control can be Write Enable and addresses or some combination of other signals and addresses ... WEn Waddr Raddr Wdata Rdata Write Logic Read ECE Digital Engineering Laboratory

10 ECE 554 - Digital Engineering Laboratory
Latched-Based Level-sensitive write (assume positive level) Setup time on write address relative to leading edge of Wen Hold time on write address relative to trailing edge of Wen Setup and hold time on write data relative to trailing edge of Wen Latches cannot be in closed loop without: Additional latch on different clock in loop, or Flip-flop in loop ECE Digital Engineering Laboratory

11 Flip-flop (Latch Pair)-Based
Flip-flop/bit of file Flip-flop is clocked by CLK or some combination of CLK and other signal and enabled by addressing logic and combination of other signals ... WEn Waddr Raddr Wdata Rdata Write Logic Read CLK ECE Digital Engineering Laboratory

12 Flip-flop (Latch Pair)-Based
Write Logic adds setup-time to that for flip-flops Read Logic adds propagation delay to that for flip-flops Acts as edge triggered flip-flop register file with above delays added ECE Digital Engineering Laboratory

13 Flip-flop (Shared-Slave)-Based
Latch/bit of file plus latch/bit of output Master latches are clocked by CLK or some combination of CLK and other signal and enabled by addressing logic and combination of other signals; slave latches clocks by CLK WEn ... Waddr Write Logic ... Read Logic ... ... Rdata Wdata ... CLK ... Raddr CLK ECE Digital Engineering Laboratory

14 Flip-flop (Shared-Master)-Based
Latch/bit of file plus latch/bit of input Master latches are clocked by CLK some combination of CLK and other signal and enabled by addressing logic and combination of other signals; slave latches clocks by CLK WEn Waddr ... ... Write Logic ... Read Logic ... Rdata Wdata ... ... Raddr ... ... CLK CLK ECE Digital Engineering Laboratory

15 Implementation of Register Files
Custom VLSI SRAM Classic SRAM Xilinx Virtex-II Pro SRAM Specifications Shortcomings ECE Digital Engineering Laboratory

16 ECE 554 - Digital Engineering Laboratory
Custom VLSI SRAM Is the most flexible of all implementation techniques Can be used to implement any combination of variants discussed Latch-based straightforward; needs additional rank of latches to do flip-flop-based Short of performance issues due to capacitance, can implement any port configuration in a singe storage element array. ECE Digital Engineering Laboratory

17 ECE 554 - Digital Engineering Laboratory
Classic SRAM Has single RWaddr port, single Wdata port, and single Rdata port and is latch-based. Due to single address port, can handle only one R or W access per clock cycle Expansion to n R address/data ports Place n SRAMs in parallel with the write accomplished by: Applying same address to all RWaddr, and Wiring together all Wdata ports Expansion to m W address/data ports Add an m-way multiplexer to address port Use a clock that is m times CLK and multiplex the writes over m clocks ECE Digital Engineering Laboratory

18 Classic SRAM (Continued)
Addresses must be switched on positive clock edge WEn must be generated from negative clock edge and positive clock edge Expansion to m W address/data ports and n R address/data ports Doing both expansions above Using (m +1)-way multiplexer, and A clock that is (m + 1) times CLK Virtex Distributed SelectRAM The SRAM capability provided in CLBs Can be used with expansion methods here in classic asynchronous SRAM mode or some synchronous modes Getting reliable timing is tricky - may require more complex clocking! ECE Digital Engineering Laboratory

19 Distributed SelectRAM Resources
Uses a LUT in a slice as memory Synchronous write Asynchronous read Accompanying flip-flops can be used to create synchronous read RAM and ROM are initialized during configuration Data can be written to RAM after configuration Emulated dual-port RAM One read/write port One read-only port RAM16X1S D LUT WE WCLK A0 O A1 A2 A3 RAM32X1S RAM16X1D D D WE WE Slice WCLK WCLK A0 O A0 SPO LUT A1 A1 A2 A2 A3 A3 A4 DPRA0 DPO DPRA1 The table below lists the number of LUTs required to implement different sizes of RAM (S = single-port RAM, D = dual-port RAM). Memories that are deeper than 32 words will require additional logic for bank selection and output multiplexing. RAM Size # of LUTs 16 x 1S 16 x 1D 32 x 1S 32 x 1D 64 x 1S 64 x 1D 128 x 1S DPRA2 LUT DPRA3 ECE Digital Engineering Laboratory

20 Block SelectRAM Resources
Up to 3.5 Mb of RAM in 18-kb blocks Synchronous read and write True dual-port memory Each port has synchronous read and write capability Different clocks for each port Supports initial values Synchronous reset on output latches Supports parity bits One parity bit per eight data bits 18-kb block SelectRAM memory DIA DIPA ADDRA WEA ENA SSRA DOA CLKA DOPA DIB DIPB ADDRB WEB ENB Block SelectRAM™ resources are dedicated resources on the silicon. RAMs can be given an initial value. Many “initialization” attributes are associated with the block SelectRAM resources: INIT_xx: Numbered attributes (00 - 3F) that specify the initial memory data contents. Each INIT_xx attribute is a 64-digit hex number. INITP_xx: Numbered attributes ( ) that specify the initial memory parity contents. Each INITP_xx attribute is a 64-digit hex number. INIT_A/INIT_B: Specifies the initial value of the RAM output latches after configuration. SRVAL_A/SRVAL_B: Specifies the value of the RAM output latches after SSRA/SSRB is asserted. INIT and SRVAL attributes are specified as 1-hex numbers. For more information on RAM initialization, refer to the data sheet. SSRB DOB CLKB DOPB ECE Digital Engineering Laboratory

21 Dual-Port Block RAM Configurations
Configurations available on each port Independent configurations on ports A and B Supports data-width conversion, including parity bits Configuration Depth Data Bits Parity Bits 16k x 1 16 kb 1 8k x 2 8 kb 2 4k x 4 4 kb 4 2k x 9 2 kb 8 1k x 18 1 kb 16 512 x 36 512 32 IN 8 bit Port A: 8 bits Parity bits are stored in a separate address space that can only be accessed in the 2k x 9, 1k x 18, and 512 x 36 configurations. Example: If port A is configured to write 9-bit data and port B is configured to read serial data, the overall functionality is a parallel-to-serial conversion that strips the parity bits from the data. OUT 32 bit Port B: 32 bits ECE Digital Engineering Laboratory

22 Virtex Block SRAM Specifications
Functionality A WRITE operation of data DI to address ADDR occurs for WE = 1, EN = 1, RST = 0 and a positive edge on CLK. DI can also be read on DO after a delay. A READ operation from address ADDR occurs for WE = 0, EN = 1, RST = 0 and a positive edge on CLK. A SET/RESET operation occurs on the DO latches only for EN = 1, SSR = 1, and a positive edge on CLK ECE Digital Engineering Laboratory

23 Virtex Block SRAM Specifications
Functionality CLK, EN, WE, and RST can also be programmed to be active low Conflicts for Dual Port SRAM Simultaneous WRITEs to same location give invalid data A simultaneous READ on the alternate port of a location being written gives invalid READ data A READ on the alternate port of a location being written may not be performed until after a clock-to-clock setup window ECE Digital Engineering Laboratory

24 Virtex Block SRAM Specifications
Functionality - Timing EN, WE, RST, ADDR, DI are captured on the positive edge of CLK in registers WRITEs into the SRAM latch array occur later due to internal timing logic READs (including those associated with writes) occur later due to internal timing logic ECE Digital Engineering Laboratory

25 Virtex Block SRAM Shortcomings
Positive edge-triggered storage of inputs to SRAM places an implicit register in front of the SRAM Combinational READs with address changing, for example, on both the leading and trailing edge of clock, impossible (i.e. dual-porting with single clock) Feeding the SRAM array directly from combinational logic impossible Latching of outputs Combinational READs impossible ECE Digital Engineering Laboratory

26 Workarounds for Virtex FPGAs
READ-after-alternate port-WRITE READ port expansion Inter-operation address dependency removal WRITE port expansion Absorbing output latches ECE Digital Engineering Laboratory

27 Virtex Block SRAM Shortcomings
Using Dual Port Virtex Block SRAM with custom VLSI SRAM used as the standard for comparison On a single clock cycle: Maximum of two independent READ or WRITE operations Maximum of two READbacks of written value from WRITE operation on same port possible READback of written value from WRITE on alternate port not possible ECE Digital Engineering Laboratory

28 READ-after-alternate port-WRITE
Add bypass logic outside of Virtex Block SRAM: CLK Select P RAMB4_S#_S# 1 WEA ENA RSTA CLK CLKA DOA[#:0] ADDRA[#:0] DIA[#:0] WEB ENB = RSTB CLK CLKB DOB[#:0] ADDRB[#:0] DIB[#:0] 1 P Select CLK ECE Digital Engineering Laboratory

29 ECE 554 - Digital Engineering Laboratory
Read Port Expansion Expansion to n R address/data ports Place ceiling(n/2) SRAMs in parallel with the two writes accomplished by: Applying same address to all ADDRA and the same address to all ADDRB, and Wiring together all DIA ports and all DIB ports ECE Digital Engineering Laboratory

30 ECE 554 - Digital Engineering Laboratory
Read Port Expansion Example for n = 4 RAMB4_S#_S# ENA WEA ENA1 ENA RSTA WADDRA CLK CLKA DOA[#:0] ADDRA[#:0] RADDRA1 DIA[#:0] ENB WEB ENB1 ENB RSTB CLK WADDRB CLKB DOB[#:0] ADDRB[#:0] RADDRB1 DIB[#:0] DIA RAMB4_S#_S# DIB WEA ENA2 ENA RSTA CLK CLKA DOA[#:0] ADDRA[#:0] RADDRA2 DIA[#:0] Select for all A mux’s is WEA and all B mux’s is WEB All other like-named signals connected together WEB ENB2 ENB RSTB CLK CLKB DOB[#:0] ADDRB[#:0] RADDRB2 DIB[#:0] ECE Digital Engineering Laboratory

31 Inter-operation Address Dependency
READ-after-WRITE - Can be done for one WRITE - two READs with two parallel Dual Port Block SRAMs with READ-after-alternate port-WRITE logic added to READ side of both. Parallel WRITE on A ports Independent parallel READs on B-ports Each additional parallel Dual Port Block SRAM adds one more READ port Cannot accomplish WRITE-after-READ Cannot be done for more than one active WRITE port without using WRITE Port Expansion ECE Digital Engineering Laboratory

32 ECE 554 - Digital Engineering Laboratory
Write Port Expansion Requires “super-clocking,” in which a clock having a multiple of the frequency of the fundamental operational clock is used to serialize Block SRAM operations. Requires additional registers to locally enter into and return from serialized operations Muxes required that are switched by the a flip-flop driven by the faster clock ECE Digital Engineering Laboratory

33 ECE 554 - Digital Engineering Laboratory
Write Port Expansion Example - Non-Pipelined - 4 WRITE Max ports Pi1 Pj Pi -1 2CLK RAMB4_S#_S# WEA ENA 2CLK RSTA 2CLK CLKA DOA[#:0] 2CLK ADDRA[#:0] DIA[#:0] WEB ENB RSTB CLKB DOB[#:0] 2CLK 2CLK ADDRB[#:0] DIB[#:0] Pi2 CLK CLK ECE Digital Engineering Laboratory

34 Absorbing Output Latches
The output latch is a part of the attempt at a “flip-flop” appearance for the SRAM operation. As such, there appears to be no way to explicitly work around it Other workarounds handle its effects ECE Digital Engineering Laboratory

35 ECE 554 - Digital Engineering Laboratory
The Bottom Line Overall, it appears that the best approach is to: Use a Non-Pipelined or Pipeline 1 structure Use the Interoperation Dependency solution to achieve multiple dependency-free READs Use WRITE Port Expansion for multiple WRITEs Use the READ-after-alternate port-WRITE to get READ-after-WRITE capability Use WRITE Port Expansion with READs on early subcycles to get WRITE-after-READ capability Be cognizant of substantial setup times and delays for the synchronous operations Feel free to experiment with other approaches and apply ideas given to other Virtex Block SRAM uses ECE Digital Engineering Laboratory

36 ECE 554 - Digital Engineering Laboratory
Memories Timing Issues Width Expansion ECE Digital Engineering Laboratory

37 ECE 554 - Digital Engineering Laboratory
Timing Issues The off-chip SRAMs are asynchronous and have typical signal timing requirements See AS7C4096 Datasheets for timing parameters Address controlled READ is easy WE-controlled WRITE has zero setup and hold times which look easy, but read on Due to unpredictable FPGA timing, timing of memory signals, particularly for WRITE should be verified. In worst case, may need to use “super clocking” to get reliable timing ECE Digital Engineering Laboratory

38 ECE 554 - Digital Engineering Laboratory
Width Expansion Width expansion can be achieved by using “super clocking” with implementation similar to that for register file write expansion. To expand a 16-bit word to a 16 n bit word requires “super clocking” at n times the fundamental rate. ECE Digital Engineering Laboratory

39 ECE 554 - Digital Engineering Laboratory
Width Expansion Implementation For address-controlled READs, straight-forward Challenging for WRITEs: Must be trailing edges on, for example, WE, for each of the super clock cycles This will require changes on negative as well as positive super clock edges ECE Digital Engineering Laboratory

40 ECE 554 - Digital Engineering Laboratory
Postscript The workarounds do not consider: Multiple clock edge use instead of super-clocking Different clock edges on the two ports on a dual port SelectRAM These techniques can potentially be beneficial to the degree that: the resulting constructs are synthesizable, and do not adversely affect performance ECE Digital Engineering Laboratory

41 ECE 554 - Digital Engineering Laboratory
Clocking Issues Limits of hardware and synthesis tools constrain clock and reset signal use Known problems and their workarounds Generating reset signals Limitations of the global clock routing Using enable signals instead of clocks ECE Digital Engineering Laboratory

42 ECE 554 - Digital Engineering Laboratory
Reset Signals Generate reset signals only once, in top level module Pass signal down to lower level modules, but do not change it Both synchronous and asynchronous reset are OK, but be consistent. ECE Digital Engineering Laboratory

43 ECE 554 - Digital Engineering Laboratory
Reset Signals (cont.) Top level module Reset Internal module Logic Reset state Reset state Reset state ECE Digital Engineering Laboratory

44 ECE 554 - Digital Engineering Laboratory
Clocking Limitations Signals on the global clock routing cannot be used as inputs to combinational logic. Using a clock signal as a combinational input will pull that clock signal off the global routing, severely impacting performance. ECE Digital Engineering Laboratory

45 ECE 554 - Digital Engineering Laboratory
Clocking Examples BAD: Signal_1 <= Signal_2 & clk; Gated_clk <= Signal & clk; Neither clk or gated_clk will be on the global clock routing GOOD: Use enable signals instead of clocks Don’t use clock gating ECE Digital Engineering Laboratory

46 ECE 554 - Digital Engineering Laboratory
Enable Vs. Clock Use enables instead of clocks: Reg enable; (posedge clk or negedge clk) begin enable <= ~enable; end Same waveform, but only uses clock as input to flip-flop – doesn’t pull it off global clock routing. ECE Digital Engineering Laboratory


Download ppt "Register Files and Memories"

Similar presentations


Ads by Google