Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design and Analysis of a Robust Pipelined Memory System Hao Wang †, Haiquan (Chuck) Zhao *, Bill Lin †, and Jun (Jim) Xu * † University of California,

Similar presentations


Presentation on theme: "Design and Analysis of a Robust Pipelined Memory System Hao Wang †, Haiquan (Chuck) Zhao *, Bill Lin †, and Jun (Jim) Xu * † University of California,"— Presentation transcript:

1 Design and Analysis of a Robust Pipelined Memory System Hao Wang †, Haiquan (Chuck) Zhao *, Bill Lin †, and Jun (Jim) Xu * † University of California, San Diego * Georgia Institute of Technology Infocom 2010, San Diego

2 Memory Wall Modern Internet routers need to manage large amounts of packet- and flow-level data at line rates e.g., need to maintain per-flow records during a monitoring period, but – Core routers have millions of flows, translating to 100’s of megabytes of storage – 40 Gb/s OC-768 link, new packet can arrive every 8 ns 2

3 Memory Wall SRAM/DRAM dilemma SRAM: access latency typically between 5-15 ns (fast enough for 8 ns line rate) But the capacity of SRAMs is substantially inadequate in many cases: 4 MB largest typically (much less than 100’s of MBs needed) 3

4 Memory Wall DRAM provides inexpensive bulk storage But random access latency typically 50- 100 ns (much slower than 8 ns needed for 40 Gb/s line rate) Conventional wisdom is that DRAMs are not fast enough to keep up with ever-increasing line rates 4

5 Memory Design Wish List Line rate memory bandwidth (like SRAM) Inexpensive bulk storage (like DRAM) Predictable performance Robustness to adversarial access patterns 5

6 Modern DRAMs can be fast and cheap! – Graphics, video games, and HDTV – At commodity pricing, just $0.01/MB currently, $20 for 2GB! 6 Main Observation

7 Example: Rambus XDR Memory 16 internal banks 7

8 8 Memory Interleaving Performance achieved through memory interleaving – e.g. suppose we have B = 6 DRAM banks and access pattern is sequential – Effective memory bandwidth B times faster 1 7 13 :::: 2 8 14 :::: 3 9 15 :::: 4 10 16 :::: 5 11 17 :::: 6 12 18 :::: 123456 789101112 8

9 9 Memory Interleaving But, suppose access pattern is as follows: Memory bandwidth degrades to worst-case DRAM latency 1 7 13 :::: 2 8 14 :::: 3 9 15 :::: 4 10 16 :::: 5 11 17 :::: 6 12 18 :::: 1 7 13 19 25 9

10 10 Memory Interleaving One solution is to apply pseudo-randomization of memory locations :::: :::: :::: :::: :::: :::: 10

11 Adversarial Access Patterns However, memory bandwidth can still degrade to worst-case DRAM latency even with randomization: 1. Lookups to same global variable will trigger accesses to same memory bank 2. Attacker can flood packets with same TCP/IP header, triggering updates to the same memory location and memory bank, regardless of the randomization function. 11

12 Outline Problem and Background →Proposed Design Theoretical Analysis Evaluation 12

13 SRAM data addr data outop  W a RW b W c RR time  RRR acc SRAM Emulation data addr data outop  W a RW b W c RR time  RRR time  acc 13 Pipelined Memory Abstraction Emulates SRAM with Fixed Delay

14 Implications of Emulation Fixed pipeline delay: If a read operation is issued at time t to an emulated SRAM, the data is available from the memory controller at exactly t +  (instead of same cycle). Coherency: The read operations output the same results as an ideal SRAM system. 14

15 Proposed Solution: Basic Idea Keep SRAM reservation table of memory operations and data that occurred in last C cycles Avoid introducing new DRAM operation for memory references to same location within C cycles 15

16 Details of Memory Architecture request buffersDRAM banks reservation table random address permutation input operations data out B opaddrdata opaddrdata opaddrdata opaddrdata C …… … R-link p p p p MRI table (CAM) C MRW table (CAM) C 16

17 Merging of Operations Requests arrive from right to left. 1. 3. 4. 17 WRITEREAD +  WRITE … +  … read copies data from write 2 nd write overwrites 1 st write READ +  2 nd read copies data from 1 st read … READWRITE +  … READWRITE +

18 Proposed Solution Rigorously prove that with merging, worst-case delay for memory operation is bounded by some fixed  w.h.p. Provide pipelined memory abstraction in which operations issued at time t are completed at exactly t +  cycles later (instead of same cycle). Reservation table with C >  also used to implement the pipeline delay, as well as serving as a “cache”. 18

19 Outline Problem and Background Proposed Design →Theoretical Analysis Evaluation 19

20 Robustness At most one write operation in a request buffer every C cycles to a particular memory address. At most one read operation in a request buffer every C cycles to a particular memory address. At most one read operation followed by one write operation in a request buffer every C cycles to a particular address. 20

21 Theoretical Analysis Worst case analysis Convex ordering Large deviation theory Prove: with a cache of size C, the best an attacker can do is to send repetitive requests every C+1 cycles. 21

22 Bound on Overflow Probability Want to bound the probability that a request buffer overflows in n cycles is the number of updates to a bank during cycles [ s, t ],, K is the length of a request queue. For total overflow probability bound multiply by B. 22

23 Chernoff Inequality Since this is true for all θ>0, We want to find the update sequence that maximizes 23

24 Worst Case Request Patterns q 1 +q 2 +1 requests for distinct counters, q 1 requests repeat 2 T times each q 2 requests repeat 2 T-1 times each 1 request repeat r times each 24

25 Outline Problem and Background Proposed Design Theoretical Analysis →Evaluation 25

26 Evaluation Overflow probability for 16 million addresses, µ=1/10, and B=32. 26 SRAM 156 KB, CAM 24 KB

27 Evaluation Overflow probability for 16 million addresses, µ=1/10, and C=8000. 27

28 Proposed a robust memory architecture that provides throughput of SRAM with density of DRAM. Unlike conventional caching that have unpredictable hit/miss performance, our design guarantees w.h.p. a pipelined memory architecture abstraction that can support new memory operation every cycle with fixed pipeline delay. Convex ordering and large deviation theory to rigorously prove robustness under adversarial accesses. 28 Conclusion

29 Thank You


Download ppt "Design and Analysis of a Robust Pipelined Memory System Hao Wang †, Haiquan (Chuck) Zhao *, Bill Lin †, and Jun (Jim) Xu * † University of California,"

Similar presentations


Ads by Google