Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virtually Pipelined Network Memory

Similar presentations


Presentation on theme: "Virtually Pipelined Network Memory"— Presentation transcript:

1 Virtually Pipelined Network Memory
Banit Agrawal Tim Sherwood UC SANTA BARBARA

2 Packet classification rules
Memory Design is Hard Increasing functionalities Increasing size of data structures Increasing line rate Throughput in the worst case Need to service the traffic at the advertised rate IPv4 routing table size 100k 200k 360k Packet classification rules 2000 5000 10000 10 Gbps 40 Gbps 160 Gbps Banit Agrawal /14/2018

3 What programmers think ?
Low cost Low power High capacity High bandwidth incase of some access patterns Network Programmers Network System Memory DRAM What is the problem? Banit Agrawal /14/2018

4 Worst case : Every access conflicts
DRAM Bank Conflicts Variable latency Variable throughput bank 0 bank 1 Busy bank 2 bank 3 Busy row decoder row decoder row decoder row decoder sense amplifier sense amplifier sense amplifier sense amplifier column decoder column decoder column decoder column decoder DRAM macro latency, Bank interleaving, dram accesses, bank conflicts, reduction in efficiency address bus data bus Worst case : Every access conflicts Banit Agrawal /14/2018

5 Prior Work Reducing bank-conflicts in common access patterns
Prefetching and memory-aware layout [Lin-HPCA’01, Mathew-HPCA’00] Reordering of requests [Hong-HPCA’99, Rixner-ISCA’00] Vector processing domain [Espasa-Micro’97] Good for desktop computing No guarantees for the worst case Reducing bank conflicts for special access patterns Packet buffering : written once write and read once Low bank conflicts - Optimizations including row-locality and scheduling [Hasan-ISCA’03, Nikologiannis-ICC’01] No bank conflicts - Reordering and clever memory management algorithms [Garcia-Micro’03, Iyer-StanTechReport’02] Not applicable in any access patterns Banit Agrawal /14/2018

6 Where network system stands ?
0% deadline failures Full determinism required No exploitable deadline failures Common-case optimized parts Best effort (co-operative) Common-case optimized Parts Banit Agrawal /14/2018

7 Virtually Pipelined Memory
Normalize the overall latency Using randomization and buffering Deterministic latency for all accesses Trillions of accesses without any bank conflicts Even in case of any access patterns t Memory Controller DRAM t + D Banit Agrawal /14/2018

8 Outline Memory for networking systems Memory controller
Design analysis Hardware design How we compare? Conclusion Banit Agrawal /14/2018

9 Memory Controller t t + D Bank 0 controller Bank 0 key
HU 5 → 2,A 6 → 0,F 7 → 2,B 8 → 3,A t Bank 2 controller Bus Scheduler Bank 2 R address t + D Bank 3 controller data Bank 3

10 Non-conflicting Accesses
Bank latency (L) – 15 cycles Normalized delay (D) – 30 cycles 10 20 30 40 50 60 70 80 requests A B C data ready A B C Repeated requests Banit Agrawal /14/2018

11 Redundant Accesses Conflicting requests
Bank latency (L) – 15 cycles Normalized delay (D) – 30 cycles 10 20 30 40 50 60 70 80 requests A B A A B data ready A B A A B Conflicting requests Banit Agrawal /14/2018

12 Conflicting Accesses Bank latency (L) – 15 cycles Normalized delay (D) – 30 cycles 10 20 30 40 50 60 70 80 requests A B C D E Stall data ready A B C D E Banit Agrawal /14/2018

13 Implementing Virtual Pipelined Banks
Delay Storage Buffer v address incr/decr ++ data words first zero Delay Storage Buffer Bank Access Queue Bank Access Queue r/w row id scheduled-access address row scheduled-access data addr data Write Buffer address data words Set 1 Set 0 out ptr access[t-3] access[t] access[t-d+1] access[t-2] access[t-d] access[t-d+2] in ptr Circular Delay Buffer Circular Delay Buffer Control Logic Control Logic to memory Write Buffer (FIFO) Interface address Interface data

14 Implementing Virtual Pipelined Banks
Delay Storage Buffer v address incr/decr ++ data words first zero Delay Storage Buffer Bank Access Queue Bank Access Queue r/w row id scheduled-access address row scheduled-access data addr data Write Buffer address data words Set 1 Set 0 out ptr access[t-3] access[t] access[t-d+1] access[t-2] access[t-d] access[t-d+2] in ptr Circular Delay Buffer Circular Delay Buffer Control Logic Control Logic to memory Write Buffer (FIFO) Interface address Interface data

15 Delay Storage Buffer Stall
Mean-time to stall (MTS) B – number of banks, 1/B is the probability of a request Stall happens when there are more than k accesses in interval D An Illustration Normalized latency (D) cycles Number of entries in the delay storage buffer (K) - 3 Banit Agrawal /14/2018

16 Delay Storage Buffer Stall
requests A B C D E F 10 20 30 40 50 60 70 80 data ready A B C D E F MTS = log ( ) D - 1 K - 1 1 2 log (1 – ( ( ) * ) K-1 )) B + D Banit Agrawal /14/2018

17 MTS = probability of stall state becomes 0.5
Markovian Analysis Bank access queue stall State-based analysis Number of banks (B) - 1/B is the probability of an access to a bank If more than D cycles of work to be done, a stall occurs. An example: Bank access latency (L) = 3 Normalized delay (D) = 6 1 1 B 1 B 1 B stall idle 1 2 3 4 5 6 1- 1 B 1- 1 B MTS = probability of stall state becomes 0.5 Banit Agrawal /14/2018

18 Markovian Analysis P = I M n Find n s.t. P=50%
Banit Agrawal /14/2018

19 Hardware Design and Overhead
Verilog implementation Verification using ModelSim and C++ simulation model Synthesizing using Synopsys Design Compiler Hardware overhead tool Using Cacti parameters Verify one with the synthesized design Optimal design parameters using this tool 45.7 seconds MTS with area overhead of 34.1 mm2 at 77% efficiency 10 hours MTS with area overhead of 34 mm2 at 71.4% efficiency Banit Agrawal /14/2018

20 How VPNM performs ? Packet buffering Packet reassembly 35% less area
Only need to store the head and tail pointers Can support arbitrarily large number of logical queues Packet reassembly Scheme Line rate (Gbps) Area (mm2) Total delay (ns) Supported interfaces RADS [17] 40 10 53 130 CFDS [12] 160 60 10000 850 Our approach 41.9 960 4096 35% less area 10x less latency 5x more queues Banit Agrawal /14/2018

21 Conclusion VPNM provides t Higher throughput Memory DRAM Controller
Deterministic latency Randomization and normalization Higher throughput worst case that is impossible to exploit Handles any access patterns Ease of programmability/mapping Packet buffering Packet reassembly t Memory Controller DRAM t + D Banit Agrawal /14/2018

22 Thanks for your attention.
Questions?? Banit Agrawal /14/2018


Download ppt "Virtually Pipelined Network Memory"

Similar presentations


Ads by Google