Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rev PA1 1 Exascale Node Model Following-up May 20th DMD discussion Updated, June 13th Sébastien Rumley, Robert Hendry, Dave Resnick, Anthony Lentine.

Similar presentations


Presentation on theme: "Rev PA1 1 Exascale Node Model Following-up May 20th DMD discussion Updated, June 13th Sébastien Rumley, Robert Hendry, Dave Resnick, Anthony Lentine."— Presentation transcript:

1 Rev PA1 1 Exascale Node Model Following-up May 20th DMD discussion Updated, June 13th Sébastien Rumley, Robert Hendry, Dave Resnick, Anthony Lentine

2 Rev PA1 2 Exascale compute node architecture Node organized around a single computing chip –Interconnecting multiple sockets to inflate node computing power might be counterproductive Too much inefficient data-movement between the sockets –In 2018, transistor scaling will allow many CPUs to be regrouped 100 CPUs (each capable of 100 Gflop/s) on the same chip seems realistic Translate into a 10Tflop/s node (so 100,000 nodes required – similar to Sequoia) Computing chip integrates a massive NoC interconnecting the nodes, the memory interface(s), the network interface(s) (and a few other components) CPUs IOs MEM Other nodes

3 Rev PA1 3 Compute node "off-chip" architecture Desirable memory bandwidth: 0.5 byte per flop (can live with 0.25 byte per flop as first) –Memory bandwidth: 2.5 TB/s – 5 TB/s (20 Tb/s – 40 Tb/s) Desirable memory capacity: 0.5 bytes/flop (+ 2 – 5 bytes/flop NV) –DRAM memory size: 5 TB  320 HMCs with 16GB (or 40 HMCs with 128GB)* –NV memory size: 20 – 50TB Memory channels likely to show high utilization (~80%) Only several bytes per transaction Computing chip Memory system 20 Tb/s – 40 Tb/s 25 (5 + 20) – 50 (5 + 50) TB HMCs NV * 128 GB HMCs is a possible evolution of the technology $ Cache (size depends on memory parameters)

4 Rev PA1 4 HMC based memory system HMC forms a network Does the computing chip/ HMC link supports 20 Tb/s? –If yes, the maximal RAM capacity is determined by the maximal depth Currently 8  128 GB (or 1TB with bigger HMCs)  insufficient!  Multiple “access lanes” required Fan-out limited by the pin count (for electrical links) Fan-out also limited by the internal NoC and computing node architecture (too many interfaces consumes and occupy area) Computing chip HMC Maximum depth? HMC Maximum fan-out? Computing chip Heaviest load

5 Rev PA1 5 Optical links between computing chip and HMCs First segment only First segments (with P2P links) First segments (bus type) All segments Dedicated Hybrid Computing chip HMC Computing chip HMC Computing chip HMC Computing chip Will generally depend on the cost of using multiple interfaces within the computing chip (2-5 is probably okay but 20-30 is perhaps too much), on the traffic, on the bandwidth. A nice space to explore…


Download ppt "Rev PA1 1 Exascale Node Model Following-up May 20th DMD discussion Updated, June 13th Sébastien Rumley, Robert Hendry, Dave Resnick, Anthony Lentine."

Similar presentations


Ads by Google