Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cray Inc. Hot Interconnects 1 Bob Alverson, Duncan Roweth, Larry Kaplan Cray Inc.

Similar presentations


Presentation on theme: "Cray Inc. Hot Interconnects 1 Bob Alverson, Duncan Roweth, Larry Kaplan Cray Inc."— Presentation transcript:

1 Cray Inc. Hot Interconnects 1 Bob Alverson, Duncan Roweth, Larry Kaplan Cray Inc.

2 Overview Network Interface Router Reliability, Availability, and Serviceability Features Software Stack Performance Cray Inc. Hot Interconnects 2

3 Integrated NIC and Router External HSS Monitoring Supports 2 Nodes per ASIC Advanced Resiliency Features Hardware Global Address Support Advanced NIC designed to efficiently support MPI One-sided MPI Shmem UPC, Coarray FORTRAN Cray Inc. Hot Interconnects 3

4 4 Y X Z Z X Y

5 Fast Memory Access (FMA) – fine grain remote PUT/GET Block Transfer Engine (BTE) – offload for long transfers Completion Queue (CQ) – client notification Atomic Memory Op (AMO) – fetch&add, etc. Cray Inc. Hot Interconnects 5

6 Single-sided Processor stores become remote PUT or GET FMA descriptors hold state to help determine destination node and memory location FMA PUT for short messages Uncached processor store to Gemini window translated directly to network packet FMA GET allows reverse direction data transfer of 1 to 64 bytes Cray Inc. Hot Interconnects 6

7 Driver managed BTE PUT for long messages DMA transfer to offload data movement from processor BTE SEND for IP traffic, etc. Send message to remote node Single receive queue for all sources Upper level protocol covers lost messages BTE GET support for simplified data transfers In lieu of involving remote side for PUT Cray Inc. Hot Interconnects 7

8 Hardware remote atomic memory operations in the NIC Add, Compare & Swap, Logical Operations Executed at the node with the memory AMO cache for hot locations Up to 64 locations with AMOs in process Global operations support Barriers Counters Collectives (reductions, global sum) Cray Inc. Hot Interconnects 8

9 6x8 tile matrix Input queue to one of 6 subswitches Route to one of 8 output buffers Hashed routing preserves order to cachelines Adaptive routing Cray Inc. Hot Interconnects 9

10 Route around stalled or down links If a link goes down, adaptive routing mask updated in hardware to exclude it OS traffic uses adaptive routing only, recovers from finite loss of packets Quiesce and re-route to repair deterministic routes Congestion feedback to allow routing around bottlenecks Potential for improved performance on difficult traffic patterns such as transpose Packets reordered in receive buffer (DRAM) Separate notification (completion event) when all stored Cray Inc. Hot Interconnects 10

11 24 bit flit Maximum size packet is 7+24+1=32 flit Put request of 64 bytes Minimum is 2 flit Put response Cray Inc. Hot Interconnects 11

12 Automatic link-level retries HT3 support including automatic retries and improved CRC Most internal data structures are at least parity protected The longer the occupancy of data at a location, the stronger the protection Errors reported as precisely as possible Payload errors reported directly to user Control errors often cannot be associated with a particular transaction In all cases OS or HSS can be notified of the error Router errors included Reported at the point of error Endpoint(s) (user) see a timeout Cray Inc. Hot Interconnects 12

13 Cray Inc. Hot Interconnects 13 User level Gemini Network Interface (uGNI) User level Gemini Network Interface (uGNI) DMAPP MPICH MPICH2 SHMEM Gemini Hardware Abstraction Layer (GHAL) Gemini Hardware Abstraction Layer (GHAL) GNI Core IOCTL or S yste m C all Kernel level GNI (kGNI) Kernel level GNI (kGNI) Lustre Network Driver (LND) Lustre Network Driver (LND) IP over Gemini Fabric (IPoGIF) IP over Gemini Fabric (IPoGIF) Direct Access Linux Core GART Resource Management (GRM) GART Resource Management (GRM) Cray COW solution MRT-size page support Registration Cache support PGAS Direct Access

14 Latency Bandwidth Atomic operations Cray Inc. Hot Interconnects 14

15 Gemini expanded to HT3 at up to 5.2 GT/s Expect to sustain greater than 6 GB/s user data injection Network bandwidth is limited by XT packaging Link speed from 3.125 to 6.25 Gbit/sec In some cases, double wide X & Z links also offer increased bandwidth Gemini relies on user level threads MPI processing limits to 2M messages/sec per thread Scales beyond 10M msg/sec per NIC Cray Inc. Hot Interconnects 15

16 One way PUT in 750ns Waiting for Ack in only 1.1 us Remote GET increases to 1.4 us Cray Inc. Hot Interconnects 16

17 Peak bandwidth reached with small transfers Multiple threads reach peak with smaller, still, transfers Cray Inc. Hot Interconnects 17

18 Hot location reaches 100 Mupdates/sec Random locations (GUPS) still over 45 Mupdates/sec Cray Inc. Hot Interconnects 18

19 Gemini provides low latency, and performance for fine grain operations Gemini has features to scale in performance and reliability to large system size Questions? Cray Inc. Hot Interconnects 19


Download ppt "Cray Inc. Hot Interconnects 1 Bob Alverson, Duncan Roweth, Larry Kaplan Cray Inc."

Similar presentations


Ads by Google