Presentation is loading. Please wait.

Presentation is loading. Please wait.

Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Similar presentations


Presentation on theme: "Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December."— Presentation transcript:

1 Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December 1-11 th 2014 Muscat, Oman

2 Please, maximize my effective throughput Connect HPC resources at Fusionopolis with the storage and genomics pipeline in the Biopolis Matrix Building Pg 2

3 Pg 3

4 Pg 4 Tests started with Mellanox METRO-X early 2013 and were followed up with trials using the Obsidian Strategics Longbow C400. Today the sites are connected with 2x40gbps connections running native InfiniBand and reaching approx. 98.4% of maximum theoretical possible throughput.

5 The big picture: NSCC Singapore’s National SuperComputer Centre Joint A*STAR, NUS, NTU, SUTD and NRF project; RFS Q NSCC –Calls for new 1-2+ PetaFLOP Supercomputer –Recurrent investment every 3 to 5 years –Pooling up and high tier compute resources at A*STAR and IHLs –Co-investment from primary stakeholders Science, Technology and Research Network (STAR-N) –High bandwidth network to connect distributed compute resources –Provides high speed access to users, both public and private, anywhere –Supports transfer of large data sets both locally and internationally Pg 5

6 The quest to maximize effective throughput TCP/IP’s curse: CPU overhead Pg 6 Source: IBTA, the InfiniBand Trade Association

7 InfiniBand’s magic potion: RDMA Pg 7 Source: IBTA, the InfiniBand Trade Association

8 The undeniable virtues of RDMA Pg 8 47% system CPU overhead and idle time in a TCP/IP environment versus 12% in an RDMA environment In other words 88% CPU efficiency in the user space with RDMA versus 53% with TCP/IP Source: Mellanox

9 HPC’s road to InfiniBand 1999: Intel, IBM, Sun, HP, Microsoft, Compaq and Dell agree on the original InfiniBand standard to solve a looming problem of a PCI (Peripheral Component Interconnect) bottleneck 2003: Virginia Tech builds an InfiniBand cluster ranked number three on the SC Top500 at the time. IB becomes increasingly popular for cluster interconnects as it beats Ethernet on both price and latency. November 2014: 225 of the Top 500 use InfiniBand, up 8.7% YoY. The Ethernet camp tries to counter with RoCE (RDMA over Converged Ethernet) and now RoCEv2 for the data centre space.. Pg 9

10 HPC’s choice: InfiniBand link layer Pg 10

11 The Ultimate InfiniBand Jailbreak HPC’s and Infiniband were suffocating within the Data Center walls. Range extenders like the Mellanox MetroX gave Infiniband and consequently HPC’s and data centres themselves more breathing room and ways to expand on metro level. Obsidian Strategics took the final step: It took Data Centre walls away completely. InfiniBand connections can cross continents and circle the globe. The ultimate step: BGFC makes InfiniBand routeable and opens the possibility to permeate the globe giving rise to an Infininet. Pg 11 Internet gave us classrooms without walls. Infininet will give us supercomputing without walls

12 …proved itself in a spectacular way Pg 12

13 Pg 13 TGN-IA and TGN-P 100gbps

14 Galaxy14 Network Topology SC14 Pg gbps linking A*STAR in Singapore to the A*STAR booth at SC14 in New Orleans via Singaren, the Tata Communications transpacific cables TGN-IA and TGN-P to Seattle, Century Link to New Orleans and Scinet on the SC14 conference grounds.

15 A*STAR’s vision: Infinicortex a Supercomputer of Supercomputers Professor Tan Tin Wee and Dr. Marek Michalewicz proposed to demonstrate something totally new, never done before, Pg 15 Very High speed transcontinental transmission of native Long Distance Infiniband between High Performance Computing (HPC) centres continents apart and have them operate as one, tackling the biggest computational challenges and opening a possible avenue to exascale supercomputing where the most vexing problem is power and heat generation. This is not cloud computing, this is not Grid

16 The four elements that made this possible Very high speed transmission as made possible by ACA100 –Asia connects America at 100gbps, challenge issued by Yves Poppe, then at Tata Communications, at APAN 37 in Bandung, Indonesia. InfiniBand over trans-pacific distances –Made possible with Obsidian Strategics InfiniBand range extenders. Galaxy of Supercomputers –Supercomputer interconnect topology and graph theory work by Y.Deng, M.Michalewicz and L.Orlowski. –InfiniBand subnetting using the BGFC protocol and the new Obsidian Crossbow InfiniBand router. Application layer –File transfer optimization based on the development of Dsync+ for simple file transfers all the way to complex work flows with ADIOS (Adaptable I/O System) developed by Dr. Scott Klasky and his team at Oak Ridge National Laboratories. Pg 16

17 InfiniBand range extender and router Pg 17 Longbow Device Developed by Obsidian Strategics based in Edmonton, Canada Crossbow plus Longbows give rise to Galaxy and open the door to an Infininet Crossbow Device

18 Pg 18

19 Galaxy of Supercomputers Supercomputers located at different geolocations connected into a Super-Graph or ‘Nodes of Super-Network’ Supercomputers may have arbitrary interconnect topologies Galaxy is based on a topology with small diameter and lowest possible link number. In terms of graph representation it is an embedding of graphs representing Supercomputers’ topologies into a graph representing the Galaxy topology. Pg 19

20 Pg 20

21 Making the vision a reality Testing within Singapore completed using dark fibre between two A*CRC sites and also with the National University of Singapore using Singaren’s new SLIX over 80km. Convincing results led us to deploy two 40gbps InfiniBand connections between our Biopolis and Fusionopolis sites. InfiniBand over Ethernet testing with Tokyo Institute of Technology’s Tsubame-KFC successfully completed using Singaren, APAN and JGN-X. InfiniBand over IP testing completed with the NCI (National Computational Infrastructure) at the Australian National Unversity in Canberra using existing Singaren, APAN and AARnet infrastructure. 10gbps dedicated link between Singapore and the USA for layer 2 ‘native’ InfiniBand testing with ORNL and others starting end October. Rather spectacular results of the 100gbps trial and demos between Singapore and New Orleans at SC14 Pg 21

22 Proving the point Pg 22 Extract from the presentation prepared by Jakub Chrzeszczyk and Andrew Howard, NCI, Australia

23 Pg 23

24 Pg 24

25 Pg 25

26 Pg 26

27 Long Distance InfiniBand: a potential R&E networking game changer Pg 27 So far, the global HPC needs were presented at TERENA, APAN and GLIF and resonate with the visions of the global R&E networking community. Adoption of native Infiniband as a commonly used layer 2 transmission protocol would give NREN’s a rare opportunity to gain back the lead in innovation and clearly differentiate themselves from commercial networks. The HPC community is faced with a continuing exponential growth of data generated and current NREN internetworking capacity is already insufficient considering only the needs of genomics data interchange. To reach exascale computing, a distributed approach is probably required if only to cope with power requirements and disaster recovery

28 The HPC community’s call to ASREN HPC’s need NREN’s and NREN’s need HPC’s –The majority of global R&E traffic originates from the HPC community. –Supercomputing is essential to the economic development in all advanced industrial sectors as well as academic research and education. –The HPC community constitutes by far the most demanding constituency globally as they continue to push relentlessly the bandwidth and switching capacity envelopes on all scales. This for the simple reason that the incredible ‘big data’ growth with associated hunger for computing power, storage and associated electrical power and cooling will continue unabated. –Reaching the exaflop scale in supercomputing will very likely require a distributed approach to be sustainable and include the Infinicortex concept. Pg 28 Let us lead the world in building an Infinet! Let us lead the world towards exascale computing

29 Circle the globe at 100gbps with ACE-100? Prof. Tan Tin Wee, Chairman of A*STAR Computational Resource Centre, pointed out that with ANA-100 now a reality and ACA-100 coming, the only missing piece to circle the globe would be ACE- 100: Asia connects Europe. I had a vision of bits racing around the world, 100,000,000,000 of them every second, 100gbps, as fast as light can travel through fibre, transmitting a continuous stream of copies of Jules Vernes’ ‘Around the world in eighty days’. Pg 29 SC15: the Phileas Fogg challenge

30 We hope to see you in Singapore at Pg 30 Organised by A*STAR Computational Resource Centre (A*CRC), An international conference on supercomputing, exascale and beyond in Singapore and Asia March 17-20, 2015, Singapore

31 Pg 31 Thank You Creativity requires the courage to let go of certainties. Erich Fromm


Download ppt "Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December."

Similar presentations


Ads by Google