Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Cluster, Intel Xeon E C 2.200GHz, TH Express- 2, Intel Xeon Phi 31S1P
Tianhe-2 (MilkyWay-2) – cont. Site: National Super Computer Center in Guangzhou Manufacturer:NUDT Cores:3,120,000 Linpack Performance (Rmax)33,862.7 TFlop/s Theoretical Peak (Rpeak)54,902.4 TFlop/s Power:17, kW Memory:1,024,000 GB Interconnect:TH Express-2 Operating System:Kylin Linux Compiler:icc Math Library:Intel MKL MPI:MPICH2 with a customized GLEX channel
Tianhe-2 (MilkyWay-2) – cont. List Ra nk System Vend or Total Cores Rmax (TFlops) Rpeak (TFlops) Power (kW) 11/ TH-IVB-FEP Cluster, Intel Xeon E C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P NUDT3,120,00033, , , / TH-IVB-FEP Cluster, Intel Xeon E C 2.200GHz, TH Express-2, Intel Xeon Phi 31S1P NUDT3,120,00033, , ,
Xeon Phi/Many Integrated Core (MIC) hardware From Larrabee to Knights Ferry *http://www.extremetech.com/extreme/ intels-64-core-champion-in-depth-on-xeon-phi
The Knights Ferry die, Aubrey Isle. Die size on KNF, at 45nm, was rumored to be roughly 700mm sq. Xeon Phi ramps up Knights Ferry; Intel isn’t giving many details yet, but we know the architecture will pack 50 or more cores and at least 8GB of RAM. In this space, total available memory is an important feature. Knights Ferry, with its 32 cores and max of 2GB of RAM, could only offer 64MB of RAM per core; a 50-core Xeon Phi with 8-16GB of RAM would offer between MB per core.
Beyond the glut of x86 compute capacity, though, Tianhe-2 is notable for another reason: Except for the CPUs, almost all of the other components were made in China. The front-end system, which manages the actual operation of all the compute nodes, consists of Galaxy FT processors — 16-core SPARC chips designed and built by China’s National University of Defense Technology (NUDT). The interconnect (pictured below), also designed and constructed by the NUDT, consists of port optoelectronic switches that connect each of the compute nodes via a fat tree topology. The operating system, Kylin Linux, was also developed by NUDT.
Tianhe-2 is currently located at the NUDT while it undergoes testing, but will be fully deployed at the National Supercomputer Center in Guangzhou (NSCC-GZ) by the end of The peak power consumption for the processors, memory, and interconnect is 17.6 megawatts, with the water cooling system bringing that up to 24MW — slightly below the gigaflops-per-watt efficiency record set by the DoE/ORNL/Cray Titan supercomputer. When Tianhe-2 is complete, its primary purpose will be as an open research platform for researchers in southern China.
With Tianhe-2, two Arch-2 network interface chips and two "Ivy Bridge-EP" Xeon E5 compute nodes (each with two processor sockets) are on a single circuit board (even though they are logically distinct). This compute node plus one Xeon Phi coprocessor share the left half of the compute node and five Xeon Phis share the right side. The two sides can be electrically separated and pulled out separately for maintenance. The Arch-2 NICs link to the Xeon E5 chipset through PCI-Express 2.0 ports on the NIC, which is unfortunate given the doubling of bandwidth with the move to PCI-Express 3.0 slots. (Maybe that is coming with the Arch-3 interconnect, if there is one on the whiteboard at NUDT?) There's one Arch-2 NIC per compute node; the three Xeon Phi coprocessors for each node link over three PCI-Express 3.0 x16 ports to the CPUs. Yup, the Xeon Phis can talk faster to the CPU than the CPU can talk to the Arch-2 interface. It is unknown how this imbalance might affect the performance of Tianhe-2. *
The RSW switch blade for Tianhe-2
One set of RSW switches is rotated 90 degrees in parts of the system for reasons that don't make sense to me – yet. But here is how the components plug together: How the compute nodes, switch, and backplane come together in Tianhe-2
That special version of the Xeon Phi that NUDT is getting is called the 31S1P, and the P is short for passive cooling. Dongarra said that this Xeon Phi card had 57 cores and was rated at teraflops at double-precision floating point, and that is precisely the same feeds and speeds as the 3120A launched back in November with active cooling (meaning a fan). That 3120A had only 6GB of GDDR5 graphics memory, and the 31S1P that NUDT is getting has 8GB like the Xeon Phi 5110P card, which has 60 cores activated, which runs at a slightly slower clock speed, and which burns less juice and creates less heat. It is also 33 per cent more expensive at $2,649 in single-unit quantities. Anyway, with 48,000 of these bad boys, the Xeon Phi part of Tianhe-2 has 2.74 million cores and delivers a peak DP performance of petaflops. Add 'em up, and you get 54.9 petaflops peak.precisely the same feeds and speeds as the 3120A *http://www.theregister.co.uk/Print/2013/06/10/inside_chinas_tianhe2_massive_hybrid_supercomputer/
The Tianhe-2, like its predecessor, is also front-ended by a homegrown Sparc-based cluster. NUDT has created its own variant of the Sparc chip, called the Galaxy FT-1500, which has sixteen cores, runs at 1.8GHz, is etched in 40 nanometer processes, burns around 65 watts, and delivers around 144 gigaflops of double-precision performance. The front-end processor for the Tianhe-2 machine has 4,096 of these processors in its nodes, which gives another 590 teraflops.
The TH Express-2 Arch interconnect created by NUDT The most interesting part of the Tianhe-2 system is probably the Arch interconnect, which is also known as TH Express-2. The heart of the Arch interconnect is a high-radix router, just like Cray's Intel's "Gemini" and "Aries" interconnects, and like Aries is also uses a combination of electrical cables for short haul jumps and optical cables for long haul jumps. And like InfiniBand networks, Arch uses a fat tree topology, which is why many believe that Arch is, in fact, a goosed up version of InfiniBand, but NUDT is claiming it is a proprietary protocol and frankly, we are in no position to argue.
links CRAYT3E https://cug.org/5- publications/proceedings_attendee_lists/1999C D/S99_Proceedings/S99_Papers/Frese/frese.ht ml -