Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005.

Similar presentations


Presentation on theme: "Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005."— Presentation transcript:

1 Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005

2 Outline An engineering level overview of the HW and SW that make up jacquard. 1)CPU’s 2)Memory 3)OS 4)Interconnect Will use seaborg as a point of reference.

3 Colony Switch PGFS seaborg.nersc.gov (review?) ResourceSpeedBytes Registers 3 ns 256 B L1 Cache 5 ns 32 KB L2 Cache 45 ns 8 MB Main Memory300 ns 16 GB Remote Memory 19 us 7 TB GPFS 10 ms 50 TB HPSS 5 s 9 PB 380 x HPS S CSS0 CSS1 6080 dedicated CPUs, 96 shared login CPUs Hierarchy of caching, speeds Bottleneck determined by first depleted resource 16 way SMP NHII Node Seaborg: crossbar main memory GPFS MPI

4 Infiniban d Switch PGFS jacquard.nersc.gov basics ResourceSpeedBytes Registers 0.5 ns 2 KB L1 Cache 1.5 ns 64 KB L2 Cache 45 ns 1 MB Main Memory70-117 ns 6 GB Remote Memory 5 us 2 TB GPFS 10 ms 15 TB HPSS 5 s 9 PB 320 x HPS S IB 640 dedicated CPUs, 8 shared login CPUs Smaller caches, HT, Really Fast SMP? NUMA? SUMO. 2 way Opteron node Jacquard: Main Memory GPFS MPI HT

5 Opteron Block Diagram : Not strictly SMP 1 TLB per CPU 1K entries 4K pages  4MB coverage SDRAM Switch, I/O

6 Hyper Transport: Good Stuff Little conflict between data movement and computation

7 SMP size and memory contention Jacquard’s numbers 1 task : 100 % 2 tasks: 98% Why is Jacquard 2 way SMP?

8 Flops @ 2.2 GHz Peak Theoretical Flops –Double (64 bit) floats : 1 add + 1 mult = 2.2 GFlop/s –Single (32 bit) floats : 2 add + 2 mult = 4.4 GFlop/s Peak Realized Flops –Double (64 bit) floats : 1.9 GFlop/s –Single (32 bit) floats : 3.4 GFlop/s Your Flops? – Walltime is more important than flops – For a known algorithm flops are a sanity check Memory BW 4 GB/sec per CPU

9 MPI Bandwidth: seaborg

10 MPI Bandwidth: Jacquard

11 Linux for AIX Users Linux and AIX are more similar than different Linux is not as good as AIX in keeping processes scheduled of the same CPU  processor affinity work. Linux has easy interfaces to architectural and process performance information /proc/cpuinfo, /proc/self, etc. AIX MPI is in /usr/{bin,lib}, Linux MPI is in modules Linux doesn’t need –bmaxdata ! Little vs. Big Endian

12 Conclusions The underlying HW technologies HT, IB, etc. are quite promising. Opteron systems are delivering great price/performance. Still working some SDRAMM, OS, and SW issues. What’s useful to you? Let us know.


Download ppt "Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005."

Similar presentations


Ads by Google