Dell Research In HPC GridPP7 1 st July 2003 Steve Smith HPC Business Manager Dell EMEA

Dell Research In HPC GridPP7 1 st July 2003 Steve Smith HPC Business Manager Dell EMEA steve_l_smith@dell.com

HPCC Team 2 Grid Shared Resources Rich Client Cluster, SMP, Blades Distributed Applications OS Middleware Hardware Traditional HPC Architecture Standard Based Clusters SMP Future HPC Architecture Current HPC Architecture Proprietary RISC Vector Custom The Changing Models of High Performance Computing © Copyright 2002-2003 Intel Corporation

HPCC Team 3 HPCC Building Blocks PowerEdge & Precision (IA32 & IA64) Parallel Benchmarks (NAS, HINT, Linpack…) and Parallel Applications VIA Fast Ethernet TCP PlatformPlatform InterconnectInterconnect ProtocolProtocol OSOS MiddlewareMiddleware BenchmarkBenchmark Gigabit EthernetMyrinet GM Linux Windows MPI/ProPVMMPICHMVICH Quadrics Elan

HPCC Team 4 HPCC Components and Research Topics Vertical Solutions: application Prototyping / Sizing - Energy/Petroleum - Life Science - Automotives – Manufacturing and Design - Custom application benchmarks - Standard benchmarks - Performance studies Resource Monitoring / Management Resource dynamic allocation Checkpoint restarting and Job redistributing Compilers and math library Performance tools - MPI analyzer / profiler - Debugger - Performance analyzer and optimizer MPI 2.0 / Fault Tolerant MPI MPICH, MPICH-GM, MPI/LAM, PVM Interconnect Technologies - FE, GbE, 10GE… (RDMA) - Myrinet, Quadrics, Scali - Infiniband Management Hardware Interconnects Hardware Interconnect Protocols Operating Systems Middleware / API Cluster Hardware Software Monitoring & Management Application Node Monitoring & Management Benchmark Development Tools Job Scheduler Platform Hardware Cluster Installation Cluster File System - Reliable PVFS - GFS, GPFS … - Storage Cluster Solutions IA-32, IA64 (Processor / Platform) comparison Standard rack mounted, blade and brick servers / workstations Cluster monitoring Load analysis and Balancing - Remote access - Web-based GUI Cluster monitoring Distributed System Performance Monitoring Workload analysis and Balancing - Remote access - Web-based GUI Remote installation / configuration PXE support System Imager LinuxBIOS

HPCC Team 5 128-node Configuration with Myrinet

HPCC Team 6 HPCC Technology Roadmap Q2 FY04Q3 FY04Q4 FY04Q3 FY03Q4 FY03Q1 FY04Q1 FY05Q2 FY05Q3 FY05 IB Prototyping Interconnects Myrinet 2000QuadricsScali Myrinet hybrid switch iSCSI TOP500 (Nov 2002)TOP500 (June 2003)TOP500 (Nov 2003) TOP500 (June 2004) Platform Baselining Big Bend 2P 1U Yukon 2P 2U Everglades 2P 1U GangliaClumon (NCSA) 10GbE Middleware Cycle StealingMPICH-G2 Cluster Monitoring PVFS2 Lustre File System 1.0 Lustre File System 2.0 Global File System File Systems ADIC Globus Toolkit Platform Computing Data Grid Grid Computing Manufacturing: Fluent, LS-DYNA, Nastran Life Science: BLASTs Energy: Eclipse, LandMark VIP Financial: MATLAB Vertical Solutions Condor-GGrid Engine (GE) Qluster NFS

HPCC Team 7 In the box scalability of Xeon Servers 71 % 71 % Scalability in the box

HPCC Team 8 In the BOX – XEON (533 MHz FSB) Scaling 32 % 32 % Performance improvement http://www.cs.utexas.edu/users/flame/goto/

HPCC Team 9 Goto Comparison on Myrinet 37% Improvement with Gotos library

HPCC Team 10 Goto Comparison on Gigabit Ethernet 25% Improvement with Gotos library 64 Nodes / 128 Processors

HPCC Team 11 Process-to-Processor Mapping Process Mapped CPU1 CPU2 CPU1 CPU2 Process 1 Process 3 Process 4 Process 2 Switch Node 2 Node 1 CPU1 CPU2 CPU1 CPU2 Process 1 Process 2 Process 4 Process 3 Switch Round Robin (Default) Node 1 Node 2 Process 1Node 1 Process 2Node 2Node 1 Process 3Node 1Node 2 Process 4Node 2

HPCC Team 12 Messages Count for HPL 16-process Run

HPCC Team 13 Messages Length of HPL 16-process Run

HPCC Team 14 HPL Results on the XEON Cluster Size-major is 35% better than Round Robin Balanced system designed for HPL type of applications Fast EthernetGigabit EthernetMyrinet Size-major is 7% better than Round Robin

HPCC Team 15 Reservoir Simulation – messages statistics

HPCC Team 16 Reservoir Simulation – Process Mapping – Gigabit Ethernet 11% improvement with GigE

HPCC Team 17 How Hyper-Threading Technology Works Second Thread/Task First Thread/Task Execution Resource Utilization Time Both Threads/Tasks without Hyper-Threading Technology Both Threads/Tasks with Hyper-Threading Technology Time saved, up to 30% Greater resource utilization equals greater performance © Copyright 2002-2003 Intel Corporation

HPCC Team 18 HPL Performance Comparison Linpack Performance Results on a 16-node Dual-Xeon 2.4 GHz cluster 0 10 20 30 40 50 60 70 80 90 20004000600010000140002000028000400004800056000 Problem size GFLOPS 16x4 processes with HT on 16x2 processes without HT 16x2 processes with HT on 16x1 processes without HT Hyper-threading provides ~6% Improvement on a 16 node 32 processors cluster

HPCC Team 19 NPB-FT (Fast Fourier Transformation) 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1x2 (1x4 with HT)2x2 (2x4 with HT)4x2 (4x4 with HT)8x2 (8x4 with HT)16x2 (16x4 with HT)32x2 (32x4 with HT) Configuration Mop/sec without HT with HT Cache (L2) misses increased Without HT: 68% With HT: 76% Cache (L2) misses increased Without HT: 68% With HT: 76% Number of nodes X Number of processors

HPCC Team 20 NPB-EP (Embarrassingly Parallel) 0 100 200 300 400 500 600 700 800 900 1000 1x2 (1x4 with HT)2x2 (2x4 with HT)4x2 (4x4 with HT)8x2 (8x4 with HT)16x2 (16x4 with HT)32x2 (32x4 with HT) Configuration Mop/sec Number of nodes X Number of processors EP requires almost no communication SSE and x87 utilization increased Without HT: 94% With HT: 99% EP requires almost no communication SSE and x87 utilization increased Without HT: 94% With HT: 99% without HT with HT

HPCC Team 21 Observations Computational intensive applications with fine-tuned floating-point operations have less chance to be improved in performance from Hyper-Threading, because the CPU resources could already be highly utilized Cache-friendly applications might suffer from Hyper-Threading enabled, because processes running on logical processors might be competing for the shared cache access, which might result in performance degradation Communication-bound or I/O-bound parallel applications may benefit from Hyper-Threading, if the communication and computation can be performed in an interleaving fashion between processes. The current version of Linux OSs support on Hyper-Threading is limited, which could cause performance degradation significantly if Hyper-Threading is not applied properly. To the OS, the logical CPUs are almost undistinguishable from physical CPUs The current Linux scheduler treats each logical CPU as a separate physical CPU - which does not maximize multiprocessing performance A patch for better HT support is available (Source: "fully HT-aware scheduler" support – 2.5.31-BK-curr, by Ingo Molnar)

Thank You Steve Smith HPC Business Manager Dell EMEA steve_l_smith@dell.com

Dell Research In HPC GridPP7 1 st July 2003 Steve Smith HPC Business Manager Dell EMEA

Similar presentations

Presentation on theme: "Dell Research In HPC GridPP7 1 st July 2003 Steve Smith HPC Business Manager Dell EMEA"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dell Research In HPC GridPP7 1 st July 2003 Steve Smith HPC Business Manager Dell EMEA

Similar presentations

Presentation on theme: "Dell Research In HPC GridPP7 1 st July 2003 Steve Smith HPC Business Manager Dell EMEA"— Presentation transcript:

Similar presentations

About project

Feedback