Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 A. Heiss, U. Schwickerath InfiniBand – Experiences at Forschungszentrum Karlsruhe.

Slides:



Advertisements
Similar presentations
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
Advertisements

NERCS Users’ Group, Oct. 3, 2005 Interconnect and MPI Bill Saphir.
Unified Wire Felix Marti, Open Fabrics Alliance Workshop Sonoma, April 2008 Chelsio Communications.
1 Agenda … HPC Technology & Trends HPC Platforms & Roadmaps HP Supercomputing Vision HP Today.
IWR Ideen werden Realität Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Institut für Wissenschaftliches Rechnen Status of Database Services.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Designing Lattice QCD Clusters Supercomputing'04 November 6-12, 2004 Pittsburgh, PA.
A Commodity Cluster for Lattice QCD Calculations at DESY Andreas Gellrich *, Peter Wegner, Hartmut Wittig DESY CHEP03, 25 March 2003 Category 6: Lattice.
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
I/O Channels I/O devices getting more sophisticated e.g. 3D graphics cards CPU instructs I/O controller to do transfer I/O controller does entire transfer.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
Peter Wegner, DESY CHEP03, 25 March LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P. Wegner (DESY Zeuthen), A. Gellrich, H.Wittig.
1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.
National Energy Research Scientific Computing Center (NERSC) The GUPFS Project at NERSC GUPFS Team NERSC Center Division, LBNL November 2003.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Control Activity & Industry Services
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,
1 A Basic R&D for an Analysis Framework Distributed on Wide Area Network Hiroshi Sakamoto International Center for Elementary Particle Physics (ICEPP),
A. Homs, BLISS Day Out – 15 Jan 2007 CCD detectors: spying with the Espia D. Fernandez A. Homs M. Perez C. Guilloud M. Papillon V. Rey V. A. Sole.
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.
Data Acquisition Backbone Core DABC J. Adamczewski, H.G. Essel, N. Kurz, S. Linev GSI, Darmstadt The new Facility for Antiproton and Ion Research at GSI.
1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
The NE010 iWARP Adapter Gary Montry Senior Scientist
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
OpenFabrics Enterprise Distribution (OFED) Update
Itanium 2 Impact Software / Systems MSC.Software Jay Clark Director, Business Development High Performance Computing
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
ClinicalSoftwareSolutions Patient focused.Business minded. Slide 1 Opus Server Architecture Fritz Feltner Sept 7, 2007 Director, IT and Systems Integration.
OFA-IWG Interop Event April 2007 Rupert Dance Lamprey Networks Sonoma Workshop Presentation.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Operational and Application Experiences with the Infiniband Environment Sharon Brunett Caltech May 1, 2007.
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
Dell PowerEdge Blade Server PDVSA Jun-05
HELMHOLTZ INSTITUT MAINZ Dalibor Djukanovic Helmholtz-Institut Mainz PANDA Collaboration Meeting GSI, Darmstadt.
ROD Activities at Dresden Andreas Glatte, Andreas Meyer, Andy Kielburg-Jeka, Arno Straessner LAr Electronics Upgrade Meeting – LAr Week September 2009.
A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town Performance Analysis of Virtualization.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,
Brief introduction about “Grid at LNS”
Enhancements for Voltaire’s InfiniBand simulator
M. Bellato INFN Padova and U. Marconi INFN Bologna
Supervisor: Andreas Gellrich
Introduction to Networks
Joint Techs Workshop InfiniBand Now and Tomorrow
LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P
Cluster Computers.
Presentation transcript:

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 A. Heiss, U. Schwickerath InfiniBand – Experiences at Forschungszentrum Karlsruhe Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Credits: Inge Bischoff-Gauss Marc García Martí Bruno Hoeft Carsten Urbach InfiniBand-Overview Hardware setup at IWR HPC applications: MPI performance lattice QCD LM HTC applications rfio xrootd

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 InfiniBand – Overview Channel-based, serial, switched fabric providing 2.5, 10 or 30 Gb/s bidirectional bandwidth. 1, 4 or 12 wire pairs carrying voltage differential signals per direction (1X, 4X, 12X). Usable bandwidth is 80% of signal rate: 250 MB/s, 1 GB/s, 3 GB/s. (soon: DDR) Copper cables (up to 15m) or fibre optics. Host Channel Adapters (HCAs) provide up to two ports each: redundant connections possible. HCAs for PCI-X (64bit, 133MHz) and PCI-Express. Onboard chips expected soon

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 Software overview kernel space drivers now ship with 2.6 kernel (since ) Verbs API implementation can be vendor specific RFIO and xrootd prototypes by IWR Notes:

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 InfiniBand Test equipment at IWR 16 V20z Dual Opteron, 4GB RAM, InfiniCon IBA drivers, SL303/304, Kernel , PBS, 2.2 GHz (for production purpose) 13 V20z Dual Opteron, 4GB RAM Mellanox GOLD, SL303/304, Kernel , LoadL+PBS, AFS, 2.2 GHz InfiniCon InfinIO x- InfiniBand switch Mounted into fully water cooled rack Installed and management with the QUATTOR toolkit Opteron Cluster HPL results: 171.4GFlops (26 nodes, 52CPU's) (75% of theoretical peak performance)

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 InfiniBand Test equipment at IWR Xeon Cluster and Blade Center 12 Dual Xeon, 2.4 Ghz, 4x-InfiniBand, RH7.3 Kernel , Mellanox drivers suite 16 Port 4x Mellanox switch (reference design) Rack mounted, air cooled Temporary equipment used for tests HP Xeon64 with 4x PCI-Express and 133MHz PCI-X, 3.4GHz, Dual-CPU, 4GB RAM NEC Quad-Opteron, 16GB RAM, 133MHz PCI-X IBM JS20 PPC64 blades with 4x-InfiniBand daughter card at 100MHz speed. Not an official IBM product but technology prototype, kindly provided by IBM/Böblingen 2 IBM Xeon (2.6GHz) nodes with Intel 10GE ethernet cards

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 MPI Raw-Performance (64Bit) Notes : best latency with PCI-Ex (4μs) best throughput with PCI-Ex (968MB/s) bidirectional BW with PCI-Ex up to 1850MB/s JS20 throughput matches experiences with Xeon nodes at 100MHz PCI-X speed, but note the better floating performance of the PPC970FX CPU. OSU Benchmarks Disclaimer on PPC64: Not an official IBM Product. Technology Prototype. (see also slide 5)

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 HPCC benchmark suite (0.8beta) Comparison GE wrt/ IBA GE not tuned, on-board Same benchmark parameters Same nodes 8 nodes, 16 CPUs HPL p x q = 4 x 4, N=31208 NB=40,64,80,96 HPL GFlops (79.5% of peak)

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 Lattice QCD Benchmark GE wrt/ InfiniBand Memory and communi- cation intensive application Benchmark by C. Urbach See also CHEP04 talk given by A. Heiss Significant speedup by using InfiniBand Thanks to Carsten Urbach FU Berlin and DESY Zeuthen

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 Lattice QCD Benchmark Xeon wrt/ Opteron Comparison Xeon with Opteron using one or two CPU's Opteron: network as good as SMP Speed up drop on Xeons when using both CPU's Effect not visible on Opterons Possible reason: Memory bottle neck by Northbridge on Xeon All measurements done at IWR Thanks to Carsten Urbach FU Berlin and DESY Zeuthen Dual-Xeon Dual-Opteron

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 The Local Model (LM) of Deutscher Wetterdienst surface wind simulation grid size 2.8km Chile, x 261 grid points 1h simulation time dashed: real time used solid: total CPU time InfiniBand: V20z NCSA MPI Mellanox Gold Power4, AIX SunV20z, IBA VPP 5000 SX5 Measurement done by Dr. I. Bischoff-Gauss

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 The Local Model (LM): 1 day simulation result

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 RFIO/IB Point-to-Point file transfers (64bit) RFIO/IB see ACAT03 NIM A 534(2004) Notes PCI-X and PCI-Express throughput solid: file transfers cache->/dev/null dashed: network+protocol only best results with PCI-Express: > 800MB/s raw transfer speed > 400MB/s file transfer speed Disclaimer on PPC64: Not an official IBM Product. Technology Prototype. (see also slide 5 and 6)

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 RFIO/IB throughput (mixed setup) Notes: NEC Quad-Opteron Server SuSE SLES9, Kernel , 16GB RAM, 2.2GHz Testfile: 1024MB random data Readers: 12 Dual Xeon 2.4GHz RH7.3 based, Kernel All readers read the same file at the same time (to /dev/null) See also CHEP04 talk by A. Heiss

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 The Extended Root Daemon Toolkit developped by SLAC and INFN (Padova) for easy data access for the BaBar experiment File based data access Simple, fault-tolerant, flexible security Standalone suite with clients and server packages Fully (and heavily) multithreaded Release version now distributed with the ROOT package What is the Xrootd package? Here: focus on raw data throughput, using a simple file copy method (xrdcp)

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 Xrootd and InfiniBand Xrootd on native InfiniBand Challenges to be addressed: Queue Pairs instead of sockets Memory management challenges Use of RDMA requires the buffers to be known to the sender in advance Send method requires preposted receive requests Xrdcp does not destroy it's physical connections before exit Makes use of IB_SEND method instead of RDMA Allocate private send and receive buffers associated with each QP Last connection times out at end ROOT interface not yet implemented Features and status of prototype:

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 Xrootd and InfiniBand First preliminary results Notes: IPoIB notes: Dual Opteron V20z Mellanox Gold drivers SM on InfiniCon 9100 same nodes as for GE Native IB notes: proof of concept version based on Mellanox VAPI using IB_SEND dedicated send/recv buffers same nodes as above 10GE notes: IBM xseries 345 nodes Xeon 32bit, single CPU 1 and 2 GB RAM 2.66GHz clock speed Intel PRO/10GbE LR cards used for long distance tests

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 Xrootd and InfiniBand Outlook/next steps: fix known problems memory management client/xrcdp resource cleanup fast connection ending implement missing parts integration into ROOT toolkit performance enhancements get rid of local buffers maybe implement buffer recycle mechanism allow use of RDMA based transfers requires discussion/interaction with developpers

Ulrich Schwickerath, Andreas Heiss InfiniBand, Desy Zeuthen, 26 th Mai 2005 Summary & Outlook InfiniBand offers nice performance for small prices usable for both HPC and high throughput applications at the same time technology is developping and prices keep falling software and drivers are freely available see also: