PC DESY Peter Wegner 1. Motivation, History 2. Myrinet-Communication 4. Cluster Hardware 5. Cluster Software 6. Future …

Slides:



Advertisements
Similar presentations
New PC Architectures - Processors, Chipsets, Performance, Bandwidth 1. PC - Schematic overview 2. Chipset schema (Intel 860 example) 3. AMD Athlon, XEON-P4.
Advertisements

“Understanding Computers” Intro to GIS Fall 2004.
Premio Predator G2 Workstation Training
Farms/Clusters of the future Large Clusters O(1000), any existing examples ? YesSupercomputing, PC clusters LLNL, Los Alamos, Google No long term experience.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Beowulf Supercomputer System Lee, Jung won CS843.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
Designing Lattice QCD Clusters Supercomputing'04 November 6-12, 2004 Pittsburgh, PA.
A Commodity Cluster for Lattice QCD Calculations at DESY Andreas Gellrich *, Peter Wegner, Hartmut Wittig DESY CHEP03, 25 March 2003 Category 6: Lattice.
CBP 2002ITY 117 Digital Technologies So far we have wrestled with Memory, the CPU, Bits and Bytes, Input and Output. Yep, Now it’s high time to put some.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825.
Teraserver Darrel Sharpe Matt Todd Rob Neff Mentor: Dr. Palaniappan.
1 Web Server Administration Chapter 2 Preparing For Server Installation.
Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat.
Peter Wegner, DESY CHEP03, 25 March LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P. Wegner (DESY Zeuthen), A. Gellrich, H.Wittig.
Intel® 64-bit Platforms Platform Features. Agenda Introduction and Positioning of Intel® 64-bit Platforms Intel® 64-Bit Xeon™ Platforms Intel® Itanium®
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Prepared by Careene McCallum-Rodney Hardware specification of a computer system.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
 Model: ASUS SABERTOOTH Z77 Intel Series 7 Motherboard – ATX, Socket H2 (LGA115), Intel Z77 Express, 1866MHz DDR3, SATA III (6Gb/s), RAID, 8-CH Audio,
INFO1119 (Fall 2012) INFO1119: Operating System and Hardware Module 2: Computer Components Hardware – Part 2 Hardware – Part 2.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Computer Organization CSC 405 Bus Structure. System Bus Functions and Features A bus is a common pathway across which data can travel within a computer.
COMP 1017: Digital Technologies Session 7: Motherboards.
Chapter 8 Input/Output. Busses l Group of electrical conductors suitable for carrying computer signals from one location to another l Each conductor in.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
… when you will open a computer We hope you will not look like …
© Copyright IBM Corporation 2006 Course materials may not be reproduced in whole or in part without the prior written permission of IBM IBM BladeCenter.
Know the Computer Multimedia tools. Computer essentials.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
Computer Maintenance Unit Subtitle: Bus Structures Excerpted from Copyright © Texas Education Agency, All rights reserved.
LOGO BUS SYSTEM Members: Bui Thi Diep Nguyen Thi Ngoc Mai Vu Thi Thuy Class: 1c06.
Chapter 3 By James Hanson June 2002 DRAM Dynamic-RAM Needs to be refreshed every few milliseconds 1 Transistor/ 1 Capacitor.
OCIPUG Hardware SIG February 12, OCIPUG Hardware SIG Agenda – February 12, :00 – 7:05 Administration 7:05 – 8:00 Featured Topic – CPUs 8:00.
1 CS503: Operating Systems Spring 2014 Dongyan Xu Department of Computer Science Purdue University.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 Web Server Administration Chapter 2 Preparing For Server Installation.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
4 Dec 2006 Testing the machine (X7DBE-X) with 6 D-RORCs 1 Evaluation of the LDC Computing Platform for Point 2 SuperMicro X7DBE-X Andrey Shevel CERN PH-AID.
Computer Architecture Part IV-B: I/O Buses. Chipsets Intelligent bus controller chips found on the motherboard Enable higher speeds on one or more buses.
Copyright © 2007 Heathkit Company, Inc. All Rights Reserved PC Fundamentals Presentation 30 – PC Architecture.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
May 25-26, 2006 LQCD Computing Review1 Jefferson Lab 2006 LQCD Analysis Cluster Chip Watson Jefferson Lab, High Performance Computing.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
CMS week, June 2002, CERN 1 First P2P Measurements on Infiniband Luciano Berti INFN Laboratori Nazionali di Legnaro.
Weekly Report By: Devin Trejo Week of June 21, 2015-> June 28, 2015.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Motherboard A motherboard allows all the parts of your computer to receive power and communicate with one another.
Advanced Operating Systems - Spring 2009 Lecture 18 – March 25, 2009 Dan C. Marinescu Office: HEC 439 B. Office hours:
1 Chapter 2 Central Processing Unit. 2 CPU The "brain" of the computer system is called the central processing unit. Everything that a computer does is.
Assembling & Disassembling of CPU. Mother Board Components.
System Bus.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
 System Requirements are the prerequisites needed in order for a software or any other resources to execute efficiently.  Most software defines two.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Recent experience with PCI-X 2.0 and PCI-E network interfaces and emerging server systems Yang Xia Caltech US LHC Network Working Group October 23, 2006.
MOTHER BOARD PARTS BY BOGDAN LANGONE BACK PANEL CONNECTORS AND PORTS Back Panels= The back panel is the portion of the motherboard that allows.
Personal Computers A Research and Reverse Engineering
Computer Hardware.
Supervisor: Andreas Gellrich
Cluster Active Archive
What’s in the Box?.
LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P
Designing a PC Farm to Simultaneously Process Separate Computations Through Different Network Topologies Patrick Dreher MIT.
Cluster Computers.
Presentation transcript:

PC DESY Peter Wegner 1. Motivation, History 2. Myrinet-Communication 4. Cluster Hardware 5. Cluster Software 6. Future …

PC Cluster “Definition 1” Idee: Herbert Cornelius (Intel München)

PC Cluster “Definition 2”

PC Cluster PC – HPC related components CPU Cache CPU Cache FrontSide Bus CHIPSET Memory MIMI external I/O PCI(64-bit/66 MHz), SCSI, EIDE, USB, Audio, LAN, etc. internal I/O AGP, etc

Motivation for PC Cluster Motivation: LQCD, Stream benchmark, Myrinet bandwidth 32/64-bit Dirac Kernel, LQCD (Martin Lüscher, CERN): P4, 1.4 GHz, 256 MB Rambus, using SSE1(2) instructions incl. cache pre-fetch Time per lattice point: micro sec (1503 Mflops [32 bit arithmetic]) micro sec (814 Mflops [64 bit arithmetic]) Stream Benchmark, Memory Bandwidth: P4(1.4 GHz, PC800 Rambus): 1.4 … 2.0 GB/s PIII (800MHz, PC133 SDRAM) : 400 MB/s PIII(400 MHz, PC133 SDRAM) : 340 MB/s Myrinet, external Bandwidth: Gb/s optical-connection, bidirectional, ~240 MB/s sustained

Motivation for PC Cluster, History March 2001 Pentium4 systems (SSE instructions, Rambus memory, 66MHz 64-bit PCI) available, Dual Pentium4 systems, XEON expected for May 2001, First systems on CeBit (under non-disclosure) Official announcement end of May 2001: Intel XEON processor, i860 chipset, Supermicro motherboard P4DC6 –the only combination available BA (Ausschreibung) July 2001 First information about the i860 problem Dual XEON test system delivered August 2001 Final decision end of August 2001(Lattice2001 in Berlin) Installation: December 2001 in Zeuthen, January 2002 in Hamburg

PC cluster interconnect - Myrinet Network Card (Myricom, USA) Myrinet2000 M3F-PCI64B PCI card with optical connector Technical details: 200 MHz Risc processor 2 MByte memory 66MHz/64-Bit PCI- connection Gb/s optical- connection, bidirectional Sustained bandwidth: MByte/sec

PC cluster interconnect - Myrinet Switch Myrinet2000 M3F-PCI64B PCI card with optical connector Technical details: 200 MHz Risc processor, 2 MByte memory 66MHz/64-Bit PCI-connection Gb/s optical-connection, bidirectional Sustained bandwidth: MByte/sec

PC - Cluster interconnect, performance Myrinet performance

PC - Cluster interconnect, performance QSnet performance (Quadrics Supercomputer World)

800MB/s 64 bit PCI P64H >1GB/s 64 bit PCI 800MB/s P64H PC Cluster, the i860 chipset problem 133 MB/s I ICH2 MCH PCI Slots, (33 MHz, 32bit) 4 USB ports LAN Connection Interface ATA 100 MB/s (dual IDE Channels) 6 channel audio 10/100 Ethernet Intel® Hub Architecture 266 MB/s 3.2 GB/s XeonProcessor Dual Channel RDRAM* AGP4XGraphics 400MHz System Bus XeonProcessor PCI Slots (66 MHz, 64bit) MRH MRH Up to 4 GB of RDRAM bus_read (send) = 227 MBytes/s bus_write (recv) = 315 MBytes/s of max. 528 MBytes/s External Myrinet bandwidth: 160 Mbytes/s

PC - Cluster Hardware NodesMainboard Supermicro P4DC6 2(1) x XEON P4, 1.7 GHz, 256 kByte Cache 1 Gbyte (4x 256 Mbyte) RDRAM IBM 18.3 GB DDYS-T18350 U ” SCSI disk Myrinet 2000 M3F-PCI64B-2 Interface NetworkFast Ethernet Switch Gigaline 2024M, 48x100BaseTX ports + GIGAline BaseSX-SC Myrinet Fast Interconnect M3-E32 5 slot chassis, 2xM3-SW16 Line cards Installation Zeuthen: 16 dual CPU nodes, Hamburg: 32 single CPU nodes

PC - Cluster Zeuthen schematic Host PC DESY Zeuthen Network node8 node7 node6 node5 node4 node3 node2 node1 node16 node15 node14 node13 node12 node11 node10 node9 Myrinet Switch Ethernet Switch Gigabit Ethernet (“private” Network)

PC - Cluster Software Operating system Linux (z.B. SuSE 7.2) Cluster tools:Clustware (Megware) Monitoring of temperature, fan rpm, cpu usage, Communication software:MPI - Message passing interface based on GM (Myricom low level communication library) Compiler: GNU, Portland Group, KAI, Intel Compiler Batch system: PBS (OpenPBS) Cluster management:Clustware, SCORE

PC - Cluster Software, Monitoring Tools Monitoring example: CPU utilization DESY HH Clustware from Megware

PC - Cluster Software, Monitoring Tools Monitoring example: CPU Utilization, Temperature, Fan speed, DESY Zeuthen

PC - Cluster Software: MPI... if (myid == numprocs-1) next = 0; else next = myid+1; if (myid == 0) { printf("%d sending '%s' \n",myid,buffer); MPI_Send(buffer, strlen(buffer)+1, MPI_CHAR, next, 99, MPI_COMM_WORLD); printf("%d receiving \n",myid); MPI_Recv(buffer, BUFLEN, MPI_CHAR, MPI_ANY_SOURCE, 99, MPI_COMM_WORLD, &status); printf("%d received '%s' \n",myid,buffer); /* mpdprintf(001,"%d receiving \n",myid); */ } else { printf("%d receiving \n",myid); MPI_Recv(buffer, BUFLEN, MPI_CHAR, MPI_ANY_SOURCE, 99, MPI_COMM_WORLD, &status); printf("%d received '%s' \n",myid,buffer); /* mpdprintf(001,"%d receiving \n",myid); */ MPI_Send(buffer, strlen(buffer)+1, MPI_CHAR, next, 99, MPI_COMM_WORLD); printf("%d sent '%s' \n",myid,buffer); }...

PC - Cluster Operating DESY Zeuthen:Dan Pop (DV), Peter Wegner (DV) DESY Hamburg:Hartmut Wittig (Theorie), Andreas Gellrich (DV) Maintenance contract with MEGWARE Software:Linux system, Compiler, MPI/GM, (SCORE) Hardware: 1 reserve node + various components MTBF: O(weeks) Uptime of the nodes ( ): Zeuthen – 38 days, node8 … node16 4 days break for line card replacement Hamburg – 42 days Problems: Hardware failures of Ethernet Switch, node, SCSI disks, Myrinet card All components were replaced relatively soon. KAI compiler not running together with MPI/GM, (RedHat-SuSE Linux problem)

PC – Cluster world wide: Examples Martyn F. Guest, Computational Science and Engineering Department,CCLRC Daresbury Laboratory CCLRC D

PC – Cluster: Ongoing Future CPUs:XEON 2.4 GHz …, AMD Athlon™ XP Processor Chipsets: Intel E7500, ServerWorks GC…, AMD-760™ MPX Chipset – full PCI bandwidth Mainboards:Mainboard Supermicro P4DP6 I/O interfaces:PCI-X, PCI Express Fast Network:Myrinet, QsNet, Inifiniband(?), …

 Dual Intel® Xeon™ 2.4GHz Processor  512KB L2 cache on-die  Hyper-Threading enabled  400MHz Bus (3.2GB/s)  Dual-Channel DDR Memory (16GB)  3.2GB/s Memory Bandwidth  3.2GB/s I/O Bandwidth  64-bit PCI/PCI-X I/O support  Optional SCSI and RAID support  GbE support  1u and 2u dense packaging

PC Cluster – new chipset Intel E7500

Link: High Speed Serial 1x, 4x, and 12x Host Channel Adapter: Protocol Engine Moves data via messages queued in memory Switch: Simple, low cost, multistage network Target Channel Adapter: Interface to I/O controller SCSI, FC-AL, GbE,... I/O Cntlr TCA Sys Mem CPU Mem Cntlr Host Bus HCA Link Switch Link Sys Mem HCA Mem Cntlr Host Bus CPU TCA I/O Cntlr no PCI bottleneck PC Cluster: Future interconnect Infiniband Concept up to 6GB/s Bi-directional

PC Cluster: Future interconnect Infiniband Concept (IBM)

PC Cluster: Future interconnect (?) Infiniband Cluster IB Fabric... 1 st Generation – up to 32 nodes (2002) 2 nd Generation – 1000s of nodes (2003 ?)