HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825.

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Windows® Deployment Services
Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Ver 0.1 Page 1 SGI Proprietary Introducing the CRAY SV1 CRAY SV1-128 SuperCluster.
Beowulf Supercomputer System Lee, Jung won CS843.
RAIDRAID Rithy Chhay Shari Holstege CMSC 691X: UNIX Systems Administration.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
High Availability through Virtualization
Jaeyoung Choi School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul , Korea {heaven, psiver,
A Commodity Cluster for Lattice QCD Calculations at DESY Andreas Gellrich *, Peter Wegner, Hartmut Wittig DESY CHEP03, 25 March 2003 Category 6: Lattice.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Lappeenranta University of Technology June 2015Igor Monastyrnyi HPC2N - High Performance Computing Center North System hardware.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Lesson 12 – NETWORK SERVERS Distinguish between servers and workstations. Choose servers for Windows NT and Netware. Maintain and troubleshoot servers.
Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.
Group 11 Pekka Nikula Ossi Hämäläinen Introduction to Parallel Computing Kentucky Linux Athlon Testbed 2
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
Windows Server 2008 Chapter 11 Last Update
Building a High-performance Computing Cluster Using FreeBSD BSDCon '03 September 10, 2003 Brooks Davis, Michael AuYeung, Gary Green, Craig Lee The Aerospace.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Introduction to Computers Personal Computing 10. What is a computer? Electronic device Performs instructions in a program Performs four functions –Accepts.
Terabyte IDE RAID-5 Disk Arrays David A. Sanders, Lucien M. Cremaldi, Vance Eschenburg, Romulus Godang, Christopher N. Lawrence, Chris Riley, and Donald.
LIGO- XXXX Oct 2001 GriPhyN All HandsLIGO Scientific Collaboration - University of Wisconsin - Milwaukee 1 Medusa: a LIGO Scientific Collaboration Facility.
Step Arena Storage Introduction. 2 HDD trend- SAS is the future Source: (IDC) Infostor June 2008.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.
Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Distributed Resource Management and Parallel Computation Dr Michael Rudgyard Streamline Computing Ltd.
Component 6 - Health Management Information Systems Unit 2-1a - Hardware and Software Supporting Health Information Systems.
N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Beowulf Software. Monitoring and Administration Beowulf Watch 
A Hardware Based Cluster Control and Management System Ralf Panse Kirchhoff Institute of Physics.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
1 Lattice QCD Clusters Amitoj Singh Fermi National Accelerator Laboratory.
1 THE EARTH SIMULATOR SYSTEM By: Shinichi HABATA, Mitsuo YOKOKAWA, Shigemune KITAWAKI Presented by: Anisha Thonour.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Based upon slides from Jay Lepreau, Utah Emulab Introduction Shiv Kalyanaraman
CIS250 OPERATING SYSTEMS Chapter One Introduction.
LCG LCG-1 Deployment and usage experience Lev Shamardin SINP MSU, Moscow
Health Management Information Systems Unit 2 Hardware and Software Supporting Health Information Systems Component 6/Unit21 Health IT Workforce Curriculum.
Construction methods and monitoring in meta-cluster systems LIT, JINR Korenkov V.V, Mitsyn V.V, Chkhaberidze D.V, Belyakov D.V.
Hands-On Virtual Computing
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Operational and Application Experiences with the Infiniband Environment Sharon Brunett Caltech May 1, 2007.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Topic 2: Hardware and Software
Low-Cost High-Performance Computing Via Consumer GPUs
OpenLab Enterasys Meeting
Berkeley Cluster Projects
Low-Cost High-Performance Computing Via Consumer GPUs
NCSA Supercluster Administration
1.00 Examine the role of hardware and software.
Myrinet 2Gbps Networks (
Designing a PC Farm to Simultaneously Process Separate Computations Through Different Network Topologies Patrick Dreher MIT.
Cluster Computers.
Presentation transcript:

HELICS Petteri Johansson & Ilkka Uuhiniemi

HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825 Gflops –COTS -> 1.3M EUROs

HELICS 256 GBytes ECC RAM 10 TB local disks Myrinet 2000 (fiber) 6 switches (128 port) Ethernet Peak performance 512*2.8GFlops = 1.43TFlops

Interconnections –Myrinet 2000 –10 ns latency (one way) 2+2 Gbs Full duplex bandwidth –bisectional bandwith: 128x (2+2) Gbs

Additional equipment 32 Double node Myrinet cluster for interactive development 2 Front End PC as access, compilation, job distribution hosts 1 Administration server 1 Fileserver (Sun Fire 880) + 2 Tbyte Raid 5 diskarray 10 Tbyte tape backup remote power control device

Problems Hardware errors: 3 power supplies, 3 hard disks, 2 motherboards, 8 Myrinet network cards Software: Kernel (stable), 2 nodes crash due to daemon crashes

Clustering What is needed? –Booting concept: Network boot (dhcp) –cluster installation installation via network –power control remote access of power supplies, seq. power off/on, reset –BIOS control update and setting via network, direct access via serial link –health control of nodes fan speed, cpu temp and disk status gathering via network

Clustering reliability of resources –spare hosts, redundant servers availability monitoring & accounting –gathering system+job status, accounting infos via network batching concepts –Score cluster software

Clustering application optimization –tracing + profiling tools (vampir, paraver) debugging of parallel applications –Debugger: Totalview, P2D2, PGI

Software SCore Cluster System Software is a high- performance parallel programming environment for workstation and PC clusters

SCORE Heterogeneous Programming Language Multiple Programming Paradigms Parallel Programming Support –Real-time process activity monitor –Deadlock detection –Automatic debugger attachment

SCORE Fault tolerance –Preemptive checkpoint –Parallel process migration Flexible Job Scheduling –Gang scheduling –Batch scheduling

USAGE Reactive flows Optimization problems Technical simulations Image processing Bio-computing/Bioinformatics