Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 1 Cluster Building and Design Vikas Singhal VECC, Kolkata, India.

Slides:

Advertisements

Similar presentations

Today’s topics Single processors and the Memory Hierarchy

Advertisements

Beowulf Supercomputer System Lee, Jung won CS843.

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Information Technology Center Introduction to High Performance Computing at KFUPM.

Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.

Novell Server Linux vs. windows server 2008 By: Gabe Miller.

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.

High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

Network File System (NFS) in AIX System COSC513 Operation Systems Instructor: Prof. Anvari Yuan Ma SID:

Grid Monitoring By Zoran Obradovic CSE-510 October 2007.

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

High Performance Computing G Burton – ICG – Oct12 – v1.1 1.

SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.

WP4-install task report WP4 workshop Barcelona project conference 5/03 German Cancio.

Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.

University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant

Weekly Report By: Devin Trejo Week of May 30, > June 5, 2015.

การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.

March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,

Block1 Wrapping Your Nugget Around Distributed Processing.

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

Sandor Acs 05/07/

InstantGrid: A Framework for On- Demand Grid Point Construction R.S.C. Ho, K.K. Yin, D.C.M. Lee, D.H.F. Hung, C.L. Wang, and F.C.M. Lau Dept. of Computer.

Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.

N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

Service - Oriented Middleware for Distributed Data Mining on the Grid ，劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.

And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR

Deploying a Network of GNU/Linux Clusters with Rocks / Arto Teräs Slide 1(18) Deploying a Network of GNU/Linux Clusters with Rocks Arto Teräs.

Deployment work at CERN: installation and configuration tasks WP4 workshop Barcelona project conference 5/03 German Cancio CERN IT/FIO.

20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.

Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP

Information Services Andrew Brown Jon Ludwig Elvis Montero grid:seminar1:lectures:seminar-grid-1-information-services.ppt.

Grid Computing Framework A Java framework for managed modular distributed parallel computing.

(WINDOWS PLATFORM - ITI310 – S15)

HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.

| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

Tool Integration with Data and Computation Grid “Grid Wizard 2”

The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.

Linux Configuration using April 12 th 2010 L. Brarda / CERN (some slides & pictures taken from the Quattor website) ‏

CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.

Background Computer System Architectures Computer System Software.

Quattor tutorial Introduction German Cancio, Rafael Garcia, Cal Loomis.

Next Generation of Apache Hadoop MapReduce Owen

Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

Condor on Dedicated Clusters Peter Couvares and Derek Wright Computer Sciences Department University of Wisconsin-Madison

CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.

Advanced Network Administration Computer Clusters.

Consulting Services JobScheduler Architecture Decision Template

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Blueprint of Persistent Infrastructure as a Service

Status of Fabric Management at CERN

Consulting Services JobScheduler Architecture Decision Template

Grid Computing.

LCG middleware and LHC experiments ARDA project

Distributed System Concepts and Architectures

Chapter 2: Operating-System Structures

Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.

Chapter 2: Operating-System Structures

Presentation transcript:

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 1 Cluster Building and Design Vikas Singhal VECC, Kolkata, India

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 2 General View of HPC Clustering Concept Requirement for clustering Quattor Description Working of Condor Glimpse of Ganglia Current status of our cluster Cluster Building and Design

February 9, 2006 Vikas Singhal, VECC 3 High Performance Computing Branch of Computing that deals with extremely powerful computers and the applications that use them. High Computing Power required for Data Intensive applications or High Computing applications. (As per requirement) Eg. Supercomputer is one of the answer for HPC. Supercomputer is characterized by very high speed, very large memory. Speed measured in terms of number of flops. Fastest computer in the world BlueGene/L (IBM made) 280 Tflops.

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 4 Technologies for HPC Traditional : Build Faster CPUs Special electronic technology for increasing clock speed Advanced CPU architecture (Pipelining, Vector Processing, Multiple functional units etc) Parallel Processing (Harness large number of ordinary CPUs and divide the job between then) Eg: CRAY Very high clock speed Very High heat dissipation Advanced cooling techniques required Liquid Freon / Liquid nitrogen Expensive But easy for User No special programming required Large number of conventional CPUs Interconnected through a Network Cost effective Program writing is difficult, Job has to be split into independently executable units

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 5 Why Clustering For High Performance and High Availability computing, Making Cluster of computers is one of the best solution.  Low cost technology than Supercomputer.  Faster than super computer of same hardware cost.  No technical and technological limitations.  Scalable and Simple.

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 6 High Computing Power Clustering of Computers Application Computing Intensive Task Main aim is High Performance Computing (HPC) (Most of TOP500 computers are built by clustering, In BlueGene/L 1,31,000 processors (approx)) Single User and single number crunching problem Communication between nodes should be much faster (Some Hi-Fi network card is required (Costly)) Program should be written with the help of any parallel language or in Parallel environment. Parallel Languages: LINDA, OCCAM etc Parallel Extension to serial languages: High Performance Fortran (HPF) Parallel APIs: OpenMP, MPI

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 7 High Computing Power Clustering of Computers Application Data Intensive Task Main aim is not High Performance Computing (HPC) but High Availability. Multi User and Multi Job System It is Part of Global Grid like EDG Security is main concern 7 collaborating Institutes More than 100 Users (Consult with Mr. S. K. Pal Talk) Internet Connectivity (High Bandwidth) is required. (We have installed 4-Mbps Leased Line (1:4))

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 8 How to build Cluster of Our Requirement Hardware Processors Memory (RAM) Storage No need to purchase Hi-Fi Network Card Software Cluster Building S/W Cluster Monitoring S/W Job Scheduling S/W User Management S/W According to requirement. Open Source Availability. Software Area is Very Big. Purchase according to requirement and Budget.

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 9 Procurement of full cluster is not at Once. Step by step process. Different H/W support different S/W. Our specific requirement Procurement of HARDWARE Procurement of SOFTWARE

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 10 DMZ Giga-bit Switch Management Nodes HP Proliant-360DLG3 Dual CPU Xeon 2.4 GHz x.x (Stand by) Computing Nodes 4Mbps (1:4) Present status of Tier2-Kol Cluster Based on High Availability

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 11 High Availability For Data Intensive and Real time task critical system requires High availability High AvailabilityRedundancy (Eliminate single point of failure) Each server has 2-NICs Eth0Eth1 2-Gigabit Switch Based on Bonding Concept

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 12 Redundancy Cont. 2 Hard Disks Both are mirror of each other. Both are hot swappable. Implemented on Hardware RAID-0 technique. Both synchronized in each millisecond. Trying to make mirror of Management node. rsync

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 13 Software Requirement for making Cluster Open Source Software for Cluster Building:- OSCAR: Free but harnessing of Client nodes is limited SCALI: Not free S/W. Paid with Network Cards (as in IMSc) Redhat Cluster Suits: Not much suitable CPM (Central Processor Manager) : IBM Proprietary Rocks: Not free software Quattor: Free and Best Suitable For selecting which one is “Best” according to our requirement one have to get experience with all.

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 14 No Specific Hardware or software required for building Quattor Cluster. Installing a Quattor Server and Client Requirements: It supports SLC or RH Linux 7.3 Disk: 6.5 GB for Server, 2.5 GB per client OS Site Address:- Package RPMs:- Quattor is a large scale management system for managing medium to very large (>1000 node) clusters. 3 Sets of Quattor RPM are available:- 1. i386 :- For all Pentium or Xeon processor or that has IA32 bit Instruction set 2.IA64 :- For 64 bit machine means Intel Itanium 3.i86x64 :- For 64 bit machine but also supports x86 instruction set like AMD Opetron Quattor is an administration toolkit for optimizing resources.

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 15 SPMA Software Package Manager Agent for software deployment Manages the different software packages installation Handle multiple package formats Manages Software Repository (SWRep) CDB Configuration Data Base NCM Node Configuration Manager for system configuration Framework, where service- specific plug-in (Components) makes necessary system. Hierarchical Template Based Structure Makes one common structure for different databases Contains cluster descriptions, networking parameters etc AII Automated Installation Infrastructure Works on top of native RH/SL installer using PXE. Anaconda / KickStart. DHCP server (IP address + kernel location). TFTP server (boot kernel). HTTP server (OS images + packages).

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 16 For Installing Cluster Site Basic Requirement Cluster Building : Quattor Job Scheduling : Condor Some basic steps after Quattor installations C3 commands for High availability (if Dual NIC) Bonding Package LDAP (Lightweight Directory Access Protocol) S/W Firewall (Make firewall rules) Specialized workload management system. Provides a job queuing mechanism, scheduling policy, resource monitoring, and resource management. Can checkpoint and migrate a job to a different machine

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 17 Condor Daemons

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 18 Job Submission Steps

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 19  condor_compile  Re-links source or object files with condor libraries  Condor library provides check-pointing, migration, remote system calls  condor_submit - Takes as input submit description file and produces a job classAd for further processing by central manager  condor_status – to view about various machines in the Condor pool  condor_q – for viewing job status Condor Commands

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 20 Submit description files  Directs queuing of jobs  Contains  Executable location  Command line arguments to job  stdin, stderr, stdout  Initial working directory  should_transfer_files =. NO disables condor file transfer mechanism  when_to_transfer_output =

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 21 Cluster Monitoring & Job Throwing : Ganglia Ganglia is a scalable distributed monitoring system for high-performance computing systems. Relies on a multicast-based listen/announce protocol to monitor state. Very low per-node overheads and high concurrency. It uses XML for data representation XDR for compact, portable data transport, RRDtool for data storage and visualization.

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 22 Ganglia Monitoring Daemon (gmond) Gmond is a multi-threaded daemon. Runs on each cluster node those we want to monitor. Ganglia Meta Daemon (gmetad) Start it only Management node. Ganglia PHP Web Front-end Displays Ganglia data in a meaningful way Cluster Monitoring & Job Throwing : Ganglia New Era of Internet Use started We had used Internet / Web as Information / Knowledge Base Now we can use http for computing also. Open page, select executable file and submit it. This file will execute on Cluster Client node.

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 23 With EDG Grid connectivity :- ALIEN, EGEE, gLite, LCG-2 ??? Cluster  Grid To become a Part of Global Monitoring : MonaLisa, Lemon.

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 24 VECC Cluster Machine status One Interactive node:- At this time we have only one Interactive node we will procure more in near future. #ssh interactive001 Other Computing type of nodes:- Here 6 Computing nodes (node001 to node006). One cannot login to these nodes but compute jobs. One can use these for Batch mode for computing, not in Interactive mode.

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 25 Where we land up Now PC – Post Card PC – Personal Computer PC – Packed Cluster

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 26 Future Work C++ and MPI (Massage Passing Interface) will be the Future for clusters. For optimum use of cluster users have to learn MPI Questions ??