Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 1 Cluster Building and Design Vikas Singhal VECC, Kolkata, India.

Similar presentations


Presentation on theme: "Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 1 Cluster Building and Design Vikas Singhal VECC, Kolkata, India."— Presentation transcript:

1 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 1 Cluster Building and Design Vikas Singhal VECC, Kolkata, India

2 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 2 General View of HPC Clustering Concept Requirement for clustering Quattor Description Working of Condor Glimpse of Ganglia Current status of our cluster Cluster Building and Design

3 February 9, 2006 Vikas Singhal, VECC 3 High Performance Computing Branch of Computing that deals with extremely powerful computers and the applications that use them. High Computing Power required for Data Intensive applications or High Computing applications. (As per requirement) Eg. Supercomputer is one of the answer for HPC. Supercomputer is characterized by very high speed, very large memory. Speed measured in terms of number of flops. Fastest computer in the world BlueGene/L (IBM made) 280 Tflops.

4 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 4 Technologies for HPC Traditional : Build Faster CPUs Special electronic technology for increasing clock speed Advanced CPU architecture (Pipelining, Vector Processing, Multiple functional units etc) Parallel Processing (Harness large number of ordinary CPUs and divide the job between then) Eg: CRAY Very high clock speed Very High heat dissipation Advanced cooling techniques required Liquid Freon / Liquid nitrogen Expensive But easy for User No special programming required Large number of conventional CPUs Interconnected through a Network Cost effective Program writing is difficult, Job has to be split into independently executable units

5 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 5 Why Clustering For High Performance and High Availability computing, Making Cluster of computers is one of the best solution.  Low cost technology than Supercomputer.  Faster than super computer of same hardware cost.  No technical and technological limitations.  Scalable and Simple.

6 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 6 High Computing Power Clustering of Computers Application Computing Intensive Task Main aim is High Performance Computing (HPC) (Most of TOP500 computers are built by clustering, In BlueGene/L 1,31,000 processors (approx)) Single User and single number crunching problem Communication between nodes should be much faster (Some Hi-Fi network card is required (Costly)) Program should be written with the help of any parallel language or in Parallel environment. Parallel Languages: LINDA, OCCAM etc Parallel Extension to serial languages: High Performance Fortran (HPF) Parallel APIs: OpenMP, MPI

7 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 7 High Computing Power Clustering of Computers Application Data Intensive Task Main aim is not High Performance Computing (HPC) but High Availability. Multi User and Multi Job System It is Part of Global Grid like EDG Security is main concern 7 collaborating Institutes More than 100 Users (Consult with Mr. S. K. Pal Talk) Internet Connectivity (High Bandwidth) is required. (We have installed 4-Mbps Leased Line (1:4))

8 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 8 How to build Cluster of Our Requirement Hardware Processors Memory (RAM) Storage No need to purchase Hi-Fi Network Card Software Cluster Building S/W Cluster Monitoring S/W Job Scheduling S/W User Management S/W According to requirement. Open Source Availability. Software Area is Very Big. Purchase according to requirement and Budget.

9 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 9 Procurement of full cluster is not at Once. Step by step process. Different H/W support different S/W. Our specific requirement Procurement of HARDWARE Procurement of SOFTWARE

10 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 10 DMZ Giga-bit Switch Management Nodes HP Proliant-360DLG3 Dual CPU Xeon 2.4 GHz 192.168.x.x (Stand by) 125.20.3.11 Computing Nodes 4Mbps (1:4) Present status of Tier2-Kol Cluster Based on High Availability

11 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 11 High Availability For Data Intensive and Real time task critical system requires High availability High AvailabilityRedundancy (Eliminate single point of failure) Each server has 2-NICs Eth0Eth1 2-Gigabit Switch Based on Bonding Concept

12 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 12 Redundancy Cont. 2 Hard Disks Both are mirror of each other. Both are hot swappable. Implemented on Hardware RAID-0 technique. Both synchronized in each millisecond. Trying to make mirror of Management node. rsync

13 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 13 Software Requirement for making Cluster Open Source Software for Cluster Building:- OSCAR: Free but harnessing of Client nodes is limited SCALI: Not free S/W. Paid with Network Cards (as in IMSc) Redhat Cluster Suits: Not much suitable CPM (Central Processor Manager) : IBM Proprietary Rocks: Not free software Quattor: Free and Best Suitable For selecting which one is “Best” according to our requirement one have to get experience with all.

14 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 14 No Specific Hardware or software required for building Quattor Cluster. Installing a Quattor Server and Client Requirements: It supports SLC or RH Linux 7.3 Disk: 6.5 GB for Server, 2.5 GB per client OS Site Address:- http://quattor.org Package RPMs:- http://quattorsw.web.cern.ch/quattorsw/software/quatttor Quattor is a large scale management system for managing medium to very large (>1000 node) clusters. 3 Sets of Quattor RPM are available:- 1. i386 :- For all Pentium or Xeon processor or that has IA32 bit Instruction set 2.IA64 :- For 64 bit machine means Intel Itanium 3.i86x64 :- For 64 bit machine but also supports x86 instruction set like AMD Opetron Quattor is an administration toolkit for optimizing resources.

15 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 15 SPMA Software Package Manager Agent for software deployment Manages the different software packages installation Handle multiple package formats Manages Software Repository (SWRep) CDB Configuration Data Base NCM Node Configuration Manager for system configuration Framework, where service- specific plug-in (Components) makes necessary system. Hierarchical Template Based Structure Makes one common structure for different databases Contains cluster descriptions, networking parameters etc AII Automated Installation Infrastructure Works on top of native RH/SL installer using PXE. Anaconda / KickStart. DHCP server (IP address + kernel location). TFTP server (boot kernel). HTTP server (OS images + packages).

16 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 16 For Installing Cluster Site Basic Requirement Cluster Building : Quattor Job Scheduling : Condor Some basic steps after Quattor installations C3 commands for High availability (if Dual NIC) Bonding Package LDAP (Lightweight Directory Access Protocol) S/W Firewall (Make firewall rules) Specialized workload management system. Provides a job queuing mechanism, scheduling policy, resource monitoring, and resource management. Can checkpoint and migrate a job to a different machine

17 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 17 Condor Daemons

18 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 18 Job Submission Steps

19 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 19  condor_compile  Re-links source or object files with condor libraries  Condor library provides check-pointing, migration, remote system calls  condor_submit - Takes as input submit description file and produces a job classAd for further processing by central manager  condor_status – to view about various machines in the Condor pool  condor_q – for viewing job status Condor Commands

20 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 20 Submit description files  Directs queuing of jobs  Contains  Executable location  Command line arguments to job  stdin, stderr, stdout  Initial working directory  should_transfer_files =. NO disables condor file transfer mechanism  when_to_transfer_output =

21 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 21 Cluster Monitoring & Job Throwing : Ganglia Ganglia is a scalable distributed monitoring system for high-performance computing systems. Relies on a multicast-based listen/announce protocol to monitor state. Very low per-node overheads and high concurrency. It uses XML for data representation XDR for compact, portable data transport, RRDtool for data storage and visualization.

22 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 22 Ganglia Monitoring Daemon (gmond) Gmond is a multi-threaded daemon. Runs on each cluster node those we want to monitor. Ganglia Meta Daemon (gmetad) Start it only Management node. Ganglia PHP Web Front-end Displays Ganglia data in a meaningful way Cluster Monitoring & Job Throwing : Ganglia New Era of Internet Use started We had used Internet / Web as Information / Knowledge Base Now we can use http for computing also. Open page, select executable file and submit it. This file will execute on Cluster Client node.

23 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 23 With EDG Grid connectivity :- ALIEN, EGEE, gLite, LCG-2 ??? Cluster  Grid To become a Part of Global Monitoring : MonaLisa, Lemon.

24 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 24 VECC Cluster Machine status One Interactive node:- At this time we have only one Interactive node we will procure more in near future. #ssh interactive001 Other Computing type of nodes:- Here 6 Computing nodes (node001 to node006). One cannot login to these nodes but compute jobs. One can use these for Batch mode for computing, not in Interactive mode.

25 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 25 Where we land up Now PC – Post Card PC – Personal Computer PC – Packed Cluster

26 Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 26 Future Work C++ and MPI (Massage Passing Interface) will be the Future for clusters. For optimum use of cluster users have to learn MPI Questions ??


Download ppt "Cluster Building and Design February 9, 2006 Vikas Singhal, VECC 1 Cluster Building and Design Vikas Singhal VECC, Kolkata, India."

Similar presentations


Ads by Google