Presentation is loading. Please wait.

Presentation is loading. Please wait.

ROCKS & The CASCI Cluster By Rick Bohn. What’s a Cluster? Cluster is a widely-used term meaning independent computers combined into a unified system through.

Similar presentations


Presentation on theme: "ROCKS & The CASCI Cluster By Rick Bohn. What’s a Cluster? Cluster is a widely-used term meaning independent computers combined into a unified system through."— Presentation transcript:

1 ROCKS & The CASCI Cluster By Rick Bohn

2 What’s a Cluster? Cluster is a widely-used term meaning independent computers combined into a unified system through software and networking. At the most fundamental level, when two or more computers are used together to solve a problem, it is considered a cluster.

3 Beowulf Cluster? Beowulf Clusters are scalable performance clusters based on commodity hardware, on a private system network, with open source software (Linux) infrastructure. The designer can improve performance proportionally with added machines. The commodity hardware can be any of a number of mass-market, stand-alone compute nodes as simple as two networked computers each running Linux and sharing a file system or as complex as 1024 nodes with a high-speed, low-latency network.

4 High Performance or High Throughput The key questions are: Granularity & Degree of Parallelism Have you got one big problem or a bunch of little ones? To what extent can the problem be decomposed into sort-of-independent parts (grains) that can all be processed in parallel? Granularity –Fine-grained parallelism – the independent bits are small, need to exchange information, synchronize often. –Coarse-grained – the problem can be decomposed into large chunks that can be processed independently.

5 HPC versus HTC Fine-grained problems need a high performance system –That enables rapid synchronization between the bits that can be processed in parallel –Runs the bits that are difficult to parallelize as fast as possible Coarse-grained problems can use a high throughput system –It maximizes the number of parts processed per minute HPC systems use a smaller number of more expensive processors expensively interconnected and is reliable HTC systems use a large number of inexpensive processors, inexpensively interconnected

6 Other Types of Clusters 1.Highly Available (HA) Generally small number of nodes Redundant components Multiple communication paths. 1.Visualization Clusters Each node drives a display OpenGL machines

7 Cluster Architecture Frontend Node Public Ethernet Private Ethernet Network Application Network (Optional) Node

8 So What’s a Grid? The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid.Today there are many definitions of Grid computing. IBM defines Grid Computing as "the ability, using a set of open standards and protocols, to gain access to applications and data, processing power, storage capacity and a vast array of other computing resources over the Internet. A Grid is a type of parallel and distributed system that enables the sharing, selection, and aggregation of resources distributed across 'multiple' administrative domains based on their (resources) availability, capacity, performance, cost and users' quality-of-service requirements" Grids can be categorized with a three stage model of departmental Grids, enterprise Grids and global Grids.

9 NYSGrid Status

10 Things to Consider Clusters are phenomenal price/performance computational engines. However They can be hard to manage without experience High-performance I/O is still evolving Finding out where something has failed increases at least linearly as cluster size increases Not cost-effective if every cluster “burns” a person just for care and feeding Programming environment could be vastly improved Technology is changing very rapidly. Scaling up is becoming commonplace

11 CASCI Cluster Center for Advancing the Study of Cyberinfrastructure (CASCI) Guy Johnson, Director

12 CASCI Cluster Hardware Head Node (1) IBM xSeries 345 1 GB Ram 2 Pentium 4 2.0 GHz 6 Hard Drives 36 GB (Internal RAID 5) 2 Gig Ethernet Ports Compute Nodes (47) IBM xSeries 330 512 MB Ram 2 Pentium 3 1.4 GHz 1 36 GB Hard Drive 1 Gig Ethernet Port

13 NYSGrid Cluster Hardware Head Node (1) IBM xSeries 330 768 MB Ram 2 Pentium 3 1.4 GHz 1 36 GB Hard Drive 2 Fast Ethernet Ports Compute Nodes (4) IBM xSeries 330 512 MB Ram 2 Pentium 3 1.4 GHz 1 36 GB Hard Drive 1 Fast Ethernet Port Experimental global grid cluster connected to other universities within New York state.

14 CASCI Cluster NETWORK The local network (eth0) is gigabit Ethernet using an Extreme Networks 6808 gigabit switch.

15 CASCI Cluster Images

16 The Great Wall of Cluster! Cluster courtesy of Paul Mezzanini Located behind CASCI Cluster racks

17 ROCKS Clustering Software

18 ROCKS Collaborators San Diego Supercomputer Center, UCSD Scalable Systems Pte Ltd in Singapore High Performance Computing Group, University of Tromso The Open Scalable Cluster Environment, Kasetsart University, Thailand Flow Physics and Computation Division, Stanford University Sun Microsystems Advanced Micro Devices

19 ROCKS Cluster Software Goal: Make Clusters Easy! 1.Easy to deploy, manage, upgrade and scale. 1.Help deliver the computational power of clusters to a wide range of scientific users. Making stable and manageable parallel computing platforms available to a wide range of scientists will aid immensely in improving the state of the art in parallel tools.

20 Supported Platforms ROCKS - is built on top of RedHat Linux releases (CentOS) - supports all the hardware components that RedHat supports, but only supports the x86, x86_64 and IA-64 architectures. Processors x86 (ia32, AMD Athlon, etc.) x86_64 (AMD Opteron and EM64T) IA-64 (Itanium) Networks Ethernet (All flavors that RedHat supports, including Intel Gigabit Ethernet) Myrinet (provided by Myricom) Infiniband (provided by Voltaire)

21 Minimum Hardware Requirements Frontend Node Disk Capacity: 20 GB Memory Capacity: 512 MB (i386) and 1 GB (x86_64) Ethernet: 2 physical ports (e.g., "eth0" and "eth1") Compute Node Disk Capacity: 20 GB Memory Capacity: 512 MB Ethernet: 1 physical port (e.g., "eth0")

22 ROCKS Distribution The ROCKS software is bundled into various packages called “Rolls” and put on CDs. Rolls are specially compiled to fit into the ROCKS installation methodology. Rolls are classified as either mandatory or optional. Rolls cannot be installed after the initial installation.

23 ROCKS Base Rolls The minimum requirements to bring up a frontend is to have the following Rolls. Kernel/Boot Roll Core Roll (Base, HPC, Web-server) OR BASE, HPC & Web-server Rolls Service Pack Roll OS Roll - Disk 1 OS Roll - Disk 2

24 ROCKS Optional Rolls The optional Rolls are: –Core Roll Area 51 (chkrootkit and tripwire) Ganglia (system monitoring software) Grid (software for connecting clusters) Java (Sun Java SDK and JVM) SGE (Sun Grid Engine scheduler) –Bio (bioinformatics utilities (release 4.2)) –Condor (high throughput computing tools) –PBS (portable batch scheduling software) –PVFS2 (parallel virtual file system version 2) –VIZ (visualization software) –Voltaire(Infiband support for Voltaire IB hardware)

25 ROCKS Software Stack

26 The Head Node Users login, submit jobs, compile code, etc Uses two Ethernet interfaces – one public, one private for compute nodes Normally has lots of disk space (system partitions < 14 GB) Provides many system services –NFS, DHCP, DNS, MySQL, HTTP, 411, Firewall,etc Cluster configuration

27 Compute Nodes Basic compute workhorse Lots of memory (if lucky) Minimal storage requirements Single Ethernet connection for private LAN Disposable OS easily re-installed from head node Nodes can be heterogeneous

28 NFS in ROCKS User accounts are served over NFS –Works for small clusters (< 128 nodes) –Will not work for large clusters (>1024) –NAS tends to work better Applications are not served over NFS –/usr/local does not exist –All software is installed locally (/opt)

29 411 Secure Information Service Provides NIS-like functionality Securely distributes password files, user and group configuration files and the like using Public Key Cryptography to protect file content. Uses HTTP to distribute the files Scalable, secure and low latency

30 411 Architecture 1.Client nodes listen on the IP broadcast address for “411 alert” messages from the head node. 2.Nodes then pull the file from the head node via HTTP after some delay to avoid flooding the master with requests.

31 As Simple as 411 To make changes to the 411 system you simply use “make” and the 411 “Makefile” similar to NIS. To publish 411 changes, on the head node run the command: 411put To retrieve 411 changes, on the compute node run the command: 411get or on the head node: cluster-fork 411get --all

32 Ganglia Monitoring Ganglia is a scalable distributed monitoring system for high- performance computing systems such as clusters and grids It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. Provides a heartbeat to determine compute node availability.

33 Cluster Status with Ganglia

34 Security Tools Tripwire runs everyday and emails results. Chkrootkit is available and is manually executed Iptables is used as the firewall. Only trusted networks are allowed access.

35 Job Management Not recommended to run jobs directly! –Can hog cluster/nodes –No accountability Use installed job scheduler –You can submit multiple jobs and have it queued (and go home!) –Fair Share Allow other people to use the cluster also! - Accountability CASI Cluster Users! Without job management

36 Scheduling Systems Sun Grid Engine (default scheduler) –Rapidly becoming the new standard –Integrated into Rocks by Scalable System –Now the default scheduler for Rocks –Robust, dynamic and heterogeneous –Currently using 6.0 Portable Batch System(torque) and Maui –Long time standard for HPC queuing systems –Maui provides backfilling for high throughput –PBS/Maui system can be fragile and unstable –Multiple code bases: PBS, OpenPBS, etc Condor – high throughput computing ( currently under evaluation)

37 Sun Grid Engine (SGE) SGE is resource management software –Accepts jobs submitted by users –Schedules them for execution on appropriate systems based on resource management policies –Can submit 100s of jobs without worrying where it will run –Supports serial as well as parallel jobs

38 SUN Grid Engine Versions SGE Standard Edition –Linux cluster SGE Enterprise Edition –when you want to aggregate a few clusters together and manage them as one resource –When you want sophisticated policy management User/Project share Deadlines User, Department, Project level Rocks comes standard with SGE Enterprise 6.0

39 Cluster Web Site (http://cluster.rit.edu)

40 Requesting an Account

41 Accessing the Cluster Access the cluster via an SSH client PuTTY SSH Secure Shell X-Win32 F-Secure To transfer data to the cluster use either scp or sftp. Windows users can download and use WinSCP (http://winscp.net)

42 Available Applications BLAST (basic local alignment search tool for bio research) ENVI / IDL Data Visualization Software GCC (C, C++, Fortran programming) Mathematica (licensing limitations) Matlab (licensing limitations) mpiBLAST (parallel version of BLAST) MPICH (MPI parallel programming)

43 Other Alternatives to ROCKS Clustering Software Perceus / Warewulf (www.warewulf-cluster.org) openMosix Project (openmosix.sourceforge.net) Score Cluster System (www.pcluster.org) OSCAR (oscar.openclustergroup.org) System Imaging / Configuration Software System Imager (wiki.systemimager.org) Cfengine (www.cfengine.org) LCFG (www.lcfg.org)

44 THANK YOU A Bad to the Bohn Production


Download ppt "ROCKS & The CASCI Cluster By Rick Bohn. What’s a Cluster? Cluster is a widely-used term meaning independent computers combined into a unified system through."

Similar presentations


Ads by Google