Presentation is loading. Please wait.

Presentation is loading. Please wait.

2015-06-02 Jaeyoung Choi School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul 156-743, Korea {heaven, psiver,

Similar presentations


Presentation on theme: "2015-06-02 Jaeyoung Choi School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul 156-743, Korea {heaven, psiver,"— Presentation transcript:

1 2015-06-02 choi@comp.ssu.ac.kr Jaeyoung Choi School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul 156-743, Korea {heaven, psiver, lithmmon}@ss.ssu.ac.kr, choi@comp.ssu.ac.krlithmmon}@ss.ssu.ac.kr Jiyeon Kim, Yongkwan Park, Sungjoo Kwon, Jaeyoung Choi School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul 156-743, Korea

2 2015-06-02Soongsil university Motivation Linux Cluster System widely used for high performance computing It emphasizes on the use of commodity hardware and open source software It delivers a very high-performance at the extremely low cost System management is a challenging task Automatic and convenient installation of OS & application software packages The effective way to navigate and interact with cluster component Mechanism and tools to perform collective commands Some services such as monitoring, fault detection and recovery

3 2015-06-02Soongsil university What is CATS-i ? Cluster Administration ToolS on the Internet A collection of system management tools Provides automatic and convenient installation of OS & application software packages Provides efficient monitoring and management of cluster nodes with simple operation on the Internet. Provides easy-to-use GUI of PBS. Easy-to-install CATS-i rpm package

4 2015-06-02Soongsil university CATS-i System Architecture Setup tool Client daemon Client daemon Client daemon Server Daemon Management tool Repository Client Daemon Get system information from local OS on each node Server Daemon Running on server node to collect information from client daemon Setup tool Implemented with JAVA Management tool Implemented with JAVA Support internet

5 2015-06-02Soongsil university NodeCloner CACR at CalTech to make all nodes identical using the Bootp and NFS not provide a GUI must edit the setup files related to NodeCloner Beoboot Rembo Technology SaRL, Swizerland Boot-ROM booting using DHCP using batch file interpreter defect: make the batch file, difficult interface Difference with CATS-i

6 2015-06-02Soongsil university Difference with CATS-i LUI(Linux Utility for cluster Installation) IBM Support BOOTP protocol and using DHCP and PXE. GUI Interface Heterogeneous cluster Must define the resource object Using TFTP  As the number of nodes is increased, I/O road is increased.

7 2015-06-02Soongsil university Installation using the IP Multicasting It provides same speed of installation and reduce I/O load Automatically, multicast a client module through NFS Sever sends slave node disk image through the D class IP address To make up for the unreliability of UDP timeout and retransmission

8 2015-06-02Soongsil university Setup tools with IP multicasting Multicast Server Module GUI Error/Flow Control Node DB Node 1 D class IP (224.0.0.0 ~ 239.255.255.255) Network Configuration info UDP Master node Node 2Node 3 …… Node N

9 2015-06-02Soongsil university Setup tool in the CATS-i Disk Cloning using the NFS A slave node must be boot with DHCP and NFS enabled kernel It has a same way to boot as the diskless terminal using DHCP It makes a disk image of a slave node include hard disk info store slave node disk image in the server disk

10 2015-06-02Soongsil university OS Setup tools Architecture - Disk cloning Master-node server InterfaceSlave-node Hard diskImage file Low disk input Disk config Boot disk management DHCP server Daemon Backup wizard Boot kernel image Init Program NFS client 1.Start 2.command mode change 3.booting4.IP info 5.Query 7.Mode 8.Operation 9 10 11.Partition info 14 12 13 Client program 15.Result Lock management 6 Disk cloning preparation Step 1, 2, 3 Command operation Step 4, 5, 6, 7, 8 Make disk image Step 9, 10, 11, 12, 13 Save disk image Step 14, 15

11 2015-06-02Soongsil university OS Setup tools Architecture - Installation 3.booting Master-node Server InterfaceSlaver-node Hard DiskImage file Low disk output Disk Config format Boot disk management DHCP server Daemon Restore wizard Boot kernel image Init Program Multicast Client 1.Start 2.command mode change Lock management 4.IP info 5.Query 7.Mode 6 8. Operation 12 11 8.connect 1014 13 Sender Client program 15. Result 9.Start command Installation preparation Step 1, 2, 3 Command operation Step 4, 5, 6, 7, 8, 9 Installation Step 10, 11, 12, 13, 14

12 2015-06-02Soongsil university OS Setup tools Slave Node Master Node

13 2015-06-02Soongsil university Related works for CMS -VACM Cluster administration tool runs on VA-Linux Real-time hardware sensor data such as temperature, fan speed and voltage are reported VA Linux Systems

14 2015-06-02Soongsil university Related works for CMS - MAT Ryerson University, Canada It is implemented with Tcl/Tk It causes a lot of overhead to display rapidly changing data Individual management about each node monitor about system file mainly

15 2015-06-02Soongsil university Related works for CMS - SCMS Kasetsart University It consists of real-time monitoring system, parallel unix command and numerous system administration utilities It supports java applet to report real-time system information It supports 3D interface using VRML

16 2015-06-02Soongsil university Related works for CMS – M3C Oak Ridge National Lab It is implemented with java.User can manage multiple cluster group in one interface It supports job scheduling and software installation

17 2015-06-02Soongsil university Management tools in the CATS-i Management tool offers maintenance of cluster nodes. Characteristics of management tool It is possible to bind many node as one cluster group, and manage multiple cluster groups in one place. It is possible to apply the same operation efficiently to all or selected nodes. It offers real-time monitoring to users for resource information such as CPU, memory and etc. Console implemented with java is interactive and easy to use. Job scheduling using JPBS through Internet CATS-i offers many function about resource.

18 2015-06-02Soongsil university CATS-i function Node status CPU, memory, process, user list, account Disk space File management Alarm System log Shutdown/Reboot Package management JPBS

19 2015-06-02Soongsil university Management tools – Node status total view It shows node information for each group Real-time information about CPU and memory

20 2015-06-02Soongsil university It enable user to monitor resource information of cluster nodes such as CPU, memory, account, user, real-time CPU and memory monitoring, process monitoring, and managing Management tools – Node status basic info Performance

21 2015-06-02Soongsil university User List Disk Account process

22 2015-06-02Soongsil university It provides file management functions for a cluster group. Management tools – file management It is very easy to use When they want to perform jobs related with files, users just click the right button to show a pop-up menu. File Management

23 2015-06-02Soongsil university Management tools – alarm function Monitor import system parameters Processor utilization, Memory Usage, etc. Notification is done through e-mail of system functions.

24 2015-06-02Soongsil university Log information is very useful in various situation Server daemon collects log information from each node Management tools – system log Log Tree

25 2015-06-02Soongsil university User can install, remove, upgrade application packages with management tool and query about installed RPM Management tools – RPM package Support REDHAT Linux It is implemented with thread library Option Dialog

26 2015-06-02Soongsil university Management tools – PBS Interface It enables users to user a general PBS with the same CATS-i interface. main screen JPBS job Submission Dialog

27 2015-06-02Soongsil university Conclusion & Future works CATS-i will offer more functions such as Status of CPU temperature, voltage and speed Extended aggregation of services  Statistical memory and CPU information for each user  Statistical information can be displayed graphically Network monitoring using SNMP and network analysis  detect network bottleneck of clusters. Enhanced alarm services  Administrator can can specify the condition to alarm and action to be taken  In emergence, CATS-i can shutdown or reboot cluster nodes


Download ppt "2015-06-02 Jaeyoung Choi School of Computing, Soongsil University 1-1, Sangdo-Dong, Dongjak-Ku Seoul 156-743, Korea {heaven, psiver,"

Similar presentations


Ads by Google