Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group, Computer and Network System Research Laboratory Department of Computer.

Similar presentations


Presentation on theme: "1 Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group, Computer and Network System Research Laboratory Department of Computer."— Presentation transcript:

1 1 Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group, Computer and Network System Research Laboratory Department of Computer Engineering,Faculty of Engineering Kasetsart University Bangkok, Thailand Phone: (662) 942 8555 Ext.. 1416 Fax: (662) 5614621 Email: pu@smile.cpe.ku.ac.th

2 2 Motivation Beowulf Cluster becomes one of the most widely used platform for high performance computingBeowulf Cluster becomes one of the most widely used platform for high performance computing Very large and complex Beowulf Cluster start to appearVery large and complex Beowulf Cluster start to appear System management is still a challenging task. There are needs forSystem management is still a challenging task. There are needs for –The effective way to navigate and interact with cluster components. –Mechanism and tools to perform collective commands –Some services such as monitoring, fault detection and recovery –Special software tools that recognize special characteristics and needs of the cluster administration task

3 3 SCMS: An Extensible Cluster Management Tool for Beowulf Cluster A collection of system management tools for Beowulf ClusterA collection of system management tools for Beowulf Cluster Package includesPackage includes –Portable real-time monitoring –Parallel Unix command –Alarm system –Large collection of graphical user interface tools for users and system administrator Checking user statusChecking user status Remote software installationRemote software installation System disk space and process space statusSystem disk space and process space status Boot up and shutdown nodesBoot up and shutdown nodes Change node configuration remotelyChange node configuration remotely –Web/VRML interface Current version 1.1 only support RedHat LinuxCurrent version 1.1 only support RedHat Linux

4 4 Portable Real-time Monitoring Provides a global access to node informationProvides a global access to node information –Interface to local OS and get node information –Collect the information to a single point –Provides heartbeat and node health diagnostic –Provides API for application to access the information. The API is available in C, Java, and TCL/TK. System ArchitectureSystem Architecture –Client/Server –Layered Architecture

5 5 System Architecture CMA - Control and Monitoring AgentCMA - Control and Monitoring Agent –Get system information from local operating system on each node –Portability is achieved using HAL (Hardware Abstraction Layer) SMA - System Management AgentSMA - System Management Agent –Running on management node to collect information from CMA RMI - Resource Management InterfaceRMI - Resource Management Interface –Library that provides interface to functionality of SMA CMA SMA System Information Repository Resource Management API ( C, TCL, Java) Configuration Management Task Scheduling Performance Monitoring Parallel Unix command LOCAL OS (LINUX) HAL HAL API CMA

6 6 Parallel Unix Command Parallel version of commonly used unix commands such as pps, pls, prmParallel version of commonly used unix commands such as pps, pls, prm Follows the scalable unix tool model (Lusk and Gropp 1994)Follows the scalable unix tool model (Lusk and Gropp 1994) Graphical user interface for these commandsGraphical user interface for these commands –Ease of use –Filtering output data ps -aux data command data command data command pps -aux

7 7 Alarm System Set of daemons that monitor important system parametersSet of daemons that monitor important system parameters –Processor utilization, Memory usage, Main board temperature and more User can specify the condition to alarm and action to be takenUser can specify the condition to alarm and action to be taken Issues the alarm and shutdown some part of the system if neededIssues the alarm and shutdown some part of the system if needed Notification is sent using email. Future release will include pager, ICQ and speech synthesisNotification is sent using email. Future release will include pager, ICQ and speech synthesis Detector Alarm Manager Config Notification/action

8 8 SCMS Utilities SCMS Comes with many GUI utilities Node statusNode status Control PanelControl Panel Disk SpaceDisk Space Process StatusProcess Status Shutdown/RebootShutdown/Reboot Remote loginRemote login User statusUser status Package InstallationPackage Installation

9 9 SCMS Screen Shot

10 10 KCAP Web and VRML based Interface for SCMS Two versions of Web Interface are availableTwo versions of Web Interface are available –KCAP : Normal web interface –KCAP-VR : VRML Interface that allows you to walk and interact with your cluster Java Applet is used to report real-time system informationJava Applet is used to report real-time system information Web Generator VRML World Generator VRML World Web Tree Web server External Network Real time Monitoring System Config

11 11 KCAP and KCAP-VR Screen shot

12 12 Future Works KSIX: A frame work to support parallel tools and applicationsKSIX: A frame work to support parallel tools and applications Offer features such asOffer features such as –process control, signal delivery –Naming services –Event based communication Interconnection Network Application Node Hardware Node OS Node Hardware Node OS Node Hardware Node OS Node Hardware Node OS KSIX (Kasetsart System Interconnect eXecutive) MPI

13 13 SQMS: SMILE Queuing Management System Batch scheduler for sequential an parallel taskBatch scheduler for sequential an parallel task Static and dynamic load balancingStatic and dynamic load balancing Reconfigurable scheduling policyReconfigurable scheduling policy Auto docking between clusterAuto docking between cluster Submitter Task Queue Node Allocator Scheduler Cluster Nodes Remote Queue

14 14 Beowulf Computing Environment at Kasetsart University, Thailand SMILE Beowulf Cluster –16 nodes Pentium II/III Cluster Test bed for cluster technology and support of HPC research activitiesTest bed for cluster technology and support of HPC research activities PIRUN Beowulf Cluster (Pile of Redundant Universal Nodes) 72 nodes Beowulf System72 nodes Beowulf System –PII500 MHz, 128 MB RAM Largest Computing System in ThailandLargest Computing System in Thailand Installation will completein December 1999Installation will completein December 1999


Download ppt "1 Putchong Uthayopas, Thara Angsakul, Jullawadee Maneesilp Parallel Research Group, Computer and Network System Research Laboratory Department of Computer."

Similar presentations


Ads by Google