Gilbert Thomas Grid Computing & Sun Grid Engine “Basic Concepts”
Agenda ● Introduction ● Grid Computing ● Sun Grid Engine (SGE)
Problem : Not using Scientists/Engineers efficiently Solution: A Grid makes it easy for the engineers to submit jobs. They run more tests— product design cycle improves. Benefits: Increase productivity which leads to shorter time to market, higher quality and lower costs The Productivity Challenge
Grid Computing A New Computing Utility Model Problem-solving through resource pooling in virtual systems: Virtualization of… Transparent scalability of… Access that is... Resources into a dynamic, single compute resource CPU cycles, storage Dependable, consistent, pervasive, inexpensive
Stages of Sun Grid Computing Cluster Grid Departmental Computing Simplest Grid deployment Maximum utilization of departmental resources Resources allocated based on priorities Campus Grid Enterprise Computing Resources shared within the enterprise Policies ensure computing on demand Gives multiple groups seamless access to enterprise resources Global Grid Internet Computing Resources shared over the Internet Global view of distributed datasets Growth path for enterprise Campus Grids
Grid Computing Model Cluster Grids Usage Simplest Grid deployment Single team: Project Department Single site firewall Benefit Optimal alignment of resources, tasks, and budgets Industry Examples Automotive—More simulations for safer cars Entertainment—Faster image-frame rendering Life Sciences—Pattern matching against huge datasets EDA—Increased design iterations create more powerful devices
Grid Computing Model Campus Grids Usage Multiple teams in organization share one or more Cluster Grids Single site to enterprise-wide Benefit Maximum ROI and utility Industry Examples Manufacturing—Collaborative engineering projects Oil and Gas—Mining-distributed databases Finance—More Monte Carlo simulations for uncovering new business
Grid Computing Model Global Grids N1 Usage Linked Cluster and Campus Grid Models across many organizations Typically used for research Benefit Creates large virtual system Facilitates collaboration between organizations Industry Examples Medicine—Provides expert teams access to medical instruments and distributed computing resources Academia—Facilitates collaboration between geographically dispersed groups Research—Enables compute- intensive projects beyond the firewall
Grid Computing Adoption Trends Campus Grids Multiple teams Single organization Global Grids Multiple teams Multiple organizations Cluster Grids Single team Single organization
Key Software Technologies for the Grid Cluster Grid: Sun Grid Engine Campus Grid: Sun Grid Engine, Enterprise Edition Global Grid: Globus, Avaki = Sun Grid Computing software
How it Works Grid Hardware and Software Components Resource management services above OS layer to integrate systems Hardware/OS systems are unchanged Minimal management software/tool costs Connecting people, departments,organization s, communities
Cluster Grid Solution Sun™ Grid Engine Maximize resources for single projects, teams, departments Prioritize jobs Manage jobs from start to finish Free download for Solaris and Linux Operating Environments
Sun Grid Engine Free Downloads First Year Fast becoming the most-used Distributed Resource Manager (DRM) tool 3016 unique sites 118,000 CPUs worldwide run Sun Grid Engine 1 new CPU every 5 minutes Over 90 countries 60% never used Grid software before 92% rated Sun Grid Engine as Good, Very Good, or Excellent
Existing Problem In Clusters Bottleneck Idle Overloaded
Load Balancing – Ensure no single compute resource is overloaded – SGE automatically finds the resource with the least load for every new job – If no free resource is found, the job is queued till a free resource is available – Implication: Jobs run and finish faster! Solution : Sun Grid Engine
Job types - a mixture of: – Batch – Interactive (qsh, qrsh, qlogin) – Parallel (mpi, pvm...) – Checkpointing – Array Jobs (unlimited size, massive scalability) Dynamically changeable while pending (prior to execution) Job Types
Monitoring ● Qmon ● Mail notification ● Qstat
Qmon: SGE’s GUI
Configuring Queues
Checking Queue Status
Submitting Jobs
Checking Job Status
Qstat Display all info about queues > qstat -f State column: - r= running- s= suspended - q = queued- w= waiting
Qmod Control the status of the queues in your cluster - qmod –dDisable a queue - qmod –eEnable a queue - qmod –sSuspend a queue - qmod –us Resume a suspended queue - qmod –cClear the error states of a queue - qstat –alarm Show the alarm state of a queue
Complexes Set host-specific attributes: - Number of slots - Maximum amount of memory that can be used - Maximum number of diskblocks that can be used - Maximum load for that host Set requestable values to a queue: - Software licences - Available memory - Available disk space - Specific data-sets
Parallel Environments ● Parallel Virtual Machine (PVM) ● Message Parsing Interface (MPI) ● A parallel environment allows execution of shared memory and distributed memory applications.
Parallel Environments Advantages of tight integration with SGE: - Correct accounting - Full job control, i.e.: suspending tasks - Resource limits - Cleaning up/killing all tasks
References Sun Grid Engine Home Sun Grid Engine Open Source Sun Grid Engine Web-Based Training
Gilbert Thomas Associate Engineer Thank you! Thank You For further enquiries,