Download presentation
Presentation is loading. Please wait.
Published byTrevor Beeney Modified over 9 years ago
1
We make the net affordable. We make the net available. We make the net reliable. We make the net work. Version: 03/2005 Jongjun Son Sun Microsystems, korea Sun's Infrastructure Solution for Grid Engine http://sun.com/grid
2
2 Agenda Sun's Grid Strategy Sun's Grid Software N1 Grid Engine 6 Technical Overview
3
3 Sun's Grid Strategy
4
4 Sun's Grid Computing Approach ● A flexible and scalable architecture – Pools computing resources to solve important problems – Collects unused capacity for better utilization – Architecture for seamless addition of resources – Up to hundreds or thousands of processors and systems ● Multi-platform, Multi-OS ● Distributed resource management (DRM) ● Distributed system and software management A well-designed Grid Computing infrastructure is accessed, used, and managed as a single, unified resource
5
5 Supported Platforms Download and try it out free at http://gridengine.sunsource.net: N1 Grid Engine
6
6 Compute Elements ● Access systems – Thin clients, workstations ● Compute nodes – Linux and Solaris Operating Systems – Compact 1U and 2U servers – Blade servers – Larger symmetric multiprocessing (SMP) systems – Sun Fire Superclusters ● Pre-configured Grid Computing rack systems Sun's End-to-end Product Line Sun Fire Compute Grid rack system
7
7 Sun Fire Compute Grid Engineered, Tested, Integrated, Supported ● Up to 32 Sun Fire V20z, or Up to 10 V40z ● Sun Control Station ● Sun N1 Grid Engine Software ● Upto 2 * 24port Gigabit Ethernet Switches ● 48-port Terminal Server ● Keyboard/Video/Mouse shelf unit ● Sun Rack 1000-38
8
8 Sun's Grid Software
9
9 Software Elements Sun QFS/SamFS Solaris CacheFS N1 Grid Engine Solaris TM Resource Manager N1 Grid Engine Cluster Grid Infrastructure Global Grid Infrastructure Enterprise Grid Infrastructure Sun Management Center Sun Control Station Service Discovery Authentication/ Authorization Data Management Policy Management Resource Management System Management Data Access Small to Large Grid Computing Solutions Industry Standards and partner technologies OGSA, Globus Toolkit, Avaki
10
10 N1 Grid Engine 6 Policy management – Owners negotiate usage – 4 different, customizable policy schemes – Exceptions for specific needs Benefits – Equitable, enforceable sharing between groups – Alignment of resources with business goals Distributed Resource Manager, Job scheduling
11
11 Sun Cluster Grid Manager ● Sun Control Station software – System health and performance monitoring – Pull, push, and automatic provisioning – Deploy both Linux and Solaris x86 images ● Integrated grid management module – Manages Sun Grid Engine or Sun Grid Engine, Enterprise Edition ● Aggregated Management – Address hundreds of systems individually or groups – Combined system, software, and grid management Unified Remote System and Grid Management
12
12 Sun Cluster Grid Manager
13
13 Grid Engine Portal
14
14 A Complete Solution Proven and Repeatable Reference Architectures Servers Workstations Control Network (Gigabit Ethernet) Data Network (Gigabit Ethernet) Sun StorEdge storage solutions (Direct-attached, NAS, HA-NFS, HPTC SAN) Sun ONE Grid Engine Sun Compute Grid rack systems Sun Cluster Grid Manager
15
15 Grid Scalability from Local to Global Cluster, Enterprise, and Global Grids Global Grid Enterprise Grid Internet Cluster Grid
16
16 N1 Grid Engine 6 Technical Overview
17
Agenda ● N1 Grid Engine Overview ● Architecture ● Resource, data access ● Application Intergration ● N1GE6 New feature ● Accounting & Reporting
18
N1 Grid Engine Overview # # BLAST # blastall -p blastn -i /nfs/data Grid Engine Selection of Jobs Simple policies : FIFO, equal share, rank Sophisticated policies : sharing, urgency, priority, deadline, resource-based, etc Selection of Resources System characteristics: CPU, memory, OS, patches, etc. Status of systems: avail. mem, load, free disk space, etc. Status of other resources: licenses, shared storage, other software, etc. Selection of Jobs Simple policies : FIFO, equal share, rank Sophisticated policies : sharing, urgency, priority, deadline, resource-based, etc Selection of Resources System characteristics: CPU, memory, OS, patches, etc. Status of systems: avail. mem, load, free disk space, etc. Status of other resources: licenses, shared storage, other software, etc. Resource Management
19
N1 Grid Engine Overview # # BLAST # blastall -p blastn -i /nfs/data Grid Engine Control of jobs Suspend, Resume, Kill, Migrate, Restart Customizable action methods Manual or automated via policies Control of resources Regulate load from Grid jobs based upon resource value thresholds Control access via permissions, time/date, jobtype Allocate systems to jobs based on total resource consumption (eg, memory, CPUs, disk, etc) Control of jobs Suspend, Resume, Kill, Migrate, Restart Customizable action methods Manual or automated via policies Control of resources Regulate load from Grid jobs based upon resource value thresholds Control access via permissions, time/date, jobtype Allocate systems to jobs based on total resource consumption (eg, memory, CPUs, disk, etc) Resource Control
20
N1 Grid Engine Overview # # BLAST # blastall -p blastn -i /nfs/data Grid Engine Accounting of jobs Current resource consumption always monitored Total detailed consumption recorded at end of job Includes record of user, department, project, etc, Accounting of resources Current usage of resources on hosts always monitored Information recorded over time: resource utilization of hosts, grid; grid configuration changes Accounting of jobs Current resource consumption always monitored Total detailed consumption recorded at end of job Includes record of user, department, project, etc, Accounting of resources Current usage of resources on hosts always monitored Information recorded over time: resource utilization of hosts, grid; grid configuration changes Resource Accounting
21
Grid Engine 6 Architecture Submit Host Admin Host Master Host Sche dd Qma ster Exec Host exec d Access TierCompute TierManagement Tier SGE daemons TCP/IP Shadow Host?
22
22 Resources Per Host ● load_avg ● mem_free ● OS/patch-level Global ● floating licenses ● shared storage ● job resource request: job A needs 1 license and 1GB ● Load/suspend thresholds: suspend jobs if load_avg > 1.5 ● load formulas: send jobs to hosts with least load; out of those, choose hosts with most free memory Resources used for THE HEART OF GRID ENGINE MANAGEMENT Built-in and custom resources ● Static resources: strings, numbers, boolean ● Countable resources: eg, licenses, MB of memory/disk ● Measured resources: value provided through Load Sensor
23
Parallel and Checkpointing Environments Environment a set of hosts that is used to support parallel or checkpointing applications applications must inherently support parallel/checkpointing execution H2 H3 H1H4 H5 H6 H7
24
Data Access Exec hosts App binaries Job data CONFIGURED INDEPENDENTLY NFS sharing File staging Data Grid
25
Application Integration Methods queue/host prolog Job END queue/host epilog terminate method resume method suspend method parallel start parallel stop queue/host epilog migration command clean command requeue job checkpoint command run at specified intervals START starter method General methodsParallel methods Checkpointing methods
26
Integrating applications with Grid Engine 1)Unmodified/legacy application binaries: integrate using wrapper script 2)Interactive applications: use pluggable remote mechanisms, eg, ssh, rsh, telnet two most common approaches 3)Grid-ready applications: modify code to use DRM APIs API recently standardized 4)Java applications: JGrid package for low- level coupling (object/method distribution) currently provided separately
27
27 N1GE 6 New Features Architecture ● Berkeley DB spooling ● Multi-threaded Master Daemon ● New communication system ● Scalability goals: N1GE 6 per 1 master – Up to 10,000 unique hosts – Up to 500,000 unique jobs * Array Jobs counted as a single job
28
28 N1GE 6 Supporting Platforms end.CY2004
29
29 N1GE 6 New Features Scheduler Functionality – Advanced planning capabilities ● Resource Reservation w/ Backfilling ● Can reserve any resource, eg memory, CPU, license – More sophisticated scheduling algorithms ● Management policies matched with business priorities: – Priority, urgency, share tree, category, deadline, etc
30
30 Job Resource Reservation
31
31 Simple, priority-based scheduling Time CPU Mem. CPU Mem. Lic. Host 1 Host 2 Global Job 1 Job 6 Job 3 Job 6 Job 3 Job 4 Job 5 Job 2 Wasted resources Job 6
32
32 Scheduling with Resource Reservation Time CPU Mem. CPU Mem. Lic. Host 1 Host 2 Global Job 1 Job 3 Job 4 Job 5 Job 2 Job 6
33
33 Resource Reservation with backfilling Time CPU Mem. CPU Mem. Li c. Host 1 Host 2 Global Job 1 Job 6 Job 3 Job 6 Job 3 Job 4 Job 5 Job 2 Job 6
34
Resource Management Policies Resource allocation based upon business priorities ● policy basis includes: cumulative utilization, category priority, time- based priority, resource value, etc ● powerful, flexible, tunable, easy to configure All jobs High Priority Normal Priority Low Priority Dept A: 70 more rights to high priority jobs Dept B: 30 Dept A: 50 Dept B: 50 Dept A: 50 Dept B: 50 Group X: temporary boost
35
35 Policies for Job Prioritization Priority determines which pending jobs get dispatched Job priority calculated based on three sub- policies (normalized to 0.0 < N < 1.0): prio = W urg N urg + W tix N tix + W psx N psx N urg = normalized Urgency N tix = normalized Tickets N psx = normalized Posix W = weighting factors
36
36 6.x Cluster Queue ABCD... 5.x Queue A BCD... Hosts: Cluster Queue
37
37 N1GE 6 New Features Analysis / Monitoring / Accounting ● Value-add module for doing analysis, monitoring, accounting reports, etc. – Fine-grained resource recording – Stored in RDBMS in well-defined schema – provides built-in capability for reporting, chargeback, etc – Web-based console tool provided for generating reports, queries, etc.
38
38 Why 2 nd separated DB? ● Different access considerations – Standardized access (SQL, ODBC, JDBC) – More powerful database structure ● Independent of core system data – historical data – Derived data (sums, averages...) – queries won't affect system performance – lower requirements on availability
39
39 Architecture ● Reporting-Writer: Java application ● loosely coupled to the SGE system via qmaster- generated reporting file ● Stores raw data, pre-processed data to SQL-DB via JDBC Reporting-DB Reporting File Reporting-Writer Qmaster raw data build derived values
40
40 Stored Data ● Job related information times, user, project, exit status... ● Host and queue related information load information, consumables... ● Sharetree configured shares, actual shares... ● Precomputed, derived values sums, averages per host, queue, user, project...
41
41 ARCo: Accounting and Reporting Console ● Web-based tool for displaying data in reporting DB ● Based on Sun Web Console ● Ability to create simple and advanced (SQL-based) queries ● Generates tables, graphs, exportable as CVS, PDF ● Also, command-line report generation
42
42 Selecting a query
43
43 Query Results
44
44 Defining new query
45
We make the net affordable. We make the net available. We make the net reliable. We make the net work. Version: 03/2005 jongjun.son@sun.com http://sun.com/grid
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.