Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems.

Similar presentations


Presentation on theme: "Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems."— Presentation transcript:

1 Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems Group

2 Context Funded by / collaborating with –UK e-Science Core Programme –IBM (Watson, Hursley) –NASA (Ames) –NEC Europe –Los Alamos National Laboratory Integrate established performance tools into emerging grid middleware High Performance Systems Group

3 What do we mean by ‘scheduling’ Users view –Jobs run somewhere on the Grid –Notion of deadline –Execution is single domain (includes pre-staging) Resource providers view –Don’t mind which jobs are run where –As long as resources are well/evenly used –Maintaining customers deadlines is important System view –Jobs can run anywhere –Resources are heterogeneous –Throughput is important, as are scheduling overheads

4 High Performance Systems Group Managing through Middleware

5 High Performance Systems Group Determine what resources are required (predict) Determine what resources are available (discover) Map requirements to available resources (schedule) Maintain contract of performance (QoS) Managing through Middleware

6 Performance Services Intra-domain –Lab- / department-based –Shared resources under local administration Multi-domain –Campus- / country-based –Wide-area resource and task management –Cross domain High Performance Systems Group

7 Performance Services High Performance Systems Group Intra-domain –Lab- / department-based –Shared resources under local administration Multi-domain –Campus- / country-based –Wide-area resource and task management –Cross domain

8 Performance Services High Performance Systems Group Intra-domain –Lab- / department-based –Shared resources under local administration Multi-domain –Campus- / country-based –Wide-area resource and task management –Cross domain

9 Performance Prediction Performance prediction tools Aim to predict –Execution time –Communication usage –Data and resource requirements Provides best guess as to how an application will execute on a given resource High Performance Systems Group

10 PACE User Application Resource

11 High Performance Systems Group PACE User Application Resource Application Model Resource Model

12 Application Model Resource Model PACE User Evaluation Engine Model parameters Resource config. High Performance Systems Group

13 Application Model Resource Model PACE User Evaluation Engine Model parameters Resource config. High Performance Systems Group

14 Why is prediction useful? Scaling properties Compare runtime options with –deadline –available resources –priority / other jobs –etc. High Performance Systems Group Allows runtime scenarios to be explored before deployment Run-time

15 1. Intra-Domain Co-Scheduling High Performance Systems Group Augment Condor scheduler with additional performance information Scheduler driver, or co-scheduler (called Titan) Use predictive data for system improvement –Time to complete tasks / utilisation of resources –QoS – ability to meet deadlines Handle predictive and non-predictive tasks

16 Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan

17 Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan

18 Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks Tasks with prediction data PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan

19 Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks Tasks with prediction data PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan

20 Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks Tasks with prediction data PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan

21 Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks Tasks with prediction data PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan

22 Intra-Domain Co-Scheduling High Performance Systems Group Non-predictive tasks Tasks with prediction data PORTAL PRE- EXECUTION ENGINE MATCHMAKER SCHEDULE QUEUE PACE GA CLUSTER CONNECTOR CONDOR REQUESTS FROM USERS OR OTHER DOMAIN SCHEDULERS RESOURCES CLASSADS Titan

23 Intra-Domain Deployment Without co-schedulerWith co-scheduler Time to complete = 70.08mTime to complete = 35.19m High Performance Systems Group

24 Publish intra-domain perf. data through global information services (MDS) Augment service with agent system –One agent per domain / VO When a task is submitted –Agents query IS, and negotiate to discover best domain to run task Scheme is tested on a 256-node exp. Grid –16 resource domains; 6 arch. types High Performance Systems Group 2. Multi-Domain Management

25 High Performance Systems Group Multi-Domain Management time

26 High Performance Systems Group Multi-Domain Management time

27 High Performance Systems Group Multi-Domain Management time

28 High Performance Systems Group Multi-Domain Management Time to complete = 2752s

29 Multi-Domain Management High Performance Systems Group Time to complete = 467s;an improvement of 83%

30 Multi-Domain Management High Performance Systems Group Time to complete = 467s; an improvement of 83%

31 QoS: Ability to Meet Deadline High Performance Systems Group activeinactive

32 Resource usage High Performance Systems Group activeinactive

33 Other work OGSA compatibility Prediction –Accuracy –Other prediction techniques Workflow (CCGrid 2003) Reservation V. 1.1, Condor/GT2-based –www.dcs.warwick.ac.uk/~hpsg –Documented at HPDC-12/GGF-8, FGCS High Performance Systems Group


Download ppt "Performance-responsive Scheduling for Grid Computing Dr Stephen Jarvis High Performance Systems Group University of Warwick, UK High Performance Systems."

Similar presentations


Ads by Google