Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local.

Similar presentations


Presentation on theme: "1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local."— Presentation transcript:

1 http://www.epcc.ed.ac.uk/sungrid 1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local and Remote Machines Terry Sloan Edinburgh Parallel Computing Centre (EPCC) Telephone: +44 131 650 5155 Email: t.sloan@epcc.ed.ac.uk

2 http://www.epcc.ed.ac.uk/sungrid 2 Overview  The Project  Why do it ?  Project Scenario  Project Goal  How ?  Project Achievements  The Compute Scheduler  The Compute & Data Scheduler

3 http://www.epcc.ed.ac.uk/sungrid The Project

4 http://www.epcc.ed.ac.uk/sungrid 4 The Project  Develop a Globus enabled compute and data scheduler  Based on Grid Engine, Globus and variety of data technologies

5 http://www.epcc.ed.ac.uk/sungrid 5 The Project (cont)  Partners –Sun Microsystems –National e-Science Centre represented by EPCC  Timescales –23 months –Start Feb 2002 –End Dec 2003 –Feb 2003 = Project Month 13 (PM13)

6 http://www.epcc.ed.ac.uk/sungrid Why do it ?

7 http://www.epcc.ed.ac.uk/sungrid 7 Why do it?  Grid Engine – over 20000 downloads (Nov 2002) –Distributed Resource Management tool –Schedules activities across networked resources  Sun classifies 3 levels of Grid –Cluster Grid – a single team or project and their associated resources –Enterprise Grid – multiple teams and projects but within a single organisation, facilitating collaboration across the enterprise –Global Grid – linked Cluster and Enterprise grids, providing collaboration amongst organisations  Grid Engine meets first two levels but by itself does not meet the third

8 http://www.epcc.ed.ac.uk/sungrid 8 Why do it? (cont)  Globus Toolkit –A Grid API for connecting distributed compute and instrument resources  Integration with Globus allows Grid Engine to meet level 3 –Collaboration amongst enterprises –Most integration efforts use Globus to submit work to Grid Engine  This project tackles opposite problem - to engineer Grid Engine on top of Globus

9 http://www.epcc.ed.ac.uk/sungrid 9 Why do it? (cont)  Grid Engine concerned with compute resources –Extend it to work with popular data and service access protocols (eg. OGSA-DAI)

10 http://www.epcc.ed.ac.uk/sungrid Project Scenario

11 http://www.epcc.ed.ac.uk/sungrid 11 Project Scenario Two collaborating enterprises A and B both have some machines –Both enterprises run Grid Engine to schedule jobs –Local demand for machines is variable Sometimes it exceeds supply Other times machines lie idle Grid Engine abcd AB efgh Users (A)Users (B)

12 http://www.epcc.ed.ac.uk/sungrid 12 Project Scenario(cont) Ideal Situation –If enterprises A and B could expose some of their machines to each other across the internet through Grid Engine… Both A and B could enjoy through-put efficiency improvements Large gains when one enterprise is busy while the other is idle Grid Engine abcd efgh efgh abcd AB Users (A) Users (B)

13 http://www.epcc.ed.ac.uk/sungrid The Project Goal

14 http://www.epcc.ed.ac.uk/sungrid 14 Project Goal  Final goal –Develop a scheduler based on Grid Engine to schedule jobs across a combination of local and remote machines –Enable jobs to access necessary data sources –Use Globus as the Grid API to provide secure communications and transfer  Development Criteria –Industrial strength –Application of software engineering techniques –Use of industry standard design and analysis tools –Migration to OGSA-compliant Globus 3

15 http://www.epcc.ed.ac.uk/sungrid How ?

16 http://www.epcc.ed.ac.uk/sungrid 16 Workpackages  WP 1: Analysis of existing Grid components WP 1.1: UML analysis of core Globus 2.0 WP 1.2: UML analysis of Grid Engine WP 1.3: UML analysis of other Globus 2.0 –WP 1.4: UML analysis of Globus 3.0 –WP 1.5: Exploration of data technologies WP 2: Requirements Capture & Analysis  WP 3: Prototype Compute Scheduler  WP 4: Compute/Data Scheduler Design  WP 5: Compute/Data Scheduler Development

17 http://www.epcc.ed.ac.uk/sungrid 17 The Project Team  Project Personnel –Terry Sloan : Project leader –Geoff Cawood : Project architect –Ratna Abrol : Engineering –Thomas Seed : Engineering –Ali Anjomshoaa : Globus 2 Analysis –Paul Graham : Requirements Capture and Analysis –Amy Krause : Technical reviewer  Project Review Board –Fritz Ferstl (Sun Microsystems Gmbh) –John Barr (Sun Microsystems Ltd) –Steven Newhouse (London e-Science Centre) –Neil Chue Hong (EPCC)

18 http://www.epcc.ed.ac.uk/sungrid Achievements

19 http://www.epcc.ed.ac.uk/sungrid 19 Achievements  Publications –D1.1 Analysis of Globus Toolkit V2.0 –D1.2 Grid Engine UML Analysis –D2.1 Use cases and requirements –D2.2 Questionnaire Report –D3.1 Prototype Development: Requirements  Software –Transfer-queue Over Globus (TOG)

20 http://www.epcc.ed.ac.uk/sungrid Transfer-queue Over Globus (TOG) - A Compute Scheduler

21 http://www.epcc.ed.ac.uk/sungrid 21 Transfer-queue Over Globus (TOG) Grid Engine abcd e efgh d A B Globus 2 User A User B  Integrates Grid Engine and Globus 2 to access remote resources  GE execution methods provide job submission and control  GE job context stores job specific information eg job handle  Globus GSI for security  Globus GRAM enables interaction with remote resource  GASS for small data transfer, GridFTP for large datasets

22 http://www.epcc.ed.ac.uk/sungrid 22 TOG (cont)  Current Status –Secure job submission functionality implemented and tested Staging of input data and executables and transfer of output –Secure job control functionality implemented and tested Suspend, Resume, Terminate –Basic scheduling functionality implemented and tested Schedules jobs to remote resources when local resources are full –Testing Integrated successfully within Grid Engine test suite Tested through firewalls  TOG software available upon request –Contact sungrid@epcc.ed.ac.uk  Generally available via web site soon –www.epcc.ed.ac.uk/sungrid

23 http://www.epcc.ed.ac.uk/sungrid 23 TOG (cont) Pros  Simple approach  Usability – existing Grid Engine interface, users only need to be aware of Globus certificates  Remote administrators still have full control of their resources

24 http://www.epcc.ed.ac.uk/sungrid 24 TOG (cont) Cons  Low quality scheduling decisions (?) –May be a time-lag in getting query results back from remote resource –Incorporating data transfer costs into scheduling  Mirror queues for remote resources  Possible set-up overhead  Globus 2 vs. Globus 3  Grid Engine specific solution

25 http://www.epcc.ed.ac.uk/sungrid The Compute & Data Scheduler

26 http://www.epcc.ed.ac.uk/sungrid 26 Current status Considering two possible routes  Extend TOG –Migrate to Globus 3 –Incorporate OGSA-DAI  Hierarchical Scheduler –Overcome limitations –Global Grid vision

27 http://www.epcc.ed.ac.uk/sungrid 27 1. Extend compute scheduler  Compute Grid  Data Grid GE GridFTP SiteSRBOGSA-DAI (Hides ODBC, JDBC, XMLDB etc.) Globus

28 http://www.epcc.ed.ac.uk/sungrid 28 2. Hierarchical Scheduler  Unified Interface –Grid Scalability Grid Engine Hierarchical Scheduler Web Services Layer Hierarchical Scheduler Web Services Layer Scotland Edinburgh EPCC  Query child DRMs for capabilities  Pass Job Specification to the child Same Interface

29 http://www.epcc.ed.ac.uk/sungrid 29 Conclusions  Before proceeding  Examine Globus 3 Analysis  Examine Data Technologies ie OGSA-DAI, etc  Informed decision on whether to –Extend Compute Scheduler, or –Build Hierarchical Scheduler or some sub-set of this.  Delivery in December 2003


Download ppt "1 EPCC Sun Data and Compute Grids Project Using Sun Grid Engine and Globus to Schedule Jobs Across a Combination of Local."

Similar presentations


Ads by Google