Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager.

Similar presentations


Presentation on theme: "Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager."— Presentation transcript:

1 Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager

2 Oxford Interdisciplinary e-Research Centre I e R C Outline What is a grid? Why make a campus grid? How we are making it? –Central Systems –Software –Resources –Users How can the ICT/ECE help this activity?

3 Oxford Interdisciplinary e-Research Centre I e R C What makes a Grid a Grid? Single sign-on to multiple resources located in different administrative domains. A Virtual Organisation of users that spans physical organisational boundaries.

4 Oxford Interdisciplinary e-Research Centre I e R C The Problem Many new problems in research have a need for massive computational and data access Research work increasingly limited by the capacity of accessible resources.

5 Oxford Interdisciplinary e-Research Centre I e R C The Solution If the computational or data need is too large for a single existing resource, construct a system able to concurrently use a number of appropriate resources. –Designed so that; use single sign-on to access multiple resources and switch between each seamlessly layout can be dynamically altered without user interruption once a job has been started or data placed on a remote resource, its status is monitored to make sure it stays running/available!

6 Oxford Interdisciplinary e-Research Centre I e R C Why make a campus grid? Many computers throughout the University under-utilised: –PCs, already purchased – depreciating daily Idle time and unused disk space are being wasted. e.g. OULS has up to 1200 desktop computers. –Clusters are expensive to purchase, house and run (extra FTEs). Rarely 100% utilised Users forced to queue to find suitable resources for their research.

7 Oxford Interdisciplinary e-Research Centre I e R C Why make a campus grid? Develop and deploy Grid technology to use under-utilised resources: –Higher utilisation Connect them together so that more often than not a free resource is available, minimising queue time. –Amplify system administrator effort. –Substantially increase the research computing power available Ensure that should applications reach a suitable resource ASAP, certainly quicker than in a single cluster

8 Oxford Interdisciplinary e-Research Centre I e R C OxGrid, a University Campus Grid Single entry point for Oxford users to shared and dedicated resources Seamless access to National Grid Service and OSC for registered users Single sign-on using PKI technology integrated with current methods NGSOSC OxGrid Central Management Resource Broker MDS/ VOM Storage College Resources Departmental Resources Oxford Users

9 Oxford Interdisciplinary e-Research Centre I e R C Authorisation And Authentication Initially use the standard UK e-Science Certification Authority –X509 digital certificates issued on a per user basis. –OUCS is a Registration Authority for this CA For users that only wish to access internal (university) resources, a Kerberos CA has been installed, controlled by the Oxford central Kerberos system (Herald username) Use an online credential repository to minimise user - certificate interaction

10 Oxford Interdisciplinary e-Research Centre I e R C Central System Components Information Service –Contains all system status information on which the resource broker makes decisions, retrieving information from all clients in the system Resource Broker –User access and distribution of submitted tasks to appropriate resources Systems monitoring –Monitoring system for helpdesk first point of system contact in case of problems Virtual Organisation Management and Resource Usage Service –Control a virtual community whose members can use various resources –Create accounting information so that full system as well as single resource use can be recorded and hence possibly charged for Storage –Create a dynamic multi-homed virtual file system –User metadata mark-up for improved data mining

11 Oxford Interdisciplinary e-Research Centre I e R C Grid Middleware Virtual Data Toolkit –Chosen for stability & support structure –Platform independent installation method –Widely used in other European production grid systems –Contains Globus Toolkit™ version 2.4 with several enhancements GSI enhanced OpenSSH myProxy Client & Server

12 Oxford Interdisciplinary e-Research Centre I e R C Information Server Globus Grid Resource Information Index Central LDAP database for system information System information, CPU, memory etc. Scheduler queue status, number of running & queued tasks Further additions to published data easily managed Pull model for retrieving data from clients

13 Oxford Interdisciplinary e-Research Centre I e R C Resource Broker Uses the Condor-G™ meta-scheduler –Can be considered a large batch processing system –Condor-G allows treatment of a remote resource (cluster, PC pool) as a local resource –Command-line tools available to perform job management (submit, query, cancel, etc.) with detailed logging –Simple job submission language which is translated into remote scheduler specific language Custom script for determination of resource status & priority. Integrated the Condor Resource description mechanism and Globus Monitoring and Discovery Service.

14 Oxford Interdisciplinary e-Research Centre I e R C OxGrid specific information added Priority of resource dependant on current load measured against possible load List of installed software on each node Resource usage permissions (registered users of NGS, OSC)

15 Oxford Interdisciplinary e-Research Centre I e R C Job to Resource Matching For each resource that is accessible to the Resource Broker a machine advertisement is created. –Contains information such as CPU type, available memory and any additional information such as load etc. For each job that is submitted to the Resource Broker a job advertisement is created. –This has the job requirements, such as CPU type, memory necessary etc. Specific daemon within the system does matchmaking between the job requirements and the resource properties.

16 Oxford Interdisciplinary e-Research Centre I e R C Resource Broker Operation

17 Oxford Interdisciplinary e-Research Centre I e R C Virtual Organisation Management Globus uses a mapping between Distinguished Name (DN) as defined in a Digital Certificate to local usernames on each resource. Important that for each resource that a user is expecting to use, his DN is mapped locally. Have to also make sure the correct resources are registered.

18 Oxford Interdisciplinary e-Research Centre I e R C Virtual Organisation Management and Accounting OxVOM –Custom in-house designed Web based user interface –Persistent information stored in relational database –User DN list retrieved by remote resources using standard tools Resource Usage Service –Installed software altered to include commands to determine job start and stop time as well as interface with host scheduling system –Using Global Grid Forum User Record Usage Service standard –Information returned from client to RUS server when job completed and stored in persistent database

19 Oxford Interdisciplinary e-Research Centre I e R C OxGrid VOM

20 Oxford Interdisciplinary e-Research Centre I e R C Resource Usage Service Enables presentation of system use to users as well as system owners Can form the basis of a charging model

21 Oxford Interdisciplinary e-Research Centre I e R C Systems Monitoring ‘Ganglia’ monitoring tool for system status and graphical representation Simple interface showing immediate hardware problems as well as system load Well understood by helpdesk and support staff Open source with simple configuration

22 Oxford Interdisciplinary e-Research Centre I e R C Ganglia System Monitoring

23 Oxford Interdisciplinary e-Research Centre I e R C Core Resources Individual Departmental Clusters (PBS, LSF, SGE) –Grid software interfaces –Management of users –Owner controlled access through local management software Condor clusters of PCs –Single master running up to ~500 nodes –Condor masters run either by owners or IeRC

24 Oxford Interdisciplinary e-Research Centre I e R C External Resources Only accessible to users that have registered with them –National Grid Service Peered access with individual systems –OSC Gatekeeper system User management done through standard account issuing procedures and manual DN mapping Controlled grid submission to Oxford Supercomputing Centre

25 Oxford Interdisciplinary e-Research Centre I e R C Services necessary to connect to OxGrid For a system to connect to OxGrid –Must support a minimum software set (without which it is impossible to submit jobs from the Resource Broker) Globus 2.4 job management and RUS compatible jobmanager MDS compatible information server –Desirable though not mandated OxVOM compatible grid-mapfile installation scripts With a scheduling system installed the system administrator is in control

26 Oxford Interdisciplinary e-Research Centre I e R C Connecting Clusters into OxGrid, 1 Direct connection –Install middleware etc. onto system head nodes Automated installation script Well known procedure –Known port numbers for services and port range for data transfer –Addition of ~30 user pool accounts Example of this type of setup is Oxford NGS node –Contact Steven Young (OeSC)

27 Oxford Interdisciplinary e-Research Centre I e R C Connecting Clusters into OxGrid, 2 Indirect –Separate gatekeeper system with submission components of local scheduler Transfer Queues on each gatekeeper Decouples Globus from local resources –Hides internals from the Grid users –Many clusters can be handled by one system jobmanager Example of this type of installation is the old OSC Gatekeeper. –Contact Jon Lockley (OSC)

28 Oxford Interdisciplinary e-Research Centre I e R C Connecting PCs, 1 Student labs, libraries and college terminal rooms Very different usage patterns for this type of resource –Systems inaccessible out of hours, greatest performance from dual boot using Windows/Scientific Linux Can have environmental and power considerations –24 hour access, coLinux virtual machine installation running in parallel with native OS Both of these types of systems use Condor and a Linux condor master server.

29 Oxford Interdisciplinary e-Research Centre I e R C Connecting PCs, 2 Install Windows Condor client –Runs a system service Configured either to hold when local user or to run at all times with low priority –Studies by several groups have shown that for modern systems a student user sees no system performance difference between the two –Downside there is a significant extra effort needed because of code recompiling and porting. Some code will not run because of external libraries availability –‘Services for Unix’ being investigated to run linux jobs natively on Windows systems.

30 Oxford Interdisciplinary e-Research Centre I e R C Environmentally aware Condor systems Increasingly system owners shutdown machines that are not being used. –Save electricity Develop a scheme to still use these systems within OxGrid –Take advantage of Wake-On-LAN technology. –Automate load balancing to start and stop worker nodes as necessary.

31 Oxford Interdisciplinary e-Research Centre I e R C Connecting Others Sun –Create Sun Grid Engine clusters and then perform direct connection method Mac –Apple have their own grid software Xgrid Not fully tested –Supported by Condor

32 Oxford Interdisciplinary e-Research Centre I e R C Data Management Engagement of data as well as computationally intensive research groups Provide a remote store for those groups that cannot resource their own Distribute the client software as widely as possible, including departments that are not currently engaged in e-Research

33 Oxford Interdisciplinary e-Research Centre I e R C Data Management Software for creation of system –Storage Resource Broker to create large virtual datastore Through central metadata catalogue users interface with single virtual file system though physical volumes may be on several network resources In built metadata capability

34 Oxford Interdisciplinary e-Research Centre I e R C SRB Architecture MCAT Disk Server1 Disk Server2 Mcat Server USER

35 Oxford Interdisciplinary e-Research Centre I e R C SRB as a Data Grid SRB MCAT DB SRB Data Grid has arbitrary number of servers Complexity is hidden from users

36 Oxford Interdisciplinary e-Research Centre I e R C SRB Client Implementations inQ – Window GUI browser Jargon – Java SRB client classes –Pure Java implementation mySRB – Web based GUI –run using web browser Java Admin Tool –GUI for User and Resource management Matrix – Web service for SRB work flow

37 Oxford Interdisciplinary e-Research Centre I e R C How users interact with OxGrid Log in to system head node (Resource Broker) Create digital credential Use ‘job-submission’ script to create and submit jobs onto Condor-G system.

38 Oxford Interdisciplinary e-Research Centre I e R C Supporting OxGrid First point of contact is OUCS Helpdesk through support email. –Preset list of questions to ask and log files to see if available. –Not expected to do any actual debugging. –Pass problems onto Grid experts who pass hardware problems on a system by system basis to their own maintenance staff. Answer grid software problems themselves. Significant cluster support expertise within OeSC/IeRC. As one of the UK e-Science Centres we also have access to the Grid Support Centre.

39 Oxford Interdisciplinary e-Research Centre I e R C Users Installed several example applications –Plasma physics –Polymer physics –Biochemistry protein docking –Graphics rendering We have our first Oxford user code example –Dr Peter Grout, Chemistry Contacting currently registered users of both OSC as well as NGS. –Beneficial to these systems to remove ‘serial’ users that don’t need to be there to provide more capability to those that must be there. Data provision is an integral component of the grid –Contacting Humanities and other large data users

40 Oxford Interdisciplinary e-Research Centre I e R C Collaboration Configuring computational components to share resources between Harvard & Monash Universities as proof of principle of global campus grids. Configuring Storage System to allow safe, secure multi-site storage of data with Monash.

41 Oxford Interdisciplinary e-Research Centre I e R C How the ICT Strategy & ECE can help Produce single uniform configuration of ~2000 systems. Willingness at the design outset to include the capacity to use systems for computation and hence include as a key criteria in final system choice. Consider using a supported architecture that is popular with computationally active researchers. Use an underlying system management software that is flexible enough to allow for usage changes of resources, e.g. Alteris. Persuade that efficient usage of resources and sharing is within everyone's best interests.

42 Oxford Interdisciplinary e-Research Centre I e R C The Future Improve RB system usage algorithm Install Service based grid software on test system to provide transition information Package central server modules for public distribution

43 Oxford Interdisciplinary e-Research Centre I e R C The Future, 2 Develop Windows/Linux Condor pools so that all shared systems can be included Continue contacting users to expand the user base Design and construct user training courses.

44 Oxford Interdisciplinary e-Research Centre I e R C Conclusions Users are already able to log onto the Resource Broker and schedule work onto the NGS, OSC and OUCS Condor Systems We are working as quickly as possible to engage more users We need these users to then go out and evangelise to bring in both more users and resource.

45 Oxford Interdisciplinary e-Research Centre I e R C Contact Email: david.wallom@ierc.ox.ac.ukdavid.wallom@ierc.ox.ac.uk Telephone: 01865 283378


Download ppt "Oxford Interdisciplinary e-Research Centre I e R C OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom Campus Grid Manager."

Similar presentations


Ads by Google