Presentation is loading. Please wait.

Presentation is loading. Please wait.

WestGrid Overview Dr. Rob Simmonds Distributed Systems Architect.

Similar presentations


Presentation on theme: "WestGrid Overview Dr. Rob Simmonds Distributed Systems Architect."— Presentation transcript:

1 WestGrid Overview Dr. Rob Simmonds Distributed Systems Architect

2 Talk Overview The WestGrid project The WestGrid HPTC resources Grid services for HPTC and how they will be used in WestGrid

3 WestGrid Project 8 institutions More than 250 researchers Technical and operational officers HPTC: compute resources and storage Visualization and collaboration

4 WestGrid People PIs –Jonathan Borwein (SFU), Gren Patey (UBC), Jonathan Schaeffer (UofA), Brian Unger (UofC), Mike Vetterli (SFU/TRIUMF) HPC planning committee –Rob Balantyne, Matthew Choptuik, Corrie Kost, Harold Esche, Paul Lu, Richard Marchand, Seamus O'Shea, Mark Thachuk, Ron Senda, Martin Siegert, Rob Simmonds, Mike Vetterli Visualization planning committee –Lyn Bartram, Kelly Booth, Pierre Boulanger, Brian Corrie, Sara Diamond, Larry Katz, John MacDonald, Trever Woods CAO –Ken Hewitt

5 WestGrid HPTC Resources 140TB IBM storage server (Power4/AIX) 1008 processor IBM cluster (IA-32/Linux) 256 processor SGI Origin (MIPS/Irix) 144 processor HP SC45 (Alpha/Tru64) All connected by Canada’s world class networks

6 Grid Computing “Grid” is a set software services –Combines meta-computing, resource discovery and security –Designed to enable access to resources in different management domains –Grid services will enable WestGrid resources to be integrated into individual researcher’s computing environments

7 Grid Standardization Global Grid Form (GGF) is working to provide standards Open Grid Services Architecture (OGSA) defines low level Grid services

8 Grid toolkits Globus (Public domain – ANL/ISI) –Currently version 2.x used for production –Version 3 provides a reference implementation for OGSA Legion (Commercial – Avaki) –Provides more support for data handing –Will support OGSA

9 Grid Security Infrastructure Ability for trusted users to access remote resources without re- authentication Ability for trusted jobs to access remote resources without re- authentication Protection against stolen credentials Avoid requirement for dedicated, highly available security server(s)

10 Certificate Authority Model CA issues certificates to trusted users and services Certificates used to authenticate with remote resources that trust issuing CA Grid Canada CA will be trusted by WestGrid resources

11 GSI Proxy Certificates User credentials delegated from user certificate to proxy certificate –Proxy certificate used for authentication Proxy certificates have limited lifetime –can also be limited to only authenticate with certain services Proxy certificate copied to remote resource when job is started

12 Globus Security Commands Users can request a certificate using ‘grid-cert-request’ –This creates userkey.pem and usercert_request.pem in ~/.globus/ Certificate request file sent to CA –usercert.pem is returned and placed in ~/.globus/ Aim to automate this process for WestGrid users

13 Globus Security – Cont. Proxy certificate created using ‘grid-proxy-init’ Proxy certificate examined using ‘grid-proxy-info’ Proxy certificate destroyed using ‘grid-proxy-destroy’ Proxy certificates could be created during login process

14 GSI initialization demo …

15 Enabling Access to Resources Holding certificate from trusted CA does not guarantee access to resources Users given access to resource by being included in recource’s grid-mapfile –This allows owner of resource to choose which users are allowed to use the resource The grid-mapfile maps Grid user to a local account

16 Globus Job Starting Run job on remote resource using ‘globus-job-run ’ – must trust the CA that signed the users certificate and user must be mentioned in grid-mapfile –Proxy certificate is copied to GASS cache on to enable program to authenticate with other remote resources

17

18 Batch Job Starting ‘globus-job-submit ’ –This returns a url used to query job ‘globus-job-status ’ –Find out if the job is waiting, running or finished ‘globus-job-get-output ’ –Get output produced by job. This is stored in the GASS cache on the host where the job is running ‘globus-job-clean ’ –Remove the GASS cache entry for the job in question

19 GridFTP ‘globus-url-copy ’ –Copies file from one location to another file:/ - a file on a local file-system gsiftp:// / - a file on GridFTP server Extensions to standard FTP include –Third party transfers –Parallel transfers

20 Credential Repository NCSA’s MyProxy server provides an on- line credential repository User stores proxy certificate in repository –This certificate can be long lived User can later recover a short lived certificate from the repository

21 Credential Repository Uses Used to authenticate with environment when user does not have access to their certificate –e.g., in a Web portal Could be used to authenticate and get proxy certificate during login process eliminating need for Unix passwords

22 MyProxy Commands myproxy-init –s –Put a proxy certificate into the MyPoxy server on –Can specify host using environment variable myproxy-info –s –View information about user’s proxy certificate myproxy-get-credential –Get a proxy certificate myproxy-destroy –Remove proxy certificate from the MyProxy server

23 Inserting Credential

24 Recovering Credential

25 MyProxy Certificate Renewal Allows automated proxy certificate renewal Special proxy certificate enables trusted service to renew standard proxy certificate –e.g., trust a local scheduler to renew the certificate before starting a job Should help to prevent users resorting to insecure means for automating proxy renewal

26 GSI Enabled SSH Tools GSI enabled versions of OpenSSH tools will be used in WestGrid –gsi-ssh Authenticates through GSI and copies proxy certificates to remote host –gsi-scp Authenticates through GSI

27 GSI Enabled SSH

28 Resource Discovery Globus uses MDS for resource discovery –GRIS – provides information about individual hosts –GIIS – provides information about groups of hosts In WestGrid each of the 4 major resources will run a GRIS At least one GIIS will be provided to hold aggregate information –Probably use one per site

29 MDS Publish information to LDAP servers –Information used by Grid services to locate needed resources Publish information such as –Type(s) of job scheduler available –Parameters accepted by job scheduler –Number of processors –Amount of RAM, disk or tape –Software and license availability

30 MDS Example

31 Meta-scheduling A meta-scheduler is used to submit jobs to other job schedulers WestGrid will employ meta-scheduling –Condor-G, Silver and Trellis are under consideration –Multiple meta-schedulers could be used Hierarchical meta-scheduling can be employed

32 Condor-G Can be used to submit jobs to specific machines Can use ‘glideins’ to add resources to local condor pool New version will include support for batch scheduler advertisements

33 Condor-G : Glidein Example Movie at http://www.cpsc.ucalgary.ca/~simmonds/EdmontonTalk1/condor_demo1.avihttp://www.cpsc.ucalgary.ca/~simmonds/EdmontonTalk1/condor_demo1.avi

34 Result: Solar System Viz Movie at http://www.cpsc.ucalgary.ca/~simmonds/EdmontonTalk1/solarsystem.avihttp://www.cpsc.ucalgary.ca/~simmonds/EdmontonTalk1/solarsystem.avi

35 WestGrid Accounting Use MDS to publish accounting information from each site to LDAP WestGrid wide accounting calculated and also published in secure LDAP Users will be able to gain access to information, filtered by a policy manager

36 Scheduling Priorities Plan to use accounting information to provide fairness in scheduling priorities across WestGrid Feed values calculated using global accounting information back into local batch schedulers

37 Data Storage Grid enabled access to storage –Accessible from researcher’s desktop Distributed file systems currently limited –Security and caching issues Data repository systems provide much of the functionality required –SRB from SDSC –Giggle from ISI/ANL

38 Repository management Large network available file stores Annotation – meta-data tagging Data representation optimization –Files, collections and containers User level replication aided by catalogs

39 Look at SRB

40 SRB – “S commands”

41 Wide Area Message Passing MPI-G2 enables running of message passing jobs in Grid environment Attempts to use best MPI implementation at each site Provides process mapping configuration to group tightly coupled processes

42 Web Portals Enable access to Grid services via web browser Start a secure session then authenticate this session with GSI using credential server Web session now acts as you in Grid environment WestGrid mock up

43

44

45

46

47 Getting a WestGrid Account Centralized Web based account requests We get certificate or you use exiting certificate We setup accounts, install certificates and email you

48 WestGrid Grid Environment Initial Grid services use –Globus, MyProxy, OpenSSH, SRB Services include –Job starting, resource discover, credential management and repository management Working on having meta-scheduler(s) –Condor-G, …

49 Lots of work to do … Distributed file systems Improved replica management Fine-grain security Performance measurement and analysis Credential based information discovery Enhanced meta-scheduling Workflow

50 Credits – TeleSim helpers Mark Fox mfox@cpsc.ucalgary.ca mfox@cpsc.ucalgary.ca (TeleSim programmer) –Web portals, demo Andrey Mirchovski mirchov@cpsc.ucalgary.ca mirchov@cpsc.ucalgary.ca (TeleSim research student) –Security and chief Globus critic Phil Rizk rizkp@cpsc.ucalgary.ca rizkp@cpsc.ucalgary.ca (Hons project student/TeleSim programmer) –MDS, accounting and Web services

51 Questions and Comments …


Download ppt "WestGrid Overview Dr. Rob Simmonds Distributed Systems Architect."

Similar presentations


Ads by Google