Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid.

Similar presentations


Presentation on theme: "Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid."— Presentation transcript:

1 Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid

2 2 Overview Condor Globus Storage Resource Broker (SRB) UoBGrid Summary

3 3 Condor Overview High Throughput Computing Environment –From networked resources (Condor pool) Like other Schedulers –Queuing mechanism –Prioritisation scheme –Scheduling Policy Unlike other schedulers –Doesn’t need dedicated resources –Desktops workstations, library or PC lab computers –Cycle scavenging

4 4 Class Ads Classified advertisements –Machine Class Ads (for sale) –Job Class Ads (wanted) Machine Class Ads –Created from information “advertised” by machines in the condor pool –Can add extra Class Ad information Job Class Ads –Created from information in the condor submit file –Created from default values

5 5 Different roles in a Condor pool Central Manager Submit Execute Or a combination of these –e.g. submit and execute node Different daemons will be started depending on the role of the machine

6 6 Condor Daemons All Machines –condor_master - controls other daemons Central Manager –condor_collector - Collects information from other machines –condor_negotiator - Performs matchmaking Execute –condor_startd - Starts, stops, suspends jobs Submit –condor_schedd - Maintains queue of jobs

7 7 Job Submission Executable = /bin/ls Arguments = -l InitialDir = /usr/bin Output = out Error = err Queue

8 8 Job Submission Executable = /bin/ls Arguments = -l InitialDir = /usr/bin Output = out.$(Process) Error = err.$(process) Queue 2

9 9 Job Submission Executable = /bin/ls Arguments = -l InitialDir = /usr/bin Output = out.$(Process) Error = err.$(process) Requirements = ((Arch==“INTEL” && OpSys=“LINUX”) || (Arch==“INTEL” && OpSys=“IRIX65”)) Queue 2

10 10 Job Submission Executable = gaussian.$$(Arch).$$(OpSys) InitialDir = /home/jon Input = chlorobenzene.in Output = chlorobenzene.$(Process) Error = chlorobenzene.$(process) Requirements = ((Arch==“INTEL” && OpSys=“LINUX”) || (Arch==“INTEL” && OpSys=“IRIX65”)) Queue 2

11 11 Condor Commands condor_submit condor_q condor_rm condor_status –Displays pool status in a succinct format condor_status –l –Display full Class Ad information

12 12 Condor-G Condor interface to access Globus resources –condor submit file –condor commands –Keeps log of runs –Adds fault tolerance Can be used to perform matchmaking –Must create machine Class Ads manually –condor_advertise command –Can be used to create a resource broker –No RB functionality in Globus Toolkit

13 13 Globus Toolkit Overview Globus is a toolkit not an turnkey solution Globus Toolkit 2.4.3 common choice for production grids Four main components –Authentication (GSI) –Resource management (GRAM) –Data transfer (GridFTP) –Resource discovery and monitoring (MDS)

14 14 Authentication Grid users need to obtain something called a certificate Applications can use the certificate to establish the identity of the user…. i.e. authenticate the user

15 15 PKI Authentication Public Key Infrastructure –Public/Private keys –Used to encrypt data –And to sign certificates Certification Authority (CA) –User create certificate –CA Signs certificates –UK eScience CA at RAL Certificate Contains –Identity/Distinguished Name (DN) –Public Key –signature & Identity of CA

16 16 GSI Authentication Grid Security Infrastructure –extensions to PKI (X509, SSL extensions) –Single sign-on –Delegation –grid-proxy-init – command to create proxy certificates CAUserProxy Signature

17 17 Resource Management (GRAM) Grid Resource Allocation Manager –Gatekeeper –Resource Specification Language –JobManagers

18 18 GRAM - Gatekeeper Daemon runs on a grid resource Processes incoming globus requests Authenticates Users –Configured to trust a given CA –e.g. UK eScience CA at RAL Maps user to local account –DN => username –grid-mapfile Passes the job onto the jobmanager

19 19 GRAM - RSL Resource Specification Language –(attribute op value) in parenthesis Operators –Numerical operators within clauses (, >=, =, !=) –Logical operators between clauses (&, | ) Attributes –Predefined executable, arguments, stdin, stdout, stderr, environment maxCpuTime, maxWallTime, maxMemory, project, queue –User defined May be handled by subsequent application

20 20 GRAM - RSL &(executable=“/bin/ls”) (arguments=“-l”) (directory=“/usr/bin”)

21 21 GRAM - JobManagers Perl modules –Convert RSL into scheduler specific language Reference implementations –Fork, Condor, PBS, LSF May need to roll-your-own –e.g. LoadLeveller, SGE –Or just to add extra functionality

22 22 Data Management (GridFTP) File Transfer Protocol –Extension of the standard FTP protocol stack to include extra functionality GSI authentication Third Party transfers Striped transfers –User application is globus-url-copy

23 23 Information Services (MDS) Collect and provide status information about Grid resources MDS: Monitoring and Discovery Service –GRIS: Grid Resource Information Service Collects info about local resource Reports to GIIS server –GIIS: Grid Index Information Service Aggregates information from GRIS servers One per organisation –Same executable with different configuration

24 24 Storage Resource Broker (SRB) Overview Uniform interface to heterogeneous data storage resources –Unix, Irix, linux file systems –Windows –Databases –Physical media (tape storage) SRB is middleware –Allows access to a wide range of data resources –Allows a wide range of user Apps to be written –All accessed through a “narrow” API API Storage Applications

25 25 SRB Access Applications –Scommands: command line –MySRB: Web access –inQ: Windows GUI APIs –Java, C, C++, Python, Perl

26 26 UoBGrid Overview What is a Campus Grid? Our Situation Software Choices Services

27 27 What is a Campus Grid? A Grid: Single sign-on to multiple resources located in different administrative domains

28 28 Our Situation Dedicated departmental clusters –Windows Condor pools not a requested resource Separation of user communities –parallel vs serial usage All contained within a single firewall domain Wanted to become partners in the NGS –Systems must be compatible –Encourage our users to become NGS users Full Economic Costing coming soon! –Important to keep usage records –Ensure best usage of purchased resources for sustainable future

29 29 Software Choices Condor 6.6.7 Globus 2.4.3 MyProxy GSI-SSH Storage Resource Broker (SRB) Virtual Data Toolkit (VDT) –Bundles many useful tools –Platform independent installation –Supported release of Globus Toolkit, MyProxy & GSI-SSH

30 30 Planned Resources

31 31 Current Resources 4 Servers –RB: Resource Broker –VOM: Virtual Organisation Manager –MDS: Monitoring and Discovery Service –SRB: Storage Resource Broker 4 Compute Resources –Monster2 - SGE, 20 CPU –Tuya - PBS, 16 CPU –Grendel - PBS, 110 CPU –BSESrv1 - PBS, 28 CPU

32 32 Resource Broker Condor-G with matchmaking Custom script for determination of resource status –Converts MDS information into condor Class Ads –Adds information about available software User submission script –Create condor submit file –Software requirements passed into Condor submit file –Submits jobs –Sends data SRB http://cerb-rb.bris.ac.uk/cgi-bin/rb_status.cgi

33 33 Virtual Organisation Manager Built using –Webserver Apache + mod_ssl Perl CGI –Postgres Database –Modified Globus JobManagers Functionality –Record of users and machines –Administrative functions –Accounting/Usage Statistics

34 34 Virtual Organisation Manager Admin – via web interface (https) –Access based on Certificate/DN –Add/remove Users –Add/Remove Resource –Control Users Access to Resources Constructs grid-mapfiles for all resources https://cerb-vom.bris.ac.uk/vom-bin/VOM.cgi

35 35 Virtual Organisation Manager Accounting/Usage Statistics –Usage by machine –Usage by users Modified GRAM JobManagers –Job details sent to DB on completion –executable, arguments, start time, end time, CPU, wall time, memory, virtual memory, jobmanager-type, number of nodes http://cerb-vom.bris.ac.uk/cgi-bin/VOM-usage-stats.cgi

36 36 Resource Monitor Runs GIIS –Collects information from UoBGrid resources Runs Big Brother monitoring software –Client/Server model –Server pings registered resources –Client records local system info and reports to server

37 37 Big Brother Monitoring System Web available status page with easy to understand functionality for helpdesk and admin staff.

38 38 Storage Resource Broker All UobGrid users given SRB account GSI authentication enabled for Scommands Access via certificate

39 39 SRB MDS RB VOM User RALOxford LeedsMan Monster 2 (SGE) bserv (PBS) Grendel (PBS) Tuya (PBS) BDII NGS UoB Grid

40 40 SRB MDS RB VOM User RALOxford LeedsMan Monster 2 (SGE) bserv (PBS) Grendel (PBS) Tuya (PBS) BDII NGS Compute resources running GRIS report to information servers

41 41 SRB MDS RB VOM User RALOxford LeedsMan Monster 2 (SGE) bserv (PBS) Grendel (PBS) Tuya (PBS) BDII NGS Resource Broker polls information servers and converts MDS information into Condor Class Ads

42 42 SRB MDS RB VOM User RALOxford LeedsMan Monster 2 (SGE) bserv (PBS) Grendel (PBS) Tuya (PBS) BDII NGS User logs on to Resource Broker to submit job. Jobs Are matched to resources using condor

43 43 SRB MDS RB VOM User RALOxford LeedsMan Monster 2 (SGE) bserv (PBS) Grendel (PBS) Tuya (PBS) BDII NGS Job details sent to machine by Condor-G

44 44 SRB MDS RB VOM User RALOxford LeedsMan Monster 2 (SGE) bserv (PBS) Grendel (PBS) Tuya (PBS) BDII NGS Upon completion output files are sent back to the Resource broker

45 45 SRB MDS RB VOM User RALOxford LeedsMan Monster 2 (SGE) bserv (PBS) Grendel (PBS) Tuya (PBS) BDII NGS If job runs on UoB Grid resources run details are sent to VOM DB For NGS and UoB users alike.

46 46 SRB MDS RB VOM User RALOxford LeedsMan Monster 2 (SGE) bserv (PBS) Grendel (PBS) Tuya (PBS) BDII NGS Finally output file are sent from RB to the Storage Resource Broker

47 47 Summary Condor –Standalone: high throughput computing system –Matchmaking with Class Ads –Condor-G: interface to Globus Toolkit Globus Toolkit –Applications, Protocols, APIs –GSI – Certificates (DN, public key, digital signature) UoBGrid –Centralised access to disparate resources –Custom components created to fill functionality gaps –Globus gives authenticated access to resources –Condor provides matchmaking (i.e. brokering) –SRB provides storage

48 48 Useful URLs SRB: http://www.sdsc.edu/srb/http://www.sdsc.edu/srb/ Globus: http://www.globus.org/http://www.globus.org/ Condor: http://www.cs.wisc.edu/condor/http://www.cs.wisc.edu/condor/ UK eScience CA: https://ca.grid-support.ac.uk/https://ca.grid-support.ac.uk/ NGS: http://www.ngs.ac.uk/http://www.ngs.ac.uk/ UoBGrid: http://escience.bris.ac.ukhttp://escience.bris.ac.uk


Download ppt "Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid."

Similar presentations


Ads by Google