Presentation is loading. Please wait.

Presentation is loading. Please wait.

UCS D OSG Summer School 2011 Life of an OSG job1 2011 OSG Summer School A peek behind the scenes The life of an OSG job by Igor Sfiligoi University of.

Similar presentations


Presentation on theme: "UCS D OSG Summer School 2011 Life of an OSG job1 2011 OSG Summer School A peek behind the scenes The life of an OSG job by Igor Sfiligoi University of."— Presentation transcript:

1 UCS D OSG Summer School 2011 Life of an OSG job1 2011 OSG Summer School A peek behind the scenes The life of an OSG job by Igor Sfiligoi University of California San Diego

2 UCS D OSG Summer School 2011 Life of an OSG job2 Summary of past lessons ● HTC is maximizing CPU use over long periods ● And getting lots of computation done ● DHTC is HTC over many sites ● Using an overlay system makes life easier ● Each compute sites is independent ● Local authentication and authorization ● Local HTC systems ● Cloud similar to Grid in many ways

3 UCS D OSG Summer School 2011 Life of an OSG job3 The Open Science Grid (1) ● OSG is an umbrella organization ● Does not write software ● Does not own compute resources ● OSG negotiates between affected parties and sets standards: ● x.509+VOMS for authentication and authorization ● Globus GT2 for Grid CE technology ● A standard software distribution Partial list

4 UCS D OSG Summer School 2011 Life of an OSG job4 The Open Science Grid (2) ● OSG runs common services ● Troubleshooting ● Information services ● Accounting services ● A glidein factory Partial list

5 UCS D OSG Summer School 2011 Life of an OSG job5 Running on OSG ● The easiest way is to join an existing OSG VO and use instructions they have ● Most VOs quite mature at this point, with good procedures in place ● 2 nd easiest thing is install a Condor overlay pool and hook up to the OSG glidein factory ● Factory admins will do a large fraction of the Grid- related tasks for you ● But there is also the direct submission path ● You may want to do something new

6 UCS D OSG Summer School 2011 Life of an OSG job6 Are we done for the day? I just want to do my science. I will take the easy route and never submit directly to OSG. But then it is up to me to fix your screw ups! Good. This is is the spirit. You still should learn.

7 UCS D OSG Summer School 2011 Life of an OSG job7 The life of an OSG job Using OSG directly Because knowing the details helps you make better decisions

8 UCS D OSG Summer School 2011 Life of an OSG job8 OSG job basics ● Always use Condor-G ● Direct use of Globus client tools not scalable ● Know how to discover which sites support your VO ● Not everybody will let you in ● Know what to do when things go wrong ● Within a large (D)HTC system, something will occasionally go wrong!

9 UCS D OSG Summer School 2011 Life of an OSG job9 Using an OSG CE ● We have seen how to use Condor-G this morning gt2 itbv-ce-pbs.uchicago.edu/jobmanager-pbs Technology Hardware IP address Local HTC There are a few more knobs, but we can ignore them

10 UCS D OSG Summer School 2011 Life of an OSG job10 The jobmanager ● Globus by default uses jobmanager-fork ● Jobs run directly on the CE node ● Users must explicitly specify the proper jobmanager to get into the HTC system ● jobmanager-fork useful just for basic testing ● e.g. if my proxy is authorized

11 UCS D OSG Summer School 2011 Life of an OSG job11 Finding sites to use ● You should start with a couple friendly sites ● They will tell you how to talk to them ● Will help you debug the initial problems ● But when you want to go bigger, you need an information system that will tell you what is out there ● OSG provides a BDII information system

12 UCS D OSG Summer School 2011 Life of an OSG job12 OSG BDII ● LDAP based http://en.wikipedia.org/wiki/LDAP http://en.wikipedia.org/wiki/LDAP ● The data is structured using the GLUE schema http://vdt.cs.wisc.edu/components/glue-schema.html http://vdt.cs.wisc.edu/components/glue-schema.html ● It will tell you which sites (claim to) support you, plus ● CE URL ● Site description More in the hands-on session.

13 UCS D OSG Summer School 2011 Life of an OSG job13 Job errors ● Many possible error sources ● Authentication/authorization ● Jobs never start ● Jobs fail without any output coming back ● Wrong OS ● Missing libraries (or other files) Partial list

14 UCS D OSG Summer School 2011 Life of an OSG job14 Auth errors ● Possible causes: ● Site is not interested in supporting you ● Misconfigured site ● Expired proxy ● Difficult to debug ● First rule of security is to give the attacker as little info as possible ● Even if the “attacker” is a legitimate user!

15 UCS D OSG Summer School 2011 Life of an OSG job15 Jobs never start ● Could be a legitimate situation ● Other users just have higher priority than you! ● Not completely unusual when you are an opportunistic user ● But can be a site problem ● Misconfiguration ● CE “forgets” about your job Difficult to tell

16 UCS D OSG Summer School 2011 Life of an OSG job16 Jobs failing without output ● Black hole effect ● Typically a broken worker node (HW problems, misconfiguration, etc.) ● Can “eat” hundred of jobs before being detected (and it may not be easy to detect!) ● Pilot paradigm helps here ● Little damage if pilots are “eaten”

17 UCS D OSG Summer School 2011 Life of an OSG job17 Wrong OS ● You may compile for a Red Hat Linux 5, but land on Ubuntu (or even Windows) ● Could be your fault ● Site clearly advertised it was a Windows site ● But could be a site problem ● Mistakenly re-installed a worker node from the wrong CD ● Pilot paradigm again can help ● Pilot setup not site controlled

18 UCS D OSG Summer School 2011 Life of an OSG job18 Missing libs/files ● Sites don't advertise what files you will find on a worker node ● At best you can make a good guess ● Particularly problematic first time you use a site ● Or have not used it for a while ● But there are also the broken/missconfig. nodes ● Once again, pilots can help ● Discover and publish what files are avaialble

19 UCS D OSG Summer School 2011 Life of an OSG job19 Troubleshooting ● Most of the time,you cannot fix the problem yourself ● Some help from the site admin will be needed ● Too many sites to know all admins ● Use the GOC (Grid Operations Center) goc@opensciencegrid.org https://ticket.grid.iu.edu/goc/open goc@opensciencegrid.org https://ticket.grid.iu.edu/goc/open ● They will route your request to the right people

20 UCS D OSG Summer School 2011 Life of an OSG job20 Get your hands dirty ● This is all the theory I want you to know ● Exercise time ● Feel free to ask question

21 UCS D OSG Summer School 2011 Life of an OSG job21 Copyright statement ● This presentation contains images copyrighted by ToonClipart.com ● These images have been licensed to Igor Sfiligoi for use in his presentations ● Any other use of them is prohibited


Download ppt "UCS D OSG Summer School 2011 Life of an OSG job1 2011 OSG Summer School A peek behind the scenes The life of an OSG job by Igor Sfiligoi University of."

Similar presentations


Ads by Google