Presentation on theme: "Using the Grid- a users perspective Ivan Hollins University of Birmingham."— Presentation transcript:
Using the Grid- a users perspective Ivan Hollins University of Birmingham
What I have been doing… Using the Light Grid Submission Framework https://uimon.cern.ch/twiki/bin/view/Atlas/LJSFGettingStarted Doing Full Simulation with atlas software only Have run ~400k Full Sim events equating to ~ 32K jobs on the Grid, from July to date. Warning Am very much a novice user Experiences are heavily dependant on the Ljsf, which sitting on top of the grid tools may be as much responsible for some of my experiences
Summary of Nice things… Good documentation REALY helps to understand the tools Hello world example very nice Once language mastered operations seem fairly intuitive Ranking and Interactive features v nice (although sometimes felt a little in the dark as which was the most appropriate way to Rank sites) Ganglia tool RAL has nice, also Gstat er%2fcsflnx353.rl.ac.uk&m=%5Bnone%5D&r=day&s=descendinghttp://ganglia.gridpp.rl.ac.uk/specials/pbs.php?h=OpenPBS%20serv er%2fcsflnx353.rl.ac.uk&m=%5Bnone%5D&r=day&s=descending
Summary of Not so Nice things… Getting Started Getting all my certificates and registering seemed complicated / unclear. Would be useful to have an idiots guide, step by step. Also unclear which VO I should join (atlas has different options) Running Sites that advertise software versions that arent installed properly. –Could there be a monitoring tool to periodically test the sites to weed these out? Some jobs just Hang! – absolutely no idea why. Would be nice to be able to see how much cpu jobs are using then could tell easily which have hung. Resource Brokers – can be a bit temperamental. Had to keep changing these, not obvious at first how to do this. Can the RBs be hidden? Registering files to the Grid – overall works well with a couple of exceptions. –You seem to be able to overwrite a catalogue entry with a new file, however the storage element doesnt allow files to be rewritten. End result Cat ID != File ID –Can also register a non-existent file! Not sure if this is a Grid or Ljsf thing… Storage Elements –Not immediately obvious where to store data, what were the choices etc –CASTOR – obviously not directly a Grid problem, but has been a big source of pain. Once a file goes onto tape since the retrieval time long, your job times out and ends.