Presentation on theme: "Grid in action: from EasyGrid to LCG testbed and gridification techniques. James Cunha Werner University of Manchester Christmas Meeting - 2005."— Presentation transcript:
Grid in action: from EasyGrid to LCG testbed and gridification techniques. James Cunha Werner University of Manchester Christmas Meeting - 2005
Going to grid Conventional way: Usual code (your cuts) Run BetaMiniApp in several data files one after the other. When all data is done, you have results! Grid way: Same usual code (your cuts) Run several copies of BetaMiniApp, each running in one data file independent. At the end, join all results! EasyGrid does it for you!
General overview Gridification algorithms for generic soft EasyGrid for datasets EasyTau for selected events Grid testbed Users’ software
EasyGrid: an overview Prototype for future development. RPA = guarantee of useful software Provide all support for job submission system: –Recovers results in users’ directory –Generates reports for further analysis (aborts and abends) in one history file. It is a Framework users can adapt to their own needs and applications. Fully operational and integrated with LCG.
Christmas 2004: My goals were… develop a submission system fail proof. write web pages with all elementary tasks in HEP/Babar, to help students and newbie. Understand q-qbar interaction through Pi 0. What I have achieved in 2005…
Achievements with EasyGrid Friendly user framework, flexible and reliable. It provides users with results, or necessary information for further analysis. Tutorial web pages for PhD students and new researchers. http://www.hep.man.ac.uk/u/jamwer Pi0 Project: analysis of 500 million events and 5 Million Monte Carlo generation in 5 weeks. http://www.hep.man.ac.uk/u/jamwer/pi0alg5.html Anti-deuteron project: 1,500 Million events in 1 week, running in several sites in UK. More than 200 jobs in parallel. http://www.hep.man.ac.uk/u/jamwer/deutdesc.html
LCG Installation and debug There are several problems in LCG grid: –high number of jobs fail when running more than 200 jobs. –installation issues. –performance issues. Installation of a complete testbed from scratch using 10 obsolete computers: http://www.hep.man.ac.uk/u/jamwer/#sec0
Testbed stress test Processing time is zero: BetaMiniApp replaced by program to print dataset name and wait some time (e.g. 300 s). 1,000 jobs submitted every time at 6 WNs testbed.
T0T1T2 Sub Fail 00 0 Aborts (1) 84122 0 Bf33296144 6 Bf34306148 161 Bf35314156 195 Bf360165 211 Bf370172 213 Bf38091 (2) 214 T0 and T1: Time between submissions is zero (continuous flow). T0: WN bf36, bf37, bf38 were without pbs_mom started T1: 1 WN crashed during test (2). T2: time between submissions: 30 s. CE (bf32) CPU use was >90%. (1) Cannot plan: BrokerHelper: no compatible resources Number of jobs/WN
Recommendations CE are very required in Grid (>90% CPU load!) and affects grid performance: The number of WNs for each CE can be defined by the minimum value of submission delay and minimum queue time. Run one CE for large farms is a limiting factor. More matched CEs per RB would reduce failure and increase performance. File system study will provide more information soon.
Research in Gridification technologies for conventional software Users expend years developing their source code, and they will not throw away just to use web services. I developed an algorithm that will allow users use their own software on top of a web service layer with LCG middleware. Preliminary tests using “fake” web services (simulated with PVM) show it is a viable and flexible approach.
Gridification algorithm Creates parallel processes using PVM with ssh remote shell. There is a central job, with distributes tasks over parallel processes, when slaves processes return results. No need for load balancing! Controls slaves failures and resubmission to available slaves. There is not a checkpoint system (not worth). Transfer time can be a bottleneck. Task streams implemented. Results with 300 empty processes in one laptop show a transfer time of 185 ms/process.
Conclusion EasyGrid is operational. Benchmarks were a proof-of-concept under real conditions. LCG testbed is operational, providing results, and supporting performance analysis and tuning. Gridification algorithm is running in one Laptop with Genetic Programming/AI.
New year resolution Analysis of linux kernel related file server issues. LCG Performance study and Linux kernel tuning. Implementation of EasyTau: a submission module for TauUser package using EasyGrid (running on ntuples). Gridification algorithm running with LCG and commercial applications (WebSphere, Tivoli, Symphony, etc) EasyGrid Product development and startup. Run pi0 project again with EasyGrid Product and maybe … publish a paper about gridification!