Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility NOBUGS 2008, Sydney.

Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility tom.griffin@stfc.ac.uk NOBUGS 2008, Sydney

Spare cycles Typical PC CPU usage is about 10% Usage minimal 5pm – 8am Most desktop PCs are really fast Waste of energy How can we use (“steal?”) unused CPU cycles to solve computational problems?

Types of Application CPU Intensive Low to moderate memory use Not too much file output Coarse grained Command line / batch driven Licensing issues?

Distributed computing solutions Lots of choice CONDOR, GridEngine, GridMP… Grid MP Server hardware Two, dual Xeon 2.8GHz servers RAID 10 Software Servers run RedHat Linux Enterprise Server / DB2 Unlimited Windows (and other) clients Programming Web Services interface – XML, SOAP Accessed with C++, Java, C# Management Console Web browser based Can manage services, jobs, devices etc Large industrial user base GSK, J&J, Novartis etc.

Installing and Running Grid MP Server Installation 2 hours Client Installation Create MSI and RPM using ‘setmsiprop’ 30 seconds Manual Install Better security on Linux and Macs

Adapting a program for GridMP 1)Think about how to split your data 2)Wrap your executable 3)Write the application service Pre and Post processing Fairly easy to write Interface to grid via Web Services C++, Java, C#

Package your executable PROGRAM MODULE EXECUTABLE Uploaded to, and resident on, the server Executable DLLsStandard data files Environment variables Compress? Encrypt? }

Create / run a job Pkg1 Pkg4 Molecules Proteins Pkg2 Pkg3 Create job, generate cross product Datasets Workunits Client side Server side https:// Start job

Code examples Mgsi.Job job = new Mgsi.Job(); job.application_gid = app.application_gid; job.description = txtJobName.Text.Trim(); job.state_id = 1; job.job_gid = ud.createJob(auth, job); Mgsi.JobStep js = new Mgsi.JobStep(); js.job_gid = job.job_gid; js.state_id = 1; js.max_concurrent = 1 js.max_errors = 20; js.num_results = 1; js.program_gid = prog.program_gid;

Code examples Mgsi.DataSet ds = new Mgsi.DataSet(); ds.job_gid = job.job_gid; ds.data_set_name = job.description + "_ds_" + DateTime.Now.Ticks; ds.data_set_gid = ud.createDataSet(auth, ds); for (int i = 1; i <= numWorkunits.Value; i++) { FileTransfer.UploadData uploadD = ft.uploadFile(auth, Application.StartupPath + "\\testdata.tar"); Mgsi.Data data = new Mgsi.Data(); data.data_set_gid = ds.data_set_gid; data.index = i; data.file_hash = uploadD.hash; data.file_size = long.Parse(uploadD.size); datas[i - 1] = data; } ud.createDatas(auth, datas); ud.createWorkunitsFromDataSetsAsync(auth, js.job_step_gid, new string[] { ds.data_set_gid }, options);

Performance Famotidine form B 13 degrees of freedom P21/c V=1421 Sync data to 1.64A 1 x 10 7 moves per run, 64 runs Standard DASH 2.4GHz Core2 Quad using single core Job complete = 9 hrs Gdash submit to test grid of 5 in-use PCs 4 x 2.4GHz Core2 Quad 1 x 2.8GHz Core2 Quad Job complete = 24 minutes Speedup = 22.5 x

Performance – 999 SA runs, full grid Time Workunits 317 cores from 163 devices 42 Athlons: 1.6–2.2Ghz 168 Core 2 duos: 1.8–3 Ghz 36 Core 2 quads: 2.4–2.8 Ghz 1 duron @ 1.2Ghz 42 P4s 2.4–3.6Ghz 27 Xeons: 2.5–3.6Ghz 4 days 18 hours CPU in ~40 minutes elapsed time

A Particular Success - McStas HRPD supermirror guide design Complex design Meaningful simulations take a long time Want to try lots of ideas Many runs of >200 CPU days Simpler model was best value Massive improvement in flux Significant cost savings

Problems McStas Interactions in the wild Symantec Anti-Virus Did not show up in testing McStas restricted to night running only

User Attitudes A range Theft “I’m not having that on my machine” First thing to get blamed Gaining more trust Evangelism by users

Flexibility with virtualisation Request to run ‘GARefl’ code ISIS is Windows based Few Linux PCs VMWare server is freeware 8 Hosts gave 26 cores More cores = more demand 56 real cores recruited from servers, 64-core Beowulf 10 mac cores Run Linux as a job

Flexibility with virtualisation

The Future Grid growing in power every day New machines added, old ones still left on Electricity Energy saving drive at STFC – switch machines off Wake On-LAN ‘Magic Packets’ + Remote hibernate Laptops Good or bad?

Summary Distributed computing Perfect for coarse-grained,CPU intensive, ‘disk-lite’ Resources Use existing resources. Power increases with time, no need to write-off assets. Scalable Not just faster Allows one to try different scenarios Virtualisation Linux under Windows, Windows under Linux. Green credentials PCs are running anyway, better to utilise them. Can be powered down & up.

Acknowledgements ISIS Data Analysis Group Kenneth Shankland Damian Flannery STFC FBU IT Service Desk and ISIS Computing Group Key Users Richard Ibberson (HRPD) Stephen Holt (GARefl) Questions?

Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility NOBUGS 2008, Sydney.

Similar presentations

Presentation on theme: "Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility NOBUGS 2008, Sydney."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility NOBUGS 2008, Sydney.

Similar presentations

Presentation on theme: "Distributed computing at the Facility level: applications and attitudes Tom Griffin STFC ISIS Facility NOBUGS 2008, Sydney."— Presentation transcript:

Similar presentations

About project

Feedback