Presentation is loading. Please wait.

Presentation is loading. Please wait.

Converting ASGARD into a MC-Farm for Particle Physics Beowulf-Day 17.01.05 A.Biland IPP/ETHZ.

Similar presentations


Presentation on theme: "Converting ASGARD into a MC-Farm for Particle Physics Beowulf-Day 17.01.05 A.Biland IPP/ETHZ."— Presentation transcript:

1 Converting ASGARD into a MC-Farm for Particle Physics Beowulf-Day 17.01.05 A.Biland IPP/ETHZ

2 17.01.05Adrian Biland, IPP/ETHZ2 Beowulf Concept Three Main Components:

3 17.01.05Adrian Biland, IPP/ETHZ3 Beowulf Concept Three Main Components: CPU Nodes

4 17.01.05Adrian Biland, IPP/ETHZ4 Beowulf Concept Three Main Components: CPU NodesNetwork

5 17.01.05Adrian Biland, IPP/ETHZ5 Beowulf Concept Three Main Components: CPU NodesNetworkFileserver

6 17.01.05Adrian Biland, IPP/ETHZ6 Beowulf Concept Three Main Components: CPU NodesNetworkFileserver $$$$$$$$$ ? $$$$ ? $$$ ? How much of the (limited) money to spend for what ??

7 17.01.05Adrian Biland, IPP/ETHZ7 Beowulf Concept Intended (main) usage : “Eierlegende Woll-Milch-Sau” (one size fits everything) Put ~equal amount of money into each component ==> ok for (almost) any possible use, but waste of money for most applications

8 17.01.05Adrian Biland, IPP/ETHZ8 Beowulf Concept Intended (main) usage : ~80% ~10% ~10% [ ASGARD, HREIDAR-I ] CPU-bound jobs with limited I/O and interCPU communication

9 17.01.05Adrian Biland, IPP/ETHZ9 Beowulf Concept Intended (main) usage : ~50% ~40% ~10% [ HREIDAR-II ] Jobs with high interCPU communication needs: (Parallel Proc.)

10 17.01.05Adrian Biland, IPP/ETHZ10 Beowulf Concept Intended (main) usage : ~50% ~10% ~40% Jobs with high I/O needs or large datasets: (Data Analysis)

11 17.01.05Adrian Biland, IPP/ETHZ11 Fileserver Problems: a) Speed (parallel access) Inexpensive Fileservers reach disk-I/O ~50 MB/s 500 single-CPU jobs ==> 50 MB/s /500 jobs = 100kB/s /job (as an upper limit; typical values reached much smaller) Using several Fileservers in parallel: -- difficult data management (where is which file ?) [ use parallel filesystems ? ] -- hot spots (all jobs want to access same dataset ) [ data replication ==> $$$ ]

12 17.01.05Adrian Biland, IPP/ETHZ12 Fileserver Problems: a) Speed (parallel access) How (not) to read/write the data: Bad: NFS (constant transfer of small chunks of data) ==> always disk repositioning ==> disk-I/O --> 0 (somewhat improved with large cache (>>100MB) in memory if write-cache full: long time to flush to disk ==> server blocks) ~ok: rcp (transfer of large blocks from/to local /scratch ) /scratch rather small on ASGARD if many jobs want to transfer at same time ??? Best: fileserver initiates rpc transfers on request user discipline, not very transparent, …

13 17.01.05Adrian Biland, IPP/ETHZ13 Fileserver Problems: b) Capacity …. 500 jobs producing data Each writes 100kB/s ==> 50 MB/s to Fileserver ==> 4.2 TB / day !

14 17.01.05Adrian Biland, IPP/ETHZ14 Particle Physics MC Need huge amount of statistically independent events #events >> #CPUs ==> ‘embarassingly parallel’ problem ==> 5x500 MIPS as good as 1x2500 MIPS Usually two sets of programs: a)Simulation: produce huge, very detailed MC-files (adapted standard packages [GEANT, CORSKA, …] b) Reconstruction: read MC-files, write smaller reco-files selected events, physics data (special SW developed by each experiment) Mass-Production: only reco-files needed: ==> combine both tasks in one job, use /scratch

15 17.01.05Adrian Biland, IPP/ETHZ15 ASGARD Status: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch

16 17.01.05Adrian Biland, IPP/ETHZ16 ASGARD Status: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch Fileserver: ++bandwidth, ++capacity /scratch: guaranteed space/job Needed:

17 17.01.05Adrian Biland, IPP/ETHZ17 ASGARD Upgrade: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 4x 400GB SATA, RAID-10 (800GB usable) / frame

18 17.01.05Adrian Biland, IPP/ETHZ18 ASGARD Upgrade: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 4x 400GB SATA, RAID-10 (800GB usable) / frame Adding 10 Fileservers (~65kFr), ASGARD can serve ~2years as MC Farm and GRID Testbed …


Download ppt "Converting ASGARD into a MC-Farm for Particle Physics Beowulf-Day 17.01.05 A.Biland IPP/ETHZ."

Similar presentations


Ads by Google