Presentation is loading. Please wait.

Presentation is loading. Please wait.

Instrumenting Badi Abdul-Wahid, RJ Nowling CSE 60641 Operating Systems Professor Striegel.

Similar presentations


Presentation on theme: "Instrumenting Badi Abdul-Wahid, RJ Nowling CSE 60641 Operating Systems Professor Striegel."— Presentation transcript:

1 Instrumenting Folding@Work Badi Abdul-Wahid, RJ Nowling CSE 60641 Operating Systems Professor Striegel

2 Overview Problem Description – Experimental Structure – Folding@Work Workflow Benchmarks Results – Weak Scaling (ns / day) – Server Capacity – Available Workers Over Time – Variability of Computation Time Conclusions

3 Experimental Structure

4 Folding@Work Workflow

5 Benchmarks Tasks: 1 ns generations (approx 2 hr on test machine) 10 consecutive generations / simulations Weak Scaling – 10 simulations / 10 workers – 100 simulations / 100 workers – 1,000 simulations / 1,000 workers Condor, later added SGE jobs 1 Trial of each; Took ~ 2 days to run

6 Weak Scaling of F@W

7 Server Capacity (Wait Time)

8 Available Workers over Time

9 Transfer Times

10 Variability of Computation Time

11 Example Execution Timeline

12 Performance Model

13 Weak Scaling (updated)

14 Wait Times

15 Tasks Waiting

16 Identified Areas of Improvement Availibility of Resources – Benchmarks limited by number of sustained workers available through Condor – New feature: WorkQueue Worker Pool can be used to start new workers WorkQueue Limits Number of Workers – Increasing number of file descriptors allowed up to 2,500 workers to connect – Bad behavior occuring in calls to select() – Working with WorkQueue developers to switch to poll() Long-Running Work Units Delay Completion of Trajectories – Some work units not returned / taking very long time – Prevents trajectories from finishing – Use fast abort feature to re-assign work units that take longer than a specified time

17 Conclusion Accomplished – Identified key metrics (ns / day, wait time) – Developed scaling model – Tested model Conclusions – Real scientific applications scale well – Forcing short workunits adds load to Master – Performance model validated – “Self-correcting” behavior


Download ppt "Instrumenting Badi Abdul-Wahid, RJ Nowling CSE 60641 Operating Systems Professor Striegel."

Similar presentations


Ads by Google