Presentation is loading. Please wait.

Presentation is loading. Please wait.

Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign.

Similar presentations


Presentation on theme: "Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign."— Presentation transcript:

1 Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign

2 Contributors Principal investigators – Laxmikant Kale, Klaus Schulten, Robert Skeel Development team –Milind Bhandarkar, Robert Brunner, Attila Gursoy, Neal Krawetz, Ari Shinozaki, …...

3 Middle layers Applications Parallel Machines “Middle Layers”: Languages, Tools, Libraries

4

5 Molecular Dynamics Collection of [charged] atoms, with bonds Newtonian mechanics At each time-step Calculate forces on each atom bonds: non-bonded: electrostatic and van der Waal’s Calculate velocities and Advance positions 1 femtosecond time-step, millions needed! Thousands of atoms (1,000 - 100,000)

6 Molecular Dynamics Collection of [charged] atoms, with bonds Newtonian mechanics At each time-step –Calculate forces on each atom bonds: non-bonded: electrostatic and van der Waal’s –Calculate velocities and Advance positions 1 femtosecond time-step, millions needed! Thousands of atoms (1,000 - 100,000)

7 Further MD Use of cut-off radius to reduce work –8 - 14 Å –Faraway charges ignored! 80-95 % work is non-bonded force computations Some simulations need faraway contributions

8 NAMD Design Objectives Performance Scalability –To a small and large number of processors –small and large molecular systems Modifiable and extensible design –Ability to incorporate new algorithms –Reusing new libraries without re-implementation –Experimenting with alternate strategies

9 Force Decomposition Distribute force matrix to processors Matrix is sparse, non uniform Each processor has one block Communication: N/sqrt(P) Ratio: sqrt(P) Better scalability (can use 100+ processors) Hwang, Saltz, et al: 6% on 32 Pes 36% on 128 processor

10 Spatial Decomposition

11 Spatial decomposition modified

12 Implementation Multiple Objects per processor –Different types: patches, pairwise forces, bonded forces, –Each may have its data ready at different times –Need ability to map and remap them –Need prioritized scheduling Charm++ supports all of these

13 Charm++ Data Driven Objects Object Groups: –global object with a “representative” on each PE Asynchronous method invocation Prioritized scheduling Mature, robust, portable http://charm.cs.uiuc.edu

14 Data driven execution Scheduler Message Q

15 Object oriented design Two top level classes: –Patches: cubes containing atoms –Computes: force calculation Home patches and Proxy patches –Home patch sends coordinates to proxies, and receives forces from them –Each compute interacts with local patches only

16 Compute hierarchy Many compute subclasses: –Allow reuse of coordination code –Reuse of bookkeeping tasks –Easy to add new types of force objects Example: steered molecular dynamics Implementor focuses on the new force functionality

17 Multi-paradigm programming Long-range electrostatic interactions –Some simulations require this feature –Contributions of faraway atoms can be computed infrequently –PVM based library, DPMTA Developed at Duke, by John Board, et al Patch life cycle –better expressed as a thread

18 Converse Supports multi-paradigm programming Provides portability Makes it easy to implement RTS for new paradigms Several languages/libraries: –Charm++, threaded MPI, PVM, Java, md-perl, pc++, nexus, Path, Cid, CC++,..

19 Namd2 with Converse

20 Separation of concerns Different developers, with different interests and knowledge, can contribute effectively –Separation of communication and parallel logic –Threads to encapsulate “life-cycle” of patches –Adding new integrator, improving performance, new MD ideas, can be performed modularly and independently

21 Load balancing Collect timing data for several cycles Run heuristic load balancer –Several alternative ones Re-map and migrate objects accordingly –Registration mechanisms facilitate migration Needs a separate talk!

22 Performance: size of system

23 Performance: various machines

24 Speedup

25 Conclusion Multi-domain decomposition works well for dynamically evolving, or irregular apps –When supported by data driven objects (Charm++), user level threads, call backs Multi-paradigm programming is effective! Object oriented parallel programming: –promotes reuse, –good performance Measurement based load balancing


Download ppt "Flexibility and Interoperability in a Parallel MD code Robert Brunner, Laxmikant Kale, Jim Phillips University of Illinois at Urbana-Champaign."

Similar presentations


Ads by Google