Combining Incremental and Parallel Methods for Large-scale Physics Simulation 1 Daniel Tracy, UCSD CHMPR Software Engineer Erik Hill, UCSD CHMPR Software.

Combining Incremental and Parallel Methods for Large-scale Physics Simulation 1 Daniel Tracy, UCSD CHMPR Software Engineer Erik Hill, UCSD CHMPR Software Engineer Sheldon Brown, UCSD CHMPR Site Director

Review of work to date 2

ScalableEngine Built to handle large VR environments efficiently (massive object count, low activity) Only physics system capable of handling Scalable City in real time Overhead proportional to level of activity rather than environment scale or object count Novel broad phase 1 and physics pipeline 2 methods published 1.Efficient Large-Scale Sweep and Prune Methods with AABB Insertion and Removal. IEEE-VR 2009 2.Accelerating Physics in Large, Continuous Virtual Environments. Concurrency and Computation: Practice and Experience, 24(2):125134, 2012 3

4 Note: only a constant number of bodies undergoing active physics computation. Excluding unnecessary work: Again lower asymptotic complexity ScalableEngine: Full Physics Pipeline

ScalableEngine: Multi-user System Scalable City developed into massively multi- user client-server system. Player count increases activity level For multi-user, other factors matter as well Computational efficiency Parallelism 5

ScalableEngine Best engine at handling large environments Heavy computation similar to other software As activity increases, advantage matters less Regions with high activity see less benefit! Parallelized ScalableEngine by multi- threading all aspects of computation Improved performance, but not enough for massively-multi-player. Traditional physics does not parallelize well. 6

7 Limited parallelism in traditional physics methods ScalableEngine: Multithreaded Physics

CLEngine Developed new physics simulation system from scratch focused on massive parallelism Based on work of Thomas Jakobsen 1. Design modified for parallel application OpenCL utilized for portability to various compute devices (CPU, GPU, Accelerators) 8

CLEngine: Core Object representation broken down to particles and stick constraints Rigid body volume behavior is emergent All constraints independently solvable Very fine-grained, highly parallel core 9

CLEngine: Host Interface OpenCL weakness: expensive communication on dedicated GPUs Designed to reduce communication by Keeping many contiguous stages on the card Accelerate communication with transport kernel Reduce communication to state deltas 10 Contact Graph Coll. Det.Integration Coll. Det.

CLEngine: Performance 3-6 times CPU performance for single thread Higher parallelism acceleration curve Many optimizations still not done! GPU targetable for extreme performance Optimizations are more critical Communication, local memory, vector types 11

CLEngine Prototype Limitations Originally designed for an “all active” system Not state-aware, no incremental processing Does more total work than current CPU engine We want both advantages simultaneously! Multiple ways to achieve this Challenges imposed by slow communication Integrating a broad phase solution efficiently Reporting results usefully and efficiently 12

Finished Work Assets, Dynamics and Behavior Computation for Virtual Worlds and Computer Games 13

CLEngine Final Design Design space exploration finished Role with respect to system holistically Broad phase CD Optimizing communication Keeping model simple & maintainable Options explored through partial or complete implementation or mathematical analysis Hash grids, space-filling curves, GPU/CPU hybrid 14

CLEngine Final Design CPU-based incremental sorting broad phase Linear overhead for moving objects Handles mixed object sizes & distributions well Leverages our mature work in broad phase CD CLEngine only aware of subset of universe Active objects & objects they overlap in BP Changes in CLEngine set batched & transmitted Transforms of subset sent back to CPU 15

Geometry vs Constraints Translation: geometry vs constraint data Physics subsystem now expects constraint set Other subsystems think in terms of geometry Bidirectional translation performed during comm. 16

Scalable City Integration CLEngine uses CPU-based broad phase to: Invoke object-object collisions Discriminate object-heightmap pairs for many-city configurations Maintain set of CLEngine-visible objects Extending visibility set to BP overlaps allows dynamic object activation via collisions CLEngine also expected to trigger object resting 17

Scalable City Integration Integrated CLEngine into Scalable City Using final design decisions as the guiding model Presently operating as a run-time selectable physics engine Integration occuring in stages Current stage is unoptimal but simpler Resolving issues incrementally at each stage Behavioral differences, bugs, corner cases 18

Scalable City Integration Integration Deficiencies, TBD Altering rotation matrix from CPU unsupported Broad phase results communicated en masse Not communicating a delta from previous update Very important for object-landscape spatial overlaps! CLEngine does not rest objects Once activated, they remain activated Operation is imperative for continuous simulation CLEngine does not return events Collision events used for sound feedback, etc. 19

Proposed Work 20

Proposed Work: Completion More extensive testing for current stage Intermittent anamolies in some cities Complete feature-parity with previous engine Object resting based on heightmap feedback Collision events feedback for audio system Move “rigid body to constraint” translation to CL Communicate purely in terms of rigid bodies Will allow CPU to alter rotation matrices (IK) 21

Proposed Work: Optimize Communication Maintain B.P. overlap sets incrementally Specialized, delta-tracking structures Maintains state across narrow channel by tracking internal changes, producing delta set Designed to minimize internal changes on ops Use for visibility set maintenance as well Integrate with kernel-based batch update Allows fast editing of multiple buffers w/ single blit 22

Proposed Work: Optimize Computation Rewrite stick constraint solver Originally designed for highest parallelism Incurs high memory traffic for intermediate data Single-pass design expected to be more efficient But must compute at object resolution GPU specific Local memory optimizations using CL Vector data types for some CL implementations 23

Proposed Work: Distributed Layer Distributed system will be done as final stage Increases scale beyond the memory bandwidth constraints of a single, multi-core system Topology may closely correspond to cities Use MPI for object migration and ghosting Limiting incurred communication & overlapping computation will help increase scalability Deploy on large, multi-blade servers Test in globally segmented system over internet 24

Proposed Work: Conclusion Physics system nearing completion Expected to be finished in 5-6 months! Scalable City integration underway Feature-party & optimization being improved Single-node performance (in speed & power use) expected to be very good. Distributed layer will remove the performance ceiling. 25

Combining Incremental and Parallel Methods for Large-scale Physics Simulation 1 Daniel Tracy, UCSD CHMPR Software Engineer Erik Hill, UCSD CHMPR Software.

Similar presentations

Presentation on theme: "Combining Incremental and Parallel Methods for Large-scale Physics Simulation 1 Daniel Tracy, UCSD CHMPR Software Engineer Erik Hill, UCSD CHMPR Software."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Combining Incremental and Parallel Methods for Large-scale Physics Simulation 1 Daniel Tracy, UCSD CHMPR Software Engineer Erik Hill, UCSD CHMPR Software.

Similar presentations

Presentation on theme: "Combining Incremental and Parallel Methods for Large-scale Physics Simulation 1 Daniel Tracy, UCSD CHMPR Software Engineer Erik Hill, UCSD CHMPR Software."— Presentation transcript:

Similar presentations

About project

Feedback