GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future.

GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future hardware GeantV simulation Engineered to profit from all processing pipelines One track at a time Three stacks (leptons, γ, other) Single event transport Event-level parallelism (threading) Cache coherency – low Vectorization – low (scalar auto-vectorization) Lots of tracks in flight Many baskets – 10s to 1000s Multi event transport Fine-grain parallelism + threads Cache coherency – high Vectorization – high (explicit multi-particle interfaces)

Feeder task Reads from file a number of events. Invokes the concurrent basketizer service Basketizer(s) concurrent service injects full baskets Transport task Transports one basket for one step Basket queue concurrent service spawn inject particle Flow control task event finished? queue empty? enqueue basket input dequeue basket spawn Garbage collector Forces partially filled baskets into the basket queue to boost concurrency inspect spawn command dump all your baskets reuse tracks keeping locality output transported tracks Digitizer task This is a user task working on “hits” data Scoring This is a user task reading track info and creating ”hits” I/O task Write data (hits, digits, kinematics) on disk A future task approach of GeantV Transport task may be further split into subtasks spawn queue empty? event finished? spawn

Some issues for migrating to tasks Flow control Proof of principle that it works Thread ID integration Now we have static threads with unique id’s, how to deal with this in task mode Thread/task data ownership driven by id system Avoiding task bloat Spawning transport tasks per basket can create large overhead Keeping transport task “living” for longer period? Locality Preventing the task system migrating arbitrarily threads Specially when we move to NUMA awareness

Scalability on many-core Fine grain MT preventing to scale to high number of threads Issue for many core architectures Implement new MP approach with common events queue as feeder Lightweight interaction Lock-free algorithm (memory polling) Algorithm using spinlocks Rebasketizing 2x Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz RAM_1 RAM_2 Scheduler1 Scheduler2 Transport CPU1 CPU2 Transport RAM_1 RAM_2 Scheduler Transport CPU1 CPU2 Process Events queue

Future: NUMA aware GeantV Replicate schedulers on NUMA clusters One basketizer per quadrant 2 supported modes MPI/shared memory dispatch running one GeantV process per quadrant (previous slide) Single process spawning one scheduler per quadrant Loose communication between NUMA nodes at basketizing step Tracks Transport Basketizer 0 Scheduler 0 Tracks Transport Basketizer 1 Scheduler 1 Tracks Transport Basketizer 2 Scheduler 2 Tracks Transport Basketizer 3 Scheduler 3 Global basketizer

GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future.

Similar presentations

Presentation on theme: "GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future.

Similar presentations

Presentation on theme: "GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future."— Presentation transcript:

Similar presentations

About project

Feedback