GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future.

Slides:



Advertisements
Similar presentations
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Advertisements

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
1 Threads, SMP, and Microkernels Chapter 4. 2 Process: Some Info. Motivation for threads! Two fundamental aspects of a “process”: Resource ownership Scheduling.
Chapter 17 Parallel Processing.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Process Concept An operating system executes a variety of programs
Operating Systems (CSCI2413) Lecture 3 Processes phones off (please)
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Cellular Disco: resource management using virtual clusters on shared memory multiprocessors Published in ACM 1999 by K.Govil, D. Teodosiu,Y. Huang, M.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Prototyping particle transport towards GEANT5 A. Gheata 27 November 2012 Fourth International Workshop for Future Challenges in Tracking and Trigger Concepts.
Operating System 4 THREADS, SMP AND MICROKERNELS
Status of the vector transport prototype Andrei Gheata 12/12/12.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
ADAPTATIVE TRACK SCHEDULING TO OPTIMIZE CONCURRENCY AND VECTORIZATION IN GEANTV J Apostolakis, M Bandieramonte, G Bitzes, R Brun, P Canal, F Carminati,
Supporting Multi-Processors Bernard Wong February 17, 2003.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
PARTICLE TRANSPORT REFLECTING ON THE NEXT STEP R.BRUN, F.CARMINATI, A.GHEATA 1.
The High Performance Simulation Project Status and short term plans 17 th April 2013 Federico Carminati.
Operating Systems: Internals and Design Principles
GeantV scheduler, concurrency Andrei Gheata GeantV FNAL meeting Fermilab, October 20, 2014.
Martin Kruliš by Martin Kruliš (v1.1)1.
The JANA Reconstruction Framework David Lawrence - JLab May 25, /25/101JANA - Lawrence - CLAS12 Software Workshop.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Update on G5 prototype Andrei Gheata Computing Upgrade Weekly Meeting 26 June 2012.
Andrei Gheata (CERN) for the GeantV development team G.Amadio (UNESP), A.Ananya (CERN), J.Apostolakis (CERN), A.Arora (CERN), M.Bandieramonte (CERN), A.Bhattacharyya.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Report on Vector Prototype J.Apostolakis, R.Brun, F.Carminati, A. Gheata 10 September 2012.
GeantV – status and plan A. Gheata for the GeantV team.
GeantV prototype at a glance A.Gheata Simulation weekly meeting July 8, 2014.
Scheduling fine grain workloads in GeantV A.Gheata Geant4 21 st Collaboration Meeting Ferrara, Italy September
Scheduler overview status & issues
Processes and threads.
Process Management Process Concept Why only the global variables?
Distributed Processors
The Multikernel: A New OS Architecture for Scalable Multicore Systems
Processes and Threads Processes and their scheduling
Parallelized JUNO simulation software based on SNiPER
XACML and the Cloud.
A task-based implementation for GeantV
Lecture 21 Concurrency Introduction
Task Scheduling for Multicore CPUs and NUMA Systems
Intro to Processes CSSE 332 Operating Systems
GeantV – Parallelism, transport structure and overall performance
GeantV – Parallelism, transport structure and overall performance
Report on Vector Prototype
CS 286 Computer Organization and Architecture
Computer Architecture
EECS 582 Midterm Review Mosharaf Chowdhury EECS 582 – F16.
CSCI 315 Operating Systems Design
Advanced Operating Systems
CMSC 611: Advanced Computer Architecture
The Multikernel A new OS architecture for scalable multicore systems
Hardware Multithreading
More examples How many processes does this piece of code create?
Chapter 17 Parallel Processing
CS 143A Quiz 1 Solution.
Computer Organization
Operating System 4 THREADS, SMP AND MICROKERNELS
Threads and Concurrency
Threads Chapter 4.
Multiprocessor and Real-Time Scheduling
CS510 - Portland State University
The Vector-Thread Architecture
CS 286 Computer Organization and Architecture
CMSC 202 Threads.
Presentation transcript:

GeantV – Adapting simulation to modern hardware Classical simulation Flexible, but limited adaptability towards the full potential of current & future hardware GeantV simulation Engineered to profit from all processing pipelines One track at a time Three stacks (leptons, γ, other) Single event transport Event-level parallelism (threading) Cache coherency – low Vectorization – low (scalar auto-vectorization) Lots of tracks in flight Many baskets – 10s to 1000s Multi event transport Fine-grain parallelism + threads Cache coherency – high Vectorization – high (explicit multi-particle interfaces)

Feeder task Reads from file a number of events. Invokes the concurrent basketizer service Basketizer(s) concurrent service injects full baskets Transport task Transports one basket for one step Basket queue concurrent service spawn inject particle Flow control task event finished? queue empty? enqueue basket input dequeue basket spawn Garbage collector Forces partially filled baskets into the basket queue to boost concurrency inspect spawn command dump all your baskets reuse tracks keeping locality output transported tracks Digitizer task This is a user task working on “hits” data Scoring This is a user task reading track info and creating ”hits” I/O task Write data (hits, digits, kinematics) on disk A future task approach of GeantV Transport task may be further split into subtasks spawn queue empty? event finished? spawn

Some issues for migrating to tasks Flow control Proof of principle that it works Thread ID integration Now we have static threads with unique id’s, how to deal with this in task mode Thread/task data ownership driven by id system Avoiding task bloat Spawning transport tasks per basket can create large overhead Keeping transport task “living” for longer period? Locality Preventing the task system migrating arbitrarily threads Specially when we move to NUMA awareness

Scalability on many-core Fine grain MT preventing to scale to high number of threads Issue for many core architectures Implement new MP approach with common events queue as feeder Lightweight interaction Lock-free algorithm (memory polling) Algorithm using spinlocks Rebasketizing 2x Intel(R) Xeon(R) CPU E GHz RAM_1 RAM_2 Scheduler1 Scheduler2 Transport CPU1 CPU2 Transport RAM_1 RAM_2 Scheduler Transport CPU1 CPU2 Process Events queue

Future: NUMA aware GeantV Replicate schedulers on NUMA clusters One basketizer per quadrant 2 supported modes MPI/shared memory dispatch running one GeantV process per quadrant (previous slide) Single process spawning one scheduler per quadrant Loose communication between NUMA nodes at basketizing step Tracks Transport Basketizer 0 Scheduler 0 Tracks Transport Basketizer 1 Scheduler 1 Tracks Transport Basketizer 2 Scheduler 2 Tracks Transport Basketizer 3 Scheduler 3 Global basketizer