Status of the vector transport prototype Andrei Gheata 12/12/12.

Status of the vector transport prototype Andrei Gheata 12/12/12

Current implementation transport pick-up baskets transportable baskets recycled baskets full track collections recycled track collections Worker threads Dispatch & garbage collect thread Crossing tracks (itrack, ivolume) Push/replace collection Main scheduler 0 1 2 3 4 5 6 7 8 n Inject priority baskets recycle basket ivolume loop tracks and push to baskets 0 1 2 3 4 5 6 7 8 n Stepping(tid, &tracks) Digitize & I/O thread Priority baskets Generate(N events ) Hits Hits Digitize(iev) Disk Inject/replace baskets deque generate flush

Disk Current prototype Extending the transport data flow Bottlenecks in scheduling ? Other type of work and resources, sharing the concurrency model ? Scheduler Baskets with tracks Crossing tracks Hit blocks deque Priority events Transport()Digitize(block) ProcessHits(vector) Ev 0 Ev 1 Ev 2 Ev n Digits data I/O buffers Ev 0 Ev 1 Ev 2 Ev n Buffered events

Runnable and executor (Java) A simple task concurrency model based on a Run() interface Single queue management for all different processing tasks – Minimizing overheads of work balancing – Priority management at the level of the work queue In practice, our runnables are transport, digitization, I/O, … – Lower level splitting possible: geometry, physics processes, … Flow = Runnable producing other runnable that can be processed independently Further improvements: – Scheduling executed by worker threads (no need for separate scheduler) – Workers busy -> same thread processing its own runnable result(s) Runnable Data Run() Executor Runnable Future Concurrent queue Concurrent queue Task

GPU-friendly tasks Task = code with clearly defined input/output data which can be executed in a flow Independent GPU tasks – Fully mapped on GPU Mixed CPU/GPU tasks – GPU kernel result is blocking for the CPU code How to deal with the blocking part which has the overhead of the memory bus latency ? CPU code GPU kernel CPU code GPU kernel CPU code GPU kernel Run()

Idle CPU threads CPU thread pool Scheduling work for GPU-like resouces Resource pool of idle threads, controlled by a CPU broker (messaging wake-up) – CPU broker policy: resource balancing (N cores -> N active threads) Some of the runnables are “GPU friendly” – i.e. contain a part in the Run() processing having both CPU and GPU implementations CPU thread taking GPU friendly runnable -> ask GPU broker if resources available – If yes scatter work and push to GPU, then thread goes to wait/notify, else just run on the CPU – … Not before notifying CPU resource broker who may decide to wake up a thread from the pool When result comes back from GPU, thread resumes processing At the end of a runnable cycle, CPU broker corrects the workload Keep both CPU and GPU busy, avoiding hyperthreading Runnable queue Sleep/Wake-up Active CPU threads GPU work embedded GPU broker Scatter/Gather Low latency Push/Sleep Notify() Resume CPU broker

Status of the vector transport prototype Andrei Gheata 12/12/12.

Similar presentations

Presentation on theme: "Status of the vector transport prototype Andrei Gheata 12/12/12."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Status of the vector transport prototype Andrei Gheata 12/12/12.

Similar presentations

Presentation on theme: "Status of the vector transport prototype Andrei Gheata 12/12/12."— Presentation transcript:

Similar presentations

About project

Feedback