1/13 Future computing for particle physics, 15-17 June 2011, Edinburgh A GPU-based Kalman filter for ATLAS Level 2 Trigger Dmitry Emeliyanov Particle Physics.

Slides:



Advertisements
Similar presentations
DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM I. Kisel (for CBM Collaboration) I. Kisel (for CBM Collaboration)
Advertisements

SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
IIAA GPMAD A beam dynamics code using Graphics Processing Units GPMAD (GPU Processed Methodical Accelerator Design) utilises Graphics Processing Units.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
A Memory-Efficient Algorithm for Large-Scale Symmetric Tridiagonal Eigenvalue Problem on Multi-GPU Systems Hyunsu Cho and Peter A. Yoon Trinity College,
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
TIME-05 Workshop, 3-7 October 2005, Zurich 1 Probabilistic Data Association Filter for the fast tracking in ATLAS Transition Radiation Tracker Dmitry Emeliyanov,
Development of a track trigger based on parallel architectures Felice Pantaleo PH-CMG-CO (University of Hamburg) Felice Pantaleo PH-CMG-CO (University.
HLT - data compression vs event rejection. Assumptions Need for an online rudimentary event reconstruction for monitoring Detector readout rate (i.e.
LiC Detector Toy CLIC-ILC Detector R&D, Geneva, 25 July 2008 W. Mitaroff, HEPHY Vienna LiC Detector Toy Vienna fast simulation and track fit tool for flexible.
A Fast Level 2 Tracking Algorithm for the ATLAS Detector Mark Sutton University College London 7 th October 2005.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
1 Vertex fitting Zeus student seminar May 9, 2003 Erik Maddox NIKHEF/UvA.
ALICE HLT High Speed Tracking and Vertexing Real-Time 2010 Conference Lisboa, May 25, 2010 Sergey Gorbunov 1,2 1 Frankfurt Institute for Advanced Studies,
CMS Full Simulation for Run-2 M. Hildrith, V. Ivanchenko, D. Lange CHEP'15 1.
Overview and Mathematics Bjoern Griesbach
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Enhancing GPU for Scientific Computing Some thoughts.
The LiC Detector Toy (LDT) Tracking detector optimization with fast simulation VERTEX 2011, Rust M. Valentan, R. Frühwirth, M. Regler, M. Mitaroff.
Many-Core Scalability of the Online Event Reconstruction in the CBM Experiment Ivan Kisel GSI, Germany (for the CBM Collaboration) CHEP-2010 Taipei, October.
Helmholtz International Center for CBM – Online Reconstruction and Event Selection Open Charm Event Selection – Driving Force for FEE and DAQ Open charm:
Tracking at LHCb Introduction: Tracking Performance at LHCb Kalman Filter Technique Speed Optimization Status & Plans.
Faster tracking in hadron collider experiments  The problem  The solution  Conclusions Hans Drevermann (CERN) Nikos Konstantinidis ( Santa Cruz)
A GPU Implementation of Inclusion-based Points-to Analysis Mario Méndez-Lojo (AMD) Martin Burtscher (Texas State University, USA) Keshav Pingali (U.T.
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
David N. Brown Lawrence Berkeley National Lab Representing the BaBar Collaboration The BaBar Mini  BaBar  BaBar’s Data Formats  Design of the Mini 
Speed-up of the ring recognition algorithm Semeon Lebedev GSI, Darmstadt, Germany and LIT JINR, Dubna, Russia Gennady Ososkov LIT JINR, Dubna, Russia.
GPU Architecture and Programming
University of Colorado Boulder ASEN 5070: Statistical Orbit Determination I Fall 2014 Professor Brandon A. Jones Lecture 18: Minimum Variance Estimator.
Tracking at Level 2 for the ATLAS High Level Trigger Mark Sutton University College London 26 th September 2006.
Progress report on Muon Reconstruction based on Kalman filter Y. Fisyak, BNL.
Updates on the P0D reconstruction
Magnetic Field Issues for Simulation and Reconstruction N. Amapane, N. Neumeister Workshop on LHC Physics with High-p T Muons in CMS Bologna, April 9-12,
Navigation Timing Studies of the ATLAS High-Level Trigger Andrew Lowe Royal Holloway, University of London.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
LCWS 06 Bangalore, India, March Track fitting using weight matrix Nick Sinev, University of Oregon.
Standalone FLES Package for Event Reconstruction and Selection in CBM DPG Mainz, 21 March 2012 I. Kisel 1,2, I. Kulakov 1, M. Zyzak 1 (for the CBM.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Normal text - click to edit HLT tracking in TPC Off-line week Gaute Øvrebekk.
V0 analytical selection Marian Ivanov, Alexander Kalweit.
The High Performance Simulation Project Status and short term plans 17 th April 2013 Federico Carminati.
Fast Tracking of Strip and MAPS Detectors Joachim Gläß Computer Engineering, University of Mannheim Target application is trigger  1. do it fast  2.
Global Tracking for CBM Andrey Lebedev 1,2 Ivan Kisel 1 Gennady Ososkov 2 1 GSI Helmholtzzentrum für Schwerionenforschung GmbH, Darmstadt, Germany 2 Laboratory.
Using Kalman Filter to Track Particles Saša Fratina advisor: Samo Korpar
HLT Kalman Filter Implementation of a Kalman Filter in the ALICE High Level Trigger. Thomas Vik, UiO.
Kalman Filtering And Smoothing
A parallel High Level Trigger benchmark (using multithreading and/or SSE)‏ Håvard Bjerke.
Kalman Filter based Track Fit running on Cell S. Gorbunov 1,2, U. Kebschull 2, I. Kisel 2,3, V. Lindenstruth 2 and W.F.J. Müller 1 1 Gesellschaft für Schwerionenforschung.
Parallelization of the SIMD Kalman Filter for Track Fitting R. Gabriel Esteves Ashok Thirumurthi Xin Zhou Michael D. McCool Anwar Ghuloum Rama Malladi.
Some GPU activities at the CMS experiment Felice Pantaleo EP-CMG-CO EP-CMG-CO 1.
Analysis experience at GSIAF Marian Ivanov. HEP data analysis ● Typical HEP data analysis (physic analysis, calibration, alignment) and any statistical.
STAR Simulation. Status and plans V. Perevoztchikov Brookhaven National Laboratory,USA.
1 Reconstruction tasks R.Shahoyan, 25/06/ Including TRD into track fit (JIRA PWGPP-1))  JIRA PWGPP-2: Code is in the release, need to switch setting.
AliRoot survey: Reconstruction P.Hristov 11/06/2013.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Track Reconstruction in MUCH and TRD Andrey Lebedev 1,2 Gennady Ososkov 2 1 Gesellschaft für Schwerionenforschung, Darmstadt, Germany 2 Laboratory of Information.
WP8 : High Level Trigger John Baines. Tasks & Deliverables WP8 Tasks: 1.Optimize HLT Tracking software for Phase-I 2.Optimize Trigger Selections for Phase-I.
A Kalman Filter for HADES
Dmitry Emeliyanov, Rutherford Appleton Laboratory
Kalman filter tracking library
ASEN 5070: Statistical Orbit Determination I Fall 2014
M. Kuhn, P. Hopchev, M. Ferro-Luzzi
Embedded Systems Design
Progress with MUON reconstruction
Silicon Tracking with GENFIT
Hellenic Open University
Fast Track Fitting in the SiD01 Detector
The LHCb Level 1 trigger LHC Symposium, October 27, 2001
Presentation transcript:

1/13 Future computing for particle physics, June 2011, Edinburgh A GPU-based Kalman filter for ATLAS Level 2 Trigger Dmitry Emeliyanov Particle Physics Department Rutherford Appleton Laboratory

Future computing for particle physics, June 2011, Edinburgh 2/13 Outline GPUs for ATLAS High-Level Trigger (HLT) Inner Detector Tracking at Level 2 Trigger GPU-based implementation for the track fitting algorithm Timing results GPU code optimization Discussion Conclusion and outlook ©NVidia

Future computing for particle physics, June 2011, Edinburgh 3/13 GPUs for ATLAS Trigger R&D programme for ATLAS higher luminosity upgrade includes Trigger system and HLT software In general, various upgrade approaches are possible: using more/better CPUs for HLT farms vectorization of HLT software to process more than one event per CPU core using GPUs for time-critical parts of HLT code which are suitable for SIMT parallelization The GPU-based option is feasible since ATLAS HLT uses dedicated farms which can be, in principle, equipped with GPU cards. To prove this feasibility a few GPU-accelerated algorithms for Level 2 Trigger (LVL2) tracking have been developed.

Future computing for particle physics, June 2011, Edinburgh 4/13 Tracking at LVL2 Trigger LVL2 operates independently on Region-of-Interests (RoI) identified by the Level 1 Trigger: Cross-section view Level 1 Trigger RoI Data preparation data requests ATLAS detector raw data spacepoints LVL2 track finding by Hough transform combinatorial track finding track candidates Tracks Kalman track fit interaction vertex finding: ATLAS z-axis pp Hough transform in space

Future computing for particle physics, June 2011, Edinburgh 5/13 The track fit provides track parameter estimates and covariance matrices at both ends of a track: Track fitting at LVL2 Trigger The track fitter consists of a forward Kalman filter and a backward smoother: Implemented in C++ using OO techniques and STL containers

Future computing for particle physics, June 2011, Edinburgh 6/13, – hit coordinates, – measurement matrix, – gaussian noise with zero mean and covariance The track parameter and covariance update by the Kalman filter: The Kalman filter algorithm The Kalman filter is a recursive algorithm which updates track parameter estimate and covariance using a hit and the extrapolated estimate and covariance Hits are described by a linear model: – the Kalman gain matrix, – “hit-track ”residual: hit updated track position updated track position

Future computing for particle physics, June 2011, Edinburgh 7/13 Adapting the fitter for GPU The Kalman filter/smoother code is sequential – use track- level parallelism : –one thread per track – N tracks are fitted by N threads in parallel Limitations/requirements: –no dynamic memory allocation is possible –arrays and structs used instead of STL vectors and C++ objects Data structures: structs of arrays (SoA): –Array1 : [data for thr #1][data for thr #2]…[data for thr #N] Vector data types (float4) for compact representation of data: detector plane geometry and magnetic field: – plane centre c (3 floats), rotation matrix m (9 floats), field B at the centre (3 floats), radiation thickness x0 (1 float) – packed into 4 float4 – and each float4 can be read at once

Future computing for particle physics, June 2011, Edinburgh 8/13 Using single FP precision Double-precision arithmetic is available on modern GPU –but double precision performance is typically half that of single one –using single precision halves the volume of data to/from GPU The original fitter code is sensitive to single precision: –backward smoother suffers from divergence of covariance BIG small => Loss of precision A trivial fix: using GeV instead of MeV for track momentum reduces dynamic range of covariance matrix elements Using a different smoothing algorithm to improve numerical stability of the fitter

Future computing for particle physics, June 2011, Edinburgh 9/13 FPS: Fixed-Point Smoother An FPS is just a Kalman filter which operates on an extended track parameter vector and covariance : Parameters at the beginning of track Parameters at current measurement (k) Advantages: Single-pass algorithm which does not require covariance inversion (needed for traditional smoother) No need to store track parameter estimates and covariances for intermediate measurements – great saving of GPU memory Bonus: a cross-covariance is readily available – comes very handy if we want to add more measurements / merge two fitted track segments

Future computing for particle physics, June 2011, Edinburgh 10/13 Timing results CPU: Intel Westmere 2.4 GHz, GPU: NVIDIA Tesla C2050 (Fermi arch.) Data: full ATLAS Monte Carlo simulation muon tracks,, arranged into “events” with N tracks up to 3000 I/O size (3000 tracks): Input data4.6 MBOutput tracks0.3 MB x12

Future computing for particle physics, June 2011, Edinburgh 11/13 GPU code optimization A set of optimizations has been applied: GPU time, ms Number of tracks 1.Original code 2.32 threads/block 3.Reduced memory footprint (fewer local variables, upper-triangular covariance matrix 4.Track state (cov. + parameters) stored in fast (“shared”) memory 5.Jacobian in “shared” memory to speed-up calculation The optimized code gives ~20x speed-up w.r.t. the CPU Track state Extrapolation Update x2.5

Future computing for particle physics, June 2011, Edinburgh 12/13 Discussion 20x speed-up seen at 3000 tracks Data transfer via PCI-E (time, I/O size) is not a bottleneck For low number of tracks the GPU-based track fitting is only slightly faster than that on CPU : N threads = N tracks – not enough to occupy all GPU cores Track-level parallelism is not suitable it number of tracks is low Possible solutions for low multiplicity events: process tracks from a few RoIs in parallel use a different algorithm which would allow for mixed parallelism: using separate threads for track parameters and track covariance extrapolations using parallel matrix multiplications in track covariance extrapolation

Future computing for particle physics, June 2011, Edinburgh 13/13 Conclusion and outlook The design of GPU-based Kalman track fit for ATLAS Level 2 Trigger has been presented The techniques for the fitter code optimization have been demonstrated The optimized GPU-based track fitter shows a speed-up factor of 20 for high track multiplicity events Future work: exploiting inherent parallelism of the track state extrapolation