Presentation is loading. Please wait.

Presentation is loading. Please wait.

CA+KF Track Reconstruction in the STS I. Kisel GSI / KIP CBM Collaboration Meeting GSI, February 28, 2008.

Similar presentations


Presentation on theme: "CA+KF Track Reconstruction in the STS I. Kisel GSI / KIP CBM Collaboration Meeting GSI, February 28, 2008."— Presentation transcript:

1 CA+KF Track Reconstruction in the STS I. Kisel GSI / KIP CBM Collaboration Meeting GSI, February 28, 2008

2 28 February 2008, GSIIvan Kisel, GSI2/14 Track Finder: what is the next Step? Optimize the STS geometry (strips, sector navigation) Optimize the STS geometry (strips, sector navigation) Mathematical and computational optimization Mathematical and computational optimization SIMDization of the algorithm (from scalars to vectors) SIMDization of the algorithm (from scalars to vectors) MIMDization (multi-threads, multi-cores) MIMDization (multi-threads, multi-cores) High track density High track density Non-homogeneous magnetic field Non-homogeneous magnetic field Fake space points are dominated Fake space points are dominated Single-sided strip detectors Single-sided strip detectors Detector inefficiency Detector inefficiency Not perfectly aligned system Not perfectly aligned system On-line event selection On-line event selection Large PC farm Large PC farm

3 28 February 2008, GSIIvan Kisel, GSI3/14 Data Acquisition System EventBuilderNetwork 100 ev/slice Detector PC Farm 10 7 ev/s 10 5 sl/s 50 kB/ev 5 MB/slice N x M SchedulerScheduler Sub-Farm RU RU RU RU RU RU RU RU RU RU RU RU RU RU RU RU Farm Control System Sub-Farm SF n available SF n tt MAPSSTSRICHTRDECAL SF n tt MAPSSTSRICHTRDECAL SF n tt tt tt tt 10 ? PCs

4 28 February 2008, GSIIvan Kisel, GSI4/14 Cell Blade – a Sub-Farm with (2+16) Cores Tracking and Vertexing Units Sub-Farm Management Unit Sub-Farm Decision/Selection Unit FPGA PCPCPCPCPCSub-Farm

5 28 February 2008, GSIIvan Kisel, GSI5/14 Welcome to the Era of Multicore HPC Gaming STI: Cell STI: CellGaming GP GPU Nvidia: Tesla Nvidia: Tesla GP GPU Nvidia: Tesla Nvidia: Tesla GP CPU Intel: Larrabee Intel: Larrabee GP CPU Intel: Larrabee Intel: Larrabee CPU/GPU AMD: Fusion AMD: FusionCPU/GPU ?? High performance computing (HPC) High performance computing (HPC) Highest clock rate is reached Highest clock rate is reached Performance/power optimization Performance/power optimization Heterogeneous systems of many (>8) cores Heterogeneous systems of many (>8) cores Similar programming languages (Ct and CUDA), but standards are unlikely Similar programming languages (Ct and CUDA), but standards are unlikely We need a uniform approach to all CPU/GPU families We need a uniform approach to all CPU/GPU families How to take advantage of the additional cores? How to take advantage of the additional cores?

6 28 February 2008, GSIIvan Kisel, GSI6/14 NVIDIA GeForce 9600 GT GPU: 64 Cores 64 processors 64 processors 1.625 GHz frequency 1.625 GHz frequency double precision (?) double precision (?) 170 EUR price 170 EUR price

7 28 February 2008, GSIIvan Kisel, GSI7/14 Intel Polaris: 80 Cores 3.16 GHz, 0.95 Volt, 62 Watt -> 1.01 Teraflops 3.16 GHz, 0.95 Volt, 62 Watt -> 1.01 Teraflops

8 28 February 2008, GSIIvan Kisel, GSI8/14 Cell Processor: 1+8 Cores

9 28 February 2008, GSIIvan Kisel, GSI9/14 Computer Physics Communications 178 (2008) 374-383

10 28 February 2008, GSIIvan Kisel, GSI10/14 Speed-up of the Kalman Filter Track Fit

11 28 February 2008, GSIIvan Kisel, GSI11/14 Structure and Data: a Bottleneck cbmroot/L1 L1Algo L1Geometry L1Event (L1Strips, L1Hits) L1Tracks Strips: float vStripValues[NStrips]; // strip coordinates (32b) unsigned char vStripFlags [NStrips]; // strip iStation (6b) + used (1b) + used_by_dublets (1b)Hits: struct L1StsHit { unsigned short int f, b; // front (16b) and back (16b) strip indices }; L1StsHit L1StsHit vHits[NHits]; unsigned short int vRecoHits [NRecoHits]; // hit index (16b) unsigned char vRecoTracks [NRecoTracks]; // N hits on track (8b) class L1Triplet{ unsigned short int w0; // left hit (16b) unsigned short int w1; // first neighbour (16b) or middle hit (16b) unsigned short int w2; // N neighbours (16b) or right hit (16b) unsigned char b0; // chi2 (5b) + level (3b) unsigned char b1; // qp (8b) unsigned char b2; // qp error (8b) } Input: Output: Internal: L1Algo A standalone L1Algo module 300 kB About 300 kB per central event

12 28 February 2008, GSIIvan Kisel, GSI12/14 Parallelization of the CA Track Finder 1 Create tracklets 2 Collect tracks GSI, KIP, CERN

13 28 February 2008, GSIIvan Kisel, GSI13/14 Kalman Filter Track Fit on Multicore Systems: Multithreading Real fit time/track (us)‏ #tasks Logarithmic scale! Håvard Bjerke

14 28 February 2008, GSIIvan Kisel, GSI14/14 Summary and Plans SIMDized CA track finder works well SIMDized CA track finder works well Work on single-sided strip detectors started Work on single-sided strip detectors started Multithreaded Kalman filter track fit Multithreaded Kalman filter track fit  Learn Ct (Intel) and CUDA (Nvidia) programming languages  Investigate large multi-core systems (CPU and GPU)  Parallelize the CA track finder  Parallel hardware -> parallel languages -> parallel algorithms

15 28 February 2008, GSIIvan Kisel, GSI15/14 Double-Sided vs. Single-Sided Strip Detectors: Tracking Efficiency D-S: Efficiency, % Track category S-S: Efficiency, % 96.0Reference set (>1 GeV/c)94.5 89.6All set (>=4 hits, >100 MeV/c)87.4 76.5Extra set (<1 GeV/c)73.2 0.8Clone1.1 3.4Ghost5.0 661MC tracks/event found654 0.8Time/event, s25.6 Standard geometry with all strips Standard geometry with all strips Thickness is the same for D-S and S-S strip stations Thickness is the same for D-S and S-S strip stations Front stations positioned as in sts_allstrips.geo, back stations shifted in Z st +1cm Front stations positioned as in sts_allstrips.geo, back stations shifted in Z st +1cm Fake space points are produced as in the double-sided scenario (within the same sector) Fake space points are produced as in the double-sided scenario (within the same sector) True space points taken from MC (different sectors possible) True space points taken from MC (different sectors possible) No SIMDization No SIMDization No sorting of strips No sorting of strips No sector navigation No sector navigation No memory optimization No memory optimization


Download ppt "CA+KF Track Reconstruction in the STS I. Kisel GSI / KIP CBM Collaboration Meeting GSI, February 28, 2008."

Similar presentations


Ads by Google