Presentation is loading. Please wait.

Presentation is loading. Please wait.

ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN.

Similar presentations


Presentation on theme: "ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN."— Presentation transcript:

1 ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN

2 Overview ▶ O 2 Project ▶ Upgrade for the Offline and Online computing ▶ Members of HLT, DAQ, Offline  Build a unified computing system for after LS2 ▶ Guided tour of the O 2 TDR (submitted to LHCC April 20 2015) ▶ Rationales ▶ General idea and architecture ▶ Computing needs B. von Haller | O2 Project | 19.05.2015 1

3 Rationales 1.After LS2, LHC will deliver minimum bias PbPb at 50 kHz  ~100 x higher rate than now 2.Running scenarios ▶ Goal: 13 nb −1 for Pb–Pb collisions (minimum bias) 3.Physics topics addressed by ALICE upgrade ▶ Very small signal-to-noise ratio and large background ▶ Requires very large statistics ▶ Triggering techniques very inefficient if not impossible  Too much data to be stored  Compress data intelligently by processing it online B. von Haller | TDR EC | 11.03.2015 2

4 Readout B. von Haller | O2 Project | 19.05.2015 3 DetectorMax read-out rateData rate for Pb-Pb collisions at 50kHz Average data size per interaction kHzGB/sMB TPC501012 (92.5%)20.7 ITS10040 (3.6%)0.8 TRD90.920 (1.8%)0.5 MFT10010 (0.9%)0.2 Other detectors -11.269 (1.2%)0.25 Total109322.4 Number of linksNumber of boards DDL1DDL2GBTCRORCCRU 1540799813463 Read-out parameters Detector links and read-out boards TPC : continuous readout to cope with the 50 kHz interaction rate

5 O 2 /T0/T1T0/T1 Archive CTF AOD Storage EPNs O(1000) FLPs O(100) O 2 architecture (1) B. von Haller | O2 Project | 19.05.2015 4 Raw data input Local processing Frame dispatch Global processing Compressed timeframes Partially compressed sub-timeframes Storage Synchronous Data Reduction 0 e.g. clustering Sub-timeframes Calibration 0 on local data, ie. partial detector Time slicing Buffering Local aggregation QC Tagging Detector reconstruction e.g. track finding Timeframe building Full timeframe Data Reduction 1 Calibration 1 on full detectors e.g. space charge distortion QC Detectors electronics TPCTRD … Trigger and clock ITS … Detector data samples interleaved with synchronized heartbeat triggers

6 O 2 architecture (2) B. von Haller | O2 Project | 19.05.2015 5 O 2 /T0/T1 EPNs Compressed timeframes T0/T1 Archive Storage Condition & Calibration Database Quality Control Sub-timeframes Timeframes Compressed timeframes AOD CCDB Objects Asynchronous Synchronous QC data CTF AOD Storage Compressed timeframes O 2 /T0/T1 O(1) ESD, AOD

7 O 2 /T0/T1 O(1) Event extraction Tagging Global reconstruction QC AOD extraction Calibration 2 O 2 architecture (3) B. von Haller | O2 Project | 19.05.2015 6 O 2 /T0/T1 Reconstruction passes and event extraction Compressed timeframes T0/T1 Archive Analysis Storage Simulation Asynchronous CTF AOD Analysis Facilities Storage Histograms, trees O(1) Analysis AOD Storage T2 Simulation CTF AOD O(10) QC Reconstruction Event building AOD extraction ESD, AOD Event Summary Data Analysis Object Data ESD, AOD Compressed timeframes

8 Computing Model B. von Haller | O2 Project | 19.05.2015 7 T0/T1 CTF -> ESD -> AOD AF AOD -> HISTO, TREE O 2 RAW -> CTF -> ESD -> AOD 1 T2/HPC MC -> CTF -> ESD -> AOD 1..n 1..3 CTF AOD

9 O 2 software design ▶ Message-based multi-processing ▶ Ease of development ▶ Ease to scale hor izontally ▶ Possibility to extend with different hardware ▶ Multi-threading within processes possible ▶ ALFA : ALICE-FAIR concurrency framework ▶ Provides data transport layer ▶ ZeroMQ ▶ Arbitrary payload B. von Haller | O2 Project | 19.05.2015 8 Libraries and tools ALFA Cbm ALICE O2 Panda FairRoot.......

10 Physics software design Processing workflow B. von Haller | O2 Project | 19.05.2015 9 EPN: synchronous asynchronousAll FLPs Raw data Local Processing E.g. Clusterization Calibration Detector Reconstruction E.g. TPC & ITS Track finding CTF AOD Step 1Step 2Step 3Step 4 Inter-detector matching procedures Final calibration, 2nd matching Final matching, PID, Event extraction Step 0

11 Technology survey (1) Comparison GPU and CPU for the Fast Cluster Finder B. von Haller | O2 Project | 19.05.2015 10 Performance of the FPGA-based FastClusterFinder algorithm for DDL1 and DDL2 compared to the software implementation on a recent server PC.  FPGA is the selected platform in this case

12 Technology survey (2) Comparison CPU vs GPU for the HLT TPC CA Tracker B. von Haller | O2 Project | 19.05.2015 11 Tracking time of HLT TPC Cellular Automata tracker on Nehalem CPU (6Cores) and NVIDIA Fermi GPU.  GPU is the selected platform in this case

13 Demonstrators – TPC CA Tracker B. von Haller | O2 Project | 19.05.2015 12  Verified linear rise of processing time of TPC track finding for data samples corresponding to timeframe of 1 ms

14 Computing requirements for processing B. von Haller | O2 Project | 19.05.2015 13 Computing requirements -> Total : ~ 100000 CPU cores 5000 GPU chips Goes together, merging and fitting can run on GPUs too Being ported to GPU, conversion factor unknown Theoretically could run on GPU

15 Data reduction – TPC B. von Haller | O2 Project | 19.05.2015 14  Data reduction factor of 20 for the TPC is feasible

16 Data reduction – Global B. von Haller | O2 Project | 19.05.2015 15 Data rates for input to O2 system and output to permanent storage for routine data taking with Pb–Pb at 50 kHz interaction rate.

17 Data types characteristics B. von Haller | O2 Project | 19.05.2015 16 ▶ TF size - Duration of the time window (t TF ) ▶ Data lost at the edges: 0.1/t TF (ms) ▶ For calibration and reconstruction: 20ms - 100ms ▶ Shorter is better for buffering and distribution  20ms (1000 interactions in Pb-Pb at 50kHz) Data typeSize (GB)Tape copy TF (Pb-Pb)10No CTF (Pb-Pb)1.6Yes ESD15% of CTFNo AOD10% of CTFYes MC100% of CTFNo MCAOD30% of ESDYes HISTO1% of ESDNo

18 Data storage requirements B. von Haller | O2 Project | 19.05.2015 17 Number of simulated events and storage requirements Number of reconstructed collisions and storage requirements for scenarios. ~55 PB

19 B. von Haller | O2 Project | 19.05.2015 18 2 CRUs per FLP Detectors 8100 Read-out Links 250 FLPs 2 GPUs per EPN 1500 EPNs Input: 250 ports Output : 1500 ports 1500 x 60MB/s 1.2 TB/s Switching Network 500 GB/s 90 GB/s Storage O2 facility design (1)

20 O2 facility design (2) B. von Haller | O2 Project | 19.05.2015 19 Network layout 2 : 4 independent EPN subfarms FLP 1 4 x 10 Gb/s Network Sub-Farm 4 EPN 1126 1500 FLP 256 Network Sub-Farm 3 Network Sub-Farm 2 Network Sub-Farm 1 EPN 751 1125 EPN 376 750 EPN 1 375 10 Gb/s … … … … …

21 O2 facility design (3) B. von Haller | O2 Project | 19.05.2015 20 FLP EPN FLP EPN 25 1 1 30 1 40/56 Gb/s SEPN 1 10 Gb/s 1 EPN 1471 1500 10 Gb/s 50 FLP 250 226 10 10 X 40/56 Gb/s 50 2 X 40/56 Gb/s Network layout 3 : Super-EPNs

22 O2 facility design (4) Simulation – Link speed B. von Haller | O2 Project | 19.05.2015 21 Left : Network Layout 2 : Link speed on the FLPs and EPNs for a network layout with 4 EPN subfarms for 100 parallel transfers from the FLPs. Right : Network Layout 3 : Link speed on the FLPs and Super-EPNs (configuration based on an Infiniband network at 56 Gb/s)

23 O2 facility design (6) Simulation - system scalability B. von Haller | O2 Project | 19.05.2015 22 Latency of the timeframes for different interaction rates using layout 2 (left) and layout 3 (right)  Layout 2 is cheaper but scales up to 90kHz only.

24 O2 facility – Power and cooling B. von Haller | O2 Project | 19.05.2015 23

25 Schedule B. von Haller | O2 Project | 19.05.2015 24 201520162017201820192020 Today 6/15 ITS half-layer test 1/17 TPC read-out test 4/17 ITS surface test 9/18 TPC RCUs installation CR1 1/19 Data taking Cosmics with core detectors 7/19 TPC pre-commissioning on surface 7/19 TPC commissioning in cavern 1/20 End of commissioning 6/20 O2 system v1 - 1 CRU, 1 FLP, basic data processing, control, logging, QC, monitoring 1/17 O2 system v2 - 1 detector (e.g. ITS) full read-out capability 4/18 10% Data processing and storage HW installation 9/18 90% Data processing and storage HW installation 11/19 Full system ready 2/20 Detectors milestones O2 milestones

26 Conclusion ▶ O 2 is a new project with very ambitious requirements ▶ > 1TB/s detector input, ~100x more than today ▶ Online synchronous compression factor of 14 ▶ Major paradigm change with combined offline and online computing ▶ 1 framework, ▶ 1 facility ▶ Challenging schedule ▶ TDR submitted B. von Haller | O2 Project | 19.05.2015 25

27 ▶ TDR draft available here : https://cds.cern.ch/record/2011297 https://cds.cern.ch/record/2011297 ▶ Thank you for your attention B. von Haller | O2 Project | 19.05.2015 26

28 Introduction Chapter 1 B. von Haller | O2 Project | 19.05.2015 27 Asynchronous data processing Event extraction Compressed Sub-Timeframes Continuous and triggered streams of raw data Data aggregation Synchronous global data processing Data storage and archival Compressed Timeframes Recons- tructed events Compressed Timeframes Readout Data aggregation Local data processing Detectors electronics 1.After LS2, LHC will deliver min bias Pb-Pb at 50 kHz ▶ 100 x more data than today 2.Physics topics addressed by ALICE upgrade ▶ Very small signal-to-noise ratio and large background ▶ Triggering techniques very inefficient if not impossible ▶ Needs large statistics 3.Running scenarios ▶ Goal: 13 nb −1 for Pb–Pb collisions (minimum bias)  Too much data to be stored  Compress data intelligently by processing it online

29 O 2 software design (3) Chapter 7 – Data Format B. von Haller | O2 Project | 19.05.2015 28

30 O 2 software design (2) ▶ Facility control, configuration and monitoring ▶ CCM will combine control of data taking and of asynchronous data processing ▶ 140000 commands to 70000 processes (peak) ▶ 600 kHz monitoring data Chapter 7 B. von Haller | O2 Project | 19.05.2015 29 Control, Configuration and Monitoring LHCTrigger Status/ Monitoring data Status DCSGrid Commands/ Configuration data Status Commands/ Configuration data Status/ Monitoring data Grid Jobs Status Commands O 2 Processes

31 O 2 software design (4) ▶ Dedicated FLP for DCS ▶ O 2 process retrieves conditions data and insert them into DCS data frames  The required DCS data are embedded in the data  They are available for reconstruction and calibration after the frame building Chapter 7 - DCS B. von Haller | O2 Project | 19.05.2015 30

32 Physics programme and data taking scenarios Chapter 2 B. von Haller | O2 Project | 19.05.2015 31 ALICE running scenarios : YearSystem√s NN L int N collisions (TeV)(pb -1 )(nb -1 ) 2020 pp140.42.7 · 10 10 Pb-Pb5.52.852.3 · 10 10 2021 pp140.42.7 · 10 10 Pb-Pb5.52.852.3 · 10 10 2022 pp140.42.7 · 10 10 pp5.564 · 10 11 2025 pp140.42.7 · 10 10 Pb-Pb5.52.852.3 · 10 10 2026 pp140.42.7 · 10 10 Pb-Pb5.51.41.1 · 10 10 p-Pb8.85010 11 2027 pp140.42.7 · 10 10 Pb-Pb5.52.852.3 · 10 10

33 Requirements (1) DetectorMax read-out rateData rate for Pb-Pb collisions at 50kHz Average data size per interaction kHzGB/sMB ACO1000.0140.00028 CTP2000.020.0004 EMC5040.08 FIT500.1150.023 HMP2.50.060.024 ITS10040 (3.6%)0.8 MCH1002.20.04 MFT10010 (0.9%)0.2 MID1000.30.006 PHS5020.04 TOF2002.50.05 TPC501012 (92.5%)20.7 TRD90.920 (1.8%)0.5 ZDC1000.060.0012 Total109322.4 Input rates B. von Haller | O2 Project | 19.05.2015 32

34 Requirements (2) DetectorNumber of linksNumber of read-out board DDL1DDL2GBTCRORCCRU ACO 11 EMC 204 FIT 21 HMP 143 ITS 49523 MCH 48020 MFT MID 22 PHS 163 TOF 723 TPC 5904360 TRD 104454 ZDC 11 CTP 21 Total1540799813463 Read-out B. von Haller | O2 Project | 19.05.2015 33

35 Project organisation (1) Chapter 11 B. von Haller | O2 Project | 19.05.2015 34 Total : 112 FTE for the period 2015-19 Compatible with the 120 FTEs from institutes TasksInsitutesHuman Resources (FTE) Architecture CERN, FIAS,GSI, IRI 2 Tools, procedure and software process CERN, IPNO, JU, LIPI, WRCP 2 Data flow, detector read-out CALTECH, CERN, FESB, FIAS, IRI, LIPI, WRCP 12 Computing platforms CERN, FIAS, IRI, JU, KISTI, KMUTT, KU, ORNL 12 Software framework and data model CERN, IPNO, GSI, LBNL 14 Calibration JU, WSU 16 Reconstruction CERN, FESB, GSI, IPHC, LIPI, LPC, SUBATECH, UH, WSU 16 Physics simulation CERN, CU, IPHC, IPNO, LBNL, ORNL, UH, UTK 14 Data Quality monitoring and visualization CERN, ISS, JU, WUT 6 Control, configuration, monitoring and logging ASCR, CALTECH, CERN, CU, KMUTT, IRI 10 O2 facility hardware procurement, installation CERN, FIAS, IRI, GSI 8 O2 facility and grid/cloud operations CERN, KISTI M&O

36 O2 facility design (7) ▶ Demonstrators, e.g. ▶ Existing HLT TPC algorithms interfaced to the new ALFA framework ▶ HLT development cluster infrastructure with ~40 nodes, 30 nodes with GPU hardware ▶ FLP and EPN data distribution and transport devices ▶ Verified in the prototype ▶ TPC reconstruction topology using 2011 PbPb data ▶ FLP-EPN data transport network with 36 FLPs and 28 EPNs ▶ Reproduced the performance of HLT TPC processing in ALFA ▶ Verified linear rise of processing time of TPC track finding for data samples corresponding to timeframe of 1 ms ▶ Ongoing work Chapter 10 B. von Haller | O2 Project | 19.05.2015 35

37 Project organisation (3) Chapter 11 B. von Haller | O2 Project | 19.05.2015 36 A BC Milestones relative to the framework and the facility at P2 Q1 2017 Version 1 (A) 1 CRU + QC (e.g. ITS half-layer test) Q2 2018 Version 2 (B) 1 detector full read-out (e.g. ITS or TPC surface test) Q4 2019 P2 installation and commissioning (C) All FLPs 10% EPNs Q2 2020 Production Full deployment

38 B. von Haller | O2 Project | 19.05.2015 37 2015 Today FebruaryMarchAprilMayJune Submission of the TDR to the LHCC 20/4/2015 Presentation of the TDR to the LHCC 2/6/2015 5/2/2015 - 18/2/2015 Comments on the TDR by the O2 project members 19/2/2015 - 1/3/2015 TDR editing 23/2/2015 - 27/2/2015 Proof-reading (Frank) 2/3/2015 - 15/3/2015 Comments on the TDR by the whole ALICE Collaboration 17/3/2015 ALICE internal review 18/3/2015 - 5/4/2015 Modification by the authors 6/4/2015 - 19/4/2015 Final editing of the TDR before submission 20/4/2015 - 31/5/2015 LHCC review 1/6/2015 - 4/6/2015 LHC Committee Prof. Borut Paul Kersevan, ATLAS (former computing coordinator) Tonko Ljubicic, BNL, STAR (Online project leader) Niko Neufeld, CERN, LHCb (Online) TDR Schedule

39 O2 facility design (5) Simulation – Bisection data traffic B. von Haller | O2 Project | 19.05.2015 38 Bisection data traffic in the system for one of the 4 EPNs subfarms of layout 2 (left) and for the whole layout 3 (right)

40 B. von Haller | O2 Project | 19.05.2015 39

41 B. von Haller | O2 Project | 19.05.2015 40

42 B. von Haller | O2 Project | 19.05.2015 41

43 B. von Haller | O2 Project | 19.05.2015 42


Download ppt "ALICE O 2 project B. von Haller on behalf of the O 2 project 19.05.2015 CERN."

Similar presentations


Ads by Google