2nd ASTERICS-OBELICS Workshop ASTERICS & KM3NeT 2nd ASTERICS-OBELICS Workshop 16-19 October 2017, Barcelona, Spain. H2020-Astronomy ESFRI and Research Infrastructure Cluster (Grant Agreement number: 653477). 10/10/2017 ASTERICS-OBELICS Workshop 2017 / Barcelona
CORELib: A COsmic Ray Event LIBrary for Open Access (D-ANA) ASTERICS & KM3NeT CORELib: A COsmic Ray Event LIBrary for Open Access (D-ANA) Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT Introductions Cosmic rays are a common background source for experiments in astroparticle physics and neutrino astronomy. The requirements of computing power needed to simulate air showers are heavily dependent on the energy window of interest, the simulated processes, the minimum energy of products and the inclination of the primaries. CORELib is a cosmic ray event library that is meant to be open to access to satisfy a broad range of needs. Although models are always changing and improving, there is a need for a reference dataset suitable also to develop and compare the performances of reconstruction and classification algorithms. The status of production is reviewed and the challenges in data sharing are discussed. Bernardino Spisso, INFN
Computing centres and pools provide resources for the KM3NeT ASTERICS & KM3NeT Computing centres and pools provide resources for the KM3NeT Tier Computing Facility Main Task Access Tier-0 at detector site online processing direct access, direct processing Tier-1 CC-IN2P3 general offline processing and central data storage direct access, batch processing and grid access CNAF grid access ReCaS general offline processing, interim data storage HellasGrid reconstruction of data HOU computing cluster simulation processing batch processing Tier-2 local computing clusters simulation and analysis varying Bernardino Spisso, INFN
KM3NeT on the GRID VO Central Services ASTERICS & KM3NeT KM3NeT on the GRID VO Central Services Service Site Authentication/authorization system VOMS RECAS-NAPOLI User Interface RECAS-NAPOLI, HellasGrid-Okeanos, CNAF, Frascati Logical File Catalog Job submission and management system (WMS) HellasGrid-Afroditi KM3NeT is starting on the GRID Main task: CORELib(COsmic Ray Event Library) Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT CORELib CORELib: COsmic Ray Event Library Background to many experiments Also a tuning benchmark Potentially useful to other communities Currently using CORSIKA as generator Status of production Proton-induced showers (1° delivery production): HE models: QGSJET01 with CHARM, QGSJET01 with TAULEP, QGSJET-II with TAULEP, EPOSLHC with TAULEP LE model: GHEISHA about 21M Evts per HE model 7 energy bins (2×102GeV-103GeV+equally logarithmically spaced from 1TeV to 109GeV) power-law spectrum with -2 spectral index zenith angle from 0 to 89 degrees Nuclei-induced showers: HE models: QGSJET01 with CHARM, QGSJET01 with TAULEP, QGSJET-II with TAULEP, EPOS-LHC with TAULEP 7 energy bins (A×2×102GeV-A×103GeV+equally logarithmically spaced from A×1TeV to A×109GeV) Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT CORELib CORELib: COsmic Ray Event Library Status of production Production done with and without Cherenkov radiation Energy range (GeV) Number of events 200-1000 107 103-104 104-105 106 105-106 105 106-107 104 107-108 103 108-109 102 High energy model Low energy model Option TAULEP CHARM QGSJET01 GHEISHA X QGSJETII-04 EPOS LHC About 21M events per HE model (~1% of the total production foreseen by KM3NeT) Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT CORELib CORSIKA - COsmic Ray SImulation for KAscade (Dieter Heck, Tanguy Pierog, Johannes Knapp et al.) is a program for detailed simulation of extensive air showers initiated by high energy cosmic ray particles. Protons, light nuclei up to iron, photons, and many other particles may be treated as primaries. CORSIKA produces two different main output types: Control output (text files) Particle List (binary files) The DAT file contains the information on the shower secondary particles The CER file contains the photons produced by the Cherenkov effect Each event represents a different simulated shower Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT CORELib CORSIKA is a program based on Monte Carlo approach to study the evolution and the features of particle showers in the atmosphere. • Initially developed to run simulations for the KASCADE experiment located in Karlsruhe, Germany • Now CORSIKA can simulate a particle shower varying the atmosphere parameterization and the observation level In CORELib we chose the sea level as observation level and the standard European atmosphere. This choice makes it suitable for possible usage by other communities Notice: KM3NeT could ignore Cherenkov photons and near-horizontal muons, but we chose to include them as a service to the community Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT CORELib Some plots of the output showers varying the primary particle at the energy of 104 GeV Photon Proton He Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT Time estimation The following results have been calculated analysing the CORSIKA output files. The computation time is the difference between the last and the first “PRESENT TIME” UTC date in the standard output of CORSIKA, thus has an uncertainty of 2 seconds plus a (negligible?) systematic shift due to the program & libraries load time. Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT Size estimation The event size has been calculated by dividing the total size of the file by the number of events. Both curves are linear in energy (exponential in logarithm of energy). Bernardino Spisso, INFN
ASTERICS & KM3NeT Ongoing production motivations Physics biasing: replaces the natural distribution of some process with “fake” PDFs that limit events to what is useful for your simulation Primary particle biasing (variance reduction): Increase number of primary particles generated in a particular phase space region of interest, PDFs of primary particle is appropriately modified Use case: Increase number of high energy particles in cosmic ray spectrum Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT Ongoing production Proton-induced showers: - New production on going using flat spectrum (estimated 32X output size increase! ) HEMODELS: QGSJET01 with CHARM, QGSJET01 with TAULEP, QGSJET-II with TAULEP, EPOSLHC with TAULEP LEMODEL: GHEISHA about 15M Evts per HE model 7 energy bins (2×102GeV-103GeV+equally logarithmically spaced from 1TeV to 109GeV) Flat power-law spectrum with 0 spectral index zenith angle from 0 to 89 degrees 3.000.000 hi-energy events ( 107-109 GeV) Vs. 1.100 of the previous productions Estimated about 10 days of computation time on 1064 cores for each HE model Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT Ongoing production Flat distributed events among the energy bins. Energy range (GeV) Number of events 200-1000 15 x105 103-104 104-105 105-106 106-107 107-108 108-109 High energy model Low energy model Option TAULEP CHARM QGSJET01 GHEISHA X QGSJETII-04 EPOS LHC About 10M events per HE model. Bernardino Spisso, INFN
ASTERICS & KM3NeT Data storage and sharing The estimated total data amount at the end of the ongoing production is around 10 TB. Where and how to store/share the data? Dedicated HW ? Cloud? EUDAT? (nevertheless the standard account is address for documents and small scale data (20 GB per record) Now the first productions (about 600 GB) are stored and available via SFTP in a local server hosted at the University of Salerno. Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT Temporary repository CORELib can be downloaded via SFTP at corelib@193.205.188.227 pwd = Asterics2020 The CORSIKA output binary files are supplied in a compressed tar file together with a text file containing all the standard output of the corresponding run. Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT The first two productions are contained the directory “Standard“. There are two variants for Cherenkov productions sharing the same features They differ only for the total number of files: The “Cherenkov“ directory contains the production split in 1160 files The “Cherenkov-201“ directory contains the production split in 201 files All the Cherenkov production files are contained in 4 subdirectories named after the HE-models. Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT The details for each file for the first two productions are reported in the Standard.xlsx file which is an Excel 2016 spreadsheet. Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT For the Cherenkov runs, besides Cherenkov.xlsx and Cherenkov-201.xlsx spreadsheet files, there are two sqlite files named Cherenkov.db and Cherenkov-201.db which can be queried using SQL (i.e. using the sqlite3 program) These databases contain summary information about each run. Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT Depending on the user needs, the CORSIKA output can be used directly as a binary file or can be translated in ASCII human readable format. There are various possible “translators” (e.g. for KM3NeT/ANTARES, CORANT is used) The very general converter named “corsikaread” is supplied with CORSIKA in the src/utils/ directory. Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT Another way to handle output files is to use the C++ free COAST library (https://web.ikp.kit.edu/rulrich/coast.html) which provides tools to convert the standard binary CORSIKA output in ROOT file. COAST allows to arrange the particles data in a ROOT TTree It provides some basic graphical tools such as ROOT plots or histograms. Bernardino Spisso, INFN
Bernardino Spisso, INFN ASTERICS & KM3NeT Conclusions CORELib is flexible: several models used to provide simulations CORELib is “plug-and-play”: common data formats, immediate usage CORELib is open-access: SFTP with common user/pwd CORELib is extensible: we provide the full set of parameters, so if other Collaborations or institutions want to add datasets, they can do with/without overlap CORELib is in the spirit of ASTERICS: a tool needed by KM3NeT, whose features have been extended to prove useful to many people in the community (e.g., Cherenkov radiation and high inclination would not be needed by KM3NeT) Bernardino Spisso, INFN
Acknowledgements ASTERICS & KM3NeT Thank you for the attention. H2020-Astronomy ESFRI and Research Infrastructure Cluster (Grant Agreement number: 653477). Thank you for the attention. Bernardino Spisso, INFN