EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino, Domenico Giordano, Maria Girone, Elisa Lanciotti, Daniele Spiga on behalf of CERN-IT-ES-VOS and SA3 EGI Technical Forum – Data management highlights
EGI-InSPIRE RI Outline Introduction: WLCG today LHCb Accounting Storage Element and File Catalogue consistency ATLAS Distributed Data Management: Breaking cloud boundaries CMS Popularity and Automatic Site Cleaning Conclusions EGI Technical Forum – Data management highlights
EGI-InSPIRE RI Outline Introduction: WLCG today LHCb Accounting Storage Element and File Catalogue consistency ATLAS Distributed Data Management: Breaking cloud boundaries CMS Popularity and Automatic Site Cleaning Conclusions EGI Technical Forum – Data management highlights
EGI-InSPIRE RI WLCG today EGI Technical Forum – Data management highlights experiments ALICE ATLAS CMS LHCb Over 140 sites ˜150k CPU cores >50 PB disk Few thousand users O(1M) file transfers/day O(1M) jobs/day
EGI-InSPIRE RI Outline Introduction: WLCG today LHCb Accounting Storage Element and File Catalogue consistency ATLAS Distributed Data Management: Breaking cloud boundaries CMS Popularity and Automatic Site Cleaning Conclusions EGI Technical Forum – Data management highlights
EGI-InSPIRE RI LHCb Accounting EGI Technical Forum – Data management highlights Reports used are currently the main input for clean-up campaigns Agent that on a daily basis generates an accounting report based on the information available on the book-keeping system Metadata breakdown Location Data type Event type File type Display information in dynamic web-page
EGI-InSPIRE RI Outline Introduction: WLCG today LHCb Accounting Storage Element and File Catalogue consistency ATLAS Distributed Data Management: Breaking cloud boundaries CMS Popularity and Automatic Site Cleaning Conclusions EGI Technical Forum – Data management highlights
EGI-InSPIRE RI Storage element and file catalogue consistency Grid Storage Elements (SEs) are decoupled from the File Catalogue (FC). Inconsistencies can arise: 1.Dark data: Waste of disk space 1.Dark data: Data in the SEs, but not in the FC. Waste of disk space 2.Lost/corrupted files: Operational problems, e.g. failing jobs 2.Lost/corrupted files: Data in the FC, but not in the SEs. Operational problems, e.g. failing jobs Dark datafull storage dumpsDark data is identified through consistency checks using full storage dumps one common format and procedureNeed one common format and procedure that covers various SEs: DPM, dCache, StoRM and CASTOR three experiments: ATLAS, CMS and LHCb Decision Text format and XML format Required information is: Spacetoken, LFN (or PFN), file size, creation time and checksum The storage dump should be provided on a weekly/monthly basis or on demand EGI Technical Forum – Data management highlights
EGI-InSPIRE RI Example of good synchronization: LHCb storage usage at CNAF CNAF provides storage dumps daily Checks are done centrally with LHCb Data Management tools Good SE-LFC agreementGood SE-LFC agreement Preliminary results: EGI Technical Forum – Data management highlights Small discrepancies (O(1TB)) are not a real problem. They can be due to a delay between uploading to the SE and registration to LFC and delay to refresh the information in the LHCb database
EGI-InSPIRE RI Outline Introduction: WLCG today LHCb Accounting Storage Element and File Catalogue consistency ATLAS Distributed Data Management: Breaking cloud boundaries CMS Popularity and Automatic Site Cleaning Conclusions EGI Technical Forum – Data management highlights
EGI-InSPIRE RI Original data distribution model Hierarchical tier organization based on Monarc network topology Developed over a decade ago Sites are grouped into clouds for organizational reasons Possible communications: Optical Private Network T0-T1 T1-T1 National networks Intra-cloud T1-T2 Restricted communications: General public network Inter-cloud T1-T2 Inter-cloud T2-T2 But the network capabilities are not the same anymore! Many use-cases require breaking these boundaries! EGI Technical Forum – Data management highlights
EGI-InSPIRE RI Machinery in place 12 Purpose: Generate full mesh transfer statistics for monitoring, site commissioning and to feed back the system EGI Technical Forum – Data management highlights
EGI-InSPIRE RI Consequences Link commissioning –Sites optimizing network connections E.g. UK experience –Revealed different network issues E.g. asymmetric network throughput for various sites (affecting also other experiments) Definition of T2Ds: “Directly connected T2s” Commissioned sites with good network connectivity These sites benefit from closer transfer policies Gradual flattening of the ATLAS Computing Model in order to reduce limitations on –Dynamic data placement –Output collection of multi-cloud analysis Current development of generic, detailed FTS monitor –FTS servers publishing file level information (CERN-IT-GT) –Expose info through generic web interface and API (CERN-IT-ES) EGI Technical Forum – Data management highlights
EGI-InSPIRE RI Outline Introduction: WLCG today LHCb Accounting Storage Element and File Catalogue consistency ATLAS Distributed Data Management: Breaking cloud boundaries CMS Popularity and Automatic Site Cleaning Conclusions EGI Technical Forum – Data management highlights
EGI-InSPIRE RI CMS Popularity EGI Technical Forum – Data management highlights In order to understand how to manage storage more efficiently, it is important to know what data (i.e. which files) is being accessed most and what are the access patterns 30PB of files 50 sites The CMS Popularity service now tracks the utilization of 30PB of files over more than 50 sites CRAB CMS distributed analysis framework CRAB CMS distributed analysis framework Input files Input Blocks LumiRanges Dashboard DB Pull and translate jobs to file level entities Popularity DB Popularity information Popularity web frontend 15 External systems (e.g. cleaning agent) External systems (e.g. cleaning agent)
EGI-InSPIRE RI CMS Popularity Monitoring EGI Technical Forum – Data management highlights
EGI-InSPIRE RI Automatic site cleaning Victor Group pledges & PheDEX Popularity service & PheDEX PheDEX 1. Selection of groups filling their pledge on T2s 2. Selection of unpopular replicas 3. Publication of decisions Used&pledged space Replica popularity Space information Replicas to delete Deleted replicas, Group-site association information Popularity Web Agent running daily on a dedicated machine Project initially developed for ATLAS, now extended for CMS Plug-in architecture Common core Experiment specific plug-ins wrapping their Data Management API calls Project initially developed for ATLAS, now extended for CMS Plug-in architecture Common core Experiment specific plug-ins wrapping their Data Management API calls EGI Technical Forum – Data management highlights Equally important to know what data is not accessed! Automatic procedures for site clean up
EGI-InSPIRE RI Outline Introduction: WLCG today LHCb Accounting Storage Element and File Catalogue consistency ATLAS Distributed Data Management: Breaking cloud boundaries CMS Popularity and Automatic Site Cleaning Conclusions EGI Technical Forum – Data management highlights
EGI-InSPIRE RI Conclusions First 2 years of data taking experiences on the LHC were successful Data volumes and user activity keep increasing We are learning how to operate the infrastructure efficiently Common challenges for all experiments Automate daily operations Optimize the usage of the storage and network resources Evolving computing models Improving data placement strategies 19 EGI Technical Forum – Data management highlights