The Data Lifetime model

Slides:



Advertisements
Similar presentations
Backing up and Archiving Data Chapter 1. Introduction This presentation covers the following: – What is backing up – What is archiving – Why are both.
Advertisements

Backups Backups are essential for recovering from – mistakes deleting a file by accident making changes to a document or file that turn out to be undesirable.
Optimizing Windows There are several ways to optimize (perform regular maintenance) Windows to keep it performing smoothly and quickly. Most of these discussed.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Tier-0: Preparations for Run-2 Armin NAIRZ (CERN) ADC Technical Interchange Meeting Chicago, 29 October 2014.
AMOD Report Doug Benjamin Duke University. Hourly Jobs Running during last week 140 K Blue – MC simulation Yellow Data processing Red – user Analysis.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
FZU participation in the Tier0 test CERN August 3, 2006.
11/10/2015S.A.1 Searches for data using AMI October 2010 Solveig Albrand.
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Event Data History David Adams BNL Atlas Software Week December 2001.
А.Минаенко Совещание по физике и компьютингу, 03 февраля 2010 г. НИИЯФ МГУ, Москва Текущее состояние и ближайшие перспективы компьютинга для АТЛАСа в России.
1618Tb SDDS storage BE-CO-IN. Proposal Upgrade operational LHC_DATA to 1.1 Tbytes with the same operational level : – New faster technology (SAS disks.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Changes in PD2P replication strategy S. Campana (CERN IT/ES) on.
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
US ATLAS Computing Operations Kaushik De University of Texas At Arlington U.S. ATLAS Tier 2/Tier 3 Workshop, FNAL March 8, 2010.
EGI-InSPIRE EGI-InSPIRE RI DDM Site Services winter release Fernando H. Barreiro Megino (IT-ES-VOS) ATLAS SW&C Week November
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
PD2P The DA Perspective Kaushik De Univ. of Texas at Arlington S&C Week, CERN Nov 30, 2010.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
Why should I run Disk Cleanup Remove temporary Internet files Remove temporary Internet files Delete downloaded program files, such as Microsoft ActiveX.
Data Management: US Focus Kaushik De, Armen Vartapetian Univ. of Texas at Arlington US ATLAS Facility, SLAC Apr 7, 2014.
EGI-InSPIRE EGI-InSPIRE RI DDM solutions for disk space resource optimization Fernando H. Barreiro Megino (CERN-IT Experiment Support)
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
LHCbComputing Resources requests : changes since LHCb-PUB (March 2013) m Assume no further reprocessing of Run I data o (In.
SLPs, Almost a Beginner’s Guide Scott Holowinski Senior Backup Administrator OneNeck IT Services.
NA62 computing resources update 1 Paolo Valente – INFN Roma Liverpool, Aug. 2013NA62 collaboration meeting.
The Network & ATLAS Workshop on transatlantic networking panel discussion CERN, June Kors Bos, CERN, Geneva & NIKHEF, Amsterdam ( ATLAS Computing.
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Disk Space publication Simone Campana Fernando Barreiro Wahid.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
ATLAS Distributed Computing in LHC Run2
Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)
ALICE Computing Model A pictorial guide. ALICE Computing Model External T1 CERN T0 During pp run i (7 months): P2: data taking T0: first reconstruction.
Main parameters of Russian Tier2 for ATLAS (RuTier-2 model) Russia-CERN JWGC meeting A.Minaenko IHEP (Protvino)
LHCb Current Understanding of Italian Tier-n Centres Domenico Galli, Umberto Marconi Roma, January 23, 2001.
Good user practices + Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CUF,
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
PD2P Planning Kaushik De Univ. of Texas at Arlington S&C Week, CERN Dec 2, 2010.
PD2P, Caching etc. Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011.
LHCONE Workshop Richard P Mount February 10, 2014 Concerns from Experiments ATLAS Richard P Mount SLAC National Accelerator Laboratory.
LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Computing Operations Roadmap
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Click Here Click Here >> Click Here >>
Simone Campana CERN-IT
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
Simone Campana CERN IT-ES
David Adams Brookhaven National Laboratory September 28, 2006
ATLAS activities in the IT cloud in April 2008
Bernd Panzer-Steindel, CERN/IT
AMI – Status November Solveig Albrand Jerome Fulachier
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
ALICE Computing Model in Run3
Status of ALICE pledges Predrag Buncic
jeudi 13 septembre 2018jeudi 13 septembre 2018 CMS Tapes Farida Fassi
ALICE Computing Upgrade Predrag Buncic
ATLAS STEP09 UK T2 Activity
Page Replacement.
Roadmap for Data Management and Caching
ATLAS DC2 & Continuous production
The ATLAS Computing Model
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

The Data Lifetime model Simone Campana CERN Simone Campana – ATLAS Week, October 2014 08/10/14

The new data lifecycle model Every dataset will have a lifetime set at creation The lifetime can be infinite (e.g. RAW data) The lifetime can be extended E.g. if the dataset is recently accessed. Or if there is a known exception Every dataset will have a retention policy E.g. RAW need at least 2 copies on tape. Need at least one copy of AODs on tape. Lifetime being agreed with ATLAS Computing Resources management Simone Campana – ATLAS Week, October 2014 08/10/14

Effect of the data lifecycle model Datasets with expired lifetime can disappear at any time from (data)disk and datatape groupdisk and localgroupdisk exempt “Organized” expiration lists will be distributed to groups ATLAS Distributed Computing will flexibly manage data replication and reduction Within the boundaries of lifetime and retention For example Increase/reduce the number of copies based on data popularity Re-distribute data at T2s rather than T1s and viceversa Move data to tape and free up disk space Simone Campana – ATLAS Week, October 2014 08/10/14

Apply lifetime: what I would do in DQ2 Write a script and get a list of all ATLAS datasets once a month. With creationtime and atime. Apply the lifetime policy for each project/type as defined by CREM Get a list of datasets to delete, group them by project/type Email the lists to physics groups Delete after 4 weeks if no feedback (hide after 2 weeks) If there is feedback, whitelist the complains Simone Campana – ATLAS Week, October 2014 08/10/14

Flexibly manage replication/reduction I would keep little but some planned replication I would keep PD2P I would run periodic (monthly?) rearrangement scripts Taking into account retention policy/lifetime/popularity Examples follow Simone Campana – ATLAS Week, October 2014 08/10/14

Simone Campana – ATLAS Week, October 2014 Example: EVNTs Retention policy: do not lose Keep at least 2 copies on disk (no tape) Lifetime: 4 years Pre-placement: 4 copies at T1s PD2P: none Reduction: if not accessed in 1 year, reduce to 2 copies at T1s If not accessed in 2 years, reduce to 2 copies (T1 and T2) Simone Campana – ATLAS Week, October 2014 08/10/14

Simone Campana – ATLAS Week, October 2014 Example: data AODs Retention policy: do not lose Keep at least 1 copy on tape Lifetime: 2 years Pre-placement: 1 copy on tape, 2 copies at T1s, 2 copies at T2s PD2P: replicate at T2s if used (tuning PD2P algorithm) Reduction: if not accessed in 6 months, reduce to 1 tape + 1 T1 disk copy if not accessed in 12months, reduce to 1 tape + 1 disk copy if not accessed in 18months, reduce to 1 tape copy Simone Campana – ATLAS Week, October 2014 08/10/14

Simone Campana – ATLAS Week, October 2014 Backup Simone Campana – ATLAS Week, October 2014 08/10/14

What happens if we set today the lifetime to the unused data? Impact: gain of space What happens if we set today the lifetime to the unused data? Examples from my favorite types: ESDs: we have life w/o ESD, but some are immortal RDOs: produced on demand, deleted on demand (never) HITS: as they are on tape, why bother cleaning? Datatype Lifetime (months) Expired data (TB) Total data (TB) ESD 12 17,327 23,262 RDO 942 2,223 HITS 24 6,163 14,938 As next step, we will do a full dry run based on the lifetimes discussed with Computing Resource Management Simone Campana – ATLAS Week, October 2014 08/10/14

Impact: staging from tape Data staged per week (TB) What happens if we remove all “unused” data from disk and keep it on tape? “unused” here = not accessed in 9 months 15TB Simulation based on last year’s data access Tape access from Reconstruction and Reprocessing in 2014 750TB We would have to restage from tape 20TB/week, compare with 1PB/week for reco/repro (2% increase). In terms of number of files, it is a 10% increase Simone Campana – ATLAS Week, October 2014 08/10/14