LHCONE Workshop Richard P Mount February 10, 2014 Concerns from Experiments ATLAS Richard P Mount SLAC National Accelerator Laboratory.

Slides:

Advertisements

Similar presentations

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling Squillante & Lazowska, IEEE TPDS 4(2), February 1993.

Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.

Glenn Patrick Rutherford Appleton Laboratory GridPP22 1 st April 2009.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

ALICE data access WLCG data WG revival 4 October 2013.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.

Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)

Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.

Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606

LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.

PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.

Analysis in STEP09 at TOKYO Hiroyuki Matsunaga University of Tokyo WLCG STEP'09 Post-Mortem Workshop.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Changes in PD2P replication strategy S. Campana (CERN IT/ES) on.

Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

PD2P The DA Perspective Kaushik De Univ. of Texas at Arlington S&C Week, CERN Nov 30, 2010.

Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.

PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.

EGI-InSPIRE EGI-InSPIRE RI DDM solutions for disk space resource optimization Fernando H. Barreiro Megino (CERN-IT Experiment Support)

CBM Computing Model First Thoughts CBM Collaboration Meeting, Trogir, 9 October 2009 Volker Friese.

Predrag Buncic Future IT challenges for ALICE Technical Workshop November 6, 2015.

PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,

The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.

The Network & ATLAS Workshop on transatlantic networking panel discussion CERN, June Kors Bos, CERN, Geneva & NIKHEF, Amsterdam ( ATLAS Computing.

ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.

Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.

U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.

Ian Bird WLCG Networking workshop CERN, 10 th February February 2014

Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.

Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.

ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.

WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.

WLCG November Plan for shutdown and 2009 data-taking Kors Bos.

Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.

Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

PD2P Planning Kaushik De Univ. of Texas at Arlington S&C Week, CERN Dec 2, 2010.

PD2P, Caching etc. Kaushik De Univ. of Texas at Arlington ADC Retreat, Naples Feb 4, 2011.

Data Distribution Performance Hironori Ito Brookhaven National Laboratory.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Solutions for WAN data access: xrootd and NFSv4.1 Andrea Sciabà.

Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.

ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon

Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.

29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.

LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.

ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

Data Challenge with the Grid in ATLAS

Status and Prospects of The LHC Experiments Computing

The Data Lifetime model

Readiness of ATLAS Computing - A personal view

The ADC Operations Story

ATLAS Sites Jamboree, CERN January, 2017

Status of ALICE pledges Predrag Buncic

Nuclear Physics Data Management Needs Bruce G. Gibbard

Roadmap for Data Management and Caching

Welcome to the DØRACE Workshop

Presentation transcript:

LHCONE Workshop Richard P Mount February 10, 2014 Concerns from Experiments ATLAS Richard P Mount SLAC National Accelerator Laboratory

LHCONE Workshop Richard P Mount February 10, ATLAS Computing at the Start of Run 1 Distribute data to WLCG sites according to a policy approved by a committee: 2.5 disk copies of the ESD at Tier 1s 2.5 disk copies of the ESD at Tier 2s 2 primary + 8 secondary disk copies of AOD etc. Send Grid jobs to the data

LHCONE Workshop Richard P Mount February 10, ATLAS Disk Space Usage – Early Run 1 RUN 1

LHCONE Workshop Richard P Mount February 10, ATLAS Disk Space Usage – The Crisis RUN 1 Initial data distribution

LHCONE Workshop Richard P Mount February 10, ATLAS Disk Space Usage – The Solution PD2P Initial data distribution PanDA Dynamic Data Distribution (PD2P): Suppress most policy-based replication to T2s Replicate datasets to T2 when they are in demand at T1s Re-broker jobs from T1 queues to T2 queues when the data they need arrives at T2s Physicist experience: nobody noticed!

LHCONE Workshop Richard P Mount February 10, ATLAS Disk Space Usage – Network Impact PD2P Initial data distribution

LHCONE Workshop Richard P Mount February 10, PD2P: Dataset Reuse in 2012 – Qualified Success

LHCONE Workshop Richard P Mount February 10, Recent ATLAS DDM Operations Life without ESD Regular T1 disk crises T1 disks almost filled with data marked as “primary” October 2013 C-RSG “whilst we welcome the more aggressive policy for the deletion of unused data, we think that, given the volume of unread data and the cost of disk, unused space could be recovered more aggressively and more of the disk-resident data at T2s placed under ‘load on demand’ management.” To survive (not just pacify the C-RSG) we have to aggressively delete (or not replicate) little-used data

LHCONE Workshop Richard P Mount February 10, Disk Cost – the Driver of Network Use For 30 years we saw bytes/CHF rise by a factor of two every 18 months or less.

LHCONE Workshop Richard P Mount February 10, 2014 The Past: Exponential growth of CPU, Storage, Networks 10

LHCONE Workshop Richard P Mount February 10, Disk Cost – the Driver of Network Use For 30 years we saw bytes/CHF rise by a factor of two every 18 months or less. Disk seems to have reached the end of the road with current technology. New technologies are said to be almost production ready but: Expect modest growth (factor 2 every 4 years?) in bytes/CHF We already spend more on disks than CPU “Solution”: Rely more on the network for just-in-time or real- time data distribution.

LHCONE Workshop Richard P Mount February 10, 2014 FAX: Federated ATLAS Xrootd 12 Access any ATLAS data by name from T3, T2, T1 without first copying the data

LHCONE Workshop Richard P Mount February 10, ATLAS DDM Throughput 2009 to Now

LHCONE Workshop Richard P Mount February 10, Concerns Annecdotal evidence of throughput limitations 10 Gb/s becoming marginal for large T2s Not just the network: Local network File-server hardware Increasing reliance on rapid replication Global cost-optimization needed: CPUrecalculate or store Disk/Tapestore or recalculate Networktransfer or store

LHCONE Workshop Richard P Mount February 10, Implications for the Network Massive, policy-driven, predictable data distribution will continue, but growth will be modest. Bursty traffic (there are idle CPUs in xxx so replicate some data from yyy as quickly as possible) will become very important. Real time remote access to data will become important: ATLAS does not yet fully understand how network bandwidth and latency will constrain this access It won’t be used where it doesn’t work well!