SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

Slides:



Advertisements
Similar presentations
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Advertisements

DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.
ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
ALICE data access WLCG data WG revival 4 October 2013.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.
DDM-Panda Issues Kaushik De University of Texas At Arlington DDM Workshop, BNL September 29, 2006.
Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Data and Storage Evolution in Run 2 Wahid Bhimji Contributions / conversations / s with many e.g.: Brian Bockelman. Simone Campana, Philippe Charpentier,
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
IN2P3Creating Federated Data Stores for the LHC Summary Andrew Hanushevsky SLAC National Accelerator Laboratory September 13-14, 2012 IN2P3,
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Pre-GDB 2014 Infrastructure Analysis Christian Nieke – IT-DSS Pre-GDB 2014: Christian Nieke1.
Update on replica management
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.
Federated Data Stores Volume, Velocity & Variety Future of Big Data Management Workshop Imperial College London June 27-28, 2013 Andrew Hanushevsky, SLAC.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 
XRootD Monitoring Report A.Beche D.Giordano. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Any Data, Anytime, Anywhere Dan Bradley representing the AAA Team At OSG All Hands Meeting March 2013, Indianapolis.
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Grid Operations in Germany T1-T2 workshop 2015 Torino, Italy Kilian Schwarz WooJin Park Christopher Jung.
Federating Data in the ALICE Experiment
WLCG IPv6 deployment strategy
Global Data Access – View from the Tier 2
Future of WAN Access in ATLAS
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
The ADC Operations Story
WLCG Collaboration Workshop;
Grid Canada Testbed using HEP applications
Any Data, Anytime, Anywhere
Presentation transcript:

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC

SLACFederated Storage Workshop ALICE (Alien)  Long history of using federated storage  Catalog driven read-write exclusively XRootD federation  Though other protocols supported for various types of files  Heavy production usage  Past 6 months: 12PB written (0.85GB/s) & 126PB read back (8.4GB/s)  Small reads cause a large WAN I/O performance hit.  Local storage avg: 1.44 MB/s & Remote storage avg: 0.54 MB/s  Average CPU efficiency: 55%  Waiting for XRootD R4 with planned migration 2

SLACFederated Storage Workshop ATLAS (FAX)  Rocky start due to LFC bottleneck in Name2Name translation  Now, sites use Rucio name convention so very stable  47 sites deployed  Hammer Cloud testing is an ongoing activity  Architectural issues as SLAC & SWT will be fixed for improvement  PANDA now enabled for FAX failover  Saves a significant number of jobs that would otherwise fail  Continuous monitoring and improvement whenever possible  This is essentially production now 3

SLACFederated Storage Workshop CMS (AAA)  Took a lot of work, especially to improve WAN I/O  Required for a workable large-scale federation  Today over 80% of T1/T2 sites are federated  Those that are not are poorly performing sites, so OK  Federated access at the same I/O level as bulk file transfer  Used for fallback and small-scale opportunistic scheduling  Read-only now looking to see how to do read-write 4

SLACFederated Storage Workshop COEPP (Australia)  Initially read-only federation  Problems:  WAN transfer rates problematic when TTree cache is turned off  Caching is problematic using FRM facility  Federated write access UK  Impact on infrastructure is an open topic  Need to research and understand the access patterns  Doing stress tests via hammer cloud  jobs work fine on well connected sites 5

SLACFederated Storage Workshop Analysis  Predict impact of federated access on EOS  Understand relationship between I/O and CPU  Propose events/sec as efficiency measure  Does it matter as users don’t seem to care as long as it runs  E.G. not turning on vector reads, Ttree-cache, etc  Impact of fail over on CERN is minimal (<1%)  More of a CPU issue as more people take up slots 6

SLACFederated Storage Workshop 7 Developing a cost matrix is essential  Cost of data transfer from any endpoint pair  Essential for scheduler to use this to not over-commit a site  May require some kind of throttling anyway Monitoring data rapidly increasing  1 TB of monitoring data collected so far  The increasing rate and variety is causing scaling issues  Provides wealth of information so new uses being developed  File popularity, space resource usage, etc

SLACFederated Storage Workshop 8 Effective access requires retooling the application  Latency ranges from 30 to 300 ms Caching is a must but roots’s TTreeCache is not always sufficient  During training (first 20) every object is read with a separate network access  A typical CMS 1000 branch file -> 20 minutes training time  At 130 ms latency WAN file access takes too much time  Solution is bulk loading then training, reduces startup time  Bulk training yields an order of magnitude improvement  Changes will be integrated into the core by the root team

SLACFederated Storage Workshop 9 Crucial to understand  Develop new computing models and caching proxies  Can be used as a simulation for a caching proxy  Can be used to improve CPU efficiency Data access in user areas is least predictable Needs to periodically redone because it changes  Sensitive to data formats

SLACFederated Storage Workshop Federated file open rate  Generally, will scale to CMS needs  File open needs to be 100HZ but tested to 200HZ  However, very sensitive to SE type  Investigation on the way as to why such high variability Read rate  US sites do well  However large variability even with similar SE’s  This may be due to local load 10

SLACFederated Storage Workshop Panda  Finding a spot for federated storage in an exabyte environment  Rescuing failed transfers  Done and working well  Mitigating bottlenecks in job queues near local data  Opportunistic scheduling (alpha testing successful)  Significantly reduced wait time  Actively using federated storage for transfers  May need to throttle transfer rate  Already doing this by limiting the number of jobs  Many other possibilities 11

SLACFederated Storage Workshop 12 HTTP plug-in for XRootD  Additional access mode (HTTP/HTTPS/WebDav) Full plugin environment in XRootD client  Additional application-specific processing modes Caching XRootD Proxy Server  Reduce network latency  Automatically manage site storage  Repairing local storage HTTP based federations  Another way of accessing federated storage

SLACFederated Storage Workshop Target is January 2015  UK or Rome 13