Integration Program Update Rob Gardner 3-2-09 US ATLAS Tier 3 Workshop OSG All LIGO.

Slides:



Advertisements
Similar presentations
Campus Grids & Campus Infrastructures Community Rob Gardner Computation Institute / University of Chicago July 17, 2013.
Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Integrating Network and Transfer Metrics to Optimize Transfer Efficiency and Experiment Workflows Shawn McKee, Marian Babik for the WLCG Network and Transfer.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
OSG Area Coordinators Campus Infrastructures Update Dan Fraser Miha Ahronovitz, Jaime Frey, Rob Gardner, Brooklin Gore, Marco Mambelli, Todd Tannenbaum,
Connect.usatlas.org ci.uchicago.edu ATLAS Connect Technicals & Usability David Champion Computation Institute & Enrico Fermi Institute University of Chicago.
ATLAS federated xrootd monitoring requirements Rob Gardner July 26, 2012.
Integration and Sites Rob Gardner Area Coordinators Meeting 12/4/08.
CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.
K. De UTA Grid Workshop April 2002 U.S. ATLAS Grid Testbed Workshop at UTA Introduction and Goals Kaushik De University of Texas at Arlington.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Concept: Well-managed provisioning of storage space on OSG sites owned by large communities, for usage by other science communities in OSG. Examples –Providers:
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.
Efi.uchicago.edu ci.uchicago.edu FAX meeting intro and news Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Federated Xrootd.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
From the Transatlantic Networking Workshop to the DAM Jamboree to the LHCOPN Meeting (Geneva-Amsterdam-Barcelona) David Foster CERN-IT.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
OSG Integration Activity Report Rob Gardner Leigh Grundhoefer OSG Technical Meeting UCSD Dec 16, 2004.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
LCG CCRC’08 Status WLCG Management Board November 27 th 2007
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
Alex Read, Dept. of Physics Grid Activities in Norway R-ECFA, Oslo, 15 May, 2009.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Data Management: US Focus Kaushik De, Armen Vartapetian Univ. of Texas at Arlington US ATLAS Facility, SLAC Apr 7, 2014.
Jan 2010 OSG Update Grid Deployment Board, Feb 10 th 2010 Now having daily attendance at the WLCG daily operations meeting. Helping in ensuring tickets.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
OSG Report for DOE/NSF Joint Oversight Group U.S. Large Hadron Collider Program OSG Report for DOE/NSF Joint Oversight Group U.S. Large Hadron Collider.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
Tier 3 Support and the OSG US ATLAS Tier2/Tier3 Workshop at UChicago August 20, 2009 Marco Mambelli –
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
PanDA & Networking Kaushik De Univ. of Texas at Arlington ANSE Workshop, CalTech May 6, 2013.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
Using Check_MK to Monitor perfSONAR Shawn McKee/University of Michigan North American Throughput Meeting March 9 th, 2016.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
Update on CHEP from the Computing Speaker Committee G. Carlino (INFN Napoli) on behalf of the CSC ICB, October
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Activities and Perspectives at Armenian Grid site The 6th International Conference "Distributed Computing and Grid- technologies in Science and Education"
Ruth Pordes, March 2010 OSG Update – GDB Mar 17 th 2010 Operations Services 1 Ramping up for resumption of data taking. Watching every ticket carefully.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
Grid Colombia Workshop with OSG Week 2 Startup Rob Gardner University of Chicago October 26, 2009.
PanDA & Networking Kaushik De Univ. of Texas at Arlington UM July 31, 2013.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Computing Operations Roadmap
WLCG Network Discussion
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
U.S. ATLAS Grid Production Experience
Data Challenge with the Grid in ATLAS
Presentation transcript:

Integration Program Update Rob Gardner US ATLAS Tier 3 Workshop OSG All LIGO

Overview Phased program of work for the US ATLAS Facility Establishing a baseline set of deliverables that integrates fabric resources, ATLAS software and services, grid level services, operational components, and user tools Launched June 2007 – Now in Phase 8 Focus is primarily on US ATLAS Tier 1 and Tier 2 facilities with some participation from (Tier 2-like or Tier 2-satellite) Tier 3 sites /IntegrationProgram.html /IntegrationProgram.html 3/2/092

Integration Program - Tasks 3/2/093

Integration Program - Sites Phase 7 site certification table 3/2/094

Coordination 3/2/095 Facility Analysis Performance Tuesday: Bi-weekly 10:30 am Eastern Facility Analysis Performance Tuesday: Bi-weekly 10:30 am Eastern Wednesday Computing Operations and Integration meeting Weekly 1 pm Eastern Wednesday Computing Operations and Integration meeting Weekly 1 pm Eastern Throughput Performance Tuesday: Weekly 3pm Eastern Throughput Performance Tuesday: Weekly 3pm Eastern Data Management Tuesday: Weekly 4pm Eastern Data Management Tuesday: Weekly 4pm Eastern

Analysis Queue Performance Background – We are beginning to see Tier 2 analysis jobs that stress storage systems in ways unlike production jobs – Many analysis workflows General objectives – Support testing and validation of analysis queues – Address issues uncovered during functional tests in ATLAS – Measure analysis queue performance in a number of metrics Overarching goals – is to assess the facility readiness for analysis workloads in terms of scale: response to large numbers of I/O intensive jobs stability and reliability of supporting services (gatekeepers, doors, etc) Physics-throughput (user’s latency and efficiency) 3/2/096

Working group Organized bi-weekly meeting with representatives from sites, software, operations, physics Not a general analysis user-support meeting – However will hear reports of user problems and track those indicating systematic problems in the infrastructure (facilities or distributed services) Feedback to site administrators, ADC development & operations 3/2/097

Queue Metrics For a specified set of validated job archetypes… – DnPD making for Tier 3 export – TAG-based analysis for Tier 3 export – Others as appropriate – Data access methods direct read from storage copy to local work scratch …define queue metrics such as: – Completion times for reference benchmarks – Local I/O access performance – Elapsed user response times for a reference job set Submission to queued Time in queue Total elapsed to completion 3/2/098

Measurement Measurement modality: – Regular testing (robot-like) – Stress testing – Response versus scale (for example, queue response to 10/100/200/500/1000 job sets of various types, I/O intensive) to determine breaking points Deliverables (preliminary) – March 15: Specify and test a well-defined set of job archetypes representing likely user analysis workflows hitting the analysis queues. – March 31: Specify set of queue metrics and test measurements made – April 15: Facility-wide set of metrics taken 3/2/099

Throughput & Networking Led by Shawn Implement a consistent perfSONAR infrastructure for USATLAS – Include “mesh-like” testing between Tier-n sitesb. – Improve access to perfSONAR data (new frontpage for perfSONAR, Jay’s additions for graphing/display/monitoring) Document existing monitoring (Netflow, FTS, ATLAS daskboard, dCache monitoring) Implement standard dataset for disk-to-disk tests between BNL and the Tier-n sites Work with I2 and Esnet for dedicated T1-T2 circuits Continue establishing performance benchmarks – 200/400 MB/s T1  T2, >GB/sT1  multi-T2 Iterate on all the above as needed 3/2/0910

Data Management Led by Kaushik Storage validation – Consistency checking filesystem  LFC  DQ2 catalog – Standardize cleanup tools & methods Storage reporting (ATLAS, WLCG) Data placement policies, token management and capacity planning Data deletion policies Data transfer problems Dataset distribution latency 3/2/0911

Challenge Summary (incomplete!) Analysis – Establishing performance at scale – Very dependent on storage services: developments in dCache, xrootd will be very important Throughput – Monitoring infrastructure and performance benchmarks Data management – Standardizing management tools and placement/deletion policies, capabilities, performance Tier 3 – Exporting datasets to Tier 3s efficiently 3/2/0912