AMOD Report Doug Benjamin Duke University. Running Jobs last 7 days 120K MC sim Users MC Rec Group.

Slides:



Advertisements
Similar presentations
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Advertisements

A new standard in Enterprise File Backup. Contents 1.Comparison with current backup methods 2.Introducing Snapshot EFB 3.Snapshot EFB features 4.Organization.
Skyward Disaster Recovery Options
Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.
ATLAS computing in Geneva 268 CPU cores (login + batch) 180 TB for data the analysis facility for Geneva group grid batch production for ATLAS special.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
Proxy servers By Akshit Y10. What is a proxy server O A proxy server is a computer that offers a computer network service to allow clients to make indirect.
FileSecure Implementation Training Patch Management Version 1.1.
AMOD Report Doug Benjamin Duke University. Hourly Jobs Running during last week 140 K Blue – MC simulation Yellow Data processing Red – user Analysis.
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 22 th July 2008.
AMOD Report Simone Campana CERN IT-ES. Grid Services A very good week for sites – No major issues for T1s and T2s The only one to report is
AMOD Weekly report (Ale, Alexei, Jarka) Doug Benjamin (AMOD shadow)
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
UNIT - 1Topic - 2 C OMPUTING E NVIRONMENTS. What is Computing Environment? Computing Environment explains how a collection of computers will process and.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
GGUS summary ( 4 weeks ) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals 1.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI AMOD report – Fernando H. Barreiro Megino CERN-IT-ES-VOS.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
AMOD Report October 22-28, 2012 Torre Wenaus With thanks to Alexei Sedov, shadow shifter October 30, 2012.
Tier 3 Computing Doug Benjamin Duke University. Tier 3’s live here Atlas plans for us to do our analysis work here Much of the work gets done here.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
An Agile Service Deployment Framework and its Application Quattor System Management Tool and HyperV Virtualisation applied to CASTOR Hierarchical Storage.
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
WLCG Service Report ~~~ WLCG Management Board, 9 th August
FYP Briefing Presentation Building an Efficient IaaS: - Let’s become experts in cloud computing! April 15, 2010.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
ATLAS XRootd Demonstrator Doug Benjamin Duke University On behalf of ATLAS.
GGUS Slides for the 2012/07/24 MB Drills cover the period of 2012/06/18 (Monday) until 2012/07/12 given my holiday starting the following weekend. Remove.
PD2P The DA Perspective Kaushik De Univ. of Texas at Arlington S&C Week, CERN Nov 30, 2010.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE1102 ATLAS CMS LHCb Totals
Arne Wiebalck -- VM Performance: I/O
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
CERN IT Department CH-1211 Genève 23 Switzerland t Experiment Operations Simone Campana.
Conclusions on Monitoring CERN A. Read ADC Monitoring1.
RCF Status Extended outage of the Mass Storage System (HPSS) last Wednesday –Latest transaction logs of namespace DB were erroneously deleted in the production.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
Dario Barberis: ATLAS DB S&C Week – 3 December Oracle/Frontier and CondDB Consolidation Dario Barberis Genoa University/INFN.
ADC Operations Shifts J. Yu Guido Negri, Alexey Sedov, Armen Vartapetian and Alden Stradling coordination, ADCoS coordination and DAST coordination.
Computing Operations Roadmap
BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes
Elizabeth Gallas - Oxford ADC Weekly September 13, 2011
ATLAS activities in the IT cloud in April 2008
Grid status ALICE Offline week Nov 3, Maarten Litmaath CERN-IT v1.0
The ADC Operations Story
Brookhaven National Laboratory Storage service Group Hironori Ito
TYPES OF SERVER. TYPES OF SERVER What is a server.
AliEn central services (structure and operation)
Presentation transcript:

AMOD Report Doug Benjamin Duke University

Running Jobs last 7 days 120K MC sim Users MC Rec Group

DDM activity last 7 days 40 TB

FT3-Pilot and Functional test issues On going FT3-Pilot problems (GGUS:97359 and GGUS:97419) effect Functional tests to all sites. o Var area full on single FT3-Pilot machine o Functional test served by FT3-Pilot o Rucio testing uses same machinery o Immediate Issue solved o Additional resources requested FT3-Pilot (GGUS:97359) Problem with a cached proxy affected functional tests to all sites. (solved) Functional Tests to Tier 1 sites stopped for a couple of days – Santa Claus needed to be restarted o Wednesday - Network intervention likely the cause (next slide) o Service restored over the weekend

Wednesday Network router upgrages Wednesday (18-Sep) – various core routers were upgraded – Outages were plan to be sporadic and brief. (finished by 10:00 am) But…. Several redundant routers were simultaneously upgraded instead of upgraded in series. Net result – o Site level monitoring frozen and offline o Many VM’s not accessable o Lxvoadm – group of machines used to access critical VM in ATLAS distributed computing machines not accessible until 2 hours after planned outage time. o Santa Claus – in stopped state but SLS monitoring was green (after it had been restored).

Lost files – AFS issues Triumf – many 10K’s lost during storage system migration – exact extent being determined. AFS - ~13:03 on 19-Sept spurious rm process on /afs/cern.ch/atlas/offline/* removed RW areas including panda client areas needed by Hammer Cloud. Computing operations restored the needed area from tape promptly when alerted 20-Sept. Exact cause of rm is unknown.(INC:388802) ATLAS investigation continues. o Users and Hammer Cloud affected o Panda Client code removed o Various areas restored from tape

Thanks Thanks to the ADCOS shifters and experts who help report and debug the issues during the week Thanks to ATLAS central operations for recovering the unexpected outages on Wednesday Thanks to CERN IT staff who help restore services Special thanks Ale DiG. Whose patience with DB is always appreciated especially on the weekend.