CMS Issues. Background – RAL Infrastructure TM Nsd Xrd- mgr TM Nsd Xrd- mgr TM Rhd stagerd TGW Rhd stagerd TGW Cupv Vmgr Vdqm nsd Cupv Vmgr Vdqm nsd Cupv.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Tier-1 Evolution and Futures GridPP 29, Oxford Ian Collier September 27 th 2012.
Operating System.
Distributed Xrootd Derek Weitzel & Brian Bockelman.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Part II Web Performance Modeling: basic concepts © 1998 Menascé & Almeida. All Rights Reserved.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Advanced Distributed Software Architectures and Technology group ADSaT 1 Application Architectures Ian Gorton, Paul Greenfield.
Introduction Optimizing Application Performance with Pinpoint Accuracy What every IT Executive, Administrator & Developer Needs to Know.
ICOM 6115: COMPUTER SYSTEMS PERFORMANCE MEASUREMENT AND EVALUATION Nayda G. Santiago August 18, 2006.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
RAL Site Report Castor F2F, CERN Matthew Viljoen.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
11-July-2008Fabrizio Furano - Data access and Storage: new directions1.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
RAL Site Report Castor Face-to-Face meeting September 2014 Rob Appleyard, Shaun de Witt, Juan Sierra.
Scheduling Lecture 6. What is Scheduling? An O/S often has many pending tasks. –Threads, async callbacks, device input. The order may matter. –Policy,
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
WLCG Service Report ~~~ WLCG Management Board, 9 th August
Get Rid of Cron Scripts Using Events Sheeri Cabral Senior DB Admin/Architect,
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Configuring AAA requires four basic steps: 1.Enable AAA (new-model). 2.Configure security server network parameters. 3.Define one or more method lists.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
GGUS Slides for the 2012/07/24 MB Drills cover the period of 2012/06/18 (Monday) until 2012/07/12 given my holiday starting the following weekend. Remove.
Arne Wiebalck -- VM Performance: I/O
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Physics Analysis inside the Oracle DB Progress report 10 Octobre 2013.
Slide 1/29 Informed Prefetching in ROOT Leandro Franco 23 June 2006 ROOT Team Meeting CERN.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Future Plans at RAL Tier 1 Shaun de Witt. Introduction Current Set-Up Short term plans Final Configuration How we get there… How we plan/hope/pray to.
Version Control and SVN ECE 297. Why Do We Need Version Control?
CSE 451: Operating Systems Winter 2015 Module 25 Virtual Machine Monitors Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
Stephen Gowdy FNAL 9th Feb 2015CMS Computing Model Simulation 1.
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
ATLAS FroNTier cache consistency stress testing David Front Weizmann Institute 1September 2009 ATLASFroNTier chache consistency stress testing.
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
Run-time Adaptation of Grid Data Placement Jobs George Kola, Tevfik Kosar and Miron Livny Condor Project, University of Wisconsin.
(re)-Architecting cloud applications on the windows Azure platform CLAEYS Kurt Technology Solution Professional Microsoft EMEA.
Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.
REMINDER Check in on the COLLABORATE mobile app Best Practices for Oracle on VMware - Deep Dive Darryl Smith Chief Database Architect Distinguished Engineer.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
Any Data, Anytime, Anywhere Dan Bradley representing the AAA Team At OSG All Hands Meeting March 2013, Indianapolis.
Federating Data in the ALICE Experiment
Mohit Aron Peter Druschel Presenter: Christopher Head
Operating System.
James Casey, IT-GD, CERN CERN, 5th September 2005
1 VO User Team Alarm Total ALICE ATLAS CMS
Thoughts on Computing Upgrade Activities
Architecture Background
Ákos Frohner EGEE'08 September 2008
CTA: CERN Tape Archive Overview and architecture
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
Presentation transcript:

CMS Issues

Background – RAL Infrastructure TM Nsd Xrd- mgr TM Nsd Xrd- mgr TM Rhd stagerd TGW Rhd stagerd TGW Cupv Vmgr Vdqm nsd Cupv Vmgr Vdqm nsd Cupv Vmgr nsd Cupv Vmgr nsd Common Layer Instance Headnodes Diskservers (x20) CASTOR XROOT

Background – xroot infrastructure Diskservers (x20) Xroot manager( ) Xroot redirector (4.X) European redirector1 Global redirectors Local WNs The Grid

The Problem…s Pileup workflow –Local jobs had 95% failure rate –Jobs that managed to run had only 30% efficiency AAA failure –Despite being the second site to integrate into AAA –100% failure for periods of 30 minutes to several days

Tackling the Problems

Pileup Broken Down Data accessed through xroot >95% of data at RAL Two problems in one –Slow opening times (15->600 secs) –Slow transfers rates –100% CPU WIO

Slow Opening Times No obvious place –Delays at all phases –Almost all DB time spent in SubRequestToDo

Solution 1 (aka Go Faster Stripes Solution

Database Surgery DBMS_ALERT suspect to add to delays under load –Modified DB code to sleep for 50 ms (limiting rate to 20ms for subreqtodo) Tested on preprod (functionally) –Improved open time from 3-15 secs to 0-5 secs Deployed on all instances Made NO difference for CMS problem 

Solution 2 (aka The Heart Bypass Solution)

Bypassing Scheduler Modified xroot to disable scheduling RISK –nothing restricting access to disk server –ONLY applied to CMS RESULT –Open times reduced to 1-30 seconds –WIO still flatlining at 100% ‘SUCCESS’

Improving IO Difficult to test –Could not generate artificially –Needed pileup workflow to be executing Testing on production ;) Did ‘the usual’ –Reducing allowed connections –Throttling batch jobs

Solution 3 (aka The Don’t Do This Solution) Change UNIX scheduler –Now easy and can be done in-situ Four schedulers (plus options) –Cfq (default), anticipatory, deadline, noop –Plus associated config Switched to noop –WIO dropped to 60% –Network rate increased 4x

XROOT Problems

Observations Random Failures (or more correctly random successes) Local access was OK (if slow – see previous) Lack of visibility up the hierarchy didn’t help – REALLY difficult to debug

Investigating the Problem Set up parallel infrastructure –Replicate manager, RAL redirector and European redirector Immediately saw the same issue…

Causes of Failure… Caching! –Cmsd and xrootd timed out at different times –Xroot can return ENOENT, but later cmsd gets response, and subseq access work –If cmsd doesn’t get a response, all future requests get ENOENT But why the slow response…?

Log Mining… Each log looked like performance was good Part of problem –Time resoln in xroot 3.3.X –And logging generally Finally found delays in ‘local’ nsd –Processing time was good –But delays in servicing requests

Solution – RAL Infrastructure TM Nsd Xrd- mgr TM Nsd Xrd- mgr TM Rhd stagerd TGW Rhd stagerd TGW EU Redirectors The Grid RAL Diskservers (x20) CASTOR XROOT Global Redirectors Nsd Xrd- mgr Nsd Xrd- mgr Xroot redirector (4.X) Local WNs Remote WNs