Kevin Thaddeus Flood University of Wisconsin

Slides:

Advertisements

Similar presentations

Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.

Advertisements

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.

1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.

“Managing a farm without user jobs would be easier” Clusters and Users at CERN Tim Smith CERN/IT.

Storage Issues: the experiments’ perspective Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 9 September 2008.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Software GCSE COMPUTING.

1 Dynamic Application Installation (Case of CMS on OSG) Introduction CMS Software Installation Overview Software Installation Issues Validation Considerations.

CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.

CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

1 MONGODB: CH ADMIN CSSE 533 Week 4, Spring, 2015.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

Karsten Köneke October 22 nd 2007 Ganga User Experience 1/9 Outline: Introduction What are we trying to do? Problems What are the problems? Conclusions.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.

PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,

LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.

Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.

BIG DATA/ Hadoop Interview Questions.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI MPI on the grid:

ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow.

The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.

Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)

Lesson 9: SOFTWARE ICT Fundamentals 2nd Semester SY

Daniele Bonacorsi Andrea Sciabà

Matt Lemons Nate Mayotte

The Beijing Tier 2: status and plans

Xiaomei Zhang CMS IHEP Group Meeting December

LCG Service Challenge: Planning and Milestones

2. OPERATING SYSTEM 2.1 Operating System Function

Operating System.

ATLAS Cloud Operations

ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.

The Status of Beijing site, and CMS local DBS

Chapter 2: System Structures

Proposal for obtaining installed capacity

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

THE OPERATION SYSTEM The need for an operating system

Artem Trunov and EKP team EPK – Uni Karlsruhe

Simulation use cases for T2 in ALICE

LQCD Computing Operations

Survey on User’s Computing Experience

US CMS Testbed.

CernVM Status Report Predrag Buncic (CERN/PH-SFT).

CPU efficiency Since May CMS O+C has launched a dedicated task force to investigate CMS CPU efficiency We feel the focus is on CPU efficiency is because.

TYPES OFF OPERATING SYSTEM

Lecture Topics: 11/1 General Operating System Concepts Processes

Chapter 2: Operating-System Structures

The CMS Beijing Site: Status and Application

Software - Operating Systems

LO3 – Understand Business IT Systems

Chapter 2: Operating-System Structures

Production Manager Tools (New Architecture)

Presentation transcript:

Kevin Thaddeus Flood University of Wisconsin Tier-2 Workshop Kevin Thaddeus Flood University of Wisconsin 19 April 2018

Tier-2 Workshop Tier-2 workshop on Thursday 16:00-18:00 (40-2-A01) http://indico.cern.ch/conferenceDisplay.py?confId=37243 Users/Admins roundtable discussion Identify/address problems Share solutions Standardize “best practices” Several EWK users/admins shared their experiences Frank Wuerthwein Dmytro Kovalskyi Ezio Torassa Pablo Garcia Kalanand Mishra Thanks! 19 April 2018

EWK user experience is generally very positive Tier-2 User Feedback EWK user experience is generally very positive Able to successfully run/manage large numbers of jobs, merge output files, efficiently generate physics submit jobs through both CRAB and local batch manager publish output datasets to DBS T2s have good login pools, large RAM, fast (and multi-core) machines shorten compile/link times PhEDex dataset transfer tools easy to use Fast network links between user sites and T2s 19 April 2018

Tier-2 User Feedback Some issues still to be addressed Batch priorities sometimes seem out of balance, what is optimal assignment of batch slots when competing tasks collide Widely varying job failure rates across sites Job failure rate scales with number of input events, perhaps due to memory leaks with more failures where less available memory resource – more of a problem with earlier releases before 2_1_X Difficult to debug failures in writing output files for remote jobs back to user facility (FNAL user problem, so likely T2 issue) No efficient/easy method for user file transfer offsite from T2 login pool disk “copy [file] into my /store/user at UCSD, then lcg-cp it into castor, and back out onto my desktop [at CERN]” Grid tools for managing /store/user are difficult, much easier when possible to manage area interactively from login pool No standardized Linux installation at T2s favorite/useful tools missing in some cases out-of-date/insecure tools in some cases (Firefox 1.5!!) 19 April 2018

Tier-2 Admin Issues Need standard procedures to link PAG/POG users to T2 admins for routine and special communication Something more than announcement/complaint lists Publishing datasets works well for storage management, but unpublished datasets provide unregulated storage “backdoor” “30% of the used disk space in Legnaro is written with this "back door". Is difficult to follow what people [are] doing, it will be a hard work to clean DBS and files when this usage will become too large.” No good replication/backup options for dCache There are two but don’t scale well to large numbers Connecting problematic batch jobs (e.g., large wall time, small CPU cycles) to particular users can be difficult Could use more info on how to use VOMRS/siteDB 19 April 2018

Tier-2 PAG/POG Convenor Feedback Lots of spare CPU cycles available for private production but data management tools difficult for users/admins Analysis vs production queues need balance in favor of physics users rather than production in Wisconsin, production sleeps when there are pending analysis jobs Need management tools for PAG/POG convenors available/consumed storage reports generated periodically and at thresholds (80/90/95/99% full) formal accounting for the officially allocated storage space (exact specification of what counts against storage cap) 19 April 2018