Overview BINP is contributing to all the activities of ATLAS Trigger/DAQ SysAdmin Group since 2007: – D.Popov (2007-2008, 1 visit) – A.Zaytsev (2008-2010,

Slides:

Advertisements

Similar presentations

Alessandro Di Girolamo CERN IT-SDC-OL  Many reports in the last weeks: 4 th July ATLAS DAQ/HLT Software and Operations (

Advertisements

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.

Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.

LCG-France Project Status Fabio Hernandez Frédérique Chollet Fairouz Malek Réunion Sites LCG-France Annecy, May

S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.

Development of the System remote access real time (SRART) at JINR for monitoring and quality assessment of data from the ATLAS LHC experiment (Concept.

IHEP Site Status Jingyan Shi, Computing Center, IHEP 2015 Spring HEPiX Workshop.

S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.

Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.

Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.

First year experience with the ATLAS online monitoring framework Alina Corso-Radu University of California Irvine on behalf of ATLAS TDAQ Collaboration.

CERN Cloud Infrastructure Report 2 Bruno Bompastor for the CERN Cloud Team HEPiX Spring 2015 Oxford University, UK Bruno Bompastor: CERN Cloud Report.

Deploying Moodle with Red Hat Enterprise Virtualization Brian McSpadden Director of Network Operations Remote-Learner.net.

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.

Term 2, 2011 Week 3. CONTENTS The physical design of a network Network diagrams People who develop and support networks Developing a network Supporting.

Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.

Promoting Open Source Software Through Cloud Deployment: Library à la Carte, Heroku, and OSU Michael B. Klein Digital Applications Librarian

Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Site Networking Anna Jordan April 28, 2009.

BINP/GCF Status Report BINP LCG Site Registration Oct 2009

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

Novosibirsk BINP contribution to ATLAS in 2009 and plans for 2010.

SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.

ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.

Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)

Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.

Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.

Dec07-02: Prototype Parking Meter Phase 8 Bret Schuring: Team Leader Pooja Ramesh: Communications Wilson Kwong, Matt Swanson, Alex Wernli.

ALMA Archive Operations Impact on the ARC Facilities.

KISTI-GSDC SITE REPORT Sang-Un Ahn, Jin Kim On the behalf of KISTI GSDC 24 March 2015 HEPiX Spring 2015 Workshop Oxford University, Oxford, UK.

Operations Kendall Reeves. Past Year Commissioned new SDX1 cooling loop after removing foam insert inadvertently left inside (Denis and Joel) Commissioned.

The DCS lab. Computer infrastructure Peter Chochula.

EO Ground Segment and Data Access Evolution Alice Springs Refurbishment Program Presented by Simon Oliver for John Frampton.

PI in a Modern Power Plant – American National Power, Inc. PI User Conference ‘03 Presented by: Brian M. Wood, American National Power, Inc More Uses Than.

Reliability of KLOE Computing Paolo Santangelo for the KLOE Collaboration INFN LNF Commissione Scientifica Nazionale 1 Roma, 13 Ottobre 2003.

CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.

Julia Andreeva on behalf of the MND section MND review.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.

MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.

RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June

ATLAS M&O A: Collaborative Tools S. Goldfarb (5 June 2013)

CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.

14 November 08ELACCO meeting1 Alice Detector Control System EST Fellow : Lionel Wallet, CERN Supervisor : Andre Augustinus, CERN Marie Curie Early Stage.

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1 how to profit of the ATLAS HLT farm during the LS1 & after Sergio Ballestrero.

Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

Dave Newbold, University of Bristol14/8/2001 Testbed 1 What is it? First deployment of DataGrid middleware tools The place where we find out if it all.

Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team

Update on CHEP from the Computing Speaker Committee G. Carlino (INFN Napoli) on behalf of the CSC ICB, October

Instituto de Biocomputación y Física de Sistemas Complejos Cloud resources and BIFI activities in JRA2 Reunión JRU Española.

Farming Andrea Chierici CNAF Review Current situation.

2007/05/22 Integration of virtualization software Pierre Girard ATLAS 3T1 Meeting

Production System 2 manpower and funding issues Alexei Klimentov Brookhaven National Laboratory Aug 19, 2013 Production System Technical Meeting CERN.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Contents Software components All users in one location:

Building a Virtual Infrastructure

Update on Plan for KISTI-GSDC

Luca dell’Agnello INFN-CNAF

Welcome! Thank you for joining us. We’ll get started in a few minutes.

The CCIN2P3 and its role in EGEE/LCG

Статус ГРИД-кластера ИЯФ СО РАН.

FY09 Tactical Plan Status Report for Site Networking

HEPiX Fall 2017 CERN project Follow-up

Exam VCE Questions

Managing Services with VMM and App Controller

Ivan Reid (Brunel University London/CMS)

Presentation transcript:

Overview BINP is contributing to all the activities of ATLAS Trigger/DAQ SysAdmin Group since 2007: – D.Popov ( , 1 visit) – A.Zaytsev ( , 2 visits up to now) – A.Korol ( , 1 visits up to now) – A.Bogdanchikov ( , 2 visits up to now) The contribution includes: – Support of the existing TDAQ environment (> 1500 servers, > 200 racks of equipment in SDX1 and USA15, ATLAS Main Control Room and Satellite Control Room equipment) – Support of ATLAS Point 1 users (> 3000 users, > 300 user roles) – Development of various system administration tools for internal use within the group – Building and validating hardware solutions for future use in the ATLAS TDAQ environment – Taking part in 24-hours TDAQ SysAdmin shifts (since mid-summer 2008) 06/2010BINP Contribution to ATLAS TDAQ2

IT Centre LHC Point1 Lab40 Lab32 SysAdmins Lab4 06/2010BINP Contribution to ATLAS TDAQ3

06/2010BINP Contribution to ATLAS TDAQ4

ATLAS-Novosibirsk Group Novosibirsk 11 June, 2009 ATLAS Point 1 Computing Facilities

SysAdmin Group Evolution ( ) Nominal amount of resources assigned to the team: 10 FTE (stabilized for ) Minimum number of people ever observed in the team: 4 (2009Q1) Present situation: 10 sysadmins working on site plus sysadmins on remote sites (Pakistan + Russia: BINP only) – 3 shifts per month per person in the average Two rotation cycles are now established: – 3 people in the loop (BINP: the 2 nd cycle ongoing, remote operations are allowed) – 10 people in the loop (Pakistan: 1 st cycle ongoing, remote operations not allowed) 80% of the team is renewed since 2007 No more than 30% of staff renewal is expected in /2010BINP Contribution to ATLAS TDAQ6

Previous Achievements (2009) Migration of the ATLAS Gateways to the new servers provided with XEN based virtualization solution: – Initial deployment is performed in 2008Q4 – Migration was finalized in 2009Q2-3 Implementation of bulk server firmware upgrade tools for the netbooted nodes deployed in ATLAS Point 1: – Successfully applied in 2008Q4 for upgrading of more than 1000 nodes installed in SDX1 Deployment and support of ATLAS Remote Monitoring servers: – Evaluation of commercial and free NX servers and the SGD (Sun Global Desktop) based solutions for ATLAS remote monitoring infrastructure Implementation of monitoring and accounting data analysis tools based on ROOT toolbox which were successfully applied in 2008Q4-2009Q2 for – ATLAS DCS and Nagios RRD temperature data analysis for SDX1 – ATLAS Gateway accounting system data visualization Contributing to everyday activities of the group including ATLAS TDAQ SysAdmin shifts since Sep 2008 & taking part in multiple hardware maintenance operations in SDX1 and ATLAS Control Room 7

Recent Achievements (2010Q1-2) Major upgraded of the ATLAS Remote Monitoring nodes: – Reinstalling the nodes under SLC5.4 x86_64 – The current installation is fully documented Supporting the ATLAS P1 Gateways and Remote Monitoring nodes: – Keeping the nodes up-to-date – Adding more functionality and increasing the reliability of these subsystems – Getting through the highest peaks of user activity, e.g. the recent LHC media day (Mar 30, 2010) smoothly Continuing to contribute to everyday activities on supporting the ATLAS TDAQ computing environment over the period of LHC data taking Providing ATLAS TDAQ SysAdmins Team with the virtualized nodes used for testing solutions for a new components, e.g.: – New ATLAS P1 webservers, – Tools for deploying the nodes of ATLAS HLT farm (BWM, Quattor/Puppet), etc. Taking part in commissioning of the new ATLAS TDAQ HLT computing hardware to be deployed in Point1 in 2010Q3 – 10 racks of equipment (new high density computing nodes) – Adding more than 5000 CPU cores to the ATLAS HLT computing farm (SDX1)

New High Density Machines for HLT Farm New HLT racks: 95 boxes – one 2Us box has 4 motherboards – 10 x HLT rack – 80 boxes – 15 extras for ONL/MON, LFSes, replacements Overall Dell chassis features: – 4 CPU Sockets/1U – 16 real CPU cores/1U – 32 CPU threads/1U – 64 GB RAM/1U – 1 kW/1U ($300/CPU thread) 9

Areas of Our Responsibility ( ) Support/Maintenance (since 2009) – ATLAS P1 Gateways (‘atlasgw’) – Preseries Gateways (‘preseriesgw’) – ATLAS RMON Infrastructure (‘pc-atlas-rmon’) Development/Validation (added in 2010) – ATCN Test VM Box (test webservers, LFC, Puppet, ClamAV) – GPN Test VM Box (test public webserver, Puppet, upgraded BWM infrastructure VMs) Future prospects (starting from 2010Q3) – Put virtualized BWM infrastructure to production – Virtualization of Lab32 (for sake of compactification) – Virtualization of ATLAS TDAQ MON subsystem (archiving higher stability) – Load balancing solutions for Point1 proxy and webservers (archiving better handling of the peak load) 06/2010BINP Contribution to ATLAS TDAQ10

Generic Milestones in Past (up to 2010Q2) – ATLAS RMONs reinstallation under SLC5 (Feb 2010) – LHC Media Day (Mar 30, 2010): continuous data taking period begins, no more intensive development allowed – ATLAS P1 Gateways upgrade (new VM image, Apr 2010) – ATLAS P1 Gateways proxy authentication schema upgrade (migration to NTLM, May 2010) – Recovery from 18 kV power line failure (end of May 2010) Near Future (2010) – “LHC First Heavy Ion Physics” Public Event (?) – Put extra 5000 CPU cores to production in HLT farm (SDX1) – Put ConfDB UI v2.0 in production – Migrating to the new ATLAS Point1 webservers – Christmas Shutdown Distant Future (2011) – Put and improved access manager into production – Replacing extender solution for the ACR (?) – LHC long term shutdown in the end of /2010BINP Contribution to ATLAS TDAQ11

Talks and Conference Contributions ( ) ATLAS TDAQ Week, 2008Q4 2008Q2 CHEP2009 Poster Contribution, Mar 2009 published

Talks and Conference Contributions (2010) ICSOFT2010 Poster Contribution, Jul 2010 CHEP2010, Oct 2010 (not yet accepted) accepted

Questions & Discussion