PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop, 29.11.07.

Slides:



Advertisements
Similar presentations
CWG10 Control, Configuration and Monitoring Status and plans for Control, Configuration and Monitoring 16 December 2014 ALICE O 2 Asian Workshop
Advertisements

“Managing a farm without user jobs would be easier” Clusters and Users at CERN Tim Smith CERN/IT.
GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
Cluster Computing and Genetic Algorithms With ClusterKnoppix David Tabachnick.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Staging to CAF + User groups + fairshare Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE Offline week,
Neng XU University of Wisconsin-Madison X D.  This instruction is for beginners to setup and test an Xrootd/PROOF pool quickly.  Following up each step.
By: Paul Hill Technology Coordinator Gwinn Area Community Schools.
Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK Dan Protopopescu Glasgow, UK.
University of Illinois at Urbana-Champaign NCSA Supercluster Administration NT Cluster Group Computing and Communications Division NCSA Avneesh Pant
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Introduction to HP Availability Manager.
MySQL and GRID Gabriele Carcassi STAR Collaboration 6 May Proposal.
Grid Developers’ use of FermiCloud (to be integrated with master slides)
Ideas for a virtual analysis facility Stefano Bagnasco, INFN Torino CAF & PROOF Workshop CERN Nov 29-30, 2007.
1 Part III: PROOF Jan Fiete Grosse-Oetringhaus – CERN Andrei Gheata - CERN V3.2 –
Introduction to AFS IMSA Intersession 2003 AFS Servers and Clients Brian Sebby, IMSA ‘96 Copyright 2003 by Brian Sebby, Copies of these.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
Quattor-for-Castor Jan van Eldik Sept 7, Outline Overview of CERN –Central bits CDB template structure SWREP –Local bits Updating profiles.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
ALICE Use of CMF (CC) for the installation of OS and basic S/W OPC servers and other special S/W installed and configured by hand PVSS project provided.
1 Network Information System (NIS). 2 Module – Network Information System (NIS) ♦ Overview This module focuses on configuring and managing Network Information.
CERN IT Department CH-1211 Genève 23 Switzerland t Load Testing Dennis Waldron, CERN IT/DM/DA CASTOR Face-to-Face Meeting, Feb 19 th 2009.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
CERN IT Department CH-1211 Genève 23 Switzerland t DBA Experience in a multiple RAC environment DM Technical Meeting, Feb 2008 Miguel Anjo.
EGEE is a project funded by the European Union under contract IST VO box: Experiment requirements and LCG prototype Operations.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS XROOTD news New release New features.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
ANALYSIS TOOLS FOR THE LHC EXPERIMENTS Dietrich Liko / CERN IT.
ORACLE & VLDB Nilo Segura IT/DB - CERN. VLDB The real world is in the Tb range (British Telecom - 80Tb using Sun+Oracle) Data consolidated from different.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Page 1 Monitoring, Optimization, and Troubleshooting Lecture 10 Hassan Shuja 11/30/2004.
Status of AliEn2 Services ALICE offline week Latchezar Betev Geneva, June 01, 2005.
Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
Good user practices + Dynamic staging to a CAF cluster Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CUF,
VMware Certified Professional 6-Data Center Virtualization Beta 2V0-621Exam.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
AAF tips and tricks Arsen Hayrapetyan Yerevan Physics Institute, Armenia.
Managing Large Linux Farms at CERN OpenLab: Fabric Management Workshop Tim Smith CERN/IT.
Virtual machines ALICE 2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes)
Lyon Analysis Facility - status & evolution - Renaud Vernet.
Servizi core INFN Grid presso il CNAF: setup attuale
Use of HLT farm and Clouds in ALICE
Report PROOF session ALICE Offline FAIR Grid Workshop #1
Status of the CERN Analysis Facility
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
GSIAF "CAF" experience at GSI
PES Lessons learned from large scale LSF scalability tests
Simulation use cases for T2 in ALICE
AliEn central services (structure and operation)
Alice Software Demonstration
PROOF - Parallel ROOT Facility
Grid Management Challenge - M. Jouvin
Presentation transcript:

PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,

PROOF Cluster Management in ALICE - Jan Fiete Grosse-Oetringhaus2 CAF Test Setup Test setup since May 2006 –40 machines, 2 CPUs each, 200 GB disk –Normal lxbatch machines (taken out of ALICE's LSF share) –5 machines as development partition –35 machines as production partition Machines are a xrootd disk pool –xrootd redirector and PROOF master on head node –Other nodes are xrootd disk servers and PROOF slaves

PROOF Cluster Management in ALICE - Jan Fiete Grosse-Oetringhaus3 Installation ROOT (including xrootd) installed as RPM (deployed by IT, QUATTOR) –tar files for quick updates –ROOT version including sources also installed on AFS As reference For users running on lxplus AliRoot built against ROOT version on CAF, installed to AFS

PROOF Cluster Management in ALICE - Jan Fiete Grosse-Oetringhaus4 Configuration For each partition (development, production) –xrootd/PROOF configuration (3 files) on AFS (public, no token needed!) –Staging script + alien api commands (runs on all slaves) on AFS –AliEn user certificate deployed to each machine (to read files from AliEn) Starting/ Stopping of services –Using wassh tool

PROOF Cluster Management in ALICE - Jan Fiete Grosse-Oetringhaus5 Monitoring Usual lemon monitoring by IT ALICE monitoring powered by MonaLISA Machines –CPU, Memory, Network –Alerts Queries  CPU quotas Staging  Disk quotas Detailed query monitoring (to be enabled in query) –CPU, disk used by single queries –Network traffic correlation

PROOF Cluster Management in ALICE - Jan Fiete Grosse-Oetringhaus6 Experience xrootd/olbd configuration sometimes requires a bit of trying, calm reflection and thinking outside of the box  in the end everything is logical Certain turn-around time for RPMs, therefore "manual" deployment is advisable at the beginning Restarting kills all user sessions –Makes updates difficult in a production system Conclusion: Administration and user support always takes more time than one thinks…