Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

Slides:

Advertisements

Similar presentations

Archive Task Team (ATT) Disk Storage Stuart Doescher, USGS (Ken Gacke) WGISS-18 September 2004 Beijing, China.

Advertisements

12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002.

NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.

1 Recap (RAID and Storage Architectures). 2 RAID To increase the availability and the performance (bandwidth) of a storage system, instead of a single.

K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.

IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.

Storage Area Network (SAN)

Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005.

1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Managing Storage Lesson 3.

PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.

ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.

08/06/00 LHCb(UK) Meeting Glenn Patrick LHCb(UK) Computing/Grid: RAL Perspective Glenn Patrick Central UK Computing (what.

9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.

LHCb Applications and GRID Integration Domenico Galli Catania, April 9, st INFN-GRID Workshop.

BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Alain Romeyer - 15/06/20041 CMS farm Mons Final goal : included in the GRID CMS framework To be involved in the CMS data processing scheme.

BINP/GCF Status Report BINP LCG Site Registration Oct 2009

D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,

Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

Status of LHCb-INFN Computing CSN1, Catania, September 18, 2002 Domenico Galli, Bologna.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.

1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.

ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.

T. Bowcock Liverpool Sept 00. Sept LHCb-GRID T. Bowcock 2 University of Liverpool Successes Issues Improving the system Comments.

Sandor Acs 05/07/

1 U.S. Department of the Interior U.S. Geological Survey Contractor for the USGS at the EROS Data Center EDC CR1 Storage Architecture August 2003 Ken Gacke.

Computing for LHCb-Italy Domenico Galli, Umberto Marconi and Vincenzo Vagnoni Genève, January 17, 2001.

International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.

21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.

RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.

Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Introduction to NW-GRID R.J. Allan CCLRC Daresbury Laboratory.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.

IDE disk servers at CERN Helge Meinhard / CERN-IT CERN OpenLab workshop 17 March 2003.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.

LCG LCG-1 Deployment and usage experience Lev Shamardin SINP MSU, Moscow

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

Computer and Network Infrastructure for the LHCb RTTC Artur Barczyk CERN/PH-LBC RTTC meeting,

The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK

Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.

1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.

Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.

10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.

Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Enterprise Vitrualization by Ernest de León. Brief Overview.

Dynamic Extension of the INFN Tier-1 on external resources

RHEV Platform at LHCb Red Hat at CERN 17-18/1/17

IT-DB Physics Services Planning for LHC start-up

LHCb Distributed Computing and the Grid V. Vagnoni (INFN Bologna)

Status of LHCb-INFN Computing

The LHCb Computing Data Challenge DC06

Presentation transcript:

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002

Outline  Currently available resources  Farm configuration  Performance  Scalability of the system (in view of the DC)  Resources Foreseen for the DC  Grid middleware issues  Conclusions

Current resources  Core system (hosted in two racks at INFN-CNAF) 56 CPUs hosted in Dual Processor machines (18 PIII 866 MHz + 32 PIII 1 GHz + 6 PIII Tualatin 1.13 GHz), 512 MB RAM 2 Network Attached Storage systems 1 TB in RAID5, with 14 IDE disks + hot spare 1 TB in RAID5, with 7 SCSI disks + hot spare 1 Fast Ethernet switch with Giga Uplink. Ethernet Controlled power distributor for remote power cicle  Additional resources by INFN-CNAF 42 CPUs in dual Processor machines (14 PIII 800 MHz, 26 PIII 1 GHz, 2 PIII Tualatin 1.13 GHz)

Farm Configuration (I)  Diskless processing nodes with OS centralized on a file server (Root over NFS) It makes trivial the introduction or removal of a node in the system, i.e. no need of software installation on local disks Grants easy interchange or CEs in case of shared resources (e.g. among various experiments), and permits dynamical allocation of the latter without additional work Very stable! No real drawback observed in about 1 year of run  Improved security Usage of private network IP addresses and Ethernet VLAN High level of isolation Access to external services (afs, mccontrol, bookkeeping db, servlets of various kinds, …) provided by means of NAT technology on the GW Most important critical systems (Single Points of Failure), but not everything actually, made redundant Two NAS in the core system with RAID5 redundancy GW and OS server: operating systems installed on two RAID1 disks (Mirroring)

Farm Configuration (II)

Fast ethernet switch NAS 1TB Ethernet controlled power distributor (32 channels) Rack (1U dual-processor MB)

Performance  System has been fully integrated in the LHCb MC production since August 2001  20 CPUs until December, 60 CPUs until last week, 100 CPUs now  Produced mostly bb inclusive DST2 with the classic detector (SICBMC v234 and SICBDST v235r4, 1.5 M) + some 100k channel data sets for LHCb light studies  Typically roughly 20 hours needed on a 1 GHz PIII for the full chain (minbias RAWH + bbincl RAWH + bbincl piled up DST2) for 500 events Farm capable of producing about (500 events/day)*(100 CPUs)=50000 events/day, i.e events/week, i.e. 1.4 TB/week (RAWH + DST2)  Data transfer to CASTOR at CERN realized with standard ftp (15 Mbit/s over available bandwidth of 100 Mbit/s), but tests with bbftp reached very good troughput (70 Mbit/s) Still waiting for IT to install a bbftp server at CERN

Scalability  Production tests made these days with 82 MC processes running in parallel Using the two NAS systems independently (instead to share the load between them) Each NAS worked at 20% of full performance, i.e. each of them can be scaled up much more than a factor 2 Distributing the load we are pretty sure this system can handle more than 200 CPUs working at the same time at 100% (i.e. without bottlenecks) For the analysis we want to test other technologies We plan to test a fibre channel network (SAN, Storage Area Network) on some of our machines, with nominal 1 Gbit/s bandwidth to fibre channel disk arrays

Resources for the DC  Additional resources by INFN-CNAF foreseen for the DC period  We’ll join the DC with order of CPUs (around 1 GHz or more), 5 TB of disk storage and a local tape storage system (CASTOR like? Not yet officially decided)  Still need some work to make the system fully redundant

Grid issues (A. Collamati)  2 nodes reserved at the moment for tests on GRID middleware  The two nodes form a minifarm, i.e. they have exactly the same configuration as the production nodes (one master node and one slave node) and can run MC jobs as well  Globus has been installed and first trivial tests on job submission through PBS were successful  Test job submission via globus on large scale by extending the PBS queue of the globus test farm to all our processing nodes No interference with the distributed production working system

Conclusions  Bologna is ready to join the DC with a reasonable amount of resources  Scalability tests were successful  The farm configuration is pretty stable  We need the bbftp server installed at CERN to fully exploit WAN connectivity and throughput  We are waiting for the decision of the DC period by CERN for the final allocation of INFN-CNAF resources  Work on GRID middleware started, first results are encouraging  We plan to install Brunel ASAP