Storage at TIER1 CNAF Workshop Storage INFN CNAF 20/21 Marzo 2006 Bologna Ricci Pier Paolo, on behalf of INFN TIER1 Staff

Slides:

Advertisements

Similar presentations

LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.

Advertisements

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003.

INFN CNAF TIER1 Castor Experience CERN 8 June 2006 Ricci Pier Paolo

INFN-T1 site report Giuseppe Misurelli On behalf of INFN-T1 staff HEPiX Spring 2015.

1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.

BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.

Storage Survey and Recent Acquisition at LAL Michel Jouvin LAL / IN2P3

INFN – Tier1 Site Status Report Vladimir Sapunenko on behalf of Tier1 staff.

INFN Tier1 Andrea Chierici INFN – CNAF, Italy LCG Workshop CERN, March

Introducing Snap Server™ 700i Series. 2 Introducing the Snap Server 700i series Hardware −iSCSI storage appliances with mid-market features −1U 19” rack-mount.

ASGC 1 ASGC Site Status 3D CERN. ASGC 2 Outlines Current activity Hardware and software specifications Configuration issues and experience.

Soluzioni HW per il Tier 1 al CNAF Luca dell’Agnello Stefano Zani (INFN – CNAF, Italy) III CCR Workshop May

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

CASPUR SAN News Andrei Maslennikov Orsay, April 2001.

UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.

Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.

Federico Ruggieri INFN-CNAF GDB Meeting 10 February 2004 INFN TIER1 Status.

Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.

Tier1 Status Report Martin Bly RAL 27,28 April 2005.

ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.

GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh

RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.

Tier1 status at INFN-CNAF Giuseppe Lo Re INFN – CNAF Bologna Offline Week

10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.

JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.

1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009.

CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.

Storage and Storage Access 1 Rainer Többicke CERN/IT.

Fabric Monitoring at the INFN Tier1 Felice Rosso on behalf of INFN Tier1 Joint OSG & EGEE Operations WS, Culham (UK)

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Implementation of a reliable and expandable on-line storage for compute clusters Jos van Wezel.

Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.

CERN Database Services for the LHC Computing Grid Maria Girone, CERN.

CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

SA1 operational policy training, Athens 20-21/01/05 Presentation of the HG Node “Isabella” and operational experience Antonis Zissimos Member of ICCS administration.

CASTOR CNAF TIER1 SITE REPORT Geneve CERN June 2005 Ricci Pier Paolo

Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.

CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.

The Italian Tier-1: INFN-CNAF 11-Oct-2005 Luca dell’Agnello Davide Salomoni.

Database CNAF Barbara Martelli Rome, April 4 st 2006.

ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

AMS02 Data Volume, Staging and Archiving Issues AMS Computing Meeting CERN April 8, 2002 Alexei Klimentov.

The Italian Tier-1: INFN-CNAF Andrea Chierici, on behalf of the INFN Tier1 3° April 2006 – Spring HEPIX.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.

Bonny Strong RAL RAL CASTOR Update External Institutes Meeting Nov 2006 Bonny Strong, Tim Folkes, and Chris Kruk.

Status: Central Storage Services CD/LSC/CSI/CSG June 26, 2007.

PADME Kick-Off Meeting – LNF, April 20-21, DAQ Data Rate - Preliminary estimate Tentative setup: all channels read with Fast ADC 1024 samples, 12.

Storage & Database Team Activity Report INFN CNAF,

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.

status, usage and perspectives

Status and plans Giuseppe Lo Re INFN-CNAF 8/05/2007.

IT-DB Physics Services Planning for LHC start-up

Service Challenge 3 CERN

The INFN TIER1 Regional Centre

Castor services at the Tier-0

Scalable Database Services for Physics: Oracle 10g RAC on Linux

The INFN Tier-1 Storage Implementation

CC-IN2P3 Pierre-Emmanuel Brinette IN2P3-CC Storage Team

Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017

ACAT 2007 April Nikhef Amsterdam

Experience with GPFS and StoRM at the INFN Tier-1

Storage resources management and access at TIER1 CNAF

CASTOR: CERN’s data management system

Scalable Database Services for Physics: Oracle 10g RAC on Linux

Presentation transcript:

Storage at TIER1 CNAF Workshop Storage INFN CNAF 20/21 Marzo 2006 Bologna Ricci Pier Paolo, on behalf of INFN TIER1 Staff

Contents Disk SAN hardware/software status and summary Castor and castor2 hardware/software status and summary Tools for monitoring and accounting

Hardware Status Disk: FC, IDE, SCSI, NAS technologies 470 TB raw (~ 450 FC-SATA) 2005 tender: 200 TB raw (~ 2260 Euro/TB net + VAT) Additional 20% of last tender acquisition requested Tender for 400 TB (not before Fall 2006) Tape libraries: STK L TB (only used for backups) STK LTO-2 drives with 1200 tapes  240 TB B drives (+ 3 to be installed next weeks) with tapes  130 TB (260) (1.5 KEuro/TB initial cost  0.35 KEuro/TB pure tape cost)

TIER1 INFN CNAF Storage Linux SL 3.0 clients ( nodes) WAN or TIER1 LAN STK180 with 100 LTO-1 (10Tbyte Native) STK L5500 robot (5500 slots) 6 IBM LTO-2, 4 STK 9940B drives PROCOM 3600 FC NAS2 7000Gbyte PROCOM 3600 FC NAS Gbyte NAS1,NAS4 3ware IDE SAS Gbyte AXUS BROWIE About 2200 GByte 2 FC interface 2 Gadzoox Slingshot port FC Switch STK BladeStore About GByte 4 FC interfaces Infortrend 4 x 3200 GByte SATA A16F-R1A2-M1 NFS-RFIO-GridFTP oth... W2003 Server with LEGATO Networker (Backup) CASTOR HSM servers H.A. Diskservers with Qlogic FC HBA 2340 IBM FastT900 (DS 4500) 3/4 x GByte 4 FC interfaces Brocade FABRIC 2 Silkworm port FC 1 Director port FC Infortrend 5 x 6400 GByte SATA A16F-R1211-M2 + JBOD SAN 2 (40TB) SAN 1 (400TB RAW) HSM (400 TB) NAS (20TB) NFS RFIO STK FlexLine x GByte 4 FC interfaces

STK Flexline 600 in production The GPFS, dCache and CASTOR2 tests use 2 of the 4 Flexline 600 (the 1st and 2nd) with a total of 100TB RAW The other 2 are in production (CDF,BABAR): Unsupported IBM GPFS in agreement with CDF Xrootd over xfs filesystem for BABAR We have requested to the INFN the 6°/5° upgrade of the tender (48TB RAW and a fifth controller that could be expanded if needed). Investigation of a major failure (supposed loss of a 4TB RAID5) is under way. 16 Diskservers with dual Qlogic FC HBA 2340 Sun Fire U20Z dual Opteron 2.6GHZ DDR 400MHz 4 x 1GB RAM SCSIU320 2 x 73 10K RAID1 Brocade Director FC Switch (full licenced) with 64 port (out of 128) in 4 x 16 ports "blades" 4 Flexline 600 with 200TB RAW (150TB) RAID x 2GB redundand connections to the Switch

DISK access A1A2B1B2 Generic Diskserver 1U. 2 Qlogic 2300 HBA Linux CERN SL 3.0 OS WAN or TIER1 LAN FC SAN ZONED (50TB Unit with 4 Diskservers) Single server connected in different switches or blades 2 2Gb FC connections every Diskserver 50 TB IBM FastT 900 (DS 4500) Dual redundant Controllers (A,B) Internal MiniHub (1,2) 2Gb FC connections FC Path Failover HA: Qlogic SANsurfer 4 Diskservers every ~50TB 1 Controller can perform a maximum of 120/200MByte/s R-W F1F2 FARMS of rack mountable 1U biprocessors nodes (actually about 1000 nodes for 1300 KspecInt2000) Application HA: NFS server, rfio server with Red Hat Cluster AS 3.0(*) GPFS with configuration NSD Primary Secondary /dev/sdaPrimary Diskserver 1; Secondary Diskserver2 /dev/sdbPrimary Diskserver 2; Secondary Diskserver (*) tested but not actually used in production GB Eth. connections: nfs,rfio,xrootd,GPFS, GRID ftp TB Logical Disk LUN0 LUN1... LUN0 => /dev/sda LUN1 => /dev/sdb... RAID5 SAN

TIER1 CNAF SAN Disk Storage CONSOLIDATION OF PRIMARY SAN (400TB RAW) CONCLUDED Hardware based on: Brocade Switches: SAN as one single Fabric managed with a single management web tool and Fabric Manager Software for failures and performace monitoring Qlogic Qla2340 HBA: HA failover implemented with SANsurfer configuration Director with 64 2Gb/s ports (out of 128) The tender price was 1.1 KEuro/port. The price/port ratio of brocade lower-class switches would be at least 50% lower. Silkworm 3900 with 32 2Gb/s ports (currently not on the market) 2 x 2Gb/s trunked uplink 2 x 2Gb/s trunked uplink DISK STORAGE: 4 x IBM FastT900 DS 4500 (4 x 2Gb/s output for each box) 170TB => 14 primary diskservers with single HBA 4 x Flexline 600 (4 x 2Gb/s) 200TB => 16 primary diskservers with double HBAs 5 x Infortrend A16F-R1211-M2 (2 x 2Gb/s)+ JBOD 30TB => 5 primary diskservers with single HBA About 6-12 TB RAW accessed by one diskserver, depending on the fs/protocol could be enough. Other diskservers (4-8) access the SAN storage for specific uses (grid SE, Oracle RAC etc…) Fibre Channel Physical connections, failover and zoning are configured in the simplest way, traffic from diskservers remain in the local switch in most cases so uplink usage is minimized.

SAN Monitoring & Web Tools Fabric Manager Software Web Tool Management (single SAN)

SAN Expansion Silkworm 3900 with 32 2Gb/s ports 2 x 2Gb/s trunked uplink 2 x 2Gb/s trunked uplink Scenario 1) Peripheral switches, based on the Brocade Silkworm 4100 family (lower class than Director). Lower price/port compared to Fabric switches (24000 or 48000), they are fully compatible but not fully rendundand Scenario 2) Another idea could be including a new Fabric Director SilkWorm (with an expansion capability of 256 4Gb/s ports) in one of the next tenders and provide a DUAL FABRIC SAN (logically/physically divided SAN) for the best rendundancy. Then in the next years the SAN will be expanded filling up the 2 Director and using low-cost periferical switches around these two CENTRALl CORE DIRECTORS. The price/port in the Directors could increase of a factor of 2-3 compared to the Silkworm 4100 Family.

Disk storage summary Main storage (IBM Fast-900, STK FLX680) organized in one Fabric Storage Area Network (3 Brocade switches, star topology) Level-1 disk servers connected via FC Usually in GPFS cluster Easiness of administration Load balancing and redundancy Some level-2 disk servers connected to storage only via GPFS (over IP) LCG and FC dependencies on OS decoupled WNs are not members of GPFS cluster (but scalability on large number of WNs currently under investigation) Supported protocols: rfio, gridftp, xrootd (BaBar), NFS,AFS NFS used mainly for accessing experiment software - strongly discouraged for data access AFS used only by CDF for accessing experiment software We had good experience in HA for diskserver services (RedHat cluster 2.1 and 3.0) but hardware compatibility problem (for fencing nodes). We are planning to upgrade and test the latest release of 3.0 and evaluate 4.0.

Castor HSM Status At present our CASTOR ( ) system is: 1 STK L5500 SILOS partitioned with 2 form-factor slots About 2000 slots LTO-2 form About 3500 slots 9940B form 6 LTO-2 DRIVES with 2Gb/s FC interface B DRIVES with 2Gb/s FC interface 3 more in installation for Service Challenge requirement Sun Blade v100 with 2 internal ide disks with software raid-0 running ACSLS LTO-2 TAPES 240 TB = B TAPES 250TB THE SILOS COULD NOT SUPPORT THE NEXT GENERATION DRIVE T1000 (500GB)

Castor Status (2) 10 Tapeservers, 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA, STK CSC Development Toolkit provided by CERN (with licence agreement with STK) ssi,tpdaemon and rtcpd. The 8 tapeservers are direct connected direcly with the FC drive output: DRIVE LTO-2 0,0,10,0->tapesrv-0.cnaf.infn.it DRIVE LTO-2 0,0,10,1->tapesrv-1.cnaf.infn.it DRIVE LTO-2 0,0,10,2->tapesrv-2.cnaf.infn.it DRIVE LTO-2 0,0,10,3->tapesrv-3.cnaf.infn.it DRIVE LTO-2 0,0,10,4->tapesrv-4.cnaf.infn.it DRIVE LTO-2 0,0,10,5->tapesrv-5.cnaf.infn.it DRIVE 9940B 0,0,10,6->tapesrv-6.cnaf.infn.it DRIVE 9940B 0,0,10,7->tapesrv-7.cnaf.infn.it DRIVE 9940B 0,0,10,8->tapesrv-7.cnaf.infn.it DRIVE 9940B 0,0,10,9->tapesrv-7.cnaf.infn.it

Castor Status (3) castor.cnaf.infn.it Central Machine 1 IBM x345 2U machine 2x3GHz Intel Xeon, raid1 with double power supply O.S. Red Hat A.S. 3.0 Machine running all central CASTOR services (Nsdaemon, vmgrdaemon, Cupvdaemon, vdqmdaemon, msgdaemon) and the ORACLE client for the central database castor-4.cnaf.infn.it ORACLE Machine 1 IBM x345 O.S. Red Hat A.S. 3.0 Machine running ORACLE DATABASE 9.i rel 2 1 more x345 machines are in standby and are used for storing all the backup information of the ORACLE db (.exp.dbf) and can be used for replacing the above machines if needed... HA on central service or oracle machine not yet implemented (not needed so far...) castor-1.cnaf.infn.it Monitoring Machine 1 DELL 1650 R.H 7.2 Machine running monitoring CASTOR service (Cmon daemon) NAGIOS central service for monitoring and notification. Also contains the command rtstat e tpstat that are usually runned with the –S option over the tapeserver

Castor Status (4) Stagers with diskserver: 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA accessing our SAN and runnig Cdbdaemon, stgdaemon end rfiod.1 STAGER for each LHC Experiment and 2 GENERIC STAGERS disksrv-1.cnaf.infn.it ATLAS stager with 2TB direct connected disksrv-2.cnaf.infn.it CMS stager with 3.2TB direct connected disksrv-3.cnaf.infn.it LHCB stager with 3.2TB direct connected disksrv-4.cnaf.infn.it ALICE stager with 3.2TB direct connected disksrv-5.cnaf.infn.it TEST,PAMELA,ARGO stager disksrv-6.cnaf.infn.it stager with 2TB locally (archive purpose LVD,alice TOF,CDF,VIRGO,AMS,BABAR, and other HEP experiment...) Diskservers: 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA accessing our SAN and runnig rfiod.

Castor Status CASTOR-2 has been fully installed and is currently under test CASTOR-1 (production) and CASTOR-2 (test) can share the same resources and are currently “living together” in our software implementation SC will run on the CASTOR-2 environment but production activities will remain on the CASTOR-1 services After SC CASTOR-2 will be used in production and the CASTOR-1 services will be dismissed LTO-2 technology drives not usable in a real production environment with present CASTOR release. Used only for archiving copies of disk data or with a big staging (disk buffer) area which set almost at zero the tape access. In 1.5 year of activity we report that USING THE 9940B have drastically reduced the error rate (we report only 1-3% 9940 tape marked RDONLY due to SCSI error) and negligible hang problem HW problems solved using 9940B technology drives, 3 more will be installed before next SC phase (total B drives and 6 LTO-2) CASTOR development team at CERN is currently suffering a critical lack of menpower. Support is granted ONLY for TIER1s. TIER2 won’t be considered.

Castor Status STK L drives LTO2 (20-30 MB/s) 4 drives 9940B (25-30 MB/s) 1300 LTO2 (200 GB native) B (200 GB native) TOTAL CAPACITY with 200GB 250 TB LTO-2 (400TB) 260 TB 9940B (700TB) Sun Blade v100 with 2 internal ide disks with software raid-1 running ACSLS 7.0 OS Solaris CASTOR (CERN)Central Services server RH AS tapeserver Linux RH AS3.0 HBA Qlogic stager with diskserver 15 TB Local staging area EXPERIMENTStaging area (TB) Tape pool (TB) % RDONLY ALICE912(LTO-2)8% ATLAS2037(9940) 8 (LTO-2) 2% 30% CMS1222(9940)0% LHCb1843(LTO-2)10% BABAR (copy)820(LTO-2)2% CDF (copy)29(LTO-2)5% AMS35(9940)0 ARGO+oth28(9940)1% Point to Point FC 2Gb/s connections 1 ORACLE 9i rel 2 DB server RH AS 3.0 Rfio diskservers RH 3.0 staging area (variable) SAN 1 WAN or TIER1 LAN SAN 2 Indicates Full rendundancy FC 2Gb/s connections (dual controller HW and Qlogic SANsurfer Path Failover SW)

Castor Status Storage Element front-end for CASTOR castorgrid.cr.cnaf.infn.it (DNS alias load balaced over 4 machines for WAN gridftp ) sc.cr.cnaf.infn.it (DNS alias load balaced over 8 machines for SC WAN gridftp with dedicated link) SRM1 is installed and in production in the above machines. CASTOR2 stager installation (NOT YET IN PRODUCTION) castor-6 (HW HA) STAGER + Request Handler + MigHunter + rtcpclientd oracle01 (HW HA) DB STAGER castorlsf01 (HW HA) MASTER LSF diskserv-san-13 DLF + DB DLF + RMMASTER + EXPERT

CASTOR Grid SE GridFTP access through the castorgrid SE, a dns cname pointing to 4 server. Dns round-robin for load balancing During LCG Service Challenge2 introduced also a load average selection: every M minutes the ip of the most loaded server is replaced in the cname (see graph) This method worked well, still used in production and in the next SC phases

LHCb CASTOR tape pool # processes on a CMS disk SE eth0 traffic through a CASTOR LCG SE Monitoring (Nagios) Also other parameters like overall I/O performance, status of the RAID systems, space occupation on disks is constantly monitored

Disk Accounting Pure disk space (TB)CASTOR disk space (TB)