Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009.

Similar presentations


Presentation on theme: "1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009."— Presentation transcript:

1 1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009

2 Overview Infrastructure Network Farming Storage 2

3 Infrastructure 3

4 4 INFN-T1 2005INFN-T1 2009 Racks40120 Power sourceUniversityDirectly from supplier (15kV) Power Transformer 1 (~1MVA)3 (~2.5MVA) UPS1 diesel engine/UPS (~640kVA) 2 Rotary UPS (~3400kVA) + 1 diesel engine (~640kVA) Chiller1 (~530kVA)7 (~2740kVA)

5 5 UPS up to 3,8 MW 15000 V 1.4 MW 1 MW Chillers 1.4 MW 1.2 MW

6 Mechanical and electrical surveillance

7 Network 7

8 INFN CNAF TIER1 Network 7600 GARR 2x10Gb/s 10Gb/s Exterme BD10808 4x10Gb /s 10Gb/s LHC-OPN dedicated link 10Gb/s T0-T1 (CERN) T1-T1 (PIC,RAL,TRIUMPH) T1-T1’s (BNL,FNAL,TW-ASGC,NDGF) T1-T2’s CNAF General purpose Exterme BD8810 Worker Nodes 2x1Gb/s Extreme Summit450 Extreme Summit450 4x1Gb/ s Extreme Summit450 Worker Nodes 4x1Gb/s 2x10Gb /s Extreme Summit400 Storage Servers Disk Servers Castor Stagers Fiber Channel Storage Devices SAN Extreme Summit400 In Case of network Congestion: Uplink upgrade from 4 x 1Gb/s to 10 Gb/s or 2x10Gb/s LHC-OPN CNAF-KIT CNAF-IN2P3 CNAF-SARA T0-T1 BACKUP 10Gb/s WAN RAL PIC TRIUMPH Cisco NEXUS 7000

9 Farming 9

10 New tender 1U Twin solution with these specs:  2 Intel Nehalem E5520 @2.26GHz  24GB RAM  2x 320 GB SATA HD @7200 rpm,  2x 1Gbps Ethernet 118 twin, reaching 20500 HEP-SPEC, measured on SLC44 Delivery and installation foreseen within 2009 10

11 Computing resources Including machines from new tender, INFN- T1 computing power will reach 42000 HEP- SPEC within 2009 Further increase within January 2010 will bring us to 46000 HEP-SPEC Within may 2010, we will reach 68000 HEP- SPEC (as we pledged to WLCG)  This basically will triple current computing power 11

12 Resource usage per VO 12

13 KSI2K pledged vs used 13

14 New accounting system Grid, local and overall job visualization Tier1/Tier2 separation Several parameters monitored  avg and max RSS, avg and max Vmem added in latest release KSI2K/HEP-SPEC accounting WNoD accounting Available at: http://tier1.cnaf.infn.it/monitorhttp://tier1.cnaf.infn.it/monitor Feedback welcome to: farming@cnaf.infn.it 14

15 New accounting: sample picture 15

16 GPU Computing (1) We are investigating GPU computing  NVIDIA Tesla C1060, used for porting software and performing comparison tests  https://agenda.cnaf.infn.it/conferenceDisplay. py?confId=266, meeting with Bill Dally (chief scientist and vice president of NVIDIA). https://agenda.cnaf.infn.it/conferenceDisplay. py?confId=266 16

17 GPU Computing (2) Applications currently tested:  Bioinformatics: CUDA-based paralog filtering in Expressed Sequence Tag clusters  Physics: Implementing a second order electromagnetic particle in cell code on the CUDA architecture  Physics: Spin-Glass Monte Carlo Simulations First two apps showed more than 10x increase in performance!! 17

18 GPU Computing (3) We plan to buy 2 more workstations in 2010, with 2 GPU each.  We wait for the FERMI architecture, foreseen for spring 2010 We will continue the activities currently ongoing and will probably test some monte carlo simulations for superB We plan to test selection and shared usage of GPUs via grid 18

19 Storage 19

20 2009-2010 tenders Disk tender requested  Baseline: 3.3 PB raw (~ 2.7 PB-N) 1 st option: 2.35 PB raw (~ 1.9 PB-N) 2 nd option: 2 PB raw (~ 1.6 PB-N) Options to be requested during Q2 and Q3 2010  New disk in production ~ end of Q1 2010 4000 tapes (~ 4 PB) acquired with library tender  4.9 PB needed beginning of 2010  7.7 PB probably needed by half 2010

21 21 Castor@INFN-T1 To be upgraded to 2.1.7-27 1 Srm v 2.2 end-points available  Supported protocols: rfio, gridftp Still cumbersome to manage  requires frequent intervention in the Oracle db  Lack of management tools CMS migrated to StoRM for D0T1

22 22 WLCG Storage Classes at INFN-T1 today Storage Class – offer different levels of storage quality (e.g. copy on disk and/or on tape)  DnTm = n copies on disk and m copies on tape Implementation of 3 Storage Classes needed for WLCG (but usable also by non-LHC experiments)  Disk0-Tape1 (D0T1) or “custodial nearline” Data migrated to tapes and deleted from disk when staging area full Space managed by system Disk is only a temporary buffer  Disk1-Tape0 (D1T0) “replica online” Data kept on disk: no tape copy Space managed by VO  Disk1-Tape1 (D1T1) “custodial online” Data kept on disk AND one copy kept on tape Space managed by VO (i.e. if disk is full, copy fails)‏ Currently CASTOR Currently GPFS/TSM + StoRM

23 23 YAMSS: present status Yet Another Mass Storage System  Scripting and configuration layer to interface GPFS&TSM Can work driven by StoRM or stand-alone  Experiments not using the SRM model can work with it GPFS-TSM (no StoRM) interface ready  Full support for migrations and tape ordered recalls StoRM  StoRM in production at INFN-T1 and in other centres around the world for “pure” disk access (i.e. no tape)  integration with YAMSS for migrations and tape ordered recalls ongoing (almost completed) Bulk migrations and recalls tested with a typical use case (stand-alone YAMSS, without StoRM)  Weekly production workflow of the CMS experiment

24 24 Why GPFS&TSM Tivoli Storage Manager (developed by IBM) is a tape oriented storage manager widely used (also in HEP world, e.g. FZK)  Built-in functionality present in both products to implement backup and archiving from GPFS. The development of a HSM solution is based on the combination of features of GPFS (since v.3.2) and TSM (since v.5.5).  Since GPFS v.3.2 the new concept of “external storage pool” extends use of policy driven Information Lifecycle Management (ILM) to tape storage. External pools are real interfaces to external storage managers, e.g. HPSS or TSM  HPSS very complex (no benefits in this sense compared to CASTOR)

25 25 YAMSS: hardware set-up 20x4 Gbps ~ 500 TB for GPFS on CX4-960 4 GridFTP servers (4x2 Gbps) 6 NSD servers (6x2 Gbps) on LAN HSM STA HSM STA HSM STA 8x4 Gbps 3x4 Gbps db 8x4 Gbps 8 tape drives T10KB: - 1 TB per tape, - 1 Gbps per drive TAN SAN TSM server 4 Gbps FC

26 YAMSS: validation tests  Concurrent access in r/w to MSS for transfers and from farm  StoRM not used in these tests  3 HSM nodes serving 8 T10KB drives  6 drives (at maximum) used for recalls  2 drives (at maximum) used for migrations  Order of 1GB/s of aggregated traffic 26 ~550 MB/s from tape to disk ~100 MB/s from disk to tape ~400 MB/s from disk to the computing nodes (not shown in this graph)

27 Questions? 27


Download ppt "1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009."

Similar presentations


Ads by Google