Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003.

Similar presentations


Presentation on theme: "Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003."— Presentation transcript:

1 Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003

2 INFN – Tier1 INFN computing facility for HNEP community  Location: INFN-CNAF, Bologna (Italy) o One of the main nodes on GARR network  Ending prototype phase this year  Fully operational next year  Personnel: ~ 10 FTE’s Multi-experiment  LHC experiments, Virgo, CDF  BABAR (3 rd quarter 2003)  Resources dynamically assigned to experiments according to their needs Main (~50%) Italian resource for LCG  Coordination with Tier0 and other Tier1 (management, security etc..)  Coordination with Italian tier2s, tier3s  Participation to grid test-beds (EDG,EDT,GLUE)  Participation to CMS, ATLAS, LHCb, Alice data challenge  GOC (deployment in progress)

3 Networking CNAF interconnected to GARR-B backbone at 1 Gbps.  Giga-PoP co-located  GARR-B backbone at 2.5 Gbps. LAN: star topology  Computing elements connected via FE to rack switch o 3 Extreme Summit 48 FE + 2 GE ports o 3 3550 Cisco 48 FE + 2 GE ports o 8 Enterasys 48 FE 2GE ports  Servers connected to GE switch o 1 3Com L2 24 GE ports  Uplink via GE to core switch o Extreme 7i with 32 GE ports o ER16 Gigabit switch router Enterasys  Disk servers connected via GE to core switch.

4 LAN TIER1 FarmSW1 (*) FarmSW2(*) FarmSWG1 (*) FarmSW3(*) Switch-lanCNAF (*) SSR2000 Catalyst6500 Fcds1 Fcds2 8T F.C. 2T SCSI NAS2 131.154.99.192 NAS3 131.154.99.193 Fcds3 LHCBSW1 (*) LAN CNAF 1 Gbps GARR 1 Gbps link (*) vlan tagging enabled

5 Vlan Tagging Define VLAN’s across switches  Independent from switch brand (Standard 802.1q) Adopted solution for complete granularity  To each switch port is associated one VLAN identifier  Each rack switch uplink propagates VLAN information  VLAN identifiers are propagated across switches  Each farm has its own VLAN  Avoid recabling (or physical moving) of hw to change the topology Level 2 isolation of farms  Aid for enforcement of security measures Possible to define multi-tag ports (for servers)

6 Computing units (1) 160 1U rack-mountable Intel dual processor servers  800 MHz – 2.2 GHz 160 1U bi-processors Pentium IV 2.4 GHz to be shipped this month 1 switch per rack  48 FastEthernet ports  2 Gigabit uplinks  Interconnected to core switch via 2 couples of optical fibers o Also 4 UTP cables available 1 network power control per rack  380 V three-phase power as input  Outputs 3 independent 220 V lines  Completely programmable (permits gradual servers switching on).  Remotely manageable via web

7 Computing units (2) OS: Linux RedHat (6.2, 7.2, 7.3, 7.3.2)  Experiment specific library software  Goal: have generic computing units o Experiment specific library software in standard position (e.g. /opt/cms) Centralized installation system  LCFG (EDG WP4)  Integration with central Tier1 db (see below)  Each farm on a distinct VLAN o Moving from a farm to another a server changes IP address (not name)  Unique dhcp server on all VLAN’s  Support for DDNS (cr.cnaf.infn.it) in progress Queue manages: PBS  Not possible to have version “Pro” (only for edu)  Free version not flexible enough  Tests of integration with MAUI in progress

8 Tier1 Database Resource database and management interface  Hw servers characteristics  Sw servers configuration  Servers allocation  Postgres database as back end  Web interface (apache+mod_ssl+php) Possible direct access to db for some applications  Monitoring system  nagios Interface to configure switches and interoperate with LCFG

9 Monitoring/Alarms Monitoring system developed at CNAF  Socket server on each computer  Centralized collector  ~100 variables collected every 5 minutes o Data archived on flat file – In progress: XML structure for data archives  User interface: http://tier1.cnaf.infn.it/monitor/http://tier1.cnaf.infn.it/monitor/ o Next release: JAVA interface (collaboration with D. Galli, LHCb) Critical parameters periodically checked by nagios  Connectivity (i.e. ping), system load, bandwidth use, ssh daemon, pbs etc…  User interface: http://tier1.cnaf.infn.it/nagios/http://tier1.cnaf.infn.it/nagios/  In progress: configuration interface

10 Remote control KVM switches permit remote control of servers console  2 models under test Paragon UTM8 (Raritan)  8 Analog (UTP/Fiber) output connections  Supports up to 32 daisy chains of 40 servers (UKVMSPD modules needed)  Costs: 6 KEuro + 125 Euro/server (UKVMSPD module)  IP-reach (expansion to support IP transport): 8 KEuro Autoview 2000R (Avocent)  1 Analog + 2 Digital (IP transport) output connections  Supports connections up to 16 servers o 3 switches needed for a standard rack  Costs: 4.5 KEuro NPC’s (Network Power Control) permit remote and scheduled power cycling via snmp calls or web  Bid under evaluation

11 Raritan

12 Avocent

13 Storage Access to on-line data: DAS, NAS, SAN  32 TB (> 70 TB this month)  Data served via NFS v3 Test of several hw technologies (EIDE, SCSI, FC)  Bid for FC switch Study of large file system solutions (>2TB) and load balancing/failover architectures  GFS (load balancing) o Problems with lock server (better in hw?)  GPFS (load balancing, large file systems) o Not that easy to install and configure….  HA (failover) “SAN on WAN” tests (collaboration with CASPUR) Tests with PVFS (LHCb, Alice)

14 STORAGE CONFIGURATION CLIENT SIDE (Gateway or all Farm must access Storage) WAN or TIER1 LAN PROCOM NAS2 Nas2.cnaf.infn.it 8100 Gbyte VIRGO ATLAS Fileserver CMS (or more in cluster or HA) diskserv-cms-1.cnaf.infn.it PROCOM NAS3 Nas3.cnaf.infn.it 4700 Gbyte ALICE ATLAS IDE NAS4 Nas4.cnaf.infn.it 1800Gbyte CDF LHCB AXUS BROWIE Circa 2200 Gbyte 2 FC interface DELL POWERVAULT 7100 Gbyte 2 FC interface FAIL-OVER support FC Switch In order RAIDTEC 1800 Gbyte 2 SCSI interfaces CASTOR Server+staging STK180 with 100 LTO (10Tbyte Native) Fileserver Fcds3.cnaf.infn.it

15 Mass Storage Resources StorageTek library with 9840 and LTO drives  180 tapes (100/200 GB each) StorageTek L5500 with 2000-5000 slots in order  LTOv2 (200/400 GB each)  6 I/O drives  500 tapes ordered CASTOR as front-end software for archiving  Direct access for end-users  Oracle as back-end

16 TAPE HARDWARE

17 CASTOR Developed and maintained at CERN Chosen as front-end for archiving Features  Needs a staging area on disk (~ 20% of tape)  ORACLE database as back-end for full capability (a MySQL interface is also included) o ORACLE database is under day-policy backup  Every client needs to install the CASTOR client packet (works on almost major OS’s including Windows) o Access via rfio command CNAF setup  Experiment access from TIER1 farms via rfio with UID/GID protection from single server  National Archive support via rfio with UID/GID protection from single server  Grid SE tested and working

18 CASTOR at CNAF 2 drive 9840 4 drives LTO Ultrium SCSI LEGATO NSR (Backup) Robot access via SCSI ACSLS CASTOR STK L180 LAN 2 TB Staging Disk

19 New Location The present location (at CNAF office level) is not suitable, mainly due to:  Insufficient space.  Weight (~ 700 kg./0.5 m 2 for a standard rack with 40 1U servers). Moving to the final location (early) this summer.  New hall in the basement (-2 nd floor) almost ready.  ~ 1000 m 2 of total space o Computers o Electric Power System (UPS, MPU) o Air conditioning system  Easily accessible with lorries from the road  Not suitable for office use (remote control)

20 Electric Power 220 V mono-phase needed for computers.  4 – 8 KW per standard rack (with 40 bi-processors)  16-32 A. 380 V three-phase for other devices (tape libraries, air conditioning etc..). To avoid black-outs, Tier1 has standard protection systems. Installed in the new location:  UPS (Uninterruptible Power Supply). o Located in a separate room (conditioned and ventilated). o 800 KVA (~ 640 KW).  Electric Generator. o 1250 KVA (~ 1000 KW).  up to 80-160 racks.

21 Summary & conclusions INFN-TIER1 is closing the prototype phase  But still testing new technological solutions Going to move the resources to the final location Interoperation with grid projects (EDG,EDT,LCG) Starting integration with LCG Participating to CMS DC04  ~ 70 computing servers  ~ 4M events (40% of Italian commitment) 15+60(Tier0) TB of data (July to December 03)  Analysis of simulated events (January to February 04)  Interoperation with Tier0 (CERN) and Tier2 (LNL)


Download ppt "Tier 1 Luca dell’Agnello INFN – CNAF, Bologna Workshop CCR Paestum, 9-12 Giugno 2003."

Similar presentations


Ads by Google