Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.

Similar presentations


Presentation on theme: "Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002."— Presentation transcript:

1 Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002

2 Outline  Currently available resources  Farm configuration  Performance  Scalability of the system (in view of the DC)  Resources Foreseen for the DC  Grid middleware issues  Conclusions

3 Current resources  Core system (hosted in two racks at INFN-CNAF) 56 CPUs hosted in Dual Processor machines (18 PIII 866 MHz + 32 PIII 1 GHz + 6 PIII Tualatin 1.13 GHz), 512 MB RAM 2 Network Attached Storage systems 1 TB in RAID5, with 14 IDE disks + hot spare 1 TB in RAID5, with 7 SCSI disks + hot spare 1 Fast Ethernet switch with Giga Uplink. Ethernet Controlled power distributor for remote power cicle  Additional resources by INFN-CNAF 42 CPUs in dual Processor machines (14 PIII 800 MHz, 26 PIII 1 GHz, 2 PIII Tualatin 1.13 GHz)

4 Farm Configuration (I)  Diskless processing nodes with OS centralized on a file server (Root over NFS) It makes trivial the introduction or removal of a node in the system, i.e. no need of software installation on local disks Grants easy interchange or CEs in case of shared resources (e.g. among various experiments), and permits dynamical allocation of the latter without additional work Very stable! No real drawback observed in about 1 year of run  Improved security Usage of private network IP addresses and Ethernet VLAN High level of isolation Access to external services (afs, mccontrol, bookkeeping db, servlets of various kinds, …) provided by means of NAT technology on the GW Most important critical systems (Single Points of Failure), but not everything actually, made redundant Two NAS in the core system with RAID5 redundancy GW and OS server: operating systems installed on two RAID1 disks (Mirroring)

5 Farm Configuration (II)

6 Fast ethernet switch NAS 1TB Ethernet controlled power distributor (32 channels) Rack (1U dual-processor MB)

7 Performance  System has been fully integrated in the LHCb MC production since August 2001  20 CPUs until December, 60 CPUs until last week, 100 CPUs now  Produced mostly bb inclusive DST2 with the classic detector (SICBMC v234 and SICBDST v235r4, 1.5 M) + some 100k channel data sets for LHCb light studies  Typically roughly 20 hours needed on a 1 GHz PIII for the full chain (minbias RAWH + bbincl RAWH + bbincl piled up DST2) for 500 events Farm capable of producing about (500 events/day)*(100 CPUs)=50000 events/day, i.e. 350000 events/week, i.e. 1.4 TB/week (RAWH + DST2)  Data transfer to CASTOR at CERN realized with standard ftp (15 Mbit/s over available bandwidth of 100 Mbit/s), but tests with bbftp reached very good troughput (70 Mbit/s) Still waiting for IT to install a bbftp server at CERN

8 Scalability  Production tests made these days with 82 MC processes running in parallel Using the two NAS systems independently (instead to share the load between them) Each NAS worked at 20% of full performance, i.e. each of them can be scaled up much more than a factor 2 Distributing the load we are pretty sure this system can handle more than 200 CPUs working at the same time at 100% (i.e. without bottlenecks) For the analysis we want to test other technologies We plan to test a fibre channel network (SAN, Storage Area Network) on some of our machines, with nominal 1 Gbit/s bandwidth to fibre channel disk arrays

9 Resources for the DC  Additional resources by INFN-CNAF foreseen for the DC period  We’ll join the DC with order of 150-200 CPUs (around 1 GHz or more), 5 TB of disk storage and a local tape storage system (CASTOR like? Not yet officially decided)  Still need some work to make the system fully redundant

10 Grid issues (A. Collamati)  2 nodes reserved at the moment for tests on GRID middleware  The two nodes form a minifarm, i.e. they have exactly the same configuration as the production nodes (one master node and one slave node) and can run MC jobs as well  Globus has been installed and first trivial tests on job submission through PBS were successful  Test job submission via globus on large scale by extending the PBS queue of the globus test farm to all our processing nodes No interference with the distributed production working system

11 Conclusions  Bologna is ready to join the DC with a reasonable amount of resources  Scalability tests were successful  The farm configuration is pretty stable  We need the bbftp server installed at CERN to fully exploit WAN connectivity and throughput  We are waiting for the decision of the DC period by CERN for the final allocation of INFN-CNAF resources  Work on GRID middleware started, first results are encouraging  We plan to install Brunel ASAP


Download ppt "Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002."

Similar presentations


Ads by Google