Presentation is loading. Please wait.

Presentation is loading. Please wait.

Farm Management https://bbrweb.pd.infn.it:5212/farm/ D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.

Similar presentations


Presentation on theme: "Farm Management https://bbrweb.pd.infn.it:5212/farm/ D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F."— Presentation transcript:

1 Farm Management https://bbrweb.pd.infn.it:5212/farm/ D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F. Safai Tehrani 4), R. Stroili 2), G. Tiozzo 2), G. Vedovato 2) 1) I.N.F.N. of Ferrara, Italy, 2) Univ. and I.N.F.N. of Padova, Italy, 3) Univ. “Ca’ Foscari”, Venezia and I.N.F.N. of Padova, Italy, 4) I.N.F.N. of Roma, Italy and the BaBar Computing Group A new dedicated facility for (re)processing of BaBar raw data, supported by INFN, has been installed in Padova (Italy) in 2002 as part of the distributed TierA system at disposal of the experiment. The facility consists of four independent farms, each capable of processing 2 million events (corresponding to 160 pb -1 of raw data) per day. Reconstructed data are stored in an Objectivity federation, checked and finally transferred to SLAC. The facility exploits commodity CPU and disk storage while preserving good reliability, high performance and well organized system management. The center, which now counts on approx. 200 dual CPU PIII and 30 TB of disk space, has been in operation since October 2002 and experience so far has been very satisfactory. Existing hardware: All machines: 2 x 1.26 GHz CPU, 1 GB ram 140 clients, 40 GB local IDE disk (software RAID) 20 servers, same configuration as clients, Gigabit ethernet 30 storage servers, 1.28 TB IDE disk with 3ware RAID controller, Gigabit ethernet 5 “PR” servers, up to 0.35 TB SCSI disk 10k RPM, with SCSI controller ServeRaid, Gigabit ethernet one tape library for 700 LTO tapes (70 TB uncompressed) New acquisitions: new tape library for 700 LTO2 tapes (140 TB uncompressed) 103 clients, 2 x Xeon 2.4 GHz, 2 GB ram 14 storage servers, 2 x Xeon 2.4 GHz, 2 GB ram. 1.4 TB IDE disk 10 “PR” servers, 2 x Xeon 2.4 GHz, 2 GB ram Machines are organized into: 4 identical farms, 60 CPUs each 160 pb -1 /day/farm ~2,000,000 events/day/farm (output) 160 GB/day/farm input (raw) data 330 GB/week/farm output (Objy) data Using IBM's xCAT (eXtreme Cluster Administration Toolkit) allowing: remote power control (*) remote BIOS console (*) remote OS console remote software reset parallel remote shell network installation …. (*) on IBM machines only <?xml version="1.0" standalone="no"?> <!DOCTYPE monitor SYSTEM "monitor.dtd">... XML Configuration File SNMP Poller HTTPD Host 1Host 2Host n StylesheetsHTML Pages Monitored Hosts PerfMC In-core Status RRD XSLT Engine Graphs Filter (PHP...) PerfMC (presented @ CHEP03), a high performance monitoring program developed for this farm: scalable efficient requires low resources easily configurable using XML operates in background (no GUI) First BaBar Data Processing farm fully based on: Linux cheap hardware First Boot Machines must support PXE Software installation Kickstart installation method preferred, because easier to configure according to machine type. Cloning (hard disk copy) or imaging (partition copy) methods also possible. Can use 2 nd level repositories. Network configuration All machines on a private network. A few front-end machines have two interfaces. Public machines resolve private names using a NIS server. Farm Monitoring Farm Performance System CPU: ~60% User CPU: ~100% Network: 300 Mb/s Disk write: 50 Mb/s System is continuously stressed! Screenshot of parallel installation of >100 clients SysAlarm Home-made Perl tool to parse system logfiles and save errors in MySQL database. Log server: used to centralize system logs on one machine time_of_day Monitored quantities: CPU Disk I/O Network I/O Temperatures Total disk needed for whole farm: 5 GB. Based on: SNMP, to be compatible with widest variety of hardware; using asynchronous non-blocking SNMPv2 bulk Get requests RRDtool library, for graphs. Extensive work done to optimize resources and to reduce bottlenecks (e.g., minimizing usage of NFS) Problems: vendor driver availability and support for different Linux releases had to recompile for large file support nfs not optimal under (heavy load on) Linux MySQL widely used for farm monitoring, management and production: 12 databases, 3.5 GB total


Download ppt "Farm Management https://bbrweb.pd.infn.it:5212/farm/ D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F."

Similar presentations


Ads by Google