Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t HEPiX Report Helge Meinhard, David Gutierrez, Jérôme Belleman / CERN-IT Technical Forum/Computing.

Similar presentations


Presentation on theme: "CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t HEPiX Report Helge Meinhard, David Gutierrez, Jérôme Belleman / CERN-IT Technical Forum/Computing."— Presentation transcript:

1 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t HEPiX Report Helge Meinhard, David Gutierrez, Jérôme Belleman / CERN-IT Technical Forum/Computing Seminar 16 September 2012

2 Outline Meeting organisation; site reports; computing; miscellaneous (Helge Meinhard) Security and networking; storage (David Gutierrez) IT infrastructure; grids, clouds and virtualisation (Jérôme Belleman) HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

3 HEPiX Global organisation of service managers and support staff providing computing facilities for HEP Covering infrastructure and all platforms of interest (Unix/Linux, Windows, Grid, …) Aim: Present recent work and future plans, share experience, advise managers Meetings ~ 2 / y (spring in Europe, autumn typically in North America) HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

4 HEPiX Autumn 2012 (1) Held 15 – 19 October at the Institute of High Energy Physics (IHEP) of the Chinese Academy of Sciences, Beijing, People’s Republic of China –1300 staff, 400 students, 120M $ budget –BEBC II accelerator, BES III detector; members in Belle II, CMS, ATLAS; neutrino experiments –Particle astrophysics, theory, synchrotron lab –Tier 2 centre in LCG for Atlas and CMS Excellent local organisation –Gang Chen and his team made the meeting run very smoothly –Network including Wifi, video conferencing (Vidyo – 4 remote presentations), … all working like a charm –Beijing: Growing and changing at an incredible speed Cars have almost entirely replaced bicycles… Sponsored by Huawei and Western Digital HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

5 HEPiX Autumn 2012 (2) Format: Pre-defined tracks with conveners and invited speakers per track –Rich, interesting and packed agenda Contrary to last time, Silverman’s law applied once more – agenda was full, but not overcrowded –Judging by number of submitted abstracts, good balance between tracks: IT infrastructure (12 talks), network and security (11 talks), computing (8 talks), grids/clouds/virtualisation (7 talks), storage and file systems (7 talks), miscellaneous (4 talks)… plus one BoF session (on batch systems) and 11 site reports Full details and slides: http://indico.cern.ch/conferenceDisplay.py?confId=199025 http://indico.cern.ch/conferenceDisplay.py?confId=199025 Trip report by Alan Silverman available, too http://cdsweb.cern.ch/record/1485643 http://cdsweb.cern.ch/record/1485643 HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

6 HEPiX Autumn 2012 (3) 67 registered participants, of which 9/10 from CERN –CERN: Belleman, Cass, Fedorko, Grzywaszewski, Gutierrez, Lopienski, Meinhard, Salter, (Silverman,) Traylen –20 from Asia, 39 from Europe, 6 from USA, 2 from Australia –Plus some more colleagues from IHEP Representing 27 institutes, 2 sponsors –9 from Asia, 15 from Europe, 2 from USA, 1 from Australia –2 worldwide sponsor companies Compare with Prague (spring 2012): 97 participants, of which 12/13 from CERN; Vancouver (autumn 2011): 98 participants, of which 10/11 from CERN HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

7 HEPiX Autumn 2012 (4) 60 talks, of which 13 from CERN –Compare with Prague: 74 talks, of which 22 from CERN –Compare with Vancouver: 55 talks, of which 15 from CERN Next meetings: –Spring 2013: CNAF, Bologna, Italy, 15 – 19 April Batch systems; energy efficiency; network monitoring?; Windows 8 etc.? –Autumn 2013: U Michigan, Ann Arbor, US, 28 October – 01 November –Spring 2014: Interest by LAPP Annecy (to be confirmed) HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

8 Site reports (1): Hardware Only few details this time CPU servers: same trends –12...48 core dual-CPU servers, 2...4 GB/core. Typical chassis: 2U Twin2; some A-brand blades (one failing blade has taken entire chassis down) Disk storage –External disk enclosures gaining popularity 4U trays with 48…60 drives becoming popular No positive indication that SAS nearline is taking up –A-brand’s extension disk tray has got firmware… –IBM Sonas HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

9 Site reports (2): Hardware (cont’d) Tapes –An increasing number of sites mentioned T10kC in production –LTO popular, many sites investigating (or moving to) LTO5; some migration from LTO to T10kC HPC –IB still popular; two large clusters at GSI, one for computing, one for parallel file system (Lustre) –10GE ramping up Odds and ends: Suppliers going bust HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

10 Site reports (3): Software Storage –CVMFS now a standard service – little issues only –Increasing interest in NFS interfaces for dCache and DPM –Lustre mentioned often – works well with controlled use cases and new, homogeneous hardware, but issues with some use cases and older hardware –Enstore/dCache: small file aggregation in production at FNAL OS –GSI moving from flat to hierarchical Windows domain (domain controllers on VMs); LAL has completed move to Windows domain to IN2P3 “forest” Mail/calendaring services –Exchange 2003 and/or Lotus to Exchange 2010 (FNAL: 3’000 accounts total) –DESY considering alternative solutions to replace Exchange 2003: OpenXchange, Zimbra, Zarafa HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

11 Site reports (4): Software (cont’d) Virtualisation –Most sites experimenting with KVM –Some use of VMware (and complaints about cost level…) and of HyperV –Australia: migrating from KVM to Citrix –Most sites run critical services on VMs Clouds –Openstack –OpenNebula Miscellaneous: Docuwiki, Redmine, git Configuration management –Puppet seems to be clear winner, still on the rise –Chef, Quattor used as well –Declining interest in cfengine (3) HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

12 Site reports (5): Software (cont’d), Infrastructure Monitoring –Some sites migrating from Nagios to Icinga, one site considering Zenoss –Ganglia used frequently for performance monitoring –PerfSONAR being deployed everywhere Infrastructure –A number of upgrade projects (IHEP from 800 kW to 1’800 kW) –GSI: Cube prototype working fine even at 32 deg outside –RAL: switch gear in power supply line being replaced, higher risk until end November –LAL: Major chiller failure –FNAL: During hot summer, had to throttle down major services –DESY: During power supply maintenance, batteries on full load – some exploded, acid on the floor… resulting in extended power outage –DESY: 20 kUSD network line card destroyed by concrete dust due to drilling HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

13 Computing: Batch systems (1) 8 talks, BoF session Site reports: Torque/Maui for small to medium size installations; PBSpro; GridEngine; Slurm (mentioned 3 times) FNAL: HTCondor since 2002 for part of their facilities (CDF) –Many features added on FNAL’s request –Main scalability concern is condor_schedd; single-threaded, supports up to 30 k simultaneous jobs now, goal is 150 k CERN: LSF: large installation, heterogeneous user base, 400 k jobs per day –Issues: slow response to queries and submissions, slow dispatching, fairshare scheduling, setup complex, poorly dynamic, limited scalability –Targeting 12’000 physical nodes, 300’000 job slots –Currently looking at Slurm, GE, Condor, LSF8 –Recent work on monitoring and accounting HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

14 Computing: Batch systems (2) KIT: 1000 nodes, split into two PBS instances due to PBS limitations –Tested Torque/Maui, GE; selected Univa GE –Migration started in July, to finish in December –GE: learning curve… but stable, flexible, with good support IN2P3-CC: Migration to Oracle GE completed in December 2011 –A lot of interfacing done by IN2P3 –Shadow master abandoned due to instabilities –Difficult to get job information; no native grid support –Oracle support not brilliant; difficult to get in contact with developers; no road map for GE; only serious bugs got fixed –Getting in touch with Univa HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

15 Computing: Batch systems (3) DESY Zeuthen: Added Certificate Security Protocol support to UGE NDGF: Slurm experience: very positive, easier and more stable than predecessors (Torque/Maui) –Defaults often not adequate, tuning needed INFN Bari: Testing Slurm –Tested a long list of functionalities –Scheduling powerful, but can be improved by using MOAB or LSF scheduler –No RPM; no way to transfer output back to submission host –Rather steep learning curve –Tests with 6’000 cores and 100’000 jobs all successful, very moderate load on master –Grid integration (Cream CE) progressing HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012

16 Miscellaneous CERN mobile Web site HEPiX report – Helge.Meinhard at cern.ch – 16-Nov-2012


Download ppt "CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t HEPiX Report Helge Meinhard, David Gutierrez, Jérôme Belleman / CERN-IT Technical Forum/Computing."

Similar presentations


Ads by Google