Presentation is loading. Please wait.

Presentation is loading. Please wait.

HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln 2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,

Similar presentations


Presentation on theme: "HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln 2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,"— Presentation transcript:

1

2 HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln http://indico.cern.ch/event/320819/ 2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

3 HEPiX 3 Global organization of service managers and support staff providing computing facilities for HEP community Participating sites include BNL, CERN, DESY, FNAL, IN2P3, INFN, NIKHEF, RAL, TRIUMF … Meetings are held twice per year - Spring: Europe, Autumn: U.S./Asia Reports on status and recent work, work in progress & future plans - Usually no showing-off, honest exchange of experiences Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

4 Outline 4 2014 Autumn Meeting & HEPiX News Site Reports End User Services & OS Grids, Clouds, and Virtualization Storage and File systems Computing and Batch IT Facilities Networking and Security Basic IT Services Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary Arne Liviu Borja

5 HEPiX Autumn 2014 Oct 13 – 17, 2014 at the University of Nebraska Lincoln - Well organized, rich program - Eduroam, Indico (intervention, incident, power cut) 93 registered participants - Many first timers again - 6/8 US-CMS Tier-2 sites, 2/5 US-ATLAS Tier-2 sites - 45 sites represented 60 contributions - 96 slides (in 25 minutes!) - 300 words per slide … 5Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

6 6 Lincoln, Nebraska About 22 hours door to door …

7 HEPiX Autumn 2014 Oct 13 – 17, 2014 at the University of Nebraska Lincoln - Well organized, rich program - Eduroam, Indico (intervention, incident, power cut) 93 registered participants - Many first timers again - 6/8 US-CMS Tier-2 sites, 2/5 US-ATLAS Tier-2 sites - 45 sites represented 60 contributions - 96 slides (in 25 minutes!) - 300 words per slide … 7Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

8 HEPiX News Tony Wong (BNL) new HEPiX co-chair - 3-year term Next meetings - Spring 2015: Oxford (UK) March 23 – 27 - Autumn 2015: BNL (US) Oct 12 – 16 - Spring 2016: DESY Zeuthen (DE), Berlin/Potsdam (TBC) 8Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

9 HEPiX Working Groups IPv6 - Deployment/readiness following Tier structure https://www.gridpp.ac.uk/wiki/2014_IPv6_WLCG_Site_Survey - Experiments pushing for services at T1/T2 Benchmarking - Awaiting SPEC CPUv6 - Suggestion of a “fast” benchmark (minutes) 9Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

10 Site Reports 15 site reports: T0, 7x T1s, 7x T2s (Move to) HTCondor still very visible - Talk from HTCondor team - INFN (on LSF now) will start evaluation KIT’s “Dropbox”: bwSync&Share - 8’000 users - Based on PowerFolder Ganeti used at multiple sites - VM cluster management tool from Google - Overall positive experience 10Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

11 Site Reports Ceph - Still gaining momentum: many PoCs (RAL: 1PB, BNL: 3PB) - Vivid mail exchange, BoF Session in Oxford? Energy efficiency - No WG, but many activities (refurbishments) - “Energy accounting” discussions INFN still investigating micro-server options - Moonshot and other Avoton based solutions - Experiments seem fine with performance/power ratio During “dark data” cleanup NDGF deleted all ALICE tape data due to misunderstanding of what “NDGF data” means - ALICE::NDGF vs. ALICE::NDGF_tape - 200TB of data now being backfilled … 11Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

12 CERN Site Report “What about Ceph @ CERN?” 12Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

13 CERN Site Report “What about Ceph @ CERN?” “Are there ever power cuts at CERN?” 13Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

14 End User Services & OS Six talks in total, three from CERN - Thomas: CC7 - Borja: Issue tracking and VCS - Michail: FTS3 Scientific Linux / CentOS - FNAL SL team continue to provide Scientific Linux - No competition with other rebuilds - Rebuild from git.centos.org: difficult (as not supported) So, after the initial discussions at the Annecy meeting, the community seems to part ways … 14Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

15 Virtualization Six talks in total, five from CERN - Laurence: Experiment’s Cloud Computing Adoption - Andrea: WLCG Monitoring - Helge: Volunteer Computing - Arne: Cloud Report, VM IO Performance RAL starting batch virtualization - “Burst batch into the cloud” - Successful PoC: Vacuum model integration with HTCondor Virtualization @ GSI: MS Windows on KVM - Windows domain restructuring: all on VMs, all on KVM - Partly in prod (CA, TS), partly in testing (DC, Exchange) - No support issue 15Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

16 Outline 16 2014 Autumn Meeting & HEPiX News Site Reports End User Services & OS Grids, Clouds, and Virtualization Storage and File systems Computing and Batch IT Facilities Networking and Security Basic IT Services Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary Arne Liviu Borja

17 Storage and Filesystems  Ten talks in total, five from CERN: – Luca: – EOS across 1000 km – CERNbox + EOS: Cloud Storage for Science – Andrea: DPM performance tuning hints for HTTP/WebDAV and Xrootd – Ruben: Experience in running relational databases on clustered storage – Liviu: SSD Benchmarking at CERN https://lvalsan.web.cern.ch/lvalsan/ssd_benchmarking 17Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

18 OpenZFS on Linux  OpenZFS  Large set of features  Independent of the Linux kernel  LLNL:  Three Lustre filesystems, ~100 PB, OpenZFS backend  Moving to commodity JBODs  Work ongoing for improving Linux boot time with large number of drives 18Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

19 Ceph Based Storage Systems for RACF  Deployment of same scale as at CERN  Lots of performance and stability tests  Object storage, block storage and file system (Ceph FS)  On several platforms (including HP Moonshot)  Different networking solutions 19Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

20 Using XRootD to Minimize Hadoop Replication  Hadoop replication via XRootD  Reduced local Hadoop replication to 1  In case of corrupt local blocks:  Request blocks via XRootD  Cache locally  Repair broken blocks locally in Hadoop 20Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

21 Computing and Batch Systems 21  Six talks in total, one from CERN:  Two presentations on benchmarking  Four presentations on batch systems Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

22 Benchmarking activities  Intel Xeon E5-2600 v3 (Haswell)  Showing good performance  Intel Avoton: very good HS06 / Watt ratio  ARM 32-bit HS06 / Watt in between Xeon & Avoton 22Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

23 Fast Benchmark  Some requirements are clear:  Open source  Easy to run  Small  Others requirements not so clear:  How fast? Reproducible? Reliable?  Single core or multicore? 23Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

24 Fast Benchmark Proposals  Geant4 based  Linux x86-64 & ARM  Realistic detector geometry  Footprint: 1/4 to 1/3 of real experiment  CPU bound, no I/O  LHCb fast benchmark  Small python script, single threaded 24Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

25 Next generation HEP-SPEC06  Next SPEC CPU benchmark (CPUv6) in beta  Should be released before the end of the year  Will probably not run with the default SLC 6 compiler  Gcc on CentOS 7 should be fine, config file will be provided by GridKa 25Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

26 Batch Systems  All four talks about HTCondor:  Two talks from developers  Jérôme’s talk: HTCondor pilot @ CERN  Open Science Grid adopting HTCondor 26Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

27 IT Facilities and Business Continuity  Three talks, two from CERN  First Experience with the Wigner Data Centre  Joint procurement of IT equipment and services  UPS Monitoring with Sensaphone  Multi-level email / SMS alerting  Gradual shutdown of servers in case of power cut or cooling failure  Wireless temperature sensors used to build 3D heatmap 27Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

28 NeRSC  New Computational Research and Theory (CRT) Building  Year-round free air and water cooling  PUE < 1.1  42 MW to building  12.5 MW provisioned 28Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

29 Outline 29 2014 Autumn Meeting & HEPiX News Site Reports End User Services & OS Grids, Clouds, and Virtualization Storage and File systems Computing and Batch IT Facilities Networking and Security Basic IT Services Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary Arne Liviu Borja

30 Networking and Security 30Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary Four networking talks, two security, one from CERN - Stefan: Situational Awareness: Computer Security IPv6 Deployment - HEPiX Ipv6 Working Group: WLCG dual-stack services deployment. Testing - Open Sciences Grid: Client/Server are dual-stack? Server is but not the client? Infiniband Based Networking evaluation - Brookhaven National Laboratory (USA) https://indico.cern.ch/event/320819/session/4/contribution/46/material/slides/0.pdf ESNet: Extension to Europe - US Department of Energy - “Scientific progress will be completely unconstrained by the physical location of instruments, people, computational resources or data”

31 Basic IT Services 1/2 Seven talks, three from CERN - Ben: Configuration Services at CERN: Update - Rubén: Database on Deman: insight how to build your DbaaS - Aris: Ermis service for DNS Load Balancer configuration Monitoring with Nagios - NERSC – US Department of Energy - Monitoring clusters of 1000's of compute nodes 31Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

32 Basic IT Services 2/2 CFEngine - ATLAS Great Lakes Tier 2 (AGLT2) - Change management: SVN → Push to production Puppet at USCMS-T1 – FermiLab - Modules + Data in Hiera approach. PuppetDashboard instead of TheForeman - Change management: Git branches → Push to production - Continuous Integration? Not yet but Beaker is the main candidate - Secrets? “hiera-eyaml” Not a good solution Puppet at BNL - RICH and ATLAS computing Facility - Emphasis in Change Management and Cultural Management - Test environments + self-approve delay - Looking for automatic testing 32Wiebalck, Valsan, Aparicio: HEPiX Autumn 2014 Summary

33


Download ppt "HEPiX Autumn Meeting 2014 University of Nebraska, Lincoln 2 Arne Wiebalck Liviu Valsan Borja Aparicio Wiebalck, Valsan,"

Similar presentations


Ads by Google