Presentation is loading. Please wait.

Presentation is loading. Please wait.

HEPiX Spring Meeting 2015 University of Oxford, UK 2 Arne Wiebalck Julien Leduc Adam Krajewski Wiebalck, Leduc, Krajewski:

Similar presentations


Presentation on theme: "HEPiX Spring Meeting 2015 University of Oxford, UK 2 Arne Wiebalck Julien Leduc Adam Krajewski Wiebalck, Leduc, Krajewski:"— Presentation transcript:

1

2 HEPiX Spring Meeting 2015 University of Oxford, UK http://indico.cern.ch/event/346931/ 2 Arne Wiebalck Julien Leduc Adam Krajewski Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

3 HEPiX 3 Global organization of service managers and support staff providing computing facilities for HEP community Participating sites include BNL, CERN, DESY, FNAL, IN2P3, INFN, NIKHEF, RAL, TRIUMF … Meetings are held twice per year - Spring: Europe, Autumn: U.S./Asia Reports on status and recent work, work in progress & future plans - Usually no showing-off, honest exchange of experiences Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

4 Outline 4 2015 Spring Meeting & General HEPiX News Site Reports (17) Grids, Clouds, and Virtualization (8) Storage and File systems (8) Computing and Batch (17) IT Facilities (2) End User Services & Operating Systems (10) Networking and Security (10) Basic IT Services (7) Closing remarks Arne Julien Adam Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

5 HEPiX Spring 2015 Mar 23 – 27, 2015 at the Physics Department Oxford University, UK 134 registered participants (record!) - Many first timers again - 75% from Europe, ~20 from 8 companies - 45 different affiliations 83 contributions (+30%) - slots cut down to 25mins - Ceph BoF, IPv6 tutorial 5Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

6 6

7 HEPiX Working Groups Benchmarking - Awaiting SPEC CPUv6 - Suggestion of a “fast” benchmark (minutes) - First test of a candidate provided by LHCb 7Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

8 Site Reports (1) 17 site reports: about half from T0/T1 HTCondor continues to be very visible - Many sites consider to move (e.g. DESY or KISTI) - Mostly due to scalability issues with current solutions - Feedback from sites running it is very positive - INFN renewed LSF contract “for the last time” Config’ mgmt: Puppet still gaining popularity - Quattor flag held up by some (few) sites - Ansible mentioned as well (NERSC) … (reminds me of Umeå 2009) 8Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

9 Site Reports (2) Storage: Ceph clearly dominating reports … - Some sites well advanced (e.g. BNL, RAL, CERN) - Many sites exploring what to do with Ceph … but Lustre (re)gains some popularity - Beyond GSI & JLAB, sites are considering deployment (e.g. NIKHEF) - Apparently sites see a need for a distributed file system (specialised ones? DESY considers moving out of AFS) SL vs. CentOS: not a hot topic - No rivalry, sites do not worry 9Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

10 Site Reports (3) Monitoring: being redone at several sites - With usual suspects: Flume, ES, Kibana, Grafana, … Cgroups started to be used more widely - Issues on various batch installations (kernel panics) IHEP: per user but managed VMs - no root access, no console access - an option for lxplus++ ? 10Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

11 11Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary GSI’s Cube: “3-d” CC -6 floors (128 racks, 36k U) -PUE < 1.1 -used for heating -cable length More details next time!

12 Virtualization (1) 12Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary 8 talks, 2 from CERN - Bruno: Cloud report & Heat OpenStack community within HEP growing - Different approaches (e.g. IHEP only 3 images, RAL only 1 flavor) - Mostly used for dev machines, some for services, few for compute (normal virtualization phase-in) “ATLAS on Amazon” (BNL/AWS) - Practical feasibility of commercial clouds for ATLAS production – at full scale! - Joint work with the AWS Scientific Computing Group - Areas: compute (capacity?), networking (direct links?), storage (from “keep” to “delete”), … std vs scientific computing - First test w/ 20k slots was economical, next test: 100k cores http://indico.cern.ch/event/346931/session/9/contribution/20/material/slides/0.pdf https://indico.cern.ch/event/346931/session/9/contribution/54/material/slides/1.pdf

13 Virtualization (2) 13Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

14 Outline 14 2015 Spring Meeting & General HEPiX News Site Reports (17) Grids, Clouds, and Virtualization (8) Storage and File systems (8) Computing and Batch (17) IT Facilities (2) End User Services & Operating Systems (10) Networking and Security (10) Basic IT Services (7) Closing remarks Arne Julien Adam Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

15 Storage and File systems (1) 8 talks 4 about CEPH Panel and BoF: Ask the CEPH experts CEPH as a building block for many services RACF at BNL CephFS went in production since 2014Q3 Awaiting RDMA support to ditch IBoE RAL Ceph as a large scale object store to replace their Castor disk only storage Using Xrootd and GridFTP plugins Testified about the experience of loosing monitors: using now 3 physical monitors physically distributed Going to erasure coding: 3 replicas are too expensive, looking for +30% HA overhead for Ceph storage 15Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

16 Storage and File systems (2) Distributed File systems: GPFS: DESY Petra III data taking and analysis infrastructure is moving to GPFS after detector upgrades (DESY IBM partnership) BeeGFS experience: DESY wants to use this as a replacement for GPFS and Lustre Former FhGFS from Fraunhofer, renamed in 2014 Project will become opensource with commercial support available 16Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

17 Storage and File systems (3) DESY experimenting with HGST open Ethernet drive to build a dCache cluster Each disk: Runs Linux (2GB RAM, disk is sda, network is eth0), 60/4U enclosure They recompiled dCache pool code and run it directly on disks Future tests: reuse this HW to test Ceph deployment on disks 17Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

18 Computing and Batch (1) 17 talks 8 benchmarking + 9 batch systems Commissioning cloud resources Several simple metrics: wallclock, CPU usage, data stage-in time, cvmfs software setup time allowing quick commissioning of cloud resources Stable cloud is easy to integrate in production Lot of efforts to optimize performance 18Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

19 Computing and Batch (2) BNL remote evaluation of HW Need to speed up acquisition processes (partnership with vendors) Long acquisition processes money lost Beyond HS06/fast benchmark Candidates (SPEC CPUv6, Multithreaded Geant4), mandatory compiler flags (-o2?)... Fast benchmark LHCb fast benchmark, HS06/LHCb ratio between 1.2 and 1.6 (but can go >2) 19Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

20 Computing and Batch (3) Alternate CPUs: Intel Atom Avoton, Tegra K1 (ARM 32bit) extensively tested ARM 64bit software support is improving Working on integration in CERN environment (PXE boot, puppet, koji...) Test platforms available through CERN techlab https://twiki.cern.ch/twik/bin/viewauth/IT/TechLab 20Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

21 Computing and Batch (4) Univa GE is popular Only one to support DRMAA2 standard now HTCondor is more popular Very large reactive community Lot of additional tools developed by communities (HEP, HCCondor) Monitoring CPU and memory usage with cgroups Batch schedulers can isolate jobs in cgroups Allow to understand resource utilization per type of jobs (analysis, reconstruction,...) => refine scheduling policies 21Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

22 IT Facilities (1) 2 talks from CERN Recent operational issues at CERN 14/10/16 power incident (+ Murphy's law) 22Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

23 IT Facilities (2) Another operational incident: Dust on tape incident Thanks to vendor impact was limited Development of a homemade dust sensor to monitor dust inside tape libraries at CERN 23Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

24 Outline 24 2015 Spring Meeting & General HEPiX News Site Reports (17) Grids, Clouds, and Virtualization (8) Storage and File systems (8) Computing and Batch (17) IT Facilities (2) End User Services & Operating Systems (10) Networking and Security (10) Basic IT Services (7) Closing remarks Arne Julien Adam Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

25 End User Services & OS (1) 10 talks total, 5 from CERN Andreas: CERN Search and Social for the Enterprise Web Experience Thomas: Evolutions in the CERN Conferencing Services Landscape Arne: CERN CentOS 7 Update Nils: Update on software collaboration services at CERN Status of volunteer computing at CERN HEP Software Foundation Collaboration started for HEP software/computing efforts (kickoff meeting April 2014, first workshop January 2015) Objectives: sharing expertise, catalyzing common SW projects, promoting collaboration in new developments Website: http://hepsoftwarefoundation.orghttp://hepsoftwarefoundation.org Scientific Linux Current Status Development continues, SL 7.1 released on April 10th 2015 Researching containerization possibilities: Docker image Scientific Linux Project Atomic distro Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary25

26 End User Services & OS (2) SciDB at NERSC Testbed evaluation Cluster of ~20 nodes, normally 100 GB – 1 TB data, even 20+ TB Happy with the results, decided to go with a production-level cluster Lustre at the Sanger Institute 11 Lustre Volumes, 6 PB storage Problems analyzing storage usage Solved by implementing an efficient, parallel file tree walker using MPI Zimbra at DESY Replacement of UNIX mail and Microsoft Exchange 26Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

27 Networking and Security (1) 10 talks in total, 2 from CERN : Adam: Effects of packet loss and delay on TCP performance Romain: Computer Security Update + phishing demonstration IPv6 Working Group Lots of sites still not IPv6-ready (especially T2) Testing and deploying dual-stack services if performance is sufficient Dual-stack perfSONAR should be provided in 2015 perfSONAR Network and Transfer Metric Working Group started in May 2014 OSG datastore – community data store for all perfSONAR metrics --to enter production in Q3 2015 Integrating perfSONAR with FTS and experiments to optimize transfers 27Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

28 Networking and Security (2) WLCG Cloud Traceability Working Group Looking into incident traceability in emerging cloud computing environments Best practices for gathering additional logging informations in cloud frameworks, configuring VMs etc. Operational Security in the EGI and WLCG Security policies: reporting vulnerabilities is essential Only 8 incidents last year, quite successful prevention Now re-working policies to face cloud computing technology threats OSSEC at Scotgrid Glasgow Visualizing with Elasticsearch / Logstash / Kibana 28Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

29 Basic IT Services (1) 7 talks, 3 from CERN: Alberto: Configuration management at CERN: Status and directions Francisco: Towards a modernisation of CERN’s telephony infrastructure Andrei: Updates from Database Services at CERN Config Management at RACF Deployed Puppet Server in production Catalog compilation avg 1.97 sec -> 1.00 sec Looking into Jenkins CI for testing pending production changes MCollective in testing, plans to put it in production MCollective at DESY Succesfully deployed in production for following use cases: Steering Puppet agent runs Querying the infrastructure Small parallel-ssh tasks (e.g. package updates) Performance problems caused by SSH key plugin, now fixed 29Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

30 Basic IT Services (2) Subtlenoise by Lancaster University Small framework to leverage acoustics during monitoring shifts „produces low-impact but information-rich soundscapes in realtime” https://github.com/ptrlv/subtlenoise Update on Quattor Still in development, Quattor 15.2.0 released March 23rd 2015 ~15 institutes participating, over 2500 commits on GitHub in 2014 Active community 30Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

31 Outline 31 2015 Spring Meeting & General HEPiX News Site Reports (17) Grids, Clouds, and Virtualization (8) Storage and File systems (8) Computing and Batch (17) IT Facilities (2) End User Services & Operating Systems (10) Networking and Security (10) Basic IT Services (7) Closing remarks Arne Julien Adam Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

32 HEPiX Board News Next meetings - Autumn 2015: BNL (US) Oct 12 – 16 (to be held jointly with the WLCG GDB) - Spring 2016: DESY Zeuthen (DE) April 18-22 - Autumn 2016: U.S. West Coast candidates, but also other proposals Discussions about swapping the European/US location cycle 32Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

33 Questions? 33Wiebalck, Leduc, Krajewski: HEPiX Spring 2015 Summary

34


Download ppt "HEPiX Spring Meeting 2015 University of Oxford, UK 2 Arne Wiebalck Julien Leduc Adam Krajewski Wiebalck, Leduc, Krajewski:"

Similar presentations


Ads by Google