Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012.

Similar presentations


Presentation on theme: "Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012."— Presentation transcript:

1 Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

2 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Outline Introduction – CERN – Physics at the LHC – LHC machine and detectors – Data processing challenges – WLCG – CERN computer centre Past and present (phase I): CERN virtualisation infrastructure, service consolidation, lxcloud Present and future (phase II): remote data centre, new tool suite, IaaS Virtualisation, clouds, IaaS - June 2011 2 Helge Meinhard

3

4 Hubble ALMA VLT AMS Atom Proton Big Bang Radius of Earth Radius of Galaxies Earth to Sun Universe cm Study physics laws of first moments after Big Bang increasing Symbiosis between Particle Physics, Astrophysics and Cosmology Super-Microscope LHC Virtualisation, clouds, IaaS - June 2011 4

5 Enter a New Era in Fundamental Science The Large Hadron Collider (LHC), one of the largest and truly global scientific projects ever, is the most exciting turning point in particle physics. Exploration of a new energy frontier LHC ring: 27 km circumference CMS ALICE LHCb ATLAS Virtualisation, clouds, IaaS - June 2011 5

6 6

7 The LHC Computing Challenge Signal/Noise: 10 -13 (10 -9 offline) Data volume High rate * large number of channels * 4 experiments 22 PetaBytes of new data each year Compute power Event complexity * Nb. events * thousands users 200 k CPUs 45 PB of disk storage Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere GRID technology

8 Worldwide LHC Computing Grid Tier 0: CERN Tier 0: CERN Data acquisition and initial processing Data acquisition and initial processing Data distribution Data distribution Long-term curation Long-term curation Tier 1: 11 major centres Tier 1: 11 major centres Managed mass storage Managed mass storage Data-heavy analysis Data-heavy analysis Dedicated 10 Gbps lines to CERN Dedicated 10 Gbps lines to CERN Tier 2: More than 200 centres in more than 30 countries Tier 2: More than 200 centres in more than 30 countries Simulation Simulation End-user analysis End-user analysis Tier 3: from physicists desktops to small workgroup cluster Tier 3: from physicists desktops to small workgroup cluster Not covered by MoU Not covered by MoU Tier3 physics department Desktop Germany USA UK France Italy Taiwan Nordic Countries Nether- lands CERN Tier 0 Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x grid for a physics study group Spain Tier 1 grid for a regional group

9 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it The CERN Data Centre in Numbers Data Centre Operations (Tier 0) – 24x7 operator support and System Administration services to support 24x7 operation of all IT services. – Hardware installation & retirement ~7,000 hardware movements/year; ~1800 disk failures/year – Management and Automation framework for large scale Linux clusters High Speed Routers (640 Mbps 2.4 Tbps) 24 Ethernet Switches350 10 Gbps ports2000 Switching Capacity4.8 Tbps 1 Gbps ports16,939 10 Gbps ports558 Racks828 Boxes11,728 Processors15,694 Cores64,238 HEPSpec06482,507 Disks64,109 Raw disk capacity (TiB)63,289 Memory modules56,014 Memory capacity (TiB)158 RAID controllers3,749 Tape Drives160 Tape Cartridges45000 Tape slots56000 Tape Capacity (TiB)34000 IT Power Consumption2456 KW Total Power Consumption3890 KW Helge Meinhard 9 Virtualisation, clouds, IaaS - June 2011

10 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Functionality drill-down Clusters – sets of machines with identical configuration different from other clusters Virtualisation, clouds, IaaS - June 2011 10 Helge Meinhard

11 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Problem Statement (Phase I) Small clusters – Far too many clusters and too many managers – Small size of some clusters makes disruptive upgrades very difficult OS/software upgrades HW life cycle management – Many servers poorly used Large clusters – Effective, efficient management is a must Virtualisation addresses part of these problem Virtualisation, clouds, IaaS - June 2011 11 Helge Meinhard

12 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Phase I: CERN Virtualisation Infrastructure (1) Addressing the small cluster problem Custom virtual machines in the CERN computer centre – VMs have a long-term lifetime of months/years User kiosk for requesting a VM in less than 30 mins Based on Microsofts System Center Virtual Machine Manager on top of Hyper-V – Enterprise class centralized management – Rich feature set: Allows grouping of hypervisors, with delegation of administrative privileges VM migration, High availability Checkpoints PowerShell Snap-In for administration / scripting Hardware implementation using cells of blade servers and redundant iSCSI arrays Virtualisation, clouds, IaaS - June 2011 12 Helge Meinhard

13 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Phase I: CERN Virtualisation Infrastructure (2) Why SCVMM/Hyper-V? – Only cost-effective solution for CERN at the time offering required advanced management functionality Current status – Checkpointing implemented – Hypervisors upgraded to Win 2008 R2 SP1 – Dynamic memory allocation allowing for overcommitting memory Virtualisation, clouds, IaaS - June 2011 13 Helge Meinhard 42% Windows VMs 58% Linux VMs Feb 2012: 2450 VMs on 350 hypervisors Nov 2010: 680 VMs on 170 hypervisors

14 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Phase I: Service Consolidation More than 600 Linux machines in CVI run as fully managed machines for physics services – Installation, configuration, monitoring We offer managed CERN Linux VMs with (some) combinations of: – 1-4 CPUs – 1-8 GB memory – 100-2000 GB disk – 1Gbps paravirtualized network 100Mbps during installation CPU, disk and network are happily overcommitted – Typical physical CPU usage on hypervisors < 30% – Typical physical network usage < 2% – Real disk usage vs. committed capacity < 20% Memory is not overcommitted Current statistics: – 627 VMs – 1245 virtual CPUs – 3235 GB memory – 59TB disk used (out of 265TB allocated) Virtualisation, clouds, IaaS - June 2011 14 Helge Meinhard

15 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Phase I: lxcloud (1) Addressing the large cluster problem Aims: – Dynamic provisioning of resources to users of our large batch computing service SLC 5 vs. SLC 6 User group-specific customisations of environment – Test provisioning of a generic cloud interface (EC2) to selected users Hardware: O(60) physical batch worker nodes (out of 4000) with local storage Fully managed SLC 6 machines with KVM and KSM Virtualisation, clouds, IaaS - June 2011 15 Helge Meinhard

16 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Phase I: lxcloud (2) Image repository, image distribution mechanism – Images for virtual batch servers derived from fully managed golden nodes – User-supplied images for EC2 interface – Internal distribution to hypervisors via torrent- like mechanism – Sharing images across (WLCG) sites discussed in context of HEPiX VM provisioning system: OpenNebula 3.2 – Looking at OpenStack (see later) Virtualisation, clouds, IaaS - June 2011 16 Helge Meinhard

17 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Phase II: New challenges CERN data centre is reaching its limits IT staff numbers remain fixed but more computing capacity is needed Tools are high maintenance and becoming increasingly brittle Inefficiencies exist but root cause cannot be easily identified Virtualisation, clouds, IaaS - June 2011 17 Helge Meinhard

18 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it CERN Data Centre More capacity needed for the processing of LHC data For various reasons, not possible to provide additional capacity at CERN 2010: calls for expression of interest among CERN member states; 2011: call for tender; 2012: adjudication: Wigner Institute in Budapest/Hungary Timescales: Prototyping in 2012, testing in 2013, production in 2014 This will be a hands-off facility for CERN – Only smart hands there, everything else done remotely Disaster Recovery for key services in primary CERN data centre becomes a realistic scenario Virtualisation, clouds, IaaS - June 2011 18 Helge Meinhard

19 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Usage Model(s) Various possible models of usage – sure to evolve – The less specific the hardware installed, the easier to change function Vision: Run both massively scaled services (kettle) and carefully set-up special services (pets) as virtual machines on top of kettle style hypervisors Virtualisation, clouds, IaaS - June 2011 19 Helge Meinhard

20 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Infrastructure Tools Evolution (1) We had to develop our own toolset in 2002 – Installation, configuration, monitoring Nowadays, – CERN compute capacity is no longer leading edge – Many options available for open source fabric management – We need to scale to meet the upcoming capacity increase If there is a requirement which is not available through an open source tool, we should question the need – If we are the first to need it, contribute it back to the open source tool Large community out there taking the tool chain approach whose scaling needs match ours: O(100k) servers and many applications – Many small tools for specific purposes linked together Easy to exchange one tool with an alternative one Virtualisation, clouds, IaaS - June 2011 20 Helge Meinhard

21 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Infrastructure Tools Evolution (2) Virtualisation, clouds, IaaS - June 2011 21 Helge Meinhard

22 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Infrastructure Tools Evolution (3) Configuration management: – Using off the shelf components Puppet – configuration definition Foreman – GUI and Data store Git – version control Mcollective – remote execution – Integrated with CERN Single Sign On CERN Certificate Authority Installation Server Virtualisation, clouds, IaaS - June 2011 22 Helge Meinhard

23 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Infrastructure as a Service Goals – Improve repair processes with virtualisation – More efficient use of our hardware – Better tracking of usage – Enable remote management for new data centre – Support potential new use cases (PaaS, Cloud) – Sustainable support model At scale for 2015 – 15,000 servers – 90% of hardware virtualized – 300,000 VMs needed Virtualisation, clouds, IaaS - June 2011 23 Helge Meinhard

24 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Openstack Open source cloud software Supported by 173 companies including IBM, RedHat, Rackspace, HP, Cisco, AT&T, … Vibrant development community and ecosystem Infrastructure as a Service to our scale Started in 2010 but maturing rapidly Virtualisation, clouds, IaaS - June 2011 24 Helge Meinhard

25 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Openstack at CERN (1) Virtualisation, clouds, IaaS - June 2011 25 Helge Meinhard Compute Scheduler Network Volume Registry Image KEYSTONE HORIZON NOVA GLANCE

26 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Openstack at CERN (2) Multiple uses of IaaS – Server consolidation – Classic batch (single or multi-core) – Cloud VMs such as CERNVM Scheduling options – Availability zones for disaster recovery – Quality of service options to improve efficiency such as build machines, public login services – Batch system scalability is likely to be an issue Accounting – Use underlying services of IaaS and Hypervisors for reporting and quotas Virtualisation, clouds, IaaS - June 2011 26 Helge Meinhard

27 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Monitoring Action needed – >30 monitoring applications Number of producers: ~40k Input data volume: ~280 GB per day – Covering a wide range of different resources Hardware, OS, applications, files, jobs, etc. – Application-specific monitoring solutions Using different technologies (including commercial tools) Sharing similar needs: aggregate metrics, get alarms, etc – Limited sharing of monitoring data Hard to implement complex monitoring queries Virtualisation, clouds, IaaS - June 2011 27 Helge Meinhard

28 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Monitoring: New Architecture Virtualisation, clouds, IaaS - June 2011 28 Helge Meinhard Messaging Broker Storage Consumer Storage Consumer Producer Sensor Producer Sensor Storage and Analysis Engine Storage and Analysis Engine Operations Tools Operations Tools Operations Consumers Operations Consumers Producer Sensor Producer Sensor Producer Sensor Producer Sensor Dashboards and APIs Dashboards and APIs Apollo Lemon Hadoop Splunk

29 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Current tool snapshot (subject to change!) Virtualisation, clouds, IaaS - June 2011 29 Helge Meinhard Jenkins Koji, Mock Puppet Foreman AIMS/PXE Foreman AIMS/PXE Foreman Yum repo Pulp Yum repo Pulp Puppet stored config DB mcollective, yum JIRA Lemon git, SVN Openstack Nova Hardware database

30 Timelines YearWhatActions 2012Prepare formal project plan Establish IaaS in CERN Data Centre Monitoring Implementation as per WG Migrate lxcloud users Early adopters to use new tools 2013LS 1 New Data Centre Extend IaaS to remote Data Centre Business Continuity Migrate CVI users General migration to new tools with SLC6 and Windows 8 2014LS 1 (to November) Phase out legacy tools such as Quattor Virtualisation, clouds and IaaS - June 201230Helge Meinhard

31 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Conclusions Remote T0 and other challenges require us to re-think the way we run the computer centre and our services Virtualisation has proved to be the right way forward (CVI/service consolidation and lxcloud) Now unifying on single tool (Openstack) and going much further – Coverage of machines and services – Tool chain for installation, configuration, monitoring, IaaS – Proof of concept done rapidly, very successful People highly motivated Virtualisation, clouds, IaaS - June 2011 31 Helge Meinhard

32 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it More information HEPiX Agile Infrastructure Talks – http://cern.ch/go/99Ck http://cern.ch/go/99Ck Tier-0 Upgrade – http://cern.ch/go/NN98 http://cern.ch/go/NN98 Other info or contacts… Helge.Meinhard (at) cern.ch Helge.Meinhard (at) cern.ch Virtualisation, clouds, IaaS - June 2011 32 Helge Meinhard

33 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Acknowledgements Numerous colleagues and collaborators at CERN, including – Ian Bird – Tim Bell – Gavin McCance – Ulrich Schwickerath – Alexandre Lossent – Jose Castro Leon – Jan van Eldik – Belmiro Moreira Virtualisation, clouds, IaaS - June 2011 33 Helge Meinhard

34 Jenkins Koji, Mock Puppet Foreman AIMS/PXE Foreman AIMS/PXE Foreman Yum repo Pulp Yum repo Pulp Puppet stored config DB mcollective, yum JIRA Lemon git, SVN Openstack Nova Hardware database Thank you Helge MeinhardVirtualisation, clouds, IaaS - June 201134


Download ppt "Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012."

Similar presentations


Ads by Google