Configuration Management Evolution at CERN Gavin

Slides:



Advertisements
Similar presentations
What’s New: Windows Server 2012 R2 Tim Vander Kooi Systems Architect
Advertisements

Agile Infrastructure built on OpenStack Building The Next Generation Data Center with OpenStack John Griffith, Senior Software Engineer,
Ben Jones 12/9/2013 NEC'20132.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Testing as a Service with HammerCloud Ramón Medrano Llamas CERN, IT-SDC
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
CONTINUOUS INTEGRATION, DELIVERY & DEPLOYMENT ONE CLICK DELIVERY.
Brainstorming Thin-provisioned Tier 3s.
Getting to Push Button Deploys Moovweb January 19, 2012.
Tools and software process for the FLP prototype B. von Haller 9. June 2015 CERN.
DevOps Jesse Pai Robert Monical 8/14/2015. Agile Software Development 8/14/2015© 2015 SGT Inc.2.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Administrative Technology Services: Enterprise Applications
AI project components: Facter and Hiera
ArcGIS for Server: Reference Implementations
CERN IT Department CH-1211 Genève 23 Switzerland t The CERN Agile Infrastructure Project: Configuration and Operations Tools Helge Meinhard.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks C. Loomis (CNRS/LAL) M.-E. Bégin (SixSq.
Puppet with vSphere Workshop Install, configure and use Puppet on your laptop for vSphere DevOps Billy Lieberman August 1, 2015.
The Art and Zen of Managing Nagios with Puppet Michael Merideth - VictorOps
CERN IT Department CH-1211 Genève 23 Switzerland t Experiences running a production Puppet Ben Jones HEPiX Bologna Spring.
Configuration Management with Cobbler and Puppet Kashif Mohammad University of Oxford.
ArcGIS Server for Administrators
Continuous Integration and Code Review: how IT can help Alex Lossent – IT/PES – Version Control Systems 29-Sep st Forum1.
Jose Castro Leon CERN – IT/OIS CERN Agile Infrastructure Infrastructure as a Service.
1 PUPPET AND DSC. INTRODUCTION AND USAGE IN CONTINUOUS DELIVERY PROCESS. VIKTAR VEDMICH PAVEL PESETSKIY AUGUST 1, 2015.
CERN IT Department CH-1211 Genève 23 Switzerland t The Agile Infrastructure Project Part 1: Configuration Management Tim Bell Gavin McCance.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 Automate your way to.
CERN IT Department CH-1211 Genève 23 Switzerland t IT Configuration Activities Gavin McCance Online Cross-experiment Meeting, 14 June 2012.
Optimal Pipeline Using Perforce, Jenkins & Puppet Nitin Pathak Works on
Agile Infrastructure: an updated overview of IaaS at CERN
1 CERN IT Department CH-1211 Genève 23 Switzerland t Puppet in the CERN CC Tomas Karasek Steve Traylen Oct
Stairway to the cloud or can we take the highway? Taivo Liik.
Tim Bell 04/07/2013 Intel Openlab Briefing2.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Scaling the CERN OpenStack cloud Stefano Zilli On behalf of CERN Cloud Infrastructure Team 2.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Development Workflow of the Configuration Management.
Lecture III: Challenges for software engineering with the cloud CS 4593 Cloud-Oriented Big Data and Software Engineering.
Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Drupal at CERN Juraj Sucik Jarosław Polok.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
Monitoring with InfluxDB & Grafana
Sergey Baranov: PanDA Infrastructure at CERN 3 Sep PanDA Infrastructure at CERN Status Sergey Baranov 3 Sep 2013.
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Next Generation of Apache Hadoop MapReduce Owen
CERN AI Config Management 16/07/15 AI for INFN visit2 Overview for INFN visit.
Copyright © 2004 R2AD, LLC Submitted to GGF ACS Working Group for GGF-16 R2AD, LLC Distributing Software Life Cycles Join the ACS Team GGF-16, Athens R2AD,
Cloud Installation & Configuration Management. Outline  Definitions  Tools, “Comparison”  References.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
Software collaboration tools as a stack of services Borja Aparicio Cotarelo IT-PES-IS 2HEPiX Fall 2015 Workshop.
Configuration Services at CERN HEPiX fall Ben Jones, HEPiX Fall 2014.
Automating operational procedures with Daniel Fernández Rodríguez - Akos Hencz -
CERN IT Systems Management Gavin McCance CERN IT-CM.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES GIT Service in the Agile Infrastructure Project Vítor.
Alain Bethuyne Web Security Architect BNPParibas Fortis
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
WLCG Workshop 2017 [Manchester] Operations Session Summary
@ Bucharest DevOps Hacker Meetup
IT Services Katarzyna Dziedziniewicz-Wojcik IT-DB.
Docker Birthday #3.
Dovetail project update
Puppet
Drupal VM and Docker4Drupal For Drupal Development Platform
Drupal VM and Docker4Drupal as Consistent Drupal Development Platform
X in [Integration, Delivery, Deployment]
Scaling Puppet and Foreman for HPC
Simplified Development Toolkit
Presentation transcript:

Configuration Management Evolution at CERN Gavin

Agile Infrastructure Why we changed the stack Current status Technology challenges People challenges Community 14/10/2013 CHEP 20133

Why? Homebrew stack of tools Twice the number machines, no new staff New remote data-centre Adopting more dynamic Cloud model “We’re not special” Existence of open source tool chain: OpenStack, Puppet, Foreman, Kibana Staff turnover Use standard tools – we can hire for it and people can be hired for it when they leave 4 14/10/2013 CHEP 2013

Agile Infrastructure “stack” Our current stack has been stable for one year now See plenary talk at last CHEP (Tim Bell et al) Virtual server provisioning – Cloud “operating system”: OpenStack -> (Belmiro, next) Configuration management – Puppet + ecosystem as configuration management system – Foreman as machine inventory tool and dashboard Monitoring improvements Flume + Elasticsearch + Kibana -> (Pedro, next++) 14/10/2013 CHEP 20135

Puppet Puppet manages nodes’ configuration via “manifests” written in Puppet DSL All nodes check in frequently (~1-2 hours) and ask for configuration Configuration applied frequently to minimise drift Using the central puppet master model..rather than masterless model No shipping of code, central caching and ACLs 14/10/2013 CHEP 20136

Separation of data and code Puppet “Hiera” splits configuration “data” from “code” Treat Puppet manifests really as code More reusable manifests Heira is quite new: old manifests are catching up Hiera can use multiple sources for lookup Currently we store the data in git Investigating DB for “canned” operations 14/10/2013 CHEP 20137

Modules and Git Manifests (code) and hiera (data) are version controlled Puppet can use git’s easy branching to support parallel environments Later… 8 14/10/2013 CHEP 2013

Foreman Lifecycle management tool for VMs and physical servers External Node Classifier – tells the puppet master what a node should look like Receives reports from Puppet runs and provides dashboard 9 14/10/2013 CHEP 2013

10 14/10/2013 CHEP 2013

14/10/2013 CHEP

Deployment at CERN Puppet 3.2 Foreman 1.2 Been in real production for 6 months Over 4000 hosts currently managed by Puppet SLC5, SLC6, Windows ~100 distinct hostgroups in CERN IT + Expts New EMI Grid service instances puppetised Batch/Lxplus service moving as fast as we can drain it Data services migrating with new capacity AI services (Openstack, Puppet, etc) 14/10/2013 CHEP

Key technical challenges Service stability and scaling Service monitoring Foreman improvements Site integration 14/10/2013 CHEP

Scalability experiences Most stability issues we had were down to scaling issues Puppet masters are easy to load-balance We use standard apache mod_proxy_balancer We currently have 16 masters Fairly high IO and CPU requirements Split up services Puppet – critical vs. non critical backend nodes “Bulk” 4 backend nodes “Interactive” 14/10/2013 CHEP 2013

Scalability guidelines 15 14/10/2013 CHEP 2013

Scalability experiences Foreman is easy to load-balance Also split into different services That way Puppet and Foreman UI don’t get affected by e.g. massive installation bursts 16 ENC Reports processing UI/API Load balancer 14/10/2013 CHEP 2013

PuppetDB All puppet data sent to PuppetDB Querying at compile time for Puppet manifests e.g. configure load-balancer for all workers Scaling is still a challenge Single instance – manual failover for now Postgres scaling Heavily IO bound (we moved to SSDs) Get the book 14/10/2013 CHEP

Monitor response times Monitor response, errors and identify bottlenecks Currently using Splunk – will likely migrate to Elasticsearch and Kibana 14/10/2013 CHEP

Upstream improvements CERN strategy is to run the main-line upstream code Any developments we do gets pushed upstream e.g Foreman power operations, CVE reported 14/10/2013 CHEP Foreman Proxy Physical box IPMI VM Openstack Nova API IPMI VM

Site integration Using Opensource doesn’t get completely get you away from coding your own stuff We’ve found every time Puppet touches our existing site infrastructure a new “service” or “plugin” is born Implementing our CA audit policy Integrating with our existing PXE setup and burn- in/hardware allocation process - possible convergence on tools in the future – Razor? Implementing Lemon monitoring “masking” use-cases – nothing upstream, yet.. 14/10/2013 CHEP

People challenges Debugging tools and docu needed! PuppetDB helpful here Can we have X’, Y’ and Z’ ? Just because the old thing did it like that, doesn’t mean it was the only way to do it Real requirements are interesting to others too Re-understanding requirements and documentation and training Great tools – how do 150 people use them without stepping on each other? Workflow and process 14/10/2013 CHEP

22 Use git branches to define isolated puppet environments Your special feature My special feature QA Production Your test box Most machines “QA” machines 14/10/2013 CHEP 2013

23 Easy git cherry pick 14/10/2013 CHEP 2013

Git workflow

Git model and flexible environments For simplicity we made it more complex Each Puppet module / hostgroup now has its own git repo (~200 in all) Simple git-merge process within module Delegated ACLs to enhance security Standard “QA” and “production” branches that machines can subscribe to Flexible tool (Jens, to be open-sourced by CERN) for defining “feature” developments Everything from “production” except for the change I’m testing on my module 14/10/2013 CHEP

Strong QA process Mandatory QA process for “shared” modules Recommended for non-shared modules Everyone is expected to have some nodes from their service in the QA environment Normally changes are QA’d for at least 1 week. Hit the button if it breaks your box! Still iterating on the process Not bound by technology Is one week enough? Can people “freeze”? 14/10/2013 CHEP

Community collaboration 27 Traditionally one of HEPs strong points There’s a large existing Puppet community with a good model - we can join it and open-source our modules New HEPiX working group being formed now Engage with existing Puppet community Advice on best practices Common modules for HEP/Grid-specific software ent ent 14/10/2013 CHEP 2013

14/10/2013 CHEP for the modules we share Pull requests welcome!

Summary The Puppet / Foreman / Git / Openstack model is working well for us 4000 hosts in production, migration ongoing Key technical challenges are scaling and integration which are under control Main challenge now is people and process How to maximise the utility of the tools The HEP and Puppet communities are both strong and we can benefit if we join them together 14/10/2013 CHEP

Backup slides 14/10/2013 CHEP

14/10/2013 CHEP Bamboo Koji, Mock AIMS/PXE Foreman AIMS/PXE Foreman Yum repo Pulp Yum repo Pulp Puppet-DB mcollective, yum JIRA Lemon / Hadoop Lemon / Hadoop git OpenStack Nova OpenStack Nova Hardware database Puppet Active Directory / LDAP Active Directory / LDAP