An epic journey of infrastructure modernization for a DevOps world

An epic journey of infrastructure modernization for a DevOps world
Flight of the Hippo: An epic journey of infrastructure modernization for a DevOps world I am going to tell you a story today about the last few years at Yale: About our stops and starts in the DevOps world, And about all the steps along the way. We will call it, “Flight of the Hippo”, and it will be incredible! -J. Greg Mackinnon Windows Technical Lead | Windows Services Information Technology Services Yale University

Expedition Overview: Prologue: Pre-flight check.......Welcome to Yale
Ch 1: Flight plan The DevOps/Cloud Initiative Ch 2: Pilot training DevOps: What it is (not) Ch 3: Crash! The fall of Spinup Ch 4: Back to flight school .....The Hippo takes flight Ch 5: Spinup returns! DevOps at last? Epilogue: The adventure continues… Here is what will be covered... Let’s take another poll. Who wants to here more about DevOps? More about infrastructure improvements (SCCM, PowerShell)? I want to point out that there will be a great deal of exaggeration, hyperbole, glossing over of details, An quite a bit of not giving credit where credit is due. MANY people contributed to Yale’s DevOps initiative, me perhaps least of all.

Audience Poll Who knows what DevOps actually is?
Who is part of a DevOps group? (a real DevOps group) Have you ever: used Git? (Or another modern version control program, such as Hg?) Have you ever submitted a pull request?

Prologue: Welcome to Yale
A department suffering from massive technical debt: 1100 server instances 60% Server 2008 R2 7% Server 2003 >100 application workloads, mix of SQL and Oracle databases SCCM 2007, with a handful of packages SCOM 2007, no notifications configured “OpsView” Monitoring a rebranded Nagios* Forked NSClient++ agent using deprecated NRPE checks

Prologue: Welcome to Yale (cntnd)
WSUS patching solution 30 days to production compliance VMware virtualization platform 30% workloads on off-support ESXi 4 hosts Remaining workloads on ESXi 5.5, unpatched. Four full-time Windows systems administrators Architecture group (Design Services) not involved in ongoing engineering or operational details. New “Cloud Engineering” team also sheds operational responsibilities. Yale... The school that invented CAS and uPortal (Campus Pipeline). Often quite visionary. I love Yale, I grew up at Yale, and thus I am allowed to criticize the current state of affairs in systems administration. We are a big, heavy hippo. Strong, determined, and really hard to move.

Chapter 1: The DevOps/Cloud Initiative
“Yale will be a zero-datacenter institution” Vision announced: SaaS -> PaaS -> IaaS* (On prem virtual or physical server as last resort) Infrastructure as Code: Servers should be “Cattle not Pets*” Reorganizations: Systems and Database Administration -> DevOps Engineering Automation Initiatives: Architects charged with implementation of Self-Service: The Spinup Tool Remaining Sys Admins DevOps engineers will automate “All the Things” “Cloud Architect” hired for the Architecture group to develop a new cloud architecture and to “bring self-service to Yale”. But in Engineering/Ops, no new permanent staffing resources are made available. Offshore support pilot is initiated to reduce “toil” tasks and to reduce trauma of on-call weeks. We are being asked to make the hippo fly!

Chapter 2: Pilot Orientation (What is “DevOps”?)
The use of specific processes and tools that merge Agile software development practices with systems platforms, resulting in faster and more reliable software delivery. A flexible combination of culture, processes, and tools to streamline delivery of software or services. An vague term describing practices that make IT processes faster and more adaptable. Not as good as BizDevNetSecOps*. Just another industry buzzword, who cares? Straw poll: How do you feel about DevOps? Which definition sounds right to you? Do you have another one? Yale’s DevOps implementation considers DevOps as being more like option 2 or 3, as no career developers were engaged for the initiative, and no in-house applications were targeted for initial deployment within a DevOps framework. Our initial targets for DevOps service was self-service delivery of OS platforms and generic services that run on them, such as databases or web servers.

DevOps: Formula for Success:
Prepare all Stakeholders: Devs (Or Vendors, Or Architects) Willing to change tools? Willing to engage with Ops/QA? Quality Assurance: Change Management: Accepting of Agile? ISO: Willing to accept process changes? Operations: Must engage... DBAs Storage Admins Networking (SDN!) Sys Admins, Virtualization, Cloud Platform “Prepare the catcher” analogy. Now imagine that the ball is a baby joke. Developers, QA, and Operations staff all need to be on board in order for DevOps to succeed. Each group brings its own set of concerns to the table: Devs: Want faster builds and delivery. Don’t want to be stonewalled by QA and Ops. QA: Wants no unnecessary risk (sometimes appears to see all risk as unnecessary) Ops: Wants stability. (Often seems disinterested in other concerns) In order for DevOps to succeed, all parties must acknowledge each others’ needs and concerns! There must also be a willingness to change, and commitment to success. Full engagement of all constituents is a must!

Chapter 2: Spinup - A DevOps Project
Spinup “Sandbox” (version 1): A DevOps self-service server provisioning tool. Project is initiated by the Architecture (Dev) Group Deployed using: Repurposed hardware Entirely new tooling (OpenStack*, Puppet, Foreman, Ansible) Ongoing operation and maintenance of this service is to be transferred to existing Operational and Engineering teams. Whether to current or new staff remains to be determined. Show the slides, then ask for audience feedback. What went wrong? Operational handoff issues: Spinup used OpenStack on KVM (virtualization) and Ceph (Storage), using only one subnet. Deployed with Puppet and Foreman, code in GitHub. (Windows integration utilized OpenStack “user space” and Ansible) (Note that at this time, Ansible cannot be run from a Windows machine. Current Windows staff has no experience with Python or Ruby scripting, no exposure to YAML, no experience with Git or revision control.) Existing Linux team used Chef, Jenkins, GitHub, Vmware, NetApp storage, F5 load balancers. Windows team used same inf. As Linux, minus Chef, Jenkins, Git What’s wrong here?

Spinup v1: What Went Right?
Process: “Agile Lite” approach: Allowed people to participate based on availability Allowed work to be accomplished in small, readily digestible chunks. People: Over time, team leadership (Architecture) became more receptive to feedback from Engineering / Operations) Tools: Exposed staff to the DevOps tool landscape Started a dialog about how we choose DevOps is a combination of people, process, and tools. What did we get right on those fronts? But It wasn’t enough: Service instability becomes a major issue Service ultimately is shuttered

Spinup Crash: What Went Wrong
People/Culture: No clear strategy or leadership (The self-service mandate handed to architecture was not shared with Engineering/Ops) Constant push for new features over addressing bugs and backlog (values conflict) Tools: Engineering could not reach consensus on tools, languages, coding practices. Constant “Bike-shedding”* Budgeting: Working assumption: “If you build it, they will come.” Money was expected to arrive for support staff. No such funds became available. Architect staff time was consumed in the servicing of failing hardware Additional tooling issues included: Assertion that Windows admins should use Chef for ongoing operations Assertion that Chef was inadequate for Windows, and that we should use Ansible and Puppet instead. Debates on the merits of “Chocolatey” as a tool for server application deployment. Debates about the use of BigFix for server management. People/Culture issues: Plenty of blame to go around. (The man who points his finder at his neighbor also point three fingers at himself.) Windows admins were not ready for these changes, and passively resisted DevOps though: Dis-interest Non-participation General refusal to adapt. (Dinosaur Syndrome) “Chronic” reorganization (3 reorgs in two years) Lack of clarity around territory and ownership (power games)

Spinup Crash: What Went Wrong (for the Windows team):
This Hippo won’t fly: Failure to recognize outstanding technical debt as the leading resource constraint!!! [Insert platitude here] Distractions: Major security incident Legacy services retirement projects (Server 2003, Windows file services) Datacenter migration project VMware infrastructure issues Platitudes: Don’t build your castles of sand. You can’t help others if you can’t help yourself. In addition to previously discussed debt, we had these new distractions as well.

Chapter 4: The Hippo goes to flight school
A new strategy… Pay down the technical debt: Update the tooling Secure the environment Update the infrastructure Retrain: Revision control Code review Team Building: Forge trust-bridges with more constituents.

Chapter 4, Scene 1: Paying down the debt
Use the tools that you know to put your house in order Use new techniques when possible to prepare for DevOps Which magical tool can do this? System Center Configuration Manager Oh no, not SCCM! How is that DevOps? It’s not, but it serves a need: ongoing server maintenance (software delivery and updates), fixed maintenance windows Inventory and reporting. ….and it can be adapted to work (sort-of) within a DevOps framework.

SCCM? Isn’t that “old school?”
Yup. But we need to manage AV and Software Updates anyway. And this is SCCM “current branch” New features twice per year: OMS integration Remote script execution It’s the “gateway to the cloud”. Presenting a business case

Deliver required infrastructure updates:
.NET Framework >= 4.5.1 Windows Management Framework >= 5.1 (Enables PowerShell 5.1, DSC, and JEA) Missing OS and SQL Service Packs Missing Historical Updates: (Especially “Visual C++ Runtime”) Enable on-demand OS upgrades using SCCM OSD Having a functional configuration management tool makes it possible to bring the infrastructure up to date. Having up-to-date inf. Proves the value of the tool, build credibility with management. Modern OS + Current management components + timely patching = less time fighting fires, more time for infrastructure improvements.

Infrastructure Updates: VMware
vSphere 6.5 upgrade: Performance improvements with storage integration APIs Delivers REST API and containers Infrastructure changes: Large volume support

Chapter 4, Scene 2: The Hippo learns to code
DevOps tools implement “Infrastructure as Code”: Disciplined code control is a necessity Traditional Windows administration is light on code. (Or even code-free!) Scripts tend to be “personal”, and disposable. Often implemented using legacy languages (VBScript, Perl, PowerShell v1/2) Code sharing and collaboration are not the norm Let’s start learning... If we are to bridge the gap wit the Dev world, we need to start learning how they do things...

SCCM Scripting: Better with Git and PowerShell
SCCM environments can house hundreds or thousands of scripts. Without standards and collaboration these become unmanageable. Distributed authoring and revision control using Git. Use branching and pull requests to improve standardization and group learning. Standardize the code:* Comment-based help Use script templates Use common modules Use PSScriptAnalyzer Capture the non-code elements of CM components using code:* Get-CMApplication | Export-CMApplication Get-CMConfigurationItem | Export-CMConfigurationItem Get-CMBaseline | Export-CMBaseline *See “Additional Resources” for links to this code

SCCM: Application Development Workflow
Discussion of Git pull/push and branching logic, and how it is incorporated in our SCCM workflows. We are not using a CI service at this time... That would be a bit overkill, since there is no easy way to automate all of the steps in the creation of an SCCM application package at this time.

Chapter 5: Spinup Returns!
Version 2.0: Re-engineered with a more compact tool set: Foreman and Puppet scrapped, deferring to existing Chef and SCCM tools. OpenStack scrapped, Vmware and AWS integration are preferred. Two flavors: “Spinup::Managed” lands first, with systems administrators as the end-users Revised Agile process more formalized goals, better engagement, and more designated staff. Utilizes newly stabilized Virtualization infrastructure. Modest initial feature set: “Just address the general deployment process.” “Spinup::Self-Service” lands next No on-prem infrastructure requirements (AWS only). Time for a review: Which DevOps problems have been corrected from the previous version of Spinup?

Spinup 2 <-> SCCM integration
AD attribute-based collections*: Automate patch window selection Automate application deployment Although not a perfect union, we are able to integrate SCCM into Spinup’s DevOps workflows: Prior work to map patch maintenance windows into AD computer account attributes was helpful here... It is easier both for the AD admins and the Spinup developers to manage. We just delegate rights to Create/Delete computers, and to write to desired extensionAttributes Spinup devs don’t have to deal with moving computer objects or managing AD group memberships. They like updating attributes! Although, it is worth noting that reporting on maintenance windows is a bit of a pain now. *Scripting creation of collections was site-specific, but I will share the code on request.

Windows elements of Spinup 2:
AWS tooling: SSM service and SSM Agent Active Directory Connector VMware tooling: Built from “Fog API”, moving towards vSphere 6.5 REST API Uses temporary “Customization Specifications” Runs Sysprep Further work to make Windwos behave in the DevOps Spinup environment.

Epilogue: Edging towards “true” DevOps
People: Windows admins are only starting to embrace the DevOps culture: Who is at fault? Who needs to change? Can the existing team become DevOps (fly), or do we have to maintain status quo (float) Tools: Orchestration of actions across Windows servers remains a weakness Not likely to deploy System Center Orchestrator SCCM still takes hours to converge to a desired state. We would prefer to use a modern desired state tool to ensure fast and consistent deployments. Possible Futures in: Azure OMS and Azure Automation Chef and Chef Automation AWS SSM Let’s revist our elements of DevOps... People, Process, and Tools. How have we improved? People: Working on this presentation has afforded me some opportunity for introspection... Who are these Windows admins of whom I speak? That would be me. So who needs to change? Well, DevOps advocates still are failing to make a compelling case for how their tooling and lifestyle benefits existing infrastructure. Too much emphasis on startups... Yale is not NetFlix! But still, the “three fingers” still point back at myself. Maybe it is up to us to determine how DevOps can help us. In order to make this determination, we need to be willing to learn. Tools: IT’S TOO SLOW! ....And it is going to be a heavy lift to get to something different.

Epilogue: …still grounded
Making Hippos fly is hard work: For now we must be content to float. Legacy infrastructure needs will persist for years: Bespoke / Made-to-Order server management needs Consistent patch deployment Reporting for compliance: malware scanning, patch state, config items. Hopefully, these efforts to make legacy infrastructure more “agile” one day will allow us to fly. Cue Monty Python: “They don’t so much fly as plummet.”

The End Questions? Comments? Contact: .............. Greg Mackinnon
(work): (personal): LinkdIn: j-greg-mackinnon Twitter: not so much

Urban Footnotes: Nagios: Cattle-not-Pets: SaaS -> PaaS -> IaaS
That free monitoring tool that still can’t monitor Windows to save its life. Cattle-not-Pets: A new way to insult the practices of systems administrators. “You sure wuv your widdle puddy-tat there don’t-cha?” See also “Snowflake”. SaaS -> PaaS -> IaaS A presumed hierarchy of solution preferability: Software as a Service is preferred over Platform as a Service Platform as a Service is preferred over Infrastructure as a Service (Where is just “Infrastructure”? Well, if you have to ask...) Critical Non-Value Work A semi-polite way for DevOps to tell Security that they are not allowed to interfere with productivity. But is it really better to tell anyone that their work does not add value? Ouch! BizDevNetSecOps Just me being silly. BizDevNetSecOps is not a thing. We only have DevSecOps, BizDevOps, and NetDevOps. Or did I just invent a new paradigm?

Urban Footnotes: OpenStack: Foreman: Bike-Shedding:
Really just a collection of APIs that interface with and abstract other infrastructure management products such as KVM (or other Hypervisors), Ceph (Or other storage solutions), and others. Does not quite qualify as IaaS, so what is it doing here? Foreman: A neat tool that abstracts infrastructure components to enable automated management of IaaS. Sits on top of OpenStack and other tools. An abstraction of an abstraction? Our Spinup v1 tool sat on top of this. An abstraction of an abstraction of an abstraction? This is just like “Inception”! *BRAAM* Bike-Shedding: A problem summarized by the phrase: “Everyone agrees that we need a bike shed. Why are we fighting about the color?”

Additional Resources:
SCCM Accelerator Scripts:

Extra slides...

DevOps: Formula for Failure
Rebranding engagement failures: Rebrand your Dev teams as the DevOps team (with no staffing changes or task re-prioritization) Rebrand your Ops teams as the DevOps team Other engagement failures: Declare yourself a DevOps organization without engaging QA/ISO (Example: “Critical Non-Value Work*” classification for security issues) Neglect to engage other critical stakeholders in your organization. Planning failures: “Greenfield the Superfund site”. Rebranding: Aspirational statements are fine, but you cannot actually change exclusively though aspiration. Real conversion from traditional SysOps to DevOps requires extensive training and a major shift in culture. Note here when labeling “ops” as “sysops”, that there is a major change in general approach to computing. In DevOps, there is a lot of focus on deployment technologies, and much less focus on ongoing management. Why? 1. Because that is how the tools work, and 2. Because that’s the philosophy. If you automate all aspects of deployment, you could, in theory, redeploy all your servers every patch cycle. But if you fall short of this mark, you will be burdened with “legacy’ management overhead in addition to DevOps management. Daunting! Engagement: DevOps can be perceived as threatening to existing organizational structures. “You are trying to bypass us!” is a common reaction to declared DevOps projects. DevOps tooling may require new hardware (such as vendor-supported OpenStack equipment, or AzureStack clusters), entirely new virtualization stacks and datacenter automation technologies (OpenStack, AWS, Azure), and automation tools (Puppet, Ansible, Jenkins, etc.), coding practices (testing frameworks, Git, Continuous Integration services). The training demands can be massive. Failure to consider how you are going to support all this newness can be disastrous. Can you manage the servers and storage? Can you deal with the security implications? Existing service considerations: DevOps is sufficiently different that there is no established or easy way to convert existing legacy infrastructure and applications to DevOps-mode operations. If you are in the middle of cleaning up a toxic waste spill, would you build a luxury golf course on top of it? Before the cleanup was done? Similarly, you can’t convert to DevOps operations while you are up to your ears in legacy infrastructure debt. (At least, not without additional resources.)

Eliminate point-and-click workflows: SCCM “on rails”
SCCM deployments consist of: Metadata (~17 clicks) Installer binaries (~18 clicks) Installer scripts (tappity-tappity-tap) Requirements logic (~12 clicks) Dependency logic (~10 clicks) Detection logic (~16 clicks) Return Codes (~8 clicks) Supersedence rules (~10 clicks) Deployment rules (~32 clicks) Some built-in options for reducing errors: Copy (clone) existing applications Automated Deployment Rules (ADRs) Deployment Templates 123 Clicks! How many possible clicks in an SCCM deployment?

Streamline SCCM with Powershell
Improved ADR scheduling: Invoke-CMSoftwareUpdateAutoDeploymentRule Configuration of Cumulative Update timeouts: Get-CMSoftwareUpdate | Set-CMSoftwareUpdate -MaximumExecutionMinutes [mins] Auto-adjust Applications: Set-CMDeploymentType Set-CMApplication Set-CMApplicationDeployment The SCCM PowerShell module originally released for SCCM 2012 R2 were poorly documented and buggy. Any “real” work required calling COM objects and performing arcane coding rituals SCCM Current Branch PS module is far more reliable.

Migrate WSUS to SCCM using PowerShell
Read WSUS group membership from AD: Get-ADGroup -Filter [] | Get-ADGroupMembers Write equivalent patch time to AD extensionAttribute: Set-ADObject -Identity [computer] -replace ` @{"extensionAttribute1" = [maintenance time]} Create extensionAttribute-based collections: New-CMCollection, Add-CMDeviceCollectionQueryMembershipRule* Apply Maintence Windows to the collections: New-CMSchedule, New-CMMaintenanceWindow, Swift on Security: “On the plus side, if Twitter does double the maximum post length, I might actually be able to start posting PowerShell code.” *Wins prize for longest cmdlet name ever?

An epic journey of infrastructure modernization for a DevOps world

Similar presentations

Presentation on theme: "An epic journey of infrastructure modernization for a DevOps world"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An epic journey of infrastructure modernization for a DevOps world

Similar presentations

Presentation on theme: "An epic journey of infrastructure modernization for a DevOps world"— Presentation transcript:

Similar presentations

About project

Feedback