Presentation on theme: "UCA DATACENTRE MIGRATION & DISASTER RECOVERY PROJECT CREATE A NEW EXTERNAL DATACENTRE ENVIRONMENT AND MIGRATE ALL SYSTEMS CREATE A FURTHER EXTERNAL DATACENTRE."— Presentation transcript:
UCA DATACENTRE MIGRATION & DISASTER RECOVERY PROJECT CREATE A NEW EXTERNAL DATACENTRE ENVIRONMENT AND MIGRATE ALL SYSTEMS CREATE A FURTHER EXTERNAL DATACENTRE AND IMPLEMENT DISASTER RECOVERY FACILITIES UPGRADE BANDWIDTH AND IMPROVE NETWORK RESILIENCE By Fahri Zihni, independent consultant and former Project Manager, UCA firstname.lastname@example.org email@example.com www.ucreative.ac.uk
PROJECT OBJECTIVES During 2011-13, UCA procured datacentre hosting facilities for two UCA “virtual” datacentres, having virtualised all core applications beforehand. The procurement process was preceded by a comprehensive business case and investment appraisal which concluded that the best option was to externalise. The datacentre facilities were provided by the University of London Computer Centre (ULCC) for main data operations, and Custodian datacentre, Maidstone, for Disaster Recovery (DR) purposes. In June 2013, an external Project Manager was appointed and a project team was established to help build a new technical environment at ULCC and migrate all systems to it from the Maidstone campus between July 2013 - July 2014, as well as creating a DR facility at Custodian. Upgrading and improving network resilience for all four campuses was also a part of the project.
The project concluded in success in all respects, and all systems are operating from ULCC (3 racks), with a DR facility established in Custodian (1 rack). Networks have been upgraded between all sites. There were many benefits to the university from externalising datacentre facilities: Avoided the cost of building a new datacentre at a capital cost of over £1m. Ongoing rental costs for 4 racks is perceived to be significantly lower than an alternative in-house datacentre, taking into account land, building, rental value and maintenance costs No ongoing human resource/skills required for the maintenance of UPSs, electrical power, generator, air conditioning, buildings maintenance, cleaning etc Higher levels of systems resilience (99.9% availability) offered by industry standard datacentres with modern equipment & buildings which enjoy economies of scale. Better environmental performance attained in new datacentres, reducing the university’s carbon footprint PROJECT OUTCOMES - 1
Notionally, floor space which would have been taken up by an in-house datacentre at another campus is freed up for alternative activities, closer to the core business of the university (learning, teaching, research) IT staffing requirement level remains much the same as before with staff managing systems remotely as they essentially had done with an in-house datacentre beforehand Networks upgrade benefits: Noticeable difference to end users in systems performance at each of four sites Improved resilience in case of a line failure Providing a better platform for strategic future developments (e.g. video conferencing, video streaming etc) PROJECT OUTCOMES - 2
Disaster Recovery provision benefits Ability to recover core systems in the event of a disaster Incidental benefits Some applications which could not run the latest versions of systems software in the new environment were re-implemented from scratch, providing the most advanced versions of software to end users (e.g. Eduroam, MOTP, SCCM) The advent of the migrations project prompted some users to review and upgrade their applications’ hardware and software, and IT department to improve its operating environment (e.g. standardising SQL versions) (e.g. Planet e-stream, OpenAthens) PROJECT OUTCOMES - 3
Externalisation of main Datacentre - disadvantages Moving the main Datacentre to London meant longer distances to travel for staff. However, after all the equipment and systems were installed, these visits became very infrequent, with much of the incidental work (e.g. receiving goods) being handled by ULCC staff. Managing two external contracts has a management overhead. However, again, after the initial set of discussions and agreement of contract, this became a minor contract to maintain compared with many other IT Department contracts. Overall, the benefits very significantly outweighed the disadvantages PROJECT OUTCOMES - 4
Conclusions for the wider HE Sector Today, there is a highly competitive and efficient datacentre market, accessed directly, via Janet frameworks, or via the Cloud. We also have HE shared datacentre options. In parallel, we have been witnessing a year-by-year improvement in price-performance of wide area networks. These changes make it more possible than ever to externalise datacentre operations. The UCA experience shows that in-house datacentres can successfully be migrated to external sites within a relatively short period of time and provide universities a better solution in many respects – lower costs, higher levels of reliability, long-term scalability and environmental benefits. Release of datacentre assets also enables institutions to use available facilities for core university activities – learning, teaching and research.
DRIVERS FOR CHANGE The sale of Maidstone campus mandated the re-location of the UCA main datacentre and migration of systems by the immovable date of 31 July 2014 – no scope for slippage Requirement from Audit and the University’s own desire to establish a Disaster Recovery facility to support business continuity meant that a second datacentre had to be deployed There was a clear requirement to upgrade fragile, high risk single connection network links running 100Mbs to 1Gb, and provide secondary resilient lines
Process 1 FINANCIAL APPRAISAL AND PROCUREMENT Between 2011-12, when it became clear that the Maidstone campus would be unavailable beyond July 2014, there was a comprehensive financial and business appraisal of options for facilitating a new datacentre, including a cost-benefit and risk analysis for each. This included evaluation of: In-house options Commercial options HE Sector providers – Eduserve, ULCC, Janet brokerage, YHMAN Shared Services options Cloud options This was followed by a procurement process with key selection criteria based on price, quality of service, HE experience, Janet connectivity and location of datacentre. Following procurement, ULCC was selected as the primary, and Custodian as the DR provider site.
Process 2 DATA CENTRE BUILD AT ULCC PROCESS: Order, install, configure new hardware, software and comms technologies at ULCC including Hyper-V, HP 3Par SAN, Blade servers, StoreOnce, Veeam, Juniper firewalls, 2x1Gb connection at ULCC to Janet. RISKS/ISSUES IDENTIFIED, ENCOUNTERED AND ADDRESSED: Late delivery of hardware: addressed by re-scheduling and strengthening relationship with suppliers These were mostly new technologies to the Infrastructure Team. Learning curves a problem - addressed through training Dependence on individuals and capacity issues: addressed through increased contractor resources Hardware & software supplier support issues: addressed through strong communication and escalation (HP systems implementation) Technical Services supplier issues: Initial plans reviewed following purchase of virtual backup systems software and alternative plans agreed (System Professional Ltd)
Process 3 MIGRATE SYSTEMS - 1 PROCESS: Migrate 65 systems from Maidstone campus to ULCC involving Infrastructure, Development and system owners as well as external suppliers. RISKS/ISSUES IDENTIFIED, ENCOUNTERED AND ADDRESSED: Concerns over system-owner engagement – addressed through three sets of detailed meetings with each user group to discuss and agree migration dates and testing/acceptance programmes System-owner changes to agreed dates – addressed through face-to-face discussions and escalation to Head of IT and Project Board Conflicting priorities between this project, other projects and urgent operational problems – addressed locally or through escalation to Head of IT Acceptance test decisions to migrate/revert: Each application migration evaluated jointly by System Owners, Infrastructure, Development, Project Manager & Head of IT and decision made collectively (only one system reverted)
Process 3 MIGRATE SYSTEMS -2 RISKS/ISSUES IDENTIFIED, ENCOUNTERED AND ADDRESSED (contd.) Concerns over identification of IP addresses for each application, firewall rules etc: Sought advice from system suppliers and increased lines of communication between Infrastructure and Development Staff morale low due to concerns about re-structuring: promoted the value of the project to IT staff in relation to career development Staff sickness absences – re-assign work where possible, appoint contractors, make case for out-of-hours work Change from tape to a disk-based backup policy – obtain and act on advice from internal audit (Mazars) Some overnight backups failing intermittently – Microsoft, HP and Veeam phone conferences arranged. Problem not resolved by the deadline given to the suppliers so “Plan B” implemented instead, and snapshot software licence refunded Concern about downtimes during migrations – mitigated by strong lines of communication between all stakeholders and decisive actions arising from these. User satisfaction high
Process 4 CREATE A DISASTER RECOVERY FACILITY AT CUSTODIAN PROCESS: Create an DR facility at Custodian RISKS/ISSUES IDENTIFIED, ENCOUNTERED AND ADDRESSED: A datacentre was set up at Custodian through the re-use of previous SAN, blade servers etc formerly at the Maidstone campus. Achieving a DR solution was optimistic given the massive scope of the project and competition for resources, particularly at a time of significant staff losses. However, despite this, a simple but effective solution was implemented which is acceptable to Audit which can be further improved as time and resources allow.
Process 5 UPGRADE NETWORK AND IMPROVE RESILIENCE PROCESS: Upgrading all site connections to 1Gb and creating resilient links. All four campuses and two datacentres are now running 1Gb into Janet RISKS/ISSUES IDENTIFIED, ENCOUNTERED AND ADDRESSED: Janet network delivery dates: Face-to-face and personal communications with regional Janet representative and a shared work-in-progress tabulation of site-status up was helpful in keeping track and making things happen. All six sites are now running 1Gb and there is only one campus without a secondary backup line, This is scheduled to complete soon. Intermittent packet losses between Janet and ULCC: There was a fairly serious problem for five days, but Janet addressed the issue and the matter was resolved
June 2013 Maidstone Campus Current DC Farnham Canterbury campus Rochester campus Epsom Campus KPSN 100MG 1GB ULCCCustodian Internet / JANET
Current position (October 2014) Farnham Canterbury Rochester Epsom KPSN 1GB ULCCCustodian 1GB Internet / JANET 1GB
STAKEHOLDERS: THE HUMAN DIMENSION STAFF After an ambivalent start due to a staff re-structuring exercise running in parallel, the project inspired many who then become fully engaged with the programme. Staff worked hard despite conflicting priorities between project and other initiatives and day-to-day emergencies. SYSTEM OWNERS AND END USERS Three face-to-face meetings were held with each system-owner group to explain what will happen, agree dates for migration, and agree a testing plan. This was time consuming work, but paid off with users delivering on their responsibilities. System-owners expressed high levels of satisfaction with migrations at the end of the project, including some positive student feedback. SUPPLIERS/PRODUCTS Overall, support from all suppliers was good. System Professional Ltd provided excellent on-site high-level technical expertise. HP was the main hardware supplier and although issues were encountered, account management was strong and these issues were addressed. StoreOnce worked well. Veeam proved to be an excellent product and Hyper-V performed well. Janet regional representative was very supportive. PROJECT BOARD Strong project support was provided at Deputy VC, PVC Resources, and Head of IT levels with independent advice coming from an experienced interim Director of IT, in addition to the Project Manager. This was a small but focussed, nimble footed group, which was able to make prompt and incisive decisions. AUDIT Audit (Mazars) was kept up to speed throughout the project and provided useful advice on testing and backup policy.