Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Taming the Data Center SDR 1.3 Butch Adkins Infrastructure & Operations.

Similar presentations


Presentation on theme: "1 Taming the Data Center SDR 1.3 Butch Adkins Infrastructure & Operations."— Presentation transcript:

1 1 Taming the Data Center SDR 1.3 Butch Adkins Infrastructure & Operations

2 2 Data Center World – Certified Vendor Neutral Each presenter is required to certify that their presentation will be vendor-neutral. As an attendee you have a right to enforce this policy of having no sales pitch within a session by alerting the speaker if you feel the session is not being presented in a vendor neutral fashion. If the issue continues to be a problem, please alert Data Center World staff after the session is complete.

3 3 Agenda Background Outages Outages prevented Takeaways 1.Know your environment 2.Establish relationships 3.Plan 4.Communicate

4 4 Backstory Eastern Kentucky University 1979 33 rd year at UK 7 significant sets of job responsibilities 12 offices 18 managers [four multiple times] April 2001

5 5 McVey Hall Data Center: 2001 20 staff Computing facility with about 60 servers Mainframe Multiple tape robots Research Computer Printing Bubble sheet scanning Building

6 6 McVey Hall Data Center McVey Hall was built in 1928 Computing Center began development in the basement in the late 50s Five expansions, the last in 1987 7,400 sq ft raised floor space

7 7 McVey Hall

8 8 McVey Hall Data Center McVey Hall was built in 1928 Computing Center began development in the basement in the late 50s Five expansions, the last in 1987 7,400 sq ft raised floor space

9 9 McVey Hall Data Center

10 10 Challenges Computer Room vs Data Center Increased hardware count [100 to over 1000] Decommissioned mainframe [2006] Research Computing power requirements [195kVA] Power [725 kVA] Air conditioning [155 tons] Space Virtualization [700 VMs] Cloud Uptime Expectations

11 11

12 12 Five 9s…really? 90%36.5 days 99%3.65 days 99.9%8 hours, 46 minutes 99.99%52.5 minutes 99.999%5 minutes, 46 seconds

13 13 Power UPS C UPS Battery relocation Talk-a-phone Dual feed outage

14 14

15 15 UPS battery relocation UPS A utility by-pass breaker closed UPS A “fails” UPS C loses power MDP2 breaker opens 60% of the building load dropped Power restored; 8 minutes Failure was caused by A and C phases on UPS A being reversed The original UPS installation was verified…sort of

16 16 Power UPS C UPS Battery relocation Talk-a-phone Dual feed outage

17 17

18 18 Power Dual feed, main-tie-main UPS C UPS Battery relocation Talk-a-phone Dual feed outage

19 19

20 20 April 2013, just another Tuesday 8:33 a.m. McVey Hall loses power from both feeds 8:45 a.m. Called UPS CE – Called UPS vendor for support 8:51 a.m. Power restored to the building 8:57 a.m. Called UPS CE for reset procedure 9:09 a.m. First UPS reset 9:14 a.m. Second UPS reset 9:20 a.m. All PDU breakers reset 10:08 a.m Requested estimate to replace batteries 10:30 a.m. Different UPS CE returns call 11:37 a.m. Received battery quote for next day install 11:45 a.m. CE arrives onsite [3 hour response] 12:45 p.m. Essential functions restored 4 hours 15 minutes 4:15 p.m. = newspaper headlines

21 21 Planned Data Center Outage UPS C EOL notification extension expiring Reduced need for power UPS C 25% rack penetration Additional circuit breaker installation Required maintenance on rack-out breakers

22 22 UPS C Decommissioning TCO 2013-2016 Projected UPS C costsCost 2013-14 maintenance$7,058 2014-15 maintenance$7,058 2015-16 maintenance$7,058 2013 batteries$27,000 2015 capacitor replacement$12,400 Estimated Total Cost$60,574 Cost of Decommissioning UPS CCost PPD Electricians$7,470 PPD Materials$3,391 Total Cost$10,861 7 Month 2013 Maintenance Cost for UPS C$4,118 Net Savings$45,595

23 23

24 24 2012 NFPA 70E

25 25 Arc Flash Arc flash is the light and heat produced from an electric arc supplied with sufficient electrical energy to cause substantial damage or harm, fire or injury.

26 26 December 28

27 27 Planned Data Center Outage August 2012 – Began discussing UPS C Spring 2013 – Began strategic planning July 2013 – Requested estimates September 2013 – Plan is blessed October 2013 – Plan is unveiled November 2013 – Network is not redundant December 28, 2013 – Plan is executed

28 28 Stats page Staff directly involved – over 200 Number of VMs shut down – over 700 Initial shutdown of systems – 7 p.m. December 27 Planed Outage Start – 11 a.m. December 28 Actual Outage Start – 11:32 a.m. [+ 32] 12:02 p.m. – Only fallout; no power to Infoblox 12:37 p.m. – PPD Electricians complete switchgear maintenance 3:25 p.m. – PPD Electricians complete the UPS A “backfeed” 5:37 p.m. – All power restored [- 1:23] Estimated Outage Completion - 7 p.m. 10:00 p.m. - Substantial return to operation

29 29 Air Conditioning Chiller Hi-density in-row cooling + 3 CRAHs Power sensitivity Service clearance Thermal expansion valve failure Storage Fuse lugs System thresholds Relationships

30 30 Chiller Maintenance

31 31 Chiller Maintenance

32 32 Air Conditioning Chiller Hi-density in-row cooling + 3 CRAHs Power sensitivity Service clearance Thermal expansion valve failure Storage Fuse lugs System thresholds Relationships

33 33 Fire/Power/AC/Storage/VMs 2:03 p.m. Two CRACs lose power 2:35 p.m. Temperatures increased, reported units down 3:40 p.m. First warning from storage 4:01 p.m. First system shuts down SQL VMfarm Blackboard Drupal Account Manager Sharepoint myUK portal 4:04 p.m. AC technician arrives 5:23 p.m. Electrician arrives 5:51 p.m. Temporary fix in place 6:14 p.m. Cool enough to restart storage systems 8:45 p.m. Major systems restored +4:44

34 34

35 35 Fire Outages 1986 cooling fan Halon 2001 Administration Building Popcorn CRAC Lug Window AC unit fan motor PC

36 36

37 37 Water Roof drain Water fountain Transformer vault Drainage Sump Pump Steam Leak

38 38 Takeaways 1.Know your environment 2.Establish relationships 3.Plan 4.Communicate

39 39 Questions???

40 40 Thank you Butch Adkins Infrastructure & Operations University of Kentucky butch@uky.edu


Download ppt "1 Taming the Data Center SDR 1.3 Butch Adkins Infrastructure & Operations."

Similar presentations


Ads by Google